Blog

  • Framework Choice Is Now an AI Tooling Decision

    If you’re a CTO or engineering lead making framework decisions in 2025, you’re probably evaluating based on familiar criteria: performance, ecosystem maturity, team expertise, hiring pool. Those still matter. But framework selection AI tooling is becoming a critical new variable. How well your framework plays with AI coding tools matters more than most teams realize.

    I’ve spent six months running production systems where AI tools are part of the daily workflow. Not toy projects, not experiments – actual infrastructure work across Laravel, Django, Flask, and Python microservices. What I’ve learned is that the quality of the LLM matters less than the quality of the context it’s working with.

    And right now, language ecosystems are diverging fast on context quality.

    The Real Problem With AI Coding Tools

    Context poisoning is real. If you’ve used Claude Code, GitHub Copilot, or any AI coding assistant for more than a few hours, you’ve hit it. The AI gets bad information stuck in its context window and keeps making the same mistake. Over and over.

    I restart Claude Code sessions regularly not because I want to, but because the context from one problem bleeds into the next. Work on authentication, then switch to a background job system, and the AI is still trying to apply auth patterns where they don’t belong. The longer the session, the worse it gets.

    Sometimes it falls into a loop. Fix a bug, cause a crash, revert the fix, crash again, revert, crash. It’ll run this cycle until you kill it. I’ve watched it happen enough times to recognize the pattern early.

    Once, in an ephemeral environment where I had --allow-insecure and --dangerously-skip-permissions enabled (never do this outside throwaway containers), Claude Code tried to delete my entire codebase. Another time it attempted commands that would have bricked my laptop if I hadn’t been running in a sandbox. These aren’t bugs in Claude – they’re the natural result of an AI operating with poisoned context.

    The solution isn’t just better LLMs. It’s better context.

    Why Framework Selection AI Tooling Quality Matters

    Generic AI assistants know programming languages. They don’t know your framework. They’ve seen millions of lines of code, but they don’t understand the conventions, the magic methods, the implicit behavior that makes frameworks productive. They definitely don’t understand business context or the quirky way a team building libraries named everything.

    Laravel Boost is the first mature attempt I’ve seen at solving this. It’s not just documentation lookup. It gives the LLM a way to work WITH the framework, not just read about it. When Claude Code needs Laravel-specific knowledge, it reaches out to Boost and gets answers in the context of your actual application.

    I realized how powerful this was when building a control plane for running Python scripts. First iteration used Django. Second iteration moved to Laravel. The difference was night and day when the LLM needed to understand something. With Django, every framework question required me to intervene or point it at docs. With Laravel and Boost, it could figure things out.

    I tried Flask later for a simpler version. Complexity dropped, but Laravel still won. Not because Flask is bad – it’s excellent. But because Laravel had the tooling ecosystem.

    The Tinker Factor

    Here’s what really separates frameworks in the AI era: can your AI safely experiment with your backend?

    Laravel Tinker is a REPL for your application. It’s not a general PHP shell – it’s your actual app, with all your services, all your dependencies, all your state. An AI can use Tinker to validate assumptions, check actual data, and verify that pipelines work end-to-end.

    I was working on a feature where data needed to flow through multiple systems: UI form validation, queue processing, database storage, Redis caching with TTL. In traditional development, you’d write the code, deploy it, test it manually, find what broke, fix it, repeat.

    With Tinker in the loop, the testing agent could close that cycle automatically. Create a test record. Check it’s in the database. Verify the cache key exists and expires correctly. Confirm the queue processed the job. If any step failed, it could investigate and fix it. The first implementation attempt was dramatically more successful because the AI could actually see what was happening.

    This matters most when your stack gets complex. App plus MySQL? You’re probably fine without this. App plus MySQL plus Redis plus Memcached plus mail queue plus SQS plus external APIs plus Python microservices? Now you need visibility. The AI needs to know what state your system is actually in, not what it thinks the state should be.

    Python has something similar – Django shell, Flask shell, the improved Python 3.13+ REPL. But there’s no Django Boost. There’s no Flask Boost. You get generic AI assistants that work across all languages, which means they’re not particularly good at any specific framework.

    Node.js is even further behind. NestJS added an AI module in 2024, but that’s for building AI into your apps, not for AI-assisted development. There’s no Express equivalent to Tinker, no framework-aware AI context tool.

    Laravel has both. That combination is unique right now. When evaluating framework selection AI tooling options, this integration level should be your benchmark.

    What Framework Selection AI Tooling Means For Your Stack

    I’m not telling you to rewrite your Scala services in Laravel. That would be insane. But I am telling you to pay attention to this pattern.

    We’re in the early curve of language-specific AI tooling. Laravel Boost launched recently. Python and Node.js ecosystems will catch up eventually. But “eventually” might be six months or two years, and in the meantime, teams using frameworks with mature AI tooling will ship faster.

    If you’re starting a new project or evaluating a framework migration, framework selection AI tooling ecosystem should be on your decision matrix. Not the top priority – you still need to match the framework to your problem – but on the list.

    If you want 10x developers, you need to give them actual tooling and connectivity, not just access to better LLMs. Context matters. Building clean, usable context is hard. Frameworks that solve this problem will have an advantage.

    Watch For This Pattern

    Here’s what I’m watching for in other ecosystems:

    1. Framework-aware AI assistants (not generic language assistants)
    2. Safe REPL experimentation environments that understand framework conventions
    3. Ability for AI to verify state across complex service architectures
    4. Community investment in building these integrations

    Framework selection AI tooling ecosystems that deliver these capabilities will differentiate themselves quickly.

    When Python gets a Django-specific AI assistant with Django shell integration at the quality level of Laravel Boost, that’s a signal. When Node.js gets there, that’s another signal. When your language of choice gets there, pay attention.

    This isn’t about Laravel winning. It’s about ecosystems that invest in AI-native developer experience pulling ahead of ecosystems that don’t.

    Early adopters in frameworks with strong AI tooling will have a productivity edge. How big? Hard to quantify yet. But I’ve seen the difference between working with and without it, and it’s significant enough to influence architecture decisions.

    The Bottom Line

    Framework selection has always been about tradeoffs. Performance vs. developer experience. Ecosystem maturity vs. innovation. Type safety vs. flexibility.

    Add a new dimension: AI tooling integration.

    It’s early. The tooling will improve across all ecosystems eventually. But if you’re making framework decisions now, or evaluating whether your current stack is positioned well for AI-augmented development, this is worth thinking about.

    Don’t rewrite everything. But when you’re choosing a framework for a new service, or considering a migration, or planning your 2025-2026 architecture evolution, factor this in.

    The frameworks that win the AI tooling race will be the ones that make it easiest for developers and AI to work together. Right now, that’s Laravel. Tomorrow, it might be something else. The pattern is what matters.


    Making framework and architecture decisions with AI tooling in mind?

    I help engineering teams evaluate stacks, design infrastructure, and make strategic technology choices. If you’re thinking through how AI coding tools fit into your architecture plans, or you’re wondering whether your current framework choices position you well for AI-augmented development, let’s talk.

    Get in touch →

  • MCP Servers in My Homelab: What I Actually Learned

    MCP Servers in My Homelab: What I Actually Learned

    My four-year-old learned to use a shape sorter yesterday. Watching her figure it out reminded me of my experience with MCP servers over the last six months. Not because the toy was sophisticated, but because she had the right context. She already understood that objects have properties, that matching matters, and that trying different approaches leads to success. The toy just gave her the right signals at the right time. For the past six months, I’ve been running MCP (Model Context Protocol) servers in my homelab. Not as demos or proof-of-concepts, but as actual tools integrated into my daily workflow. Cloudflare deployments, WordPress content management, GitHub operations, Kubernetes cluster management. Some have transformed how I work. Others burned more time debugging than they saved. Building context for LLMs is the same problem my daughter faced, just exponentially more complicated. LLMs are missing the additional signals like sizes, shapes, and colors. MCP servers aim to close that gap. Here’s what I’ve learned.

    Why MCP Servers Exist

    LLMs are incredibly capable and frustratingly limited. They can write brilliant code but can’t see your actual codebase. They understand Kubernetes architecture but don’t know your cluster’s current state. Similarly, they can debug WordPress issues but can’t access your actual site. MCP servers solve this by giving LLMs structured access to external systems. Instead of copy-pasting error logs or describing your infrastructure, you connect the AI directly to the source. The promise is beautiful: AI agents that can actually take action, not just give advice. However, there’s a cost. Just like my daughter would have been overwhelmed if I’d dumped all 12 shapes in front of her at once, LLMs have limits on how much information they can actually process. Every LLM has a context window, a hard limit on how much information it can consider at once. For Claude, that’s 200k tokens. Sounds like a lot until you start feeding it MCP responses. A single Kubernetes MCP query for pod status across a namespace? 3,000 tokens. Your entire WordPress site’s page structure? 15,000 tokens. GitHub repository file tree with recent commits? 8,000 tokens. As a result, stringing together a few MCP calls in one conversation eats through that context window fast. Worse, there’s context poisoning. When you give an LLM too much information, it starts losing track of what matters. Flooding an LLM with marginally relevant MCP responses degrades its ability to solve the actual problem. The reality is messier than the demos suggest.

    Three Hard Truths About MCP Servers

    MCP Server Compatibility Is a Nightmare

    Here’s something nobody tells you: just because an MCP server works perfectly in Claude Desktop doesn’t mean it works in Cursor, Claude Code, or whatever oddly named tool ships next month. I spent three hours one night debugging the GitHub MCP server. It worked flawlessly in Claude Desktop. Beautiful integration, clean responses, perfect tool execution. Then I tried to use it programmatically via a Python script for an automation workflow. Complete failure. Different authentication flow, different response structures, different error handling. The time I spent getting it to work exceeded the time it would have taken to just use the GitHub API directly. That’s the pattern I kept hitting. The demo works. But the production integration breaks in subtle ways.

    Security and Scale Are Afterthoughts

    MCP servers are a security nightmare waiting to happen. Sure, Anthropic, Cloudflare, and a few others have production-quality implementations. However, every developer with a GitHub account has published an MCP server. Random repositories with 47 stars, zero security audits, and authentication patterns that would make your security team cry. Unless you’re doing due diligence on every MCP server you install, you’re giving random code from the internet access to your infrastructure. The MCP protocol itself doesn’t enforce any security standards. It’s just a communication layer, so security is entirely up to the implementation. Want to scale these beyond your laptop? Good luck. Most MCP servers are designed for single-user, local-only scenarios. When you try to make them available to your team, you hit questions nobody’s answered: How do you handle authentication? How do you manage permissions? How do you audit who did what when something breaks at 3am? The transport mechanisms have gaps too. MCP uses different protocols (stdio, SSE, WebSocket) and not every server implements them well. I found this out with the Kubernetes MCP. Querying pod status? Perfect. Streaming logs from a running container? Intermittent failures and timeouts.

    {
      "error": "SSE transport timeout",
      "context": "Streaming response exceeded configured limit",
      "duration": "30000ms"
    }

    These aren’t bugs exactly. Instead, they’re the growing pains of a young protocol. But when you’re trying to scale beyond your homelab, the gaps become glaring.

    Token Costs Add Up Fast

    This is the silent killer. Every MCP interaction costs tokens. Not just for the query and response, but also for tool definitions, context passing, and response formatting. Consider asking Claude to check a Kubernetes deployment status using the MCP:

    • Tool definition: ~800 tokens
    • Query processing: ~200 tokens
    • Response formatting: ~1,500 tokens
    • Total: ~2,500 tokens for information I could get with kubectl get deployment

    For one-off queries, that’s fine. For automation or frequent operations, though, the costs compound quickly. Here’s a real example: a morning deployment check. You ask Claude to verify last night’s release went smoothly. First, it checks Kubernetes deployment status (2,500 tokens). Then it lists the pods (3,200), pulls logs from a failing pod (8,400), checks recent GitHub commits (4,100), grabs the diff (6,800), and reviews Cloudflare traffic (5,200). Six MCP calls, 30,000+ tokens, roughly $0.09 in input costs alone. Do that five times a week and you’re at $24/month for something a bash script could do for free. The value proposition has to be crystal clear. “Slightly more convenient” doesn’t justify the cost. MCP shines when you need Claude to reason across systems or explain findings to someone else. Conversely, it’s overkill for routine checks you could automate.

    Diagram showing MCP server architecture with LLM sending queries to MCP Server which makes API calls to external systems like Kubernetes, GitHub, and WordPress, with a warning callout about token costs at every step
    MCP servers sit between your LLM and external systems, handling authentication and response formatting. Every step costs tokens.

    The GitHub MCP Rabbit Hole

    Last week I needed to debug how two services communicate. Different repos, shared message contracts, something wasn’t matching up. I figured the GitHub MCP would help Claude understand both codebases. What followed was a token bonfire. The MCP started searching and listing directories. Next, it pulled file trees and searched for “message” and “contract” and “schema” across both repos. Then it found tangentially related files, asked clarifying questions, and searched some more. Each step burned tokens building context it didn’t actually need. After watching this for five minutes, I killed the conversation. I opened both repos in my browser, grabbed the two Helm manifests I already knew were the problem, and pasted them directly into a fresh chat. Thirty seconds. Zero MCP calls. Claude immediately spotted the mismatch. The MCP approach made sense if I didn’t know where to look. But I did. I just needed Claude to analyze two specific files. Feeding them directly was faster, cheaper, and got me to the answer without the search expedition. The lesson: MCP servers are discovery tools. If you already know what context the LLM needs, skip the MCP and provide it directly. Don’t pay for search when you already have the answer in hand.

    Build Your Own MCP Servers

    Here’s my take: don’t wait for vendors to build the perfect MCP server for your use case. Instead, build your own. The barrier is lower than you think. You need Python, an LLM to help write the boilerplate, and a clear understanding of what context your AI actually needs. I built a custom MCP server for an internal deployment system in an afternoon:

    from mcp.server import Server, stdio_server
    from mcp.types import Tool, TextContent
    
    # Initialize the MCP server with a unique name
    server = Server("custom-deploy")
    
    # Define what tools this MCP server exposes to the LLM
    @server.list_tools()
    async def list_tools() -> list[Tool]:
        return [
            Tool(
                name="check_deployment_status",
                description="Check status of application deployment",
                # Input schema tells the LLM what parameters are available
                inputSchema={
                    "type": "object",
                    "properties": {
                        "app_name": {"type": "string"},
                        "environment": {"type": "string"}
                    },
                    "required": ["app_name"]
                }
            )
        ]
    
    # Handle incoming tool calls from the LLM
    @server.call_tool()
    async def call_tool(name: str, arguments: dict) -> list[TextContent]:
        if name == "check_deployment_status":
            # Replace this with your actual deployment status logic
            status = await get_deployment_status(
                arguments["app_name"],
                arguments.get("environment", "production")  # Default to production
            )
            # Return structured text that the LLM can parse
            return [TextContent(type="text", text=status)]

    That skeleton gets you 80% of the way there. The remaining 20% is your domain-specific logic, which you already have. Don’t spend money with companies that gate their MCP servers behind unnecessary paywalls. The protocol is open and the tooling is available. Support the companies building real value, like Cloudflare and Anthropic, who are releasing production-quality servers and contributing to the ecosystem. For everything else, build your own.

    The Right Context at the Right Time

    MCP servers aren’t perfect. They’re expensive and they break in frustrating ways. But they’re solving a real problem: giving LLMs the context they need to be genuinely useful. Just like that shape sorter, it’s not about the sophistication of the tool. It’s about providing the right context at the right time. Show too little and the AI can’t help. Show too much and it loses focus. We’re still figuring out where that line is. I’ll be writing more about individual MCP servers, what works, and what doesn’t. Next up: WordPress MCP in production.


    Working on AI tooling for your infrastructure? I help teams figure out what actually works in production, whether that’s custom MCP servers, Kubernetes architecture, or cloud deployments. Let’s talk.