Siri Intents Walked So Model Context Protocol Could Run
- AI & Data Engineering
Adam Smith Adam Smith
June 15, 2026 • 6 min read
Building AI into software is an incredibly hot topic. Deciding on how much autonomy to grant the AI in your system is a foundational question. In this post, we're going to look at three levels, from practically none all the way up to a fully autonomous agent.
You can think of AI autonomy in three levels:
Level 1: AI answers one narrow question.
Level 2: AI makes judgments inside a workflow you designed.
Level 3: AI pursues a goal using tools, permissions, and oversight.
Each comes with a tradeoff. At the simple end, the AI just answers one question, which also means there isn't much it can mess up. At the autonomous end, it can handle problems you never could have scripted, and that same freedom is the thing you have to manage. The right question is never "which level is best." It's "which level does this particular job need."
To keep things concrete, every example below comes from the same fictional business: Sit & Stay, a dog walking company with about thirty walkers and a few hundred clients.
At level 1, the AI has a single narrow job inside a process you already run. It gets asked the same kind of question every time, and it has to answer in a fixed format your other systems can use. No conversation, no follow-ups, no initiative.
Sit & Stay gets a steady stream of customer email. Every message that arrives gets put to the model with the same question: what's the mood here, what's it about, and does anything need attention today? The answer comes back in a predictable structure and lands in a database. Three weeks later Priya, the owner, notices that complaints about late appointments tripled in March and they clustered around two walkers. That's a real insight, and the AI portion of the system is maybe forty lines of code.
Nothing here loops. Nothing plans. The model never decides what happens next. That constraint is exactly what makes level 1 so fast and safe to ship. Most "we added AI" features that actually make it to production are level 1, and there's no shame in that. We built almost exactly this system a while back and wrote up how it works under the hood if you want the mechanics.
Level 2 bakes the AI into a workflow. You map a business process out as steps and the paths between them. Some steps are plain code, some are AI, some are a human being. The flow can branch, loop back, and pause for days waiting on a person.
Sit & Stay is always hiring walkers, so they built a hiring portal. A job application comes in and the LLM reads the resume, scoring it against what their good walkers tend to have in common: experience with large breeds, schedule overlap with peak demand, distance from the service area. Then deterministic code (aka not AI) checks the hard requirements, like consent to a background check. No judgment needed there; the answer is either yes or no.
Now the path branches. Strong candidates get an interview slot offered automatically. Borderline ones land in a review queue, and the workflow simply stops until a human clicks something. After the interview, the LLM drafts either an offer or a polite rejection, and a person approves it before anything is sent.
The defining property of level 2 is that you could draw this whole thing on a whiteboard before writing any code. The model exercises judgment, but only within the rules you give it from your workflow.
At level 3 you scrap the workflow and build an autonomous agent. You give the agent a goal, a set of tools, and rules about what it's allowed to do. The agent works in a loop: look at the situation, pick an action, see what happened, pick the next one.
Here's why Sit & Stay needs one. It's 6:52 am on a Tuesday and Dana texts in sick with six walks on her schedule. By 7:40 am a storm breaks over one zip code and a client wants to move Biscuit from 1pm to 3pm. Marco can absorb three of Dana's walks, except one is Rufus, who can't share a sidewalk with another dog.
😅Nobody can flowchart that morning.
So the dispatcher agent gets tools instead of a script: the schedule, messaging, walker profiles, weather, maps, and the ability to escalate to Priya. Its standing goals: every booked dog gets walked by someone qualified and every affected client hears about changes before they notice them.
The agent reads Dana's text, ranks substitutes by distance, skills, and workload, messages two walkers, and gets one yes and one silence. It offers the remaining clients a thirty-minute shift, which two of three accept, and moves the storm-zone walks earlier on its own. Rufus it can't solve, so it hands Priya a decision instead of a problem: "No qualified walker is free between 1 and 4. Refund today's walk, or reschedule to 10am tomorrow with Sam, who's walked Rufus before. I'd suggest the reschedule." She taps one button and gets a summary of the whole morning at 8:00.
The part that makes this safe to run is the part that never shows up in demos: guardrails. The agent can issue credits up to $20 and not a dollar more, because that limit lives in code, not in the prompt. It can't cancel a walk without offering an alternative. Certain message types require approval before sending. Every action lands in an audit log Priya can read. The real design work at level 3 isn't prompting; it's deciding what the agent may do alone, what needs a one-tap approval, and what it must hand to a human entirely.
The test we keep coming back to: try to draw the flowchart before you build. If it's one box, that's level 1. If you can draw a diagram and trust it to stay accurate, that's level 2. If the diagram starts to look like a plate of spaghetti that would be out of date by tomorrow, you're looking at level 3.