I Built an AI Agent to Do My Grocery Order (It Can't Check Out)

Every week my wife sits down and builds the grocery list. She pulls the recipes we're cooking, works out what we already have, and writes down the rest. It's not hard, but it's the same hour every week. I wanted to see if I could start taking it off her plate.

So I started small. Forget the full meal plan for now. Could I hand an agent a single recipe and have it come back as a real cart at our Kroger, priced and ready, without me touching a keyboard? If that works, the weekly list is just more of the same.

It works. A real 17-ingredient moussaka cart, every ingredient priced and added, sitting in my actual Kroger account waiting for me to press "place order." That's the proof I was after: an agent can read a recipe and load the cart. Then I started checking what the API could actually do, and that's where the plan fell apart in the interesting way. The thing I'd written down as a rule for myself — stop before checkout — turned out to be a thing the API enforces whether I want it to or not.

Here's what I wanted. A good agent builds the cart and waits for me. A bad agent places the order. I wrote that line at the start as discipline. I didn't know yet how load-bearing it was.

Letting the agent build it

Kroger has a developer API. The plan I started from assumed a public app could reach all of it: search products, look up stores, write to the cart, read the cart back. That last one was the assumption I should have tested first.

I gave Claude Code the plan, pointed it at the CupOfOwls/kroger-api library as a reference to read (not import, since I wanted Python stdlib only, no requests), and handed it the prompt you'll see at the end. About an hour later I had a working skill: a SKILL.md and one kroger.py script with subcommands for auth, store lookup, search, and adding to the cart.

It broke on the first live call, the way these things do. A couple of small fixes later it was running against my real account, which is where the actual lessons started. (One of them: Kroger compresses every response, including its error messages, so the first error I tried to read was unreadable until the script learned to decompress. Minor, but it set the tone. The plan was right about the shape and wrong about the friction.)

Where it went wrong, and how it got fixed

I like building this way because the agent gets you to "runs" fast and then real data does the teaching. Both of the things that mattered came from pointing the skill at a real Kroger account, not a tidy example.

The public cart API is write-only. This is the one. I needed to add items and then read the cart back, so I could show a clean summary with real prices. I assumed the read endpoint was there, just behind a permission I hadn't asked for yet. It isn't. Reading the cart requires Partner API access, which a public developer account can't get. I confirmed it the hard way by testing every cart permission Kroger's own validator would accept, then found the same wall documented in the popular CupOfOwls/kroger-mcp server, whose author left a comment in the code: "The Kroger API does not provide permission to query the actual user cart contents." Nobody's even filed a bug about it. It's just accepted as how the public API works.

And here's the part I didn't expect: the limit made the design better. I'd planned to fetch the cart after the add so I could show a polished summary. I can't. What I show instead is the list of items the skill just added, with the prices it already pulled during the product search, plus a link to the cart on kroger.com for me to review. The summary is built from data I already have, and the human reviews the real cart in the real place. Which is exactly the "stop before checkout" rule I'd written for myself. I'd been treating it as a principle I had to enforce. The API enforces it for me. It literally can't place the order or read the cart, so the agent stops because there's nowhere else to go. Same destination, better reason.

You can't remove items either. This is the corollary, and it's a real one. From a public app the Kroger cart is add-only. Every run stacks on top of whatever's already in there, and nothing in the public API can take an item out. So the skill opens with a reminder before anything else: go review or empty the cart at kroger.com first. Claude can add. It can never subtract. If that warning weren't there, you'd run it twice, see a cart with 34 items, and wonder who's in charge.

What it looks like when it works

I gave it "look up a Greek moussaka recipe and add it to my cart," my zip code, and a note that I wanted ground beef. The skill:

Found a real recipe and pulled the ingredients (the three layers: eggplant and potato base, spiced meat sauce, béchamel).
Resolved my store: the nearest Kroger to my zip.
Searched all 17 ingredients at once. Every one in stock.
Picked a product for each: cheapest in-stock option that matched what the recipe meant, quantities rounded up to real package sizes.
Added all 17 in one shot.
Stopped, and summarized with its reasoning, including which calls were judgment and which were obvious.

About $58.06 total, or $47.58 if you skip the four spices. And it showed its work instead of hiding it: rounded 1.5 lb of beef up to 2 lb, grabbed 3 eggplants, swapped in Parmesan because the Kefalotyri the recipe wanted wasn't available, left the optional red wine out, and flagged the $3.99 nutmeg as the thing I probably already own. Nothing ordered.

Ingredient	Qty	Price
Eggplant	3	$4.47
Russet potatoes	1	$3.19
Ground beef 80/20	2	$13.98
Yellow onion	1	$2.99
Garlic	1	$0.89
Crushed tomatoes	1	$1.89
Tomato paste	1	$0.99
Olive oil	1	$6.99
Unsalted butter	1	$3.79
All-purpose flour	1	$2.59
Whole milk	1	$2.49
Large eggs	1	$0.99
Parmesan, shredded	1	$2.33
Ground cinnamon	1	$1.25
Ground allspice	1	$3.99
Ground nutmeg	1	$3.99
Dried oregano	1	$1.25

Why "it works" still isn't "it's right"

The line I keep coming back to is the same one from the fantasy football post, in a different costume. The skill produced a number that looked like a verdict and wasn't. Fixing the obvious mistakes made the cart trustworthy. It did not make the judgment trustworthy.

The skill stopped picking the wrong brand. But rounding the beef up to 2 lb, grabbing three eggplants, swapping Parmesan in for the Kefalotyri — those were judgment calls, not facts, and a couple of them could just as easily have gone the other way. A good cart is more than a correct total, and a good agent knows the difference between "I added what you probably want" and "I bought it." That gap is the reason a human still reviews the cart, and it's the reason I'm nowhere near letting this run my wife's whole list unsupervised yet.

The playbook you can steal

Test what the public API can actually reach on day one, not the third iteration. The add-only cart isn't a bug, it's the shape of the public surface. Find that out with a real account before you build a "read the cart back" feature you'll just have to delete.

Let the constraint do some of the design for you. I wanted the agent to stop before checkout. The API can't check out. Instead of fighting that, I leaned on it — the principle and the limitation pointed the same direction.

This is step one. The real goal is the weekly list my wife builds, pulled from a meal-plan note and turned into a cart she just glances at and approves. I've proven the connection works. Now I get to build on it. The same skill can already run on its own in another agent runtime, pulling recipes from a note instead of waiting for me to ask — that's the path from "I tested it once" to "it quietly did the shopping."

Build this yourself

Time: about 2 hours if your environment is ready. 4 to 6 if you count the one-time developer registration and first login.

Cost: $0. Public Kroger API access is free for personal use.

Stack:

Python 3.13, standard library only. No requests, no httpx.
Claude Code, or any agent that can run a SKILL.md plus a script.
A Kroger Developer account at developer.kroger.com, set up in the Production environment, with the product, location, and cart.basic:write scopes.
A local .env with your client ID, client secret, redirect URI, and API base.
A .cache/ directory for tokens and lookups.

Steps:

Register a Production app at developer.kroger.com. Use Production, not Certification. The Certification environment rejects real kroger.com logins, so the login step won't work there. Add the three scopes, save the client ID and secret to .env.
Run the one-time login: python kroger.py auth. The script opens your browser, you approve, and it catches the redirect, swaps the code for a token plus a 180-day refresh token, and saves them locally.
Resolve your store from a zip: python kroger.py locations --zip <your-zip>. Cached for 30 days.
Hand the skill a recipe. It pulls the ingredients, searches your store, picks products, adds them in one batch, summarizes, and stops.
Review the cart at kroger.com and place the order yourself. The skill can't read the cart and the public API can't remove items, so this step is on you.

Pitfalls:

Register in Production, and re-select the API products after you switch environments or the portal errors out at submit.
Kroger compresses every response, including errors, so have your client handle that or your first error will be unreadable.
The login callback server needs to keep listening, not handle a single request, or a stray browser probe eats the one request you care about.
Open the flow with a "review or empty the cart first" reminder, because the public API can't subtract.

The prompt I used

Copy this and point it at your own recipes. Don't delete the constraints at the bottom, they're the lesson:

Build something that turns a recipe into a Kroger pickup cart and stops
before checkout. This core behavior holds regardless of how it's packaged
or what tool builds it:

- Take a recipe (text, URL, or note path) and a zip code. Resolve the zip
  to a single store via the Locations API and cache the result for 30 days.
- Step 1: parse the recipe into a structured ingredient list with name,
  quantity, unit, prep, and notes. One LLM call. Strict JSON output.
- Step 2: for each ingredient, search the Products API at the resolved
  store, limit 5 results, in parallel. Cache per (term, locationId) for
  24 hours. An ingredient with zero results goes to a "manual shopping
  list" appendix, not the cart.
- Step 3: pick from the 1-5 candidates per ingredient based on the
  recipe's intent. Match brand, cut, and fat content if specified; pick
  the cheapest in-stock option otherwise. Round quantity up to a real
  package size. One LLM call. Strict JSON output.
- Step 4: add the chosen SKUs in one batch. After the call, summarize
  from data already in hand: per-item name, size, price, and reason, plus
  a Kroger cart URL for the human to review. Do not try to read the cart
  back, and do not place the order.
- These are the hard-won constraints, so keep them exact:
  - The public cart API is write-only. Reading the cart (and removing
    items) needs partner API access, which a public account doesn't have.
    Build the summary from data already in hand.
  - The cart is add-only. Before each new recipe, the user reviews or
    empties the cart at kroger.com manually.
  - Kroger compresses every response, including errors. Decompress before
    reading.
  - The app must be registered in Production (not Certification) for
    real-account login. Make the API base configurable.

The flow when a person is driving it: recipe, then zip, then store, then
cart summary, then stop.

Hand that to Claude Code and you get a skill: a SKILL.md and a script. I give it a recipe, it picks the store, builds the cart, summarizes, and stops. The same skill travels to a different agent runtime, drop it somewhere with no human in the loop and it runs on its own, pulling recipes from a weekly meal-plan note and reporting back what it found.

Copy the prompt, point it at your own recipes, and build on it. Hit reply and tell me what you make.

— Ben

I built an agent to do my wife's grocery order. It can add to the cart but never check out.