Issue No. 42 · Technology
The AI in the Room
I spent four months trying to use a coding assistant to do my actual job. Here is what I will and will not let it touch.
By Adam Reilly 9 min read
For the first three months of this year I ran a small, fairly serious experiment, which was: I tried to use a large-language-model coding assistant for as much of my weekly work as I could justify. The work, for context, is a mix of writing, GIS analysis, the occasional Python script for a chart, and a fair amount of correspondence. The assistant was Claude, the one from Anthropic. It was the model that came out late last year, and it is, by some distance, the best one I have used.
I did not approach the experiment in either a posture of "this changes everything" or a posture of "this is mostly nonsense." Both of those postures, in my experience, are tells. The first is usually written by somebody who is paid to think the first thing. The second is usually written by somebody who has not actually tried the tools.
What I came out with, after three months, is a small and specific set of rules about what I will and will not let the assistant touch. I am writing them down here, because I think they are more useful than the usual essay about whether AI is good or bad.
What I let it do
I let it write the boring kind of code. Specifically: I let it write the data-loading and data-cleaning code at the start of a GIS analysis, the kind that reads in a CSV from the California Energy Commission, casts the columns, drops the rows with missing geocoordinates, and joins to a county shapefile. It writes this code in about thirty seconds, and the code is correct about ninety percent of the time, and the ten percent where it is wrong I catch by reading it before I run it. This used to take me forty-five minutes. It now takes me five.
I let it format. I let it convert a verbal description of a chart into the matplotlib code for the chart. I let it refactor a long Python notebook into a script, which is a thing I previously dreaded. I let it write the SQL query for the kind of join I have written ten thousand times in my life.
I let it summarize, with one specific caveat. I let it summarize documents I have read, as a check on my own summary. I do not let it summarize documents I have not read. The difference is the difference between using a calculator to check your arithmetic and using a calculator to do your thinking.
The difference is between using a calculator to check your arithmetic and using one to do your thinking.
What I do not let it do
I do not let it write prose for me. This is going to sound like a writer’s ego, but it is not. It is a fairly hard-headed observation about what writing actually is, for the kind of writing I do. The model produces prose that is grammatical and well-organized and, in a way that took me a few weeks to articulate, deeply uninteresting. The model has read a great deal of average prose and writes more of it. My job, such as it is, is to not write average prose.
I do not let it make analytical judgments. I do not let it tell me whether a data set supports a claim, because I have caught it, more than once, telling me yes when the answer was clearly no. The model is a tireless and confident pattern-matcher, and analysis is, in a deep sense, not a pattern-matching problem. Analysis is the question of which pattern is the one that actually applies. The model is not good at that question, because the question is mostly about what is not in the data.
I do not let it write anything that goes out under my name without my having read every word of it. This sounds obvious. It is not obvious. There is a slope, and I have seen people slide down it in real time. The slope starts with "I had it write the rough draft and then I edited it heavily." The slope ends with "I had it write the email." I am trying to stay off the slope.
What this means in practice
In practice the model has eaten about thirty percent of the busywork of my week. The busywork is, importantly, the worst part of my week. It is the part where I am writing the seventh import statement of the day and wondering whether this is what I went to graduate school for. The model takes that part. I get back about five hours a week. I have, mostly, spent those five hours writing more, which is to say writing for Field Letter rather than against a deadline for a freelance client.
The model has not made me a better writer, or a better analyst, or a better thinker. It has not, particularly, made me a worse one either. It has, very specifically, replaced a pile of small annoyances with a different and smaller pile of new annoyances. The new annoyances are: catching its errors, deciding when to trust it, and figuring out where the line is.
I think that is the actual story of useful AI in 2026. It is not a transformation. It is not a destruction. It is a tool that, for the specific narrow tasks at which it is good, is genuinely much better than the tool we had before. And it is a tool which, if you give it tasks it is not good at, will fail in ways that look exactly like success until you check.
I am keeping the assistant. I am keeping the rules. The boring kind of code, yes. The prose, no. The analytical judgments, absolutely not. Every word of anything that goes out under my name read by me, every time, no exceptions. I think those are pretty good rules for now. Ask me again in a year.
If this is your sort of thing, get the next one.
One long essay, every Sunday. No advertising, no tracking pixels, no five-things-to-do-in-five-minutes. Just twenty-two thousand readers and a single, well-cared-for email.
Free. Unsubscribe in one click. We never share the list — we have nobody to share it with.