Blog RSS Feed

Last Week, an LLM Out-Programmed Me

With last week’s release of Codex 5.3 and Opus 4.6, I had a new experience: an LLM showed itself to be a better programmer than I am. If you’ve seen my code, you may not think that’s a big achievement. But for the first time I saw, practically, how an AI could outperform me at something I take some measure of pride in. It was like Google’s Nano Banana Pro moment, but for coding.

Unlike my previous experiences with LLM coding, Codex 5.3 didn’t just have more familiarity with the syntax of a language or the functionality of a module; it solved an architectural problem better than I did. (It reused existing file artifacts instead of creating intermediate files.) Likely it had pulled the architectural pattern from somewhere else, but it was an elegant solution—superior to the workable-but-basic approach I’d been planning. In that instant, I felt like the future had arrived in a small way: it was better at this task than I was, not just faster at it.

LLMs have let me compress weeks of coding work into a few days. For the Bible Passage Reference Parser, I normally follow a six-month release schedule because changes take a lot of time, especially big refactoring changes like I’ve been planning for the next version (which moves language data to a different repo and adds an additional 2,000 languages). I’d been dreading this work for years because, with so many languages, dealing with exceptions would consume the bulk of the coding effort. I could barely manage exceptions with the 40 languages in the current repo, so adding 50x more didn’t sound fun.

However, Codex 5.3 made short work of the task, taking a few minutes to accomplish what would’ve taken me days of dedicated work, not that I’d ever be able to dedicate days straight to this project. I published the latest branch five months ahead of schedule (and remember, the schedule is six months long).

These models still make mistakes; you can’t yet let them code unattended. But their ability to plan ahead and write code according to that plan is now (at least sometimes) stronger than mine. A year ago, converting the reference-parser code from Coffeescript to Typescript involved a bunch of back-and-forth with ChatGPT; even with a straight 1:1 conversion, it still made questionable decisions that I corrected. With the latest models, LLMs are now correcting my questionable decisions.

Leave a Reply