LLMs will never be alive or intelligent
And "agents" will never know and cater to our every need
Over the holiday I thought I’d try my hand again at finding a way to, as automatically as possible, map my personal banking transactions (exported to a CSV file) to named accounts in my accounting system to help speed up the process of tracking my finances.1
This led me down a rabbit hole of trying to understand various approaches, algorithms and data structures that happen to also power modern AI systems (embedding; supervised learning algorithms like k-nearest neighbours, support vector machines, logistic regression, etc.; transformers; etc.).2
While my little sojourn into this territory and my playing around with some of the popular LLMs by no means makes me an expert in any way on the algorithms themselves, I’ve come to a few assessments so far:
LLMs are, as I’d initially suspected, ultimately just probabilistic token predictors (i.e. they predict words, symbols, etc.). This may sound obvious to anyone familiar with the space, but I had to verify my intuitions. They’re very impressive token predictors, of course, and it’s quite a marvel that we can get anything useful out of them at all, but token predictors nonetheless.
“Garbage in, garbage out” still applies to LLMs. This is probably why LLMs won’t be killing software development any time soon:
There’s so much garbage software out there today and it seems highly unlikely to me that LLMs are only trained on the highest quality software available.
We software developers generally spend our time doing fairly novel things, for which it’s highly unlikely there’s existing training data for solving the exact problem we’re facing. So while I’m sure that breaking down the problem into steps where there is relevant training data might help, humans still need to break down those problems and then put the LLMs’ recommendations together in a sane and manageable way. And it’ll probably stay that way for as long as we need to build novel things.
What about all the proprietary code out there that LLMs will never be able to be trained on? While that code may not be the best quality, it could represent solutions to entire classes of problems that LLMs will never be able to holistically predict.
Even if it’s not “garbage in”, and an LLM is only trained on the highest quality software there is, there are so many ways to still get garbage out of an LLM depending on how its parameters are tweaked.
LLMs are not alive or intelligent, and never will be.
The promise of robotic/computer “agents” (especially ones based on LLMs) eventually being able to cater to your every need is dead on arrival.
The security and privacy implications of the push for agents to have deeper and deeper integration in our everyday lives are pretty scary.
I’ll go into a little more depth on how I currently reason about points 4 and 5, and will link to other more knowledgeable folks’ insights on 6.
LLMs are not alive or intelligent
As token predictors, LLMs are certainly not “alive” or “intelligent”, and they never will be. We can argue about technical definitions of “life” and “intelligence” but those arguments usually lead nowhere because we really don’t have good definitions of those terms at this time.
Life and intelligence seem to me to require the ability of an entity to at least sense what would be “better”, but generally also move or act in that direction.3
Even the simplest of biological life can detect and act in the direction of “better”, like how single-celled organisms propel themselves in the direction of food sources and enact behaviours that protect them from harmful substances or environments.
As some of the most complex living organisms of which we currently know, humans have the ability to mentally map out paths of action potentially many steps ahead. In some cases, these paths may initially seem “worse” but ultimately result in “better” circumstances for the person (e.g. deliberately exercising to eventually become healthier, acting frugally to save up and make a big purchase, or invest time and effort into relationships or learning new skills). This ability to act in ways that result in better conditions for the being in the future seems to be what we intuitively associate with “intelligence” (and the more steps out into the future we’re able to plan and successfully execute, the greater the “intelligence”).
Token prediction to articulate multiple steps in solving a problem might look like intelligence, but it really isn’t. It’s the probabilistic output of a machine that’s acting on encodings of language that have been painstakingly articulated by humans over centuries. LLMs have no inherent notion of or ability to act towards “better”.
Even if you do look at their winning predictions (i.e. the ones they output to their users) as having been produced from some pre-encoded notion of “better”, this speaks more to the notions of “better” defined by the people facilitating the training of the underlying models than any notion of “better” that the models themselves hold.
One could argue that the ecosystems surrounding LLMs (the software developers, businesspeople, customers, etc.) can detect and move towards “better”, but the LLMs themselves cannot. This would mean that LLMs could possibly get progressively better over time at predicting the right tokens to be useful by people in the ecosystem, but the moment they stop being useful in the ecosystem we’d stop paying for them and nobody would run them. Or we’d only really use them in specific, niche circumstances where they could reliably predict the right tokens.
The promise of “agents” is DOA
The idea that a robotic/computational “agent” will eventually know and tend to your needs before you do is utter nonsense. Even more so if they leverage token predictors under the hood, as probabilistic failures compound in a multiplicative and potentially exponential ways.4
Even if machines could detect “better”, they certainly would not be able to detect better on your behalf. We humans can’t even detect better on behalf of other humans, which is why even in healthy, long-term relationships and friendships we still need to communicate about what we need and want. We can only really detect “better” on behalf of ourselves, and even then we often struggle. What hubris to think that we could automate a process that we ourselves can’t even understand or execute manually!
If we built machines that could detect “better” for themselves, we would most likely end up with Skynet. Doesn’t seem to me like a risk worth taking, especially in light of all the other problems humanity faces right now.
This doesn’t even begin to cover the security and privacy concerns introduced by modern agents being pushed progressively lower down in the stack to the OS level. As if gaining more “context” by spying on your every move will somehow allow an agent to detect your intentions, needs and ultimately detect “better” on your behalf. This is, in my opinion, probably a combination of magical thinking, enormous stupidity5 when it comes to security, and possibly even a ruse to trick people into allowing themselves to be spied on to an even greater degree than they already are.
Security and privacy implications of “agent” integration
I’ll leave the more detailed discussion of the security and privacy implications of attempting to integrate agents into the OS layer to the vastly more knowledgeable folks at Signal.
Why spend a few hours every week/month doing something that you can spend weeks failing to automate, right? /s
My current aim with my personal finances is to try out k-nn to facilitate mapping of transactions (descriptions and amounts) to accounts. Still very much a WIP, as I’m still working on figuring out the right embedding process.
This perspective is heavily influenced by Robert Pirsig’s Zen and the Art of Motorcycle Maintenance.
As an example, let’s say an LLM is correct 95% of the time (0.95) in predicting the “right” tokens to drive tools that power an “agent” to accomplish what you’ve asked of it. Each step the agent has to take therefore has a probability of being 95% correct. For a task that takes 2 steps, that’s a probability of 0.95^2 = 0.9025 (90.25%) that the agent will get the task right. For a task that takes 30 steps, we get 0.95^30 = 0.2146 (21.46%). Even if the LLMs were right 99% of the time, a 30-step task would only have a probability of about 74% of having been done correctly.
I use Krishnamurti’s definition of “stupidity” to mean “wrong values”.

