FYI: The author has predicted that "AGI" will be here in 1-2 years and has staked his public reputation on it. He is personally invested in trendlines being lindy rather than sigmoid.
I don't think you can use lindy on trends as if trends are static objects, but that's another conversation.
Mind you, he is only personally invested insofar as he's staked his reputation on it. Throughout his writing, he expresses the same point over and over again: desperately wants AI to slow down, advocates for politics that would slow it down, and most likely nothing would bring him greater peace than to see a sigmoid curve appear.
Ok, but you can just look at the METR curve. Mythos saturated the 50% time horizon. The 80% is now at 3 hours. The rate of progress is accelerating not slowing down. There’s no indication yet that this is a sigmoid!
> FYI: The author has predicted that "AGI" will be here in 1-2 years and has staked his public reputation on it. He is personally invested in trendlines being lindy rather than sigmoid.
He wrote articles arguing that pro-AI people are dismissive of risks or even suggesting they are intellectually lazy. He's taken a side. if he's wrong I would hope he owns up to it
He only has 1.5 more months.
If he's wrong he needs to own it. Same for Eliezer Yudkowsky. But these people have too much riding on their brands. No one has the courage to fess up to being wrong. Given how many podcasts he and others have been on professing this belief, it will be hard to just pretend otherwise.
AI has scaled well according to convenient measures. It (neural networks) have the property that whatever you define, they can rapidly be trained master it. We’re able to show that various tasks of increasing complication do not require intelligence and can be framed as autoregressive RL problems. I personally don’t think AI is any closer to sentient intelligence than LeNet; it’s almost trivially clear, we know how it works. So we’re measuring something orthogonal, basically how well a universal function approximator can fit to a function we define, given arbitrary computing power, and calling that progress. What will be really interesting is if we’re able to find a way to properly measure what they can’t do and what’s different about real intelligence.
Edit: in particular I don’t agree with
But if someone claims that the trend toward increasing AI capabilities will never reach some particular scary level...
One has to agree that the benchmark results are getting “scarier”, which is not automatically implied by finding more goals to optimize for
If we don't understand the fundamental limits to any particular kind of trend, our default assumption should be that it will continue for about as long as it has gone on already.
We can, in fact, easily put a confidence interval on this. With 90% odds we're not in the first 5% of the trend, or the last 5% of the trend. Therefore it will probably go on between 1/19th longer, and 19 times longer. With a median of as long as it has gone on so far.
This is deeply counterintuitive. When we expect something to last a finite time, every year it goes on, brings us a year closer to when it stops. But every year that it goes on properly brings the expectation that it will go on for a year longer still.
We're looking at a trend. We believe that it will be finite. Our intuition for that is that every year spent, is a year closer to the end. But our expectation becomes that every year spent, means that it will last yet another year more!
How can we apply that? A simple way is stocks. How long should we expect a rapidly growing company, to continue growing rapidly?
I feel like Lindy's law doesn't work for things whose observation is partly controlled by the thing itself.
For example, take something like a fad or trend; they don't have a hard end date like human lifespan, so it should follow Lindy's law.
However, the likelihood, on average across the population, that you observe a trend is going to be higher at the end of a trend lifecycle than at the beginning. This is baked into the definition - more and more people hear about a trend over time, so the largest quantity of observers will be at the end of the lifecycle, when the popularity reaches its peak.
In other words, if you are a random person, finding out about a trend likely means it is near the end rather than the middle.
While this is very fun as a mathematical exercise, it's completely irrelevant as a real tool for getting a better understanding of unknown processes in the real world.
The law only applies for certain types of processes, and is completely wrong for other types (e.g. a human who has lived 50 years may live 50 more, but one who has lived 100 years will certainly not live 100 more). So the question becomes: what type of process are you looking at? And that turns out to be exactly the question you started with: is there a fundamental limit to this growth curve, or not.
It's an interesting idea, and it may be something that could be mathematically justified, but I do think this is an abuse of Lindy's Law in the absence of such a justification. Per Wikipedia [1]:
"The Lindy effect applies to non-perishable items, like books, those that do not have an "unavoidable expiration date"."
And later in the article you can see the mathematical formulation which says the law holds for things with a Pareto distribution [2]. I'd want to see some sort of good analysis that "the life span of exponential growth curves" is drawn from some Pareto distribution. I don't think it's completely out of the question. But I'm also nowhere near confident enough that it is a true statement to casually apply Lindy's Law to it.
I hadn't tried to give it a name, or thought to apply it outside of that context.
As for the mathematical qualms, I'm a big believer in not letting formal mathematical technicalities get in the way of adopting an effective heuristic. And the heuristic reasoning here is compelling enough that I would like to adopt it.
The argument sounds nice, but it's just wrong. It only works if most processes you're going to encounter that you know nothing about happen to be Lindy processes. If most processes happening around you that you know nothing about are not of that type, then the argument fails.
Closely related is Laplace's Rule of Succession[1], which basically says that (in lieu of other information), the odds of something happening next time go down the more times in a row that it doesn't happen (and vice versa).
So for example, the longer a time bomb ticks, the less likely it is to go off any time soon. (Assuming the timer isn't visible.) :)
You can do that but you're laundering ignorance into precise-seeming mathematics. Better to just say "we're probably somewhere in the middle, not at the beginning or end" and leave it at that. Calling a peak is hard.
You speak about laundering ignorance into precise-seeming mathematics as if it was a bad thing.
But that's the entire idea of Bayesian reasoning. Which has proven to be surprisingly effective in a wide range of domains.
I'm all for quantifying my ignorance, and using it as an outside view to help guide my expectations. Read the book Superforecasting to understand how effective forecasters use an outside view to adjust their inside view, to allow them to forecast things more precisely.
I think an interesting thing about recent AI developments is that its all happening right as we hit the diminishing returns side of another "exponential that's actually a sigmoid" which is Moore's law.
The naive expectation is that AI will slow down b/c Moore's law is coming to an end, but if you really think about the models and how they are currently implemented in silicon, they are still inefficient as hell.
At some point someone will build a tensor processing chip that replaces all the digital matmuls with analogue logamp matmuls, or some breakthrough in memristors will start breaking down the barrier between memory and compute.
With the right level of research funding in hardware, the ceiling for AI can be very high.
Even at orders of magnitude greater speed, we've still hit diminishing returns for quality of output. We simply haven't found anything like superhuman reasoning ability, just superhuman (potentially) reasoning speed.
It's not that easy to assess diminishing returns with saturated benchmarks where asymptoting to 100% is mathematically baked in. I could point to the number of Erdos proofs being solved by AI going from 0 to many very recently as evidence for acceleration.
That is not evidence of acceleration, just of some measurable improvement compared to a previous model. After all, humans have made these breakthroughs since before recorded history—that never by itself implied accelerating intelligence.
I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.
All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.
By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.
they already did put a model into the silicon and it's crazy fast. https://chatjimmy.ai/
I'm pretty sure there's a 3 year design goal starting this year that'll do that to any of the qwen, deepseek, etc models. There's a lot you could do with sped up models of these quality.
It might even be bad enough that the real bubble is how much we don't need giant data centers when 80-90% of use cases could just be a silicon chip with a model rather than as you say, bloated SOTA
And this is an asic that is still operating digitally. Imagine a chip with baked it weights that does its math analogue with 20x reduction in number of circuit elements needed to do a multiplication op.
If there's a breakthrough in memristors, you could end up with another 20x reduction in circuit elements (get rid of memory bottlnecks, start doing multiplication ops as log transform voltage addition)
> The moral of the story is that, even though all exponentials eventually become sigmoids, this doesn’t necessarily happen at the exact moment you’re doing your analysis. Sometimes they stay exponential for much longer than that!
All exponentials eventually become sigmoids? Don’t think this can be true without qualifiers.
All models are wrong, of course, but this is kind of "common sense" so it's not hard to accept as true in a natural system. How can something continue on exponential growth forever without reaching a new blocker that causes slowdown or encountering pushback that makes it an oscillator. A pendulum looks exponential when it is at its peak and accelerating down.
The issue is that the exponential-looking part of the sigmoid might contain all of human history, sure, but most folks who espouse this theory probably agree that over time everything reaches a steady-enough state to be considered non-exponential, or become oscillatory.
I don't know what the Y-axis is supposed to be on that Wharton AI capabilities graph, but I am not really convinced that Opus 4.6 has more than double the intelligence/capability/whatever of GPT 5.1 Max.
IIRC that graph tracks capabilities as time_to_solve a task for humans (i.e. the model can now handle tasks that usually take a human ~8h). Which, depending on what tasks you look at, could be a reasonable finding. I could see Opus 4.6 handling tasks that take ~8h for humans, and that 5.1 couldn't previously handle (with 5.1 being "limited" at 4h tasks let's say). It is a bit arbitrary, but I think this is what they're tracking.
It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.
Without knowing more about their methodology, it seems like a lot of the recent improvements have involved the AI itself taking time to complete the task.
At first the models turned a 5 minute task into a 5 second task (by 5 seconds I mean a very short amount of time, not precisely 5 seconds). Then they turned a 15 minute task into a 5 second task.
Opus 4.6 completes 8 hour tasks all the time but (at least in my experience) it isn't spitting the answer out in 5 seconds anymore. It's using chain of thought and tools and the time to completion is measured in minutes or maybe hours.
In my experiments with local LLMs, a substantial part of the gap between frontier and local (for everyday use) is in tooling and infrastructure.
That is why I am sympathetic to the idea we are leveling off. But to bring in the air speed example from the article, I don't think we've reached the equivalent of the ramjet yet. I suspect in the coming years there will be new architectures, new hardware, and new ways to get even more capable models.
It measures ability to complete (with a given success rate) a task with a known human benchmark time to complete. I.e., they set the task to human volunteers and timed how long they took the complete that task.
The tasks are obviously all of the form "Go do this, and if you get the following output you passed". Setting up a web server apparently takes 15 minutes for a human, which is news to me since I'm able to search for https://gist.github.com/willurd/5720255, find the python one-liner, and copy it within about ten seconds.
Anyway, this is cool but it does not mean Claude can perform any human tasks that take less than 8 hours and are within its physical capabilities.
> more than double the intelligence/capability/whatever
I'm curious what people really mean when they say this. Intelligence is famously hard to define, let alone measure; it certainly doesn't scale linearly; it only loosely correlates to real-world qualities that are easy to measure; etc. Are you referring to coding ability or...?
According to this article: whenever someone games a benchmark to make an upward chart on some y-axis, it's YOUR responsibility to prove how and why that trend can't continue indefinitely.
I don't know when the sigmoid is going to kick in, but Nvidia's Quaterly datacenters revenues have been grown 15 folds over the past 3 years[1], and nobody including Scott believes this is sustainable for 3 more years otherwise Nvidia's market cap would conservatively be at least an order of magnitude higher than it is.
All exponential eventually becomes a sigmoid because exponential growth always expose limiting factors that weren't limiting at the beginning. Silicon manufacturing had lots of room for high-margin customers like Nvidia even a year ago (by the mere virtue of outbidding lower-margin customers), but now it is mostly gone, and no amount of money will make fabs build themselves overnight.
My mental model has been 3D computer graphics: doubling the polygon count had huge returns early on but delivered diminishing returns over time.
Ultimately, you can't make something look more realistic than real.
I don't know what the future holds, but the answer to the question "can LLMs be more realistic than real" will determine much about whether or not you think the curve will level off soon.
> What if you don’t fully understand the process? AI forecasters know some things (like how data centers work and how much it costs to build them). But they’re unsure about other things (researchers keep inventing new paradigms of data generation that get over data walls, but for how long?), and other things are entirely opaque (What is intelligence really? Why do scaling laws work? Might they just stop working at some point?) Is there anything you can do here?
This is the crux of the article. To a large extent continued progress depends on a stable increase in compute, an increase in training data, and an increase in good ideas to squeeze more out of both of them.
One calculation you could do is a survival function: for each of the above, how long before it is disrupted? For example, China could crack down on AI or invade Taiwan. Or data centers become politically unpopular in the US. Or, we could run out of great ideas. Very hard to predict.
We did hit the sigmoid's plateau on airplane speed, but the applications of airplane speed are still coming (how fast can a Chinese company airship the PCB you ordered three minutes ago?). I expect the the same will happen with LLMs, though I also happen to believe things are just getting started on end capabilities.
> It’s true that birth rates must eventually flatten out and become sigmoid
All positive growth eventually flattens out and becomes sigmoid, but a lot of phenomena experience negative growth and nose dive. No gentle curve, but a hard kink and perfect flat line at zero. Forever. I think it would be a stretch to categorize that pattern as sigmoid. Predicting a sigmoid pattern for negative growth implies some sort of a soft landing (depending on your definition of soft).
We can think of many populations that are no longer with us. So just a caution about over applying this reasoning in the negative case.
The curve is a smoothed step curve (y=1 if x>1 otherwise 0). Nature doesn't allow any change to happen instantly at any degree of rate of change. The curveis just a manifestation a change with exponential smoothening of the sharp corners.
For example, When a car starts, it's speed and acceleration become more than zero. But what about rate of change in higher degrees? It suddenly doesn't change from zero acceleration to non-zero. That means the car has a non-zero derivative at all degrees. In other words, the movement is exponential. The same thing happens in reverse when the car reaches a constant speed.
I like this article about how we should assume, at any given point, that we are exactly halfway through a phenomenon which relies on a single data point on a graph —-that apparently doesn’t need its relevance or importance explained— to illustrate that this is obviously true for AI in particular
Such a long article to say that neither side has a fucking idea about what will happen next.
While we're at it, the "exponentials are actually sigmoïds" meme is not necessarily true. While exponentials are never exponentials, sigmoids are not guaranteed. Overshoot-and-collapse examples also happen in tech, e.g. the dotcom bubble, or the successive AI winters.
If the scary AI is so inevitable, why do you feel such an overwhelming need to convince people about that? Surely you can just wait a bit, and they'll see for themselves.
By that reasoning, why even warn people about anything? Why do road construction crews put up signs saying "ROAD CLOSED AHEAD" when you can just drive on and see for yourself?
Indeed, why warn people about real things that exist in the world? That is EXACTLY the same as inciting fear about something imaginary (not even projected).
In your mind, dangers from AI are imaginary and not even projected, therefore, you don't see any reason to warn about them, because you don't think the dangers are real. You don't believe the road is actually closed up ahead, so you don't think it's necessary to post the sign.
In Scott's mind, dangers from AI are not a known fact, but are somewhere between highly probable and a near-certainty. In his mind, there are well-grounded justifications for believing that AI poses substantial future dangers to the public. Therefore he also believes he should inform people about this, and strives to convince skeptics, so that we might steer clear.
It's easy to understand why someone who believes what you believe about AI would of course not warn people about AI. It's also easy to understand why someone who believes what Scott believes about AI would want to warn people about AI. Your contention is with his confidence for being worried about AI, not his reason for wanting to warn people.
Gosh it's quite embarassing to have to spell it out, but you inserted the part about Scott's motivations. It can't be found in the text.
Neither can any specific discussion of what the dangers are and how we can steer clear. It all comes preplanted in your head. The only thing that Scott is playing on (as far as we can see) is your ingrained fear, by using an ominous headline, and a vague reference to something "scary" in the conclusion.
Of course there was no reason to "warn" you, you already believed in the scary future. Scott is just giving you fuel, which you seem to appreciate.
Yeah! And if climate change is so inevitable, why do the people who want to prevent it from happening seem hell-bent on convincing people that climate change is real?
1. It's not inevitable.
2. Those that see AI as an existential risk don't generally think it's a guarantee, but if it's say a 5% chance then that's worth addressing/mitigating.
3. That's not what this article was even about.
Sounds like the burden is on you to explain either
1. If you're not treating my claim as a black box, explain explicitly what is your model of what the article was about? Are you aware, for example of the last paragraph of the article? I think that WAS what the article was about. Do you have specific opinions on e.g. how I went wrong and where my model differs?
2. If you are treating it as a black box, what's your default expectation based on the law of Nothing Ever Happens?
Just kidding, you don't need to explain anything. A"I" fearmongers should though.
The point of the article is that people are historically bad at predicting when exponential curves plateau, even if they're correct that there will be a plateau.
This does *not* imply the inevitability of AGI. It does not imply AGI is necessarily bad.
It does mean that "the capabilities of AI will eventually plateau" offers no meaningful predictive power or relevance to the overall AI discussion.
My own bet is end of that decade: somewhere between 2045 and 2050.
Ofc "full labor automation" has a certain spread of meaning. A sliver of population will always find ways to hold to a job or run one or many businesses. But there will be "enough" labor automation for it to be a social ticking bomb. That, in fact, does not depend on better models nor better AI than we have today. By 2045 there will be a couple of generations that has been outsourcing their thinking to AI for most of their adult lives. Some of them may still work as legal flesh of sorts, but many won't get to be middle man and will find no job.
Also, if you could replace your senator today by an untainted version of a frontier model (of today), would you do it? Would it be a better ruler? What are the odds of you not wanting to push that button in the next twenty years, after a few more batches of incompetent and self-serving politicians?
Complexity of our human world has gone up so much that humanity actually needs something like AI to ensure further progress. It's impossible to expect a human to learn all the fields in a shallow manner (and be a generalist politician) or one field in full depth (ie expert to push the frontier).
The other thing people don’t understand is exponential curves are self similar. The start of an exponential looks like an exponential. People always look at and think ‘well that’s it it’s exponential now, have missed it, can’t sustain’. Nope.
Good example of this is number of submissions to neurips/icml/iclr. In 2017 that curve was exponential.
"Exponentials all tend to become sigmoids but you can't predict exactly when" is a true statement, but I'm not sure it needed an article.
This doesn't say much, and the author fights their own points a couple times, suggesting that they maybe didn't think through what they wanted to write until they were in the middle of writing it and started realizing their assumptions didn't match what they expected the data to say.
The point is the tiring arguments from AI skeptics saying “things are flattening, they have to” which while technically correct says nothing because no one knows when that will happen and we see no mechanism for this yet. Lindy’s law as a reasonable prediction under total uncertainty is interesting and insightful and a lot of people don’t know about it or why it holds. I did enjoy the reference to this!
Nah this is making a category error. You're assuming that AI skeptics agree that models are demonstrating intelligence along the same axis as humans and that with further improvement they will become equivalent to humans. I am an AI skeptic, and I disagree with this assessment.
Model reasoning is on an s-curve, which is improving.
Model intelligence is not the same as reasoning. It's a different axis, and one I have not seen much movement on.
See, humans have a recursive form of intelligence which is capable of self-reflection and introspection. LLMs can only reason about tokens which have already been emitted. Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs. Therefore it is a mistake to assume that continual improvement on the reasoning scale will result in something that is equivalent enough to humans on the intelligence axis to replace all labor.
> You're assuming that AI skeptics agree that models are demonstrating intelligence along the same axis as humans and that with further improvement they will become equivalent to humans.
No definitely not saying this and I don’t quite know what it means
> Model reasoning is on an s-curve, which is improving.
Is this saying two different things? I think I might agree with this in principle as in maybe there is some sort of s curve or something like it but do we see evidence of this? Where?
> Model intelligence is not the same as reasoning. It's a different axis, and one I have not seen much movement on.
Can you clarify this? What is the distinction and what makes you say you have “not seen much progress?”
> See, humans have a recursive form of intelligence which is capable of self-reflection and introspection. LLMs can only reason about tokens which have already been emitted
LLMs do self reflection and introspection in context, and tweaks such as value functions (serving a similar purpose to intuition or emotion) may make this better? Why do you feel self reflection and introspection are a fundamental limitation here? Models reason over tokens they have emitted and also with their own sense and learned behavior already. Are you just talking about continual learning? Also I feel people just latch onto LLMs as if this is all of AI. Why? SSMs, memory networks, recurrent neural networks etc etc etc are all part of AI but aren’t as popular because they can’t yet compete with LLMs in terms of scaling laws and training efficiency due to e.g. hardware and software optimization and investment being focused on LLMs. If something else comes along that works better we’ll just start scaling that.
> Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs.
Very strong statement, any theoretical or experimental basis for this? I also don’t particularly care personally other than as a point of curiosity. Why does it matter if AI systems will develop equivalent reasoning mechanisms as humans? In fact it may be much better not to.
> Therefore it is a mistake to assume that continual improvement on the reasoning scale will result in something that is equivalent enough to humans to replace all labor.
Idk I didn’t say this explicitly but I also dont think it matters if we have a system “equivalent to humans” or one that “replaces all labor”.
Slate Star Codex, the original article, was making the argument that "model intelligence" is on an s-curve and from there it was drawing the conclusion that the curve will likely continue and models will reach human level intelligence or beyond.
I am making that argument that how we measure model intelligence is flawed, and we are actually measuring something that is closer to "reasoning" than "intelligence". If you want evidence, we'll need a different form of tests, but how about I just gesture at the fact that GPT supposedly outscored PhDs on a broad range of subjects at least a year ago and to date is not replacing PhD jobs.
We see this pattern of high scores on tests but mediocre performance in the real world all over the place. From that, I draw the conclusion that it can reason like a PhD, but it can't think like a PhD.
So, we may see an s-curve on the measure of model reasoning but that doesn't imply they will overtake us or even match us on measures of intelligence.
As to your other questions:
> LLMs do self reflection and introspection in context,
> Why do you feel self reflection and introspection are a fundamental limitation here? Models reason over tokens they have emitted and also with their own sense and learned behavior already. Are you just talking about continual learning?
I disagree that models are reflecting and introspecting in a way equivalent to human intelligence here. They can reason over tokens which have been emitted, but by the same measure they cannot reason over tokens which have not been emitted. It's hard to make this point without drawing some diagrams, but I believe that human intelligence has internal loops, where many ideas may be turned over simultaneously before an action is taken. In comparison, an LLM might "feel uncertain" about a token before emitting it, but once it is emitted that uncertainty and the other near neighbor options are lost and the LLM is locked into the track that was set by the top-choice token. I think this is where hallucinations arise from, amongst other issues.
Context isn't sufficient for an internal reasoning loop because the tokens that compose context lose a lot of the information the network itself generated when picking those tokens. They occupy a much lower dimensional space than the "internal reasoning" processes of the transformer do.
>> Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs.
> Very strong statement, any theoretical or experimental basis for this?
It's just my theory, but this is what I have been gesturing at. You already know about RNNs so I'll put it in those terms: the core of an intelligent network should be an RNN, not a transformer, but we fundamentally cannot train a network like that to work like an LLM because backprop doesn't work when there is infinite recursion and without being able to bootstrap off of the knowledge and reasoning baked into human text, there's no sufficient source of training material beyond being embodied.
---
EDIT:
I missed this, which I also want to reply to:
> Why does it matter if AI systems will develop equivalent reasoning mechanisms as humans? In fact it may be much better not to.
I actually agree that it may be better if they did not develop equivalent reasoning, but I don't see a world in which machines replace human labor without being intellectually equivalent.
As I think about it though, "dumb" machines which can following reasoning but not think like humans are a rather scary proposition, honestly. Seems like a tool that would be wielded without restraint by those in power to control those who aren't.
But those skeptics are initially responding to the constant AI hype claims that we are exponentially growing to AGI. So this article is in fact just a (very poorly thought through) attempt at saying “nuh uh, the hype might be true, you can’t prove it’s not yet!
Yet the evidence is on the side of the hype? We don’t see any mechanism or cogent framework for what limits exist here theoretically that I’m aware of, are you? Epoch had a great article a year ago looking at several bottlenecks in terms of scale and back then we were about 4 orders of magnitude away from hitting them. We’re probably now closer to 3. Yet scale is only part of the performance equation, a fairly big chunk of progress is from algorithmic or curation related contributions. The point of the article is:
> But those skeptics are initially responding to the constant AI hype claims that we are exponentially growing to AGI.
This is a meaningless statement or at best just strawmanning.
I find him more interesting when he talks about non-AI topics. Lots of other interesting people are like this too. I'd rather get my knowledge on AI from people who have unique insights into it. Scott has a lot of unique perspectives of his own, but his views on AI are bog-standard for his social group.
Lindy's Law is not actually a law and many exact minds will be provoked by the very name; it also fails spectacularly in certain contexts (e.g. lifetime of a single organism, though not necessarily existence of entire species).
But at the same time, I am willing to take its invocation in the context of AI somewhat seriously. There is an international arms race with China, which has less compute, but more engineers and scientists. This sort of intellectual arms race does not exhaust itself easily.
A similar space race in the 1950s and 1960s progressed from first unmanned spaceflight to a moonwalk in mere 12 years, which is probably less than what it takes to approve a bicycle lane in Chicago now.
I keep seeing this. Where did it come from? Has China said that they intend to attack other countries using AI? Have other countries declared that they intend to attack China with AI?
Also, why does anyone believe that AI could actually be that dangerous, given it's inherent unpredictable and unreliable performance? I would be terrified to rely on AI in a life or death situation.
AI in war is like Palintirs whole business model. You have a system that can effectively deal with ambiguity and has superhuman performance on reasoning plus superhuman physical abilities via embodiment…
Inherent unpredictable and unreliable performance is also quite the feature of human beings as well.
It was a metaphor. I meant, and later clarified, an intellectual arms race.
BTW your handle is an actual Czech word, minus a diacritic sign ("křupan"), and a bit amusing one. It basically means hillbilly. Not that it matters, just FYI.
Anyway: AI will be used in military context, and it probably already is. Both for target acquisition and maybe even driving the weapon itself. As of now, the Ukrainians are almost certainly operating some AI-enabled killer drones.
It's not a law per se, but there are rules for reasoning under uncertainty to get the most out of what limited knowledge you have, and Lindy's law arises from that. To do better than Lindy's law requires having additional information about the problem beyond just the one data point.
I think there are many ways someone with his lack of expertise can still be valuable, including:
- Making connections to other subjects that an expert would miss. The hall of fame of sigmoid predictions is just excellent, I already know I'm going to be reminded of it some time in the future. Very entertaining way to get the point across.
- Writing about tricky concepts in a very accessible and elegant way, which experts are notoriously bad at doing themselves - they are often optimizing for other specialists.
- Being able to write with an air of speculation and experimentation with ideas that experts and institutions often can't afford. Experts have to maintain their track record; Scott Alexander can say "lol just double the timeline"
you do you, I don't come here for superficially informed-looking articles written by people who are in fact not experts, informed or educated, I come here for the real deal
it doesn't help that sCotT aLexAndEr is also as close as you can come to the modern dressed up version of a eugenicist (again, not based on any actual expertise)
I don't think you can use lindy on trends as if trends are static objects, but that's another conversation.
I mean, that's called "having an opinion".
Edit: in particular I don’t agree with
One has to agree that the benchmark results are getting “scarier”, which is not automatically implied by finding more goals to optimize forIf we don't understand the fundamental limits to any particular kind of trend, our default assumption should be that it will continue for about as long as it has gone on already.
We can, in fact, easily put a confidence interval on this. With 90% odds we're not in the first 5% of the trend, or the last 5% of the trend. Therefore it will probably go on between 1/19th longer, and 19 times longer. With a median of as long as it has gone on so far.
This is deeply counterintuitive. When we expect something to last a finite time, every year it goes on, brings us a year closer to when it stops. But every year that it goes on properly brings the expectation that it will go on for a year longer still.
We're looking at a trend. We believe that it will be finite. Our intuition for that is that every year spent, is a year closer to the end. But our expectation becomes that every year spent, means that it will last yet another year more!
How can we apply that? A simple way is stocks. How long should we expect a rapidly growing company, to continue growing rapidly?
For example, take something like a fad or trend; they don't have a hard end date like human lifespan, so it should follow Lindy's law.
However, the likelihood, on average across the population, that you observe a trend is going to be higher at the end of a trend lifecycle than at the beginning. This is baked into the definition - more and more people hear about a trend over time, so the largest quantity of observers will be at the end of the lifecycle, when the popularity reaches its peak.
In other words, if you are a random person, finding out about a trend likely means it is near the end rather than the middle.
The law only applies for certain types of processes, and is completely wrong for other types (e.g. a human who has lived 50 years may live 50 more, but one who has lived 100 years will certainly not live 100 more). So the question becomes: what type of process are you looking at? And that turns out to be exactly the question you started with: is there a fundamental limit to this growth curve, or not.
"The Lindy effect applies to non-perishable items, like books, those that do not have an "unavoidable expiration date"."
And later in the article you can see the mathematical formulation which says the law holds for things with a Pareto distribution [2]. I'd want to see some sort of good analysis that "the life span of exponential growth curves" is drawn from some Pareto distribution. I don't think it's completely out of the question. But I'm also nowhere near confident enough that it is a true statement to casually apply Lindy's Law to it.
[1]: https://en.wikipedia.org/wiki/Lindy_effect
[2]: https://en.wikipedia.org/wiki/Pareto_distribution
The argument given is the same as the one that I first ran across, not by that name, in https://www.nature.com/articles/363315a0. https://en.wikipedia.org/wiki/Doomsday_argument claims that it was a rediscovery of something that was hypothesized a decade article.
I hadn't tried to give it a name, or thought to apply it outside of that context.
As for the mathematical qualms, I'm a big believer in not letting formal mathematical technicalities get in the way of adopting an effective heuristic. And the heuristic reasoning here is compelling enough that I would like to adopt it.
So for example, the longer a time bomb ticks, the less likely it is to go off any time soon. (Assuming the timer isn't visible.) :)
[1] https://en.wikipedia.org/wiki/Rule_of_succession
But that's the entire idea of Bayesian reasoning. Which has proven to be surprisingly effective in a wide range of domains.
I'm all for quantifying my ignorance, and using it as an outside view to help guide my expectations. Read the book Superforecasting to understand how effective forecasters use an outside view to adjust their inside view, to allow them to forecast things more precisely.
We expect fresh processes to terminate quickly and long running processes to last for a while longer.
The naive expectation is that AI will slow down b/c Moore's law is coming to an end, but if you really think about the models and how they are currently implemented in silicon, they are still inefficient as hell.
At some point someone will build a tensor processing chip that replaces all the digital matmuls with analogue logamp matmuls, or some breakthrough in memristors will start breaking down the barrier between memory and compute.
With the right level of research funding in hardware, the ceiling for AI can be very high.
All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.
By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.
So it's not impossible to have things that seem orthogonal, like generation speed or context length, have an impact on quality of result.
I'm pretty sure there's a 3 year design goal starting this year that'll do that to any of the qwen, deepseek, etc models. There's a lot you could do with sped up models of these quality.
It might even be bad enough that the real bubble is how much we don't need giant data centers when 80-90% of use cases could just be a silicon chip with a model rather than as you say, bloated SOTA
If there's a breakthrough in memristors, you could end up with another 20x reduction in circuit elements (get rid of memory bottlnecks, start doing multiplication ops as log transform voltage addition)
The ceiling is ultra high for how far AI can go.
All exponentials eventually become sigmoids? Don’t think this can be true without qualifiers.
The issue is that the exponential-looking part of the sigmoid might contain all of human history, sure, but most folks who espouse this theory probably agree that over time everything reaches a steady-enough state to be considered non-exponential, or become oscillatory.
I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.
It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).
At least I want AI to solve my problems, not score high on a academic leaderboard.
At first the models turned a 5 minute task into a 5 second task (by 5 seconds I mean a very short amount of time, not precisely 5 seconds). Then they turned a 15 minute task into a 5 second task.
Opus 4.6 completes 8 hour tasks all the time but (at least in my experience) it isn't spitting the answer out in 5 seconds anymore. It's using chain of thought and tools and the time to completion is measured in minutes or maybe hours.
In my experiments with local LLMs, a substantial part of the gap between frontier and local (for everyday use) is in tooling and infrastructure.
That is why I am sympathetic to the idea we are leveling off. But to bring in the air speed example from the article, I don't think we've reached the equivalent of the ramjet yet. I suspect in the coming years there will be new architectures, new hardware, and new ways to get even more capable models.
I trained an LLM to write the whole Harry Potter series, and that took JK Rowling like 17 years.
For my next point on the graph, I'll train the LLM to write the Bible, something that took humans >1500 years.
The tasks are obviously all of the form "Go do this, and if you get the following output you passed". Setting up a web server apparently takes 15 minutes for a human, which is news to me since I'm able to search for https://gist.github.com/willurd/5720255, find the python one-liner, and copy it within about ten seconds.
Anyway, this is cool but it does not mean Claude can perform any human tasks that take less than 8 hours and are within its physical capabilities.
I'm curious what people really mean when they say this. Intelligence is famously hard to define, let alone measure; it certainly doesn't scale linearly; it only loosely correlates to real-world qualities that are easy to measure; etc. Are you referring to coding ability or...?
emoji face with eyes rolling upward
Scott makes a Lindy effect argument which is plausible, but don't let that fool you, we still don't know what's going to happen.
All exponential eventually becomes a sigmoid because exponential growth always expose limiting factors that weren't limiting at the beginning. Silicon manufacturing had lots of room for high-margin customers like Nvidia even a year ago (by the mere virtue of outbidding lower-margin customers), but now it is mostly gone, and no amount of money will make fabs build themselves overnight.
[1]: https://stockanalysis.com/stocks/nvda/metrics/revenue-by-seg...
https://news.ycombinator.com/item?id=46199723
My mental model has been 3D computer graphics: doubling the polygon count had huge returns early on but delivered diminishing returns over time.
Ultimately, you can't make something look more realistic than real.
I don't know what the future holds, but the answer to the question "can LLMs be more realistic than real" will determine much about whether or not you think the curve will level off soon.
This is the crux of the article. To a large extent continued progress depends on a stable increase in compute, an increase in training data, and an increase in good ideas to squeeze more out of both of them.
One calculation you could do is a survival function: for each of the above, how long before it is disrupted? For example, China could crack down on AI or invade Taiwan. Or data centers become politically unpopular in the US. Or, we could run out of great ideas. Very hard to predict.
All positive growth eventually flattens out and becomes sigmoid, but a lot of phenomena experience negative growth and nose dive. No gentle curve, but a hard kink and perfect flat line at zero. Forever. I think it would be a stretch to categorize that pattern as sigmoid. Predicting a sigmoid pattern for negative growth implies some sort of a soft landing (depending on your definition of soft).
We can think of many populations that are no longer with us. So just a caution about over applying this reasoning in the negative case.
For example, When a car starts, it's speed and acceleration become more than zero. But what about rate of change in higher degrees? It suddenly doesn't change from zero acceleration to non-zero. That means the car has a non-zero derivative at all degrees. In other words, the movement is exponential. The same thing happens in reverse when the car reaches a constant speed.
While we're at it, the "exponentials are actually sigmoïds" meme is not necessarily true. While exponentials are never exponentials, sigmoids are not guaranteed. Overshoot-and-collapse examples also happen in tech, e.g. the dotcom bubble, or the successive AI winters.
Except innovation. When one sigmoid tapers off we keep finding new ones to keep the climb going.
In Scott's mind, dangers from AI are not a known fact, but are somewhere between highly probable and a near-certainty. In his mind, there are well-grounded justifications for believing that AI poses substantial future dangers to the public. Therefore he also believes he should inform people about this, and strives to convince skeptics, so that we might steer clear.
It's easy to understand why someone who believes what you believe about AI would of course not warn people about AI. It's also easy to understand why someone who believes what Scott believes about AI would want to warn people about AI. Your contention is with his confidence for being worried about AI, not his reason for wanting to warn people.
Neither can any specific discussion of what the dangers are and how we can steer clear. It all comes preplanted in your head. The only thing that Scott is playing on (as far as we can see) is your ingrained fear, by using an ominous headline, and a vague reference to something "scary" in the conclusion.
Of course there was no reason to "warn" you, you already believed in the scary future. Scott is just giving you fuel, which you seem to appreciate.
If only there were a way to see more of Scott's thoughts on the subject of AI..
This does *not* imply the inevitability of AGI. It does not imply AGI is necessarily bad.
It does mean that "the capabilities of AI will eventually plateau" offers no meaningful predictive power or relevance to the overall AI discussion.
https://xcancel.com/peterwildeford/status/202963666232244661...
Ofc "full labor automation" has a certain spread of meaning. A sliver of population will always find ways to hold to a job or run one or many businesses. But there will be "enough" labor automation for it to be a social ticking bomb. That, in fact, does not depend on better models nor better AI than we have today. By 2045 there will be a couple of generations that has been outsourcing their thinking to AI for most of their adult lives. Some of them may still work as legal flesh of sorts, but many won't get to be middle man and will find no job.
Also, if you could replace your senator today by an untainted version of a frontier model (of today), would you do it? Would it be a better ruler? What are the odds of you not wanting to push that button in the next twenty years, after a few more batches of incompetent and self-serving politicians?
Yeah well my prophet says he can beat up your prophet in a fight.
---
Here in reality, I'm not accustomed to taking random predictions without backing evidence as if they were truth.
Going to need a big citation for that claim
Lol
Good example of this is number of submissions to neurips/icml/iclr. In 2017 that curve was exponential.
This doesn't say much, and the author fights their own points a couple times, suggesting that they maybe didn't think through what they wanted to write until they were in the middle of writing it and started realizing their assumptions didn't match what they expected the data to say.
I really don't get the point of what I just read.
Model reasoning is on an s-curve, which is improving.
Model intelligence is not the same as reasoning. It's a different axis, and one I have not seen much movement on.
See, humans have a recursive form of intelligence which is capable of self-reflection and introspection. LLMs can only reason about tokens which have already been emitted. Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs. Therefore it is a mistake to assume that continual improvement on the reasoning scale will result in something that is equivalent enough to humans on the intelligence axis to replace all labor.
No definitely not saying this and I don’t quite know what it means
> Model reasoning is on an s-curve, which is improving.
Is this saying two different things? I think I might agree with this in principle as in maybe there is some sort of s curve or something like it but do we see evidence of this? Where?
> Model intelligence is not the same as reasoning. It's a different axis, and one I have not seen much movement on.
Can you clarify this? What is the distinction and what makes you say you have “not seen much progress?”
> See, humans have a recursive form of intelligence which is capable of self-reflection and introspection. LLMs can only reason about tokens which have already been emitted
LLMs do self reflection and introspection in context, and tweaks such as value functions (serving a similar purpose to intuition or emotion) may make this better? Why do you feel self reflection and introspection are a fundamental limitation here? Models reason over tokens they have emitted and also with their own sense and learned behavior already. Are you just talking about continual learning? Also I feel people just latch onto LLMs as if this is all of AI. Why? SSMs, memory networks, recurrent neural networks etc etc etc are all part of AI but aren’t as popular because they can’t yet compete with LLMs in terms of scaling laws and training efficiency due to e.g. hardware and software optimization and investment being focused on LLMs. If something else comes along that works better we’ll just start scaling that.
> Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs.
Very strong statement, any theoretical or experimental basis for this? I also don’t particularly care personally other than as a point of curiosity. Why does it matter if AI systems will develop equivalent reasoning mechanisms as humans? In fact it may be much better not to.
> Therefore it is a mistake to assume that continual improvement on the reasoning scale will result in something that is equivalent enough to humans to replace all labor.
Idk I didn’t say this explicitly but I also dont think it matters if we have a system “equivalent to humans” or one that “replaces all labor”.
I am making that argument that how we measure model intelligence is flawed, and we are actually measuring something that is closer to "reasoning" than "intelligence". If you want evidence, we'll need a different form of tests, but how about I just gesture at the fact that GPT supposedly outscored PhDs on a broad range of subjects at least a year ago and to date is not replacing PhD jobs.
We see this pattern of high scores on tests but mediocre performance in the real world all over the place. From that, I draw the conclusion that it can reason like a PhD, but it can't think like a PhD.
So, we may see an s-curve on the measure of model reasoning but that doesn't imply they will overtake us or even match us on measures of intelligence.
As to your other questions:
> LLMs do self reflection and introspection in context,
> Why do you feel self reflection and introspection are a fundamental limitation here? Models reason over tokens they have emitted and also with their own sense and learned behavior already. Are you just talking about continual learning?
I disagree that models are reflecting and introspecting in a way equivalent to human intelligence here. They can reason over tokens which have been emitted, but by the same measure they cannot reason over tokens which have not been emitted. It's hard to make this point without drawing some diagrams, but I believe that human intelligence has internal loops, where many ideas may be turned over simultaneously before an action is taken. In comparison, an LLM might "feel uncertain" about a token before emitting it, but once it is emitted that uncertainty and the other near neighbor options are lost and the LLM is locked into the track that was set by the top-choice token. I think this is where hallucinations arise from, amongst other issues.
Context isn't sufficient for an internal reasoning loop because the tokens that compose context lose a lot of the information the network itself generated when picking those tokens. They occupy a much lower dimensional space than the "internal reasoning" processes of the transformer do.
>> Humans and LLMs do not share the same form of reasoning, and general human-like intelligence will not arise from the current architecture of LLMs.
> Very strong statement, any theoretical or experimental basis for this?
It's just my theory, but this is what I have been gesturing at. You already know about RNNs so I'll put it in those terms: the core of an intelligent network should be an RNN, not a transformer, but we fundamentally cannot train a network like that to work like an LLM because backprop doesn't work when there is infinite recursion and without being able to bootstrap off of the knowledge and reasoning baked into human text, there's no sufficient source of training material beyond being embodied.
---
EDIT:
I missed this, which I also want to reply to:
> Why does it matter if AI systems will develop equivalent reasoning mechanisms as humans? In fact it may be much better not to.
I actually agree that it may be better if they did not develop equivalent reasoning, but I don't see a world in which machines replace human labor without being intellectually equivalent.
As I think about it though, "dumb" machines which can following reasoning but not think like humans are a rather scary proposition, honestly. Seems like a tool that would be wielded without restraint by those in power to control those who aren't.
> But those skeptics are initially responding to the constant AI hype claims that we are exponentially growing to AGI.
This is a meaningless statement or at best just strawmanning.
The entire plot of the Lord of the Rings could probably be compressed into less than 10 kB of text too.
Edit: this seems to be a controversial comment, but IMHO a blog of Scott Alexander's type is an art form, not just a communication channel.
Lindy's Law is not actually a law and many exact minds will be provoked by the very name; it also fails spectacularly in certain contexts (e.g. lifetime of a single organism, though not necessarily existence of entire species).
But at the same time, I am willing to take its invocation in the context of AI somewhat seriously. There is an international arms race with China, which has less compute, but more engineers and scientists. This sort of intellectual arms race does not exhaust itself easily.
A similar space race in the 1950s and 1960s progressed from first unmanned spaceflight to a moonwalk in mere 12 years, which is probably less than what it takes to approve a bicycle lane in Chicago now.
I keep seeing this. Where did it come from? Has China said that they intend to attack other countries using AI? Have other countries declared that they intend to attack China with AI?
Also, why does anyone believe that AI could actually be that dangerous, given it's inherent unpredictable and unreliable performance? I would be terrified to rely on AI in a life or death situation.
Inherent unpredictable and unreliable performance is also quite the feature of human beings as well.
BTW your handle is an actual Czech word, minus a diacritic sign ("křupan"), and a bit amusing one. It basically means hillbilly. Not that it matters, just FYI.
Anyway: AI will be used in military context, and it probably already is. Both for target acquisition and maybe even driving the weapon itself. As of now, the Ukrainians are almost certainly operating some AI-enabled killer drones.
- Making connections to other subjects that an expert would miss. The hall of fame of sigmoid predictions is just excellent, I already know I'm going to be reminded of it some time in the future. Very entertaining way to get the point across.
- Writing about tricky concepts in a very accessible and elegant way, which experts are notoriously bad at doing themselves - they are often optimizing for other specialists.
- Being able to write with an air of speculation and experimentation with ideas that experts and institutions often can't afford. Experts have to maintain their track record; Scott Alexander can say "lol just double the timeline"
it doesn't help that sCotT aLexAndEr is also as close as you can come to the modern dressed up version of a eugenicist (again, not based on any actual expertise)
but I rest my case
Allowing slop articles like this literally prints them evaluation money.