The headline as stated is categorically false, buuuut... I think it's pretty salient that a company called "Builder.ai" only had 15 engineers working on actual ai and actually mostly functioned as an outsourcing intermediary for 500-1000 engineers (ie, the builders). When it comes to these viral misunderstandings, you kind of reap what you sow.
The AI engineers were based in the UK and from what I've seen on LinkedIn many came from top unis. They're probably worth a 100x more than the builders. Not to mention their boss, who was a well know AI figure = $$$
> Builder hired 300 internal engineers and kicked off building internal tools, all of which could have simply been purchased
Dear god, PLEASE hire an actual Enterprise IT professional early in your startup expansion phase. A single competent EIT person (or dinosaur like me) could have - if this story is true - possibly saved the whole startup by understanding what’s immediately needed versus what’s nice-to-have, what should be self-hosted versus what should be XaaS, stitching everything together to reduce silos, and ensuring every cent is not just accounted for but wisely invested in future success.
Even if the rest of your startup isn’t “worrying about the money”, your IT and Finance people should always be worried about the money.
I would assume they were collecting data. If you're building a chatbot to talk to clients, you need all the transcripts from meetings, chat logs, project info, analytics, metrics, and so on. It's the only way to train your model properly.
Do they have any patents related to using chatbots for project management?
I interviewed there a few years back. I bailed on the interview within the first fifteen minutes, the first time I’ve ever done that. I told them they’d given me the ick — not my most professional moment, admittedly! But it was an awkward and unpleasant interview.
They spent the first ten minutes of the call predicting the death of software engineering (this was a software engineering interview) and complaining about expensive devs (ahem). I wouldn’t have minded so much if the only demo apps they had on their website weren’t some of the worst, non-native iOS apps I’ve ever seen. Truly awful.
A month or two later I noticed on LinkedIn that a dodgy CTO I’d worked with, who had attempted to avoid paying me (and did avoid paying several colleagues of mine), had joined there too. It felt like a good fit.
Yeah, I have to say, none of this is a surprise to me.
Can assure you it wasn’t AI generated, and considering the time I spent writing the post, it’s a little upsetting to be accused of it. That said, I do get it. Especially as I’m on a new account.
I’ve always used em-dashes (mild typography nerd) but recently have been considering stopping, for exactly this reason. They always flew under the radar a little, but I’d always notice when others used them, so it’s a shame and I’ll miss them.
fyi, I was based in London at the time. This was for a position helping to build the supposed AI.
> I’ve always used em-dashes (mild typography nerd) but recently have been considering stopping, for exactly this reason. They always flew under the radar a little, but I’d always notice when others used them, so it’s a shame and I’ll miss them.
Please don’t stop using them. Keep on doing the right thing, no matter what!
Proper typography ≠ AI generated, this meme needs to stop. Feel free to stop reading anything with emdashes, but don’t accuse people of being AI for this reason.
(Also, ChatGPT usually uses emdashes without spaces.)
ChatGPT uses a combination of characters to fingerprint its output. In this case:
1. Straight apostrophe (human) vs. curved apostrophe (ChatGPT): weren't instead of weren’t. Most humans type straight quotes instead of using shortcuts like Alt+0146 or Option + Shift + ]
2. Dash (human) vs. em dash (ChatGPT). Most people type regular dashes instead of em dashes (—)
3. The second paragraph seems like it was manually updated, which is why it's grammatically incorrect, but it still kept the curved apostrophe in the word "weren’t"
4. The user only has one comment, a conspiracy theory: "a dodgy CTO joined the company" without providing names or any kind of evidence. Hmmm...
This isn't Reddit guys. We need to keep the site reliable. Thanks
I used iOS to write my comment. It adds curved apostrophes automatically. macOS does too.
The em-dash was me — you can long press on the dash button and you get the choice.
Also, next time? Maybe consider that you might be wrong, and you’re responding to an actual human. Who spent half an hour or so writing a comment to share what they thought was an interesting anecdote. And who is now feeling a bit rubbish.
Using better punctuation is not exactly rocket science — it’s just a matter of using memorising keyboard shortcuts, which is something the HN crowd is pretty good at.
Hmm, this is the reasoning of a kid: "You hire devs instead of using AI, therefore you are corrupt". More conspiracy theories. Based on what the article says, it was a dev shop like Infosys or any other Indian dev company, they were working on hundreds of projects.
A similar thing happened at Uber before the 2021 re-org. At one point they had 3 competing internal chat apps from what I've heard from peers working there, and having previously worked for a vendor of Uber's, I noticed a significant amount of disjointedness in their environment (seemed very disjointed EM driven with no overarching product vision).
Ofc, Gergely might have some thoughts about that ;)
My assumption when the story broke was that the 700 engineers were using various AI tools (Replit, Cursor, ChatGPT, etc.) to create code and documentation and then stitching it all together somewhat manually. Sort of like that original Devin demo where AI was being used at each step but there was a ton of manual intervention along the way and the final video was edited to make it seem as if the whole thing ran end to end fully automated all from the initial prompt.
I worked with an "AI data vendor" at work where you'd put in a query and "the AI gave you back a dataset" but it usually took 24hrs, so it was obvious they had humans pulling the data. The company still purchased a data plan. It happens, in this case, they have a unique dataset, though.
Builder.ai had a totally different flow, but yeah, when boring stories and exciting ones compete to tell the same story, a very large percentage will run with the exciting story. It’s like death tax in US political history - the US has never had a death tax but it’s way more exciting to call it a death tax than an estate tax. Only now, instead of media being the primary disseminator of spin, we have people sharing exciting stories on social media instead of boring stories about building internal zoom and accounting issues.
Then social animals kick in, likes pour in and more people share. Social media has created a world where an exciting lie can drown out boring truth for a large percentage of people.
I don’t find this article particularly convincing. The main argument seems to be “It couldn’t be real engineers, it would be too slow” but I have no clue what the Builder.ai interface or response times were like. Also it says 10-20min would be too long… kind of? Not really though? Depends on the output. Claude Code has run for quite a while on its own before (I’ve never timed it) but 5-10+min doesn’t shock me. Yes, Claude is giving real-time output but I’ve seen a number of dev tools that don’t (or didn’t, this area is moving fast).
Also, re: hiring outsourced contractors
> However, we didn't anticipate the significant fraud that would ensue
First time? Every experience I have personally had with outsourced contractors has been horrible. Bad code quality, high billing hours for low output, language and time barriers, the list goes on. I’m quick to flip the bozo bit on anyone pushing for outsourcing, engineers are not just cogs in a machine to start with and outsourced contractors are almost less useful than current LLM coding tools IMHO. If you already have to explain things in excruciating detail, you might as well talk to an LLM.
People really want this black box that they can feed money and input into and have full-fledged applications and platforms pop out the other side. It doesn’t exist. I have only seen failures with outsourcing on this front and so far LLMs haven’t been able to do it either. Don’t get me wrong LLM’s are actually useful in my opinion, just not for writing all the code unsupervised or “vibe coding”.
It's common sense: you either believe what you read on Bernhard Engelbrecht's Twitter account (this is the same crypto influencer who scammed startup founders out of thousands of dollars) or you trust what's published on The Pragmatic Engineer blog, who actually read Bernhard's tweet and spoke to the people who built the tech.
> I don't find this article particularly convincing
I think you missed the point of the article. It's saying: this is what the conspiracy theorists want me to believe but it doesn't add up, so I'm going to pick up the phone and call the people who built it.
At that point, it's engineers talking to engineers. And the post is the outcome of that conversation.
I always knew this story was fake. Even if you have a trillion expert developers it would still be impossible to get fast enough answers to "Fake an LLM". Humans obviously aren't _parallelizable_ like that.
The accusation I heard was these developers they hired were essentially prompt wrangling behind the scenes using other AI services to make Builder seem better than it really was.
The original story doesn't make any sense. How would you fake an "AI" agent coding by using people on the other side? Woudn't it be...obvious? People cannot type code that fast.
What's your non-snarky theory about how this could possibly be true?
I tend to trust Gergely Orosz (the writer of Pragmatic Engineer). He validate sources and has a good track record on reporting on the European tech scene and Engineering Management.
His blog and newsletter are both fairly popular on HN.
LLMs are all fake AI. As the recently released Apple study demonstrates, LLMs don't reason, they just pattern match. That's not "intelligence" however you define it because they can only solve things that are already within their training set.
In this case, it would have been better for the AI industry if it had been 700 programmers, because then the rest of the industry could have argued that the utter trash code Builder.ai generated was the result of human coders spending a few minutes haphazardly typing out random code, and not the result of a specialty-trained LLM.
> because they can only solve things that are already within their training set
I just gave up on using SwiftUI for a rewrite of a backend dashboard tool.
The LLM didn't give up. It kept suggesting wilder, and less stable ideas, until I realized that this was a rabbithole full of misery, and went back to UIKit.
It wasn't the LLM's fault. SwiftUI just isn't ready for the particular functionality I needed, and I guess that a day of watching ChatGPT get more and more desperate, saved me a lot of time.
But the LLM didn't give up, which is maybe ot-nay oo-tay ight-bray.
>As the recently released Apple study demonstrates, LLMs don't reason, they just pattern match
Hold on a minute I was under the impression that "reasoning" was just marketing buzzword the same as "hallucinations", because how tf anyone expected GPUs to "reason" and "hallucinate" when even neurology/psychology don't have a strict definition of those processes.
No, the definitions are very much up for debate, but there is an actual process here. "Reasoning" in this case means having the model not just produce whatever output is requested directly, but also spend some time writing out its thoughts about how to produce that output. Early version of this were just prompt engineering where you ask the model to produce its "chain of thought" or "work step by step" on how to approach the problem. Later this was trained into the model directly with traces on this intermediate thinking, especially for multistep problems, without the need for explicit prompting. And then architecturally these models now have different ways to determine when to stop "reasoning" to skip to generating actual output.
I don't have a strict enough definition to debate if this reasoning is "real" - but from personal experience it certainly appears to be performing something that at least "looks" like inductive thought, and leads to better answers than prior model generations without reasoning/thinking enabled.
It's gradient descent. Why are we surprised when the answers get better the more we do it? Sometimes you're stuck in a local max/minima, and you hallucinate.
Am I oversimplifying it? Is everybody else over-mystifying it?
Gradient descent is how the model weights are adjusted during training. There is no gradient descent, and nothing even remotely similar to it, that happens during inference.
If you allow me to view the weights of a model as the axioms in an axiomatic system, my (admittedly limited) understanding of modern "AI" inference is that it adds no net new information/knowledge, just more specific expressions of the underlying structure (as defined by the model weights).
So while that does undercut my original flippancy of it being "nothing but gradient descent" I don't think it runs counter to my original point that nothing particularly "uncanny" is happening here, no?
Well, if the thing is truly capable of reason, then we have an obligation to put the kibosh on the entire endeavor because we're using a potentially intelligent entity as slave labor. At best, we're re-inventing factory farming and at worst we're re-inventing chattel slavery. Neither of those situations is something I'm personally ok with allowing to continue
I also find the assumption that tech-savvy individuals would inherently be for what we currently call AI to itself, be weird. Unfortunately I feel as though being knowledgable or capable within an area is conflated with an over-acceptance of that area.
If anything, the more I've learned about technology, and the more experienced I am, the more fearful and cautious I am with it.
small-s skepticism, perhaps. 'Skeptics' can fall foul of the same kind of groupthink, magical and motivated reasoning, and fallacies as everyone else, which are often characterized as 'religious'. Not believing in a god doesn't make you immune to being wrong.
(FWIW, I do think there's some very unhealthy attitudes to AI and LLMs going around, like people feel the only two options are 'the singularity is coming' and 'they're useless scams', which tends to result in a large quantity of bullshit on the topic)
>like people feel the only two options are 'the singularity is coming' and 'they're useless scams'
No, you are turning a few loud people into a false dichotomy. The vast majority of people are somewhere between "LLMs are neat" and "I don't think LLMs are AGI"
The vast majority of people do not comment. Using only the comments that people go out of their way to make as your data source is a huge sampling error.
> because they can only solve things that are already within their training set.
That is just plain wrong, as anybody who spent more than 10 minutes with a LLM within the last 3 years can attest. Give it a try, especially if you care to have an opinion on them. Ask an absurd question (that can be, in principle, answered) that nobody has asked before and see how it performs generalizing. The hype is real.
I'm interested what study you refer to. Because I'm interested in their methods and what they actually found out.
The crux is that beyond a bit of complexity the whole house of cards comes tumbling down. This is trivially obvious to any user of LLMs who has trained themselves to use LLMs (or LRMs in this case) to get better results ... the usual "But you're prompting it wrong" answer to any LLM skepticism. Well, that's definitely true! But it's also true that these aren't magical intelligent subservient omniscient creatures, because that would imply that they would learn how to work with you. And before you say "moving goalpost" remember, this is essentially what the world thinks they are being sold.
It can be both breathless hysteria and an amazing piece of revolutionary and useful technology at the same time.
The training set argument is just a fundamental misunderstanding, yes, but you should think about the contrapositive - can an LLM do well on things that are _inside_ its training set? This paper does use examples that are present all over the internet including solutions. Things children can learn to do well. Figure 5 is a good figure to show the collapse in the face of complexity. We've all seen that when tearing through a codebase or trying to "remember" old information.
I think apple published that study right before WWDC to have an excuse to not give bigger than 3B foundation models locally and force you to go via their cloud -for reasoning- harder tasks.
beta api's so its moving waters but that's my thoughts after playing with it, the paper makes much more sense in that context
What you think is an absurd question may not be as absurd as it seems, given the trillions of tokens of data on the internet, including its darkest corners.
In my experience, its better to simply try using LLMs in areas where they don't have a lot of training data (e.g. reasoning about the behaviour of terraform plans). Its not a hard cutoff of being _only_ able to reason exactly about solved things, but its not too far off as a first approximation.
The researchers took exiting known problems and parameterised their difficulty [1]. While most of these are not by any means easy for humans, the interesting observation to me was that the failure_N was not proportional to the complexity of the problem, but more with how common solution "printouts" for that size of the problem can typically be encountered in the training data. For example, "towers of hanoi" which has printouts of solutions for a variety of sizes went to very large number of steps N, while the river crossing, which is almost entirely not present in the training data for N larger than 3, failed above pretty much that exact number.
It doesn't help that thanks to RLHF, every time a good example of this gains popularity, e.g. "How many Rs are in 'strawberry'?", it's often snuffed out quickly. If I worked at a company with an LLM product, I'd build tooling to look for these kinds of examples in social media or directly in usage data so they can be prioritized for fixes. I don't know how to feel about this.
On the one hand, it's sort of like red teaming. On the other hand, it clearly gives consumers a false sense of ability.
Indeed. Which is why I think the only way to really evaluate the progress of LLMs is to curate your own personal set of example failures that you don't share with anyone else and only use it via APIs that provide some sort of no-data-retention and no-training guarantees.
I agree that all AI has been fake AI since the term was first coined.
Researchers in the field used to acknowledge that their computational models weren't anywhere close to AI. That all changed when greed became the driving motivation of tech.
> As the recently released Apple study demonstrates
The Apple study that did Towers of Hanoi and concluded that giving up when the answers would have been too long to fit in the output window was a sign of "not reasoning"?
> As the recently released Apple study demonstrates, LLMs don't reason
Where is everyone getting this misconception? I have seen it several times. First off, the study doesn't even try to qualify whether or not these models use "actual reasoning" - that's outside of the scope. They merely examine how effective thinking/reasoning _is_ at producing better results. They found that - indeed - reasoning improves performance. But the crucial result is that it only improves performance up to a certain difficulty-cliff - at which point thinking makes no discernable difference due to a model collapse of sorts.
It's important to read the papers you're using to champion your personal biases.
> They found that - indeed - reasoning improves performance.
You're oversimplifying the results a bit here. They show that reasoning decreases performance for simple problems, improves performance for more complex ones, and does nothing for very complex problems.
The LLMs don't "reason" by any definition of the term. If they did, then the Tower of Hanoi and the river problem would have been trivial for them to handle at any level because ultimately the solutions are just highly recursive.
What the LLMs do is attempt to pattern match to existing solved problems in their training set and just copy those solutions. But this results in overthinking for very simple problems (because they're copying too much of the solutions from their training set), works well for the somewhat complex problems like a basic Tower of Hanoi, and not at all for the problems that would require actual reasoning because...they're just copying solutions.
The point of the paper is that what LLMs do is not reasoning, however much the AI industry may want to redefine the word to suit their commercial interests.
You're conflating two definitions of reasoning. LLM's have struggled with visual reasoning since their inception because guess what!? they're trained on language mostly, not a 3d environment.
LLM's aren't magic - those who claim they are are hyping for some reason or another. Ignore them. View AI objectively. Ignore your bias.
Visual reasoning is not required to follow an algorithm that was laid out. The fact that the model can't execute an algorithm it was provided proves that it is not able to do deductive synbolic reasoning in general, since that is all that would have been required required.
I would agree with you if this were about inventing the algorithm for itself - it may well be that you'd need some amount of visual reasoning to come up with it. But that's not what the GP (or the paper) were talking about.
The deep seated hate for Indians (and among them, Hindus) had been going on unchecked in the West for many hundred years. That's precisely why such fake-news go viral so quickly.
Hell, when the woke "bleeding-heart" academics are the leading voices behind this hate festival, you know there's something deeply wrong.
I was so shocked by the things "South-Asia depts." do in the US that it's hard not to to consider them to be in the same bag as the medieval religious nuts, pagan-hunting padre "saints" & "race-science pioneers".
I don’t believe that their business entirely depended on 700 actual humans, just as much as I don’t believe that to be true for the Amazon store. However, both probably relied on humans in the loop which is not sustainable at scale.
The headline as stated is categorically false, buuuut... I think it's pretty salient that a company called "Builder.ai" only had 15 engineers working on actual ai and actually mostly functioned as an outsourcing intermediary for 500-1000 engineers (ie, the builders). When it comes to these viral misunderstandings, you kind of reap what you sow.
Is ‘misunderstanding’ silicon valley slang for ‘scam’ or ‘securities fraud’?
The AI engineers were based in the UK and from what I've seen on LinkedIn many came from top unis. They're probably worth a 100x more than the builders. Not to mention their boss, who was a well know AI figure = $$$
> Builder hired 300 internal engineers and kicked off building internal tools, all of which could have simply been purchased
Dear god, PLEASE hire an actual Enterprise IT professional early in your startup expansion phase. A single competent EIT person (or dinosaur like me) could have - if this story is true - possibly saved the whole startup by understanding what’s immediately needed versus what’s nice-to-have, what should be self-hosted versus what should be XaaS, stitching everything together to reduce silos, and ensuring every cent is not just accounted for but wisely invested in future success.
Even if the rest of your startup isn’t “worrying about the money”, your IT and Finance people should always be worried about the money.
Perhaps the grand vision was to later use all those newly built internal tools as reusable components in their customer-facing apps…
I would assume they were collecting data. If you're building a chatbot to talk to clients, you need all the transcripts from meetings, chat logs, project info, analytics, metrics, and so on. It's the only way to train your model properly.
Do they have any patents related to using chatbots for project management?
Most likely. But that's exactly what someone who hasn't experienced enough cycles in industry would come up with :).
Yeah, I feel like if their goal was to make custom software cheap and easy, this was basically just dogfooding.
I interviewed there a few years back. I bailed on the interview within the first fifteen minutes, the first time I’ve ever done that. I told them they’d given me the ick — not my most professional moment, admittedly! But it was an awkward and unpleasant interview.
They spent the first ten minutes of the call predicting the death of software engineering (this was a software engineering interview) and complaining about expensive devs (ahem). I wouldn’t have minded so much if the only demo apps they had on their website weren’t some of the worst, non-native iOS apps I’ve ever seen. Truly awful.
A month or two later I noticed on LinkedIn that a dodgy CTO I’d worked with, who had attempted to avoid paying me (and did avoid paying several colleagues of mine), had joined there too. It felt like a good fit.
Yeah, I have to say, none of this is a surprise to me.
[flagged]
Can assure you it wasn’t AI generated, and considering the time I spent writing the post, it’s a little upsetting to be accused of it. That said, I do get it. Especially as I’m on a new account.
I’ve always used em-dashes (mild typography nerd) but recently have been considering stopping, for exactly this reason. They always flew under the radar a little, but I’d always notice when others used them, so it’s a shame and I’ll miss them.
fyi, I was based in London at the time. This was for a position helping to build the supposed AI.
> I’ve always used em-dashes (mild typography nerd) but recently have been considering stopping, for exactly this reason. They always flew under the radar a little, but I’d always notice when others used them, so it’s a shame and I’ll miss them.
Please don’t stop using them. Keep on doing the right thing, no matter what!
(I love em dashes too!)
Proper typography ≠ AI generated, this meme needs to stop. Feel free to stop reading anything with emdashes, but don’t accuse people of being AI for this reason.
(Also, ChatGPT usually uses emdashes without spaces.)
ChatGPT uses a combination of characters to fingerprint its output. In this case:
1. Straight apostrophe (human) vs. curved apostrophe (ChatGPT): weren't instead of weren’t. Most humans type straight quotes instead of using shortcuts like Alt+0146 or Option + Shift + ]
2. Dash (human) vs. em dash (ChatGPT). Most people type regular dashes instead of em dashes (—)
3. The second paragraph seems like it was manually updated, which is why it's grammatically incorrect, but it still kept the curved apostrophe in the word "weren’t"
4. The user only has one comment, a conspiracy theory: "a dodgy CTO joined the company" without providing names or any kind of evidence. Hmmm...
This isn't Reddit guys. We need to keep the site reliable. Thanks
I used iOS to write my comment. It adds curved apostrophes automatically. macOS does too.
The em-dash was me — you can long press on the dash button and you get the choice.
Also, next time? Maybe consider that you might be wrong, and you’re responding to an actual human. Who spent half an hour or so writing a comment to share what they thought was an interesting anecdote. And who is now feeling a bit rubbish.
No need to express emotions. This ain't a Turing test :)
sorry some white privileged Internet jackass caused you to feel rubbish. Not everyone here is like that.
> ChatGPT uses a combination of characters to fingerprint its output.
Does it, or are you inferring a fingerprint based on your own experiences?
> Most humans type straight quotes instead of using shortcuts like Alt+0146 or Option + Shift + ]
Well, I use X11, and ’ is an easy Compose ' >.
Most humans do all sorts of foolish stuff. Modern computers are not typewriters; we have the full panoply of Unicode at our literal fingertips!
Using better punctuation is not exactly rocket science — it’s just a matter of using memorising keyboard shortcuts, which is something the HN crowd is pretty good at.
Or they just typed it on an Apple device…
> Builder hired 300 internal engineers and kicked off building internal tools, all of which could have simply been purchased
Tempted to say there was a bit of corruption here, crazy decision. Like someone had connections to the contractor providing all those devs.
otoh they were an "app builder" company. Maybe they really wanted to dogfood.
Hmm, this is the reasoning of a kid: "You hire devs instead of using AI, therefore you are corrupt". More conspiracy theories. Based on what the article says, it was a dev shop like Infosys or any other Indian dev company, they were working on hundreds of projects.
A similar thing happened at Uber before the 2021 re-org. At one point they had 3 competing internal chat apps from what I've heard from peers working there, and having previously worked for a vendor of Uber's, I noticed a significant amount of disjointedness in their environment (seemed very disjointed EM driven with no overarching product vision).
Ofc, Gergely might have some thoughts about that ;)
One good thing that came out of that disjointedness is temporal.io, which is an ultimate microservice gluecode tool
Kudos to the author for the update - and also to others including @dang for calling it out at the time:
https://news.ycombinator.com/item?id=44169759
(Builder.ai Collapses: $1.5B 'AI' Startup Exposed as 'Indians'?, 367 points, 267 comments)
Hey, I did some digging too (cough cough). Here's what I said on that thread:
- Proven: BuilderAI collapsed after fabricating revenue.
- Unsubstantiated: The rumour that 700 devs were the chatbot is false, not backed by evidence or insiders.
- Marketing vs. reality: They marketed features as "AI-assisted", not AI-generated, two very different things.
- Bottom line: The real scandal is financial fraud, not some fake-AI front.
Yeah, nice work - I mentioned "and others" for that reason; dangs was just the top comment.
Are there people who actually believe that a user would enter a text prompt than a human programmer would generate the code?
My assumption when the story broke was that the 700 engineers were using various AI tools (Replit, Cursor, ChatGPT, etc.) to create code and documentation and then stitching it all together somewhat manually. Sort of like that original Devin demo where AI was being used at each step but there was a ton of manual intervention along the way and the final video was edited to make it seem as if the whole thing ran end to end fully automated all from the initial prompt.
Yes, 90% of people with no tech background reading the news
They don't know the difference between AI and ChatGPT
I worked with an "AI data vendor" at work where you'd put in a query and "the AI gave you back a dataset" but it usually took 24hrs, so it was obvious they had humans pulling the data. The company still purchased a data plan. It happens, in this case, they have a unique dataset, though.
Builder.ai had a totally different flow, but yeah, when boring stories and exciting ones compete to tell the same story, a very large percentage will run with the exciting story. It’s like death tax in US political history - the US has never had a death tax but it’s way more exciting to call it a death tax than an estate tax. Only now, instead of media being the primary disseminator of spin, we have people sharing exciting stories on social media instead of boring stories about building internal zoom and accounting issues.
Then social animals kick in, likes pour in and more people share. Social media has created a world where an exciting lie can drown out boring truth for a large percentage of people.
Unfortunately a lot of people!!
that was not the flow
But it was the flow in the examples.
Majority of HN commenters
I don’t find this article particularly convincing. The main argument seems to be “It couldn’t be real engineers, it would be too slow” but I have no clue what the Builder.ai interface or response times were like. Also it says 10-20min would be too long… kind of? Not really though? Depends on the output. Claude Code has run for quite a while on its own before (I’ve never timed it) but 5-10+min doesn’t shock me. Yes, Claude is giving real-time output but I’ve seen a number of dev tools that don’t (or didn’t, this area is moving fast).
Also, re: hiring outsourced contractors
> However, we didn't anticipate the significant fraud that would ensue
First time? Every experience I have personally had with outsourced contractors has been horrible. Bad code quality, high billing hours for low output, language and time barriers, the list goes on. I’m quick to flip the bozo bit on anyone pushing for outsourcing, engineers are not just cogs in a machine to start with and outsourced contractors are almost less useful than current LLM coding tools IMHO. If you already have to explain things in excruciating detail, you might as well talk to an LLM.
People really want this black box that they can feed money and input into and have full-fledged applications and platforms pop out the other side. It doesn’t exist. I have only seen failures with outsourcing on this front and so far LLMs haven’t been able to do it either. Don’t get me wrong LLM’s are actually useful in my opinion, just not for writing all the code unsupervised or “vibe coding”.
It's common sense: you either believe what you read on Bernhard Engelbrecht's Twitter account (this is the same crypto influencer who scammed startup founders out of thousands of dollars) or you trust what's published on The Pragmatic Engineer blog, who actually read Bernhard's tweet and spoke to the people who built the tech.
> I don't find this article particularly convincing
I think you missed the point of the article. It's saying: this is what the conspiracy theorists want me to believe but it doesn't add up, so I'm going to pick up the phone and call the people who built it.
At that point, it's engineers talking to engineers. And the post is the outcome of that conversation.
I always knew this story was fake. Even if you have a trillion expert developers it would still be impossible to get fast enough answers to "Fake an LLM". Humans obviously aren't _parallelizable_ like that.
The accusation I heard was these developers they hired were essentially prompt wrangling behind the scenes using other AI services to make Builder seem better than it really was.
I just saw the click-bait garbage headline, and from that alone I knew it was a lie, so I never even clicked it to read any articles.
[dead]
“ building internal versions of Slack, Zoom, JIRA, and more…”
Did they really do this or customize Jira schemas and workflows for example ?
Unnamed former employees of a dead company say company didn't fake it. Film at 11.
This was analyzed on HN a week or so ago: https://news.ycombinator.com/item?id=44176241
The "700 engineers faking AI" claim seems to have been sloppy[0] reasoning by an influencer, which spread like wildfire.
[0] I won't attribute malice here, but this version was certainly more interesting than the truth
The original story doesn't make any sense. How would you fake an "AI" agent coding by using people on the other side? Woudn't it be...obvious? People cannot type code that fast.
What's your non-snarky theory about how this could possibly be true?
You claim you have a queue and it takes up to 24 hours for your job to run?
It was obviously not prompt and get response model like chatgpt.
I tend to trust Gergely Orosz (the writer of Pragmatic Engineer). He validate sources and has a good track record on reporting on the European tech scene and Engineering Management.
His blog and newsletter are both fairly popular on HN.
LLMs are all fake AI. As the recently released Apple study demonstrates, LLMs don't reason, they just pattern match. That's not "intelligence" however you define it because they can only solve things that are already within their training set.
In this case, it would have been better for the AI industry if it had been 700 programmers, because then the rest of the industry could have argued that the utter trash code Builder.ai generated was the result of human coders spending a few minutes haphazardly typing out random code, and not the result of a specialty-trained LLM.
> because they can only solve things that are already within their training set
I just gave up on using SwiftUI for a rewrite of a backend dashboard tool.
The LLM didn't give up. It kept suggesting wilder, and less stable ideas, until I realized that this was a rabbithole full of misery, and went back to UIKit.
It wasn't the LLM's fault. SwiftUI just isn't ready for the particular functionality I needed, and I guess that a day of watching ChatGPT get more and more desperate, saved me a lot of time.
But the LLM didn't give up, which is maybe ot-nay oo-tay ight-bray.
https://despair.com/cdn/shop/files/stupidity.jpg
>As the recently released Apple study demonstrates, LLMs don't reason, they just pattern match
Hold on a minute I was under the impression that "reasoning" was just marketing buzzword the same as "hallucinations", because how tf anyone expected GPUs to "reason" and "hallucinate" when even neurology/psychology don't have a strict definition of those processes.
No, the definitions are very much up for debate, but there is an actual process here. "Reasoning" in this case means having the model not just produce whatever output is requested directly, but also spend some time writing out its thoughts about how to produce that output. Early version of this were just prompt engineering where you ask the model to produce its "chain of thought" or "work step by step" on how to approach the problem. Later this was trained into the model directly with traces on this intermediate thinking, especially for multistep problems, without the need for explicit prompting. And then architecturally these models now have different ways to determine when to stop "reasoning" to skip to generating actual output.
I don't have a strict enough definition to debate if this reasoning is "real" - but from personal experience it certainly appears to be performing something that at least "looks" like inductive thought, and leads to better answers than prior model generations without reasoning/thinking enabled.
It's gradient descent. Why are we surprised when the answers get better the more we do it? Sometimes you're stuck in a local max/minima, and you hallucinate.
Am I oversimplifying it? Is everybody else over-mystifying it?
Gradient descent is how the model weights are adjusted during training. There is no gradient descent, and nothing even remotely similar to it, that happens during inference.
Fair, thanks for pointing that out.
If you allow me to view the weights of a model as the axioms in an axiomatic system, my (admittedly limited) understanding of modern "AI" inference is that it adds no net new information/knowledge, just more specific expressions of the underlying structure (as defined by the model weights).
So while that does undercut my original flippancy of it being "nothing but gradient descent" I don't think it runs counter to my original point that nothing particularly "uncanny" is happening here, no?
Reasoning means what reasoning always meant.
Selling an algorithm that can write a list of steps as reasoning is bordering on fraud.
It's not uncommon that they guess the right solution, and then "reason" their way out of it.
> Reasoning means what reasoning always meant.
"""Reasoning: The deduction of inferences or interpretations from premises."""
Sounds like any logic program to me?
Apparently from Latin, ratiō, which has meanings including "explanation" and "computation"?
AI skepticism is like a religion at this point. Weird it's so prominent on a tech site.
(The Apple paper has had many serious holes poked in it.)
Well, if the thing is truly capable of reason, then we have an obligation to put the kibosh on the entire endeavor because we're using a potentially intelligent entity as slave labor. At best, we're re-inventing factory farming and at worst we're re-inventing chattel slavery. Neither of those situations is something I'm personally ok with allowing to continue
I concur.
I also find the assumption that tech-savvy individuals would inherently be for what we currently call AI to itself, be weird. Unfortunately I feel as though being knowledgable or capable within an area is conflated with an over-acceptance of that area.
If anything, the more I've learned about technology, and the more experienced I am, the more fearful and cautious I am with it.
Reason can be distinct from sapience.
Skepticism cannot, by definition, be religious.
You should re-think the bias that led you to this belief.
small-s skepticism, perhaps. 'Skeptics' can fall foul of the same kind of groupthink, magical and motivated reasoning, and fallacies as everyone else, which are often characterized as 'religious'. Not believing in a god doesn't make you immune to being wrong.
(FWIW, I do think there's some very unhealthy attitudes to AI and LLMs going around, like people feel the only two options are 'the singularity is coming' and 'they're useless scams', which tends to result in a large quantity of bullshit on the topic)
>like people feel the only two options are 'the singularity is coming' and 'they're useless scams'
No, you are turning a few loud people into a false dichotomy. The vast majority of people are somewhere between "LLMs are neat" and "I don't think LLMs are AGI"
The vast majority of people do not comment. Using only the comments that people go out of their way to make as your data source is a huge sampling error.
> because they can only solve things that are already within their training set.
That is just plain wrong, as anybody who spent more than 10 minutes with a LLM within the last 3 years can attest. Give it a try, especially if you care to have an opinion on them. Ask an absurd question (that can be, in principle, answered) that nobody has asked before and see how it performs generalizing. The hype is real.
I'm interested what study you refer to. Because I'm interested in their methods and what they actually found out.
"The apple study" is being overblown too, but here it is: https://machinelearning.apple.com/research/illusion-of-think...
The crux is that beyond a bit of complexity the whole house of cards comes tumbling down. This is trivially obvious to any user of LLMs who has trained themselves to use LLMs (or LRMs in this case) to get better results ... the usual "But you're prompting it wrong" answer to any LLM skepticism. Well, that's definitely true! But it's also true that these aren't magical intelligent subservient omniscient creatures, because that would imply that they would learn how to work with you. And before you say "moving goalpost" remember, this is essentially what the world thinks they are being sold.
It can be both breathless hysteria and an amazing piece of revolutionary and useful technology at the same time.
The training set argument is just a fundamental misunderstanding, yes, but you should think about the contrapositive - can an LLM do well on things that are _inside_ its training set? This paper does use examples that are present all over the internet including solutions. Things children can learn to do well. Figure 5 is a good figure to show the collapse in the face of complexity. We've all seen that when tearing through a codebase or trying to "remember" old information.
I think apple published that study right before WWDC to have an excuse to not give bigger than 3B foundation models locally and force you to go via their cloud -for reasoning- harder tasks.
beta api's so its moving waters but that's my thoughts after playing with it, the paper makes much more sense in that context
What you think is an absurd question may not be as absurd as it seems, given the trillions of tokens of data on the internet, including its darkest corners.
In my experience, its better to simply try using LLMs in areas where they don't have a lot of training data (e.g. reasoning about the behaviour of terraform plans). Its not a hard cutoff of being _only_ able to reason exactly about solved things, but its not too far off as a first approximation.
The researchers took exiting known problems and parameterised their difficulty [1]. While most of these are not by any means easy for humans, the interesting observation to me was that the failure_N was not proportional to the complexity of the problem, but more with how common solution "printouts" for that size of the problem can typically be encountered in the training data. For example, "towers of hanoi" which has printouts of solutions for a variety of sizes went to very large number of steps N, while the river crossing, which is almost entirely not present in the training data for N larger than 3, failed above pretty much that exact number.
[1]: https://machinelearning.apple.com/research/illusion-of-think...
It doesn't help that thanks to RLHF, every time a good example of this gains popularity, e.g. "How many Rs are in 'strawberry'?", it's often snuffed out quickly. If I worked at a company with an LLM product, I'd build tooling to look for these kinds of examples in social media or directly in usage data so they can be prioritized for fixes. I don't know how to feel about this.
On the one hand, it's sort of like red teaming. On the other hand, it clearly gives consumers a false sense of ability.
Indeed. Which is why I think the only way to really evaluate the progress of LLMs is to curate your own personal set of example failures that you don't share with anyone else and only use it via APIs that provide some sort of no-data-retention and no-training guarantees.
On that basis AI has been "fake AI" since the term AI was first coined in 1956.
The "AI isn't really intelligence" argument is so tired now it has a whole Wikipedia page about it: https://en.m.wikipedia.org/wiki/AI_effect
I agree that all AI has been fake AI since the term was first coined.
Researchers in the field used to acknowledge that their computational models weren't anywhere close to AI. That all changed when greed became the driving motivation of tech.
> As the recently released Apple study demonstrates
The Apple study that did Towers of Hanoi and concluded that giving up when the answers would have been too long to fit in the output window was a sign of "not reasoning"?
https://xcancel.com/scaling01/status/1931783050511126954
I mean, on that basis, anyone who ever went "TL;DR" is also demonstrating that humans don't reason.
> That's not "intelligence" however you define it because they can only solve things that are already within their training set.
This is proven untrue by, amongst other things, looking at them playing chess. They can and do play moves not found in the training data: https://www.lesswrong.com/posts/yzGDwpRBx6TEcdeA5/a-chess-gp...
> As the recently released Apple study demonstrates, LLMs don't reason
Where is everyone getting this misconception? I have seen it several times. First off, the study doesn't even try to qualify whether or not these models use "actual reasoning" - that's outside of the scope. They merely examine how effective thinking/reasoning _is_ at producing better results. They found that - indeed - reasoning improves performance. But the crucial result is that it only improves performance up to a certain difficulty-cliff - at which point thinking makes no discernable difference due to a model collapse of sorts.
It's important to read the papers you're using to champion your personal biases.
> They found that - indeed - reasoning improves performance.
You're oversimplifying the results a bit here. They show that reasoning decreases performance for simple problems, improves performance for more complex ones, and does nothing for very complex problems.
I think you need to re-read the paper.
The LLMs don't "reason" by any definition of the term. If they did, then the Tower of Hanoi and the river problem would have been trivial for them to handle at any level because ultimately the solutions are just highly recursive.
What the LLMs do is attempt to pattern match to existing solved problems in their training set and just copy those solutions. But this results in overthinking for very simple problems (because they're copying too much of the solutions from their training set), works well for the somewhat complex problems like a basic Tower of Hanoi, and not at all for the problems that would require actual reasoning because...they're just copying solutions.
The point of the paper is that what LLMs do is not reasoning, however much the AI industry may want to redefine the word to suit their commercial interests.
An LLM was given the algorithm to Towers of Hanoi and was unable to solve it. There is no "reasoning".
You're conflating two definitions of reasoning. LLM's have struggled with visual reasoning since their inception because guess what!? they're trained on language mostly, not a 3d environment.
LLM's aren't magic - those who claim they are are hyping for some reason or another. Ignore them. View AI objectively. Ignore your bias.
Visual reasoning is not required to follow an algorithm that was laid out. The fact that the model can't execute an algorithm it was provided proves that it is not able to do deductive synbolic reasoning in general, since that is all that would have been required required.
I would agree with you if this were about inventing the algorithm for itself - it may well be that you'd need some amount of visual reasoning to come up with it. But that's not what the GP (or the paper) were talking about.
The deep seated hate for Indians (and among them, Hindus) had been going on unchecked in the West for many hundred years. That's precisely why such fake-news go viral so quickly.
Hell, when the woke "bleeding-heart" academics are the leading voices behind this hate festival, you know there's something deeply wrong.
I was so shocked by the things "South-Asia depts." do in the US that it's hard not to to consider them to be in the same bag as the medieval religious nuts, pagan-hunting padre "saints" & "race-science pioneers".
I don’t believe that their business entirely depended on 700 actual humans, just as much as I don’t believe that to be true for the Amazon store. However, both probably relied on humans in the loop which is not sustainable at scale.
If you read the article, they had two separate products: one of which was 700 actual humans, and the other was an LLM-powered coding tool.
at what scale though? as long as money line go up faster than cost line go up, it's fine?