RustAssistant: Using LLMs to Fix Compilation Errors in Rust Code

142 points by mmastrac 2 months ago

rs186 2 months ago

Many of the examples seem very easy -- I suspect that without LLMs, just simple Google searches lead you to a stackoverflow question that asks the same thing which. I wonder how this performs in bigger, more complex codebase.

Also, my personal experience with LLMs fixing compilation errors is: when it works, it works great. But when it doesn't, it's so clueless and lost that it's a complete waste of time to employ LLM in the first place -- you are much better off debugging the code yourself using old fashioned method.

lolinder 2 months ago

Yep. This is true for all languages that I've tried, but it's particularly true in Rust. The model will get into a loop where it gets further and further away from the intended behavior while trying to fix borrow checker errors, then eventually (if you're lucky) gives up and hand the mess back over to you.
Which at least with Cursor's implementation means that it by default gives you the last iteration of its attempt to fix the problem, which when this happens is almost always way worse than its first attempt.
- _bin_ 2 months ago
  
  Absolutely; I re-try using LLMs to debug this every so often and they just aren't capable of "fixing" anything borrow checker-related. They spit out some slop amalgamation of Rc/Arc/even UnsafeCell. They don't understand futures being send + sync. They don't understand lifetimes. The other path is it sometimes loops between two or three broken "fixes" that still don't compile.
  "Sure! Let me...." (writes the most ungodly garbage Rust known to man)
  Now, I certainly hope I'm wrong. It would be nice to enjoy similar benefits to guys doing more python/typescript work. I just doubt it's that good.
  
  OJFord 2 months ago
  
  > It would be nice to enjoy similar benefits to guys doing more python/typescript work.
  No need to be envious: it doesn't give me compilation errors in python, but that ain't because it always gives correct code!
  (It can be helpful too, but I get a lot of hallucinated APIs/arguments/etc.)
  
  steveklabnik 2 months ago
  
  This is pretty contrary to my experience, for whatever it's worth. I wonder what the differences are.
  
  bsaul 2 months ago
  
  senior dev, rust very recent onboarder : my personal experience is that LLM are super helpful at the very very beginning, because they show you the very common rust pattern, and help with the verbose syntax by reducing the amount of typing ( which isn't usually a problem, but in rust it can be).
  Then once you start reaching for a bit more complex problems (like ffi, or building your own lib), llms are actually making waste time spitting nonsense.
  However, once your patterns are back on track, and you start to know a bit better what you're doing, you can easily dismiss those bad advices, and immediately give the llm hints that will make them useful again.
  At least that's my experience.
  
  _bin_ 2 months ago
  
  Well to start, you're really, really good at rust :)
  I'm not sure. I've begun advocating for people, when they write blogposts etc., to dump and attach their conversations (unless precluded by security/privacy) so those of us who seem to get worse results can see if there are "tricks" or styles to it we miss. If you're amenable to it, this would be doubly helpful from someone who knows the language so extremely well!
  
  steveklabnik 2 months ago
  
  I’ll think about it, and thanks for the kind words.
- cmrdporcupine 2 months ago
  
  Yeah borrow checker errors in Rust fed to an LLM inevitably lead to the thing just going in circles. You correct it, it offers something else that doesn't work, when notified of that it gives you some variant of the original problem.
  Usually when you come to an LLM with an error like this it's because you tried something that made sense to you and it didn't. Turns out those things usually "make sense" to an LLM, too, and they don't step through and reason through the "logic", they just vibe off of it, and the pattern they come back to you with is usually just some variant of the original pattern that led you to the problem in the first place.
- greenavocado 2 months ago
  
  That's why you need to implement logical residual connections to keep the results focused over successive prompts (like ResNets do)
nicce 2 months ago

There have been cases when o1/o3 has helped me to solve some issues that I could not solve with stackoverflow or Rust forum.
LLM was able to connect the dots of some more complex and rarer Rust features and my requirements. I did not know that they could be used like that. One case was, for example, about complex usage of generic associated types (GATs).
What it comes to lifetime issues, usually it is about wasting time if trying to solve with LLMs.
- seanw444 2 months ago
  
  I rarely touch LLMs (I think they're very overhyped and usually unnecessary), but I did find one use case that was handy recently. I was writing some Nim code that interfaced with Cairo, and one Cairo feature wasn't really documented at all, and the design pattern was one I was not familiar with.
  Asking it to write Nim code to solve my problem resulted in a mess of compilation errors, and it kept looping through multiple broken approaches when I pointed out what was flawed.
  Finally, I just decided to ask it to explain the C design pattern I was unfamiliar with, and it was capable of bridging the gap enough for me to be able to write the correct Nim code. That was pretty cool. There was no documentation anywhere for what I needed, and nobody else had ever encountered that problem before. That said, I could have just gone to the Nim forum with a snippet and asked for help, and I would have solved the problem with a fraction of the electricity usage.
rvz 2 months ago

> Also, my personal experience with LLMs fixing compilation errors is: when it works, it works great. But when it doesn't, it's so clueless and lost that it's a complete waste of time to employ LLM in the first place -- you are much better off debugging the code yourself using old fashioned method.
Or just 'learning the Rust syntax' and standard library?
As you said, LLMs are unpredictable in their output and will can generate functions that don't exist and incorrect code as you use more advanced features, wasting more time than it saves if you don't know the language well enough.
I guess those coming from dynamically typed languages are having a very hard time in getting used to strongly typed languages and then struggle with the basic syntax of say, Rust or C++.
Looking at this AI hype with vibe-coding/debugging and LLMs, it just favours throwing code on the wall with a lack of understanding as to what it does after it compiles.
This is why many candidates won't ever do Leetcode with Rust in a real interview.
onlyrealcuzzo 2 months ago

> But when it doesn't, it's so clueless and lost that it's a complete waste of time to employ LLM in the first place -- you are much better off debugging the code yourself using old fashioned method.
So why not automatically try it, see if it fixes automatically, and if not then actually debug it yourself?
- colonial 2 months ago
  
  Tokens cost money, either directly (APIs) or indirectly (wall power for your GPU). I would prefer not to spend my hard-earned dollars on slop advice.
  
  onlyrealcuzzo 2 months ago
  
  It would cost less than $0.05, and it's right way more times than 1 in 1000, which would pay for itself.
  But everyone's entitled to their own choices.
  
  colonial 2 months ago
  
  > RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.
  5 cents, for every compilation error, for a 3/4 chance of success (which is likely disproportionately weighted towards errors I could trivially fix myself.) That easily works out to multiple wasted dollars an hour even if it only applies to manual compiler invocations, not accounting for any time wasted evaluating bogus solutions.
mountainriver 2 months ago

LLMs have made me at least twice as fast at writing rust code. I now think that more people should be writing rust as it’s been made fairly simple to do.
And yes there are some errors it gets stuck in a loop on. It’s not often and generally just switching to another LLM in cursor will fix it.
themusicgod1 2 months ago

"very easy" if you have access to the correct dependencies which outside of microsoft's walled garden, and access to a free LLM (https://elevenfreedoms.org/) which is not guaranteed at all
all of this looks very different when you have to patch in rust dependencies by hand outside of github.

jumploops 2 months ago

I’m curious how this performs against Claude Code/Codex.

The “RustAssistant Algorithm” looks to be a simple LLM workflow[0], and their testing was limited to GPT-4 and GPT-3.5.

In my experience (building a simple Rust service using OpenAI’s o1), the LLM will happily fix compilation issues but will also inadvertently change some out-of-context functionality to make everything “just work.”

The most common issues I experienced were subtle changes to ownership, especially when using non-standard or frequently updated crates, which caused performance degradations in the test cases.

Therefore I wouldn’t really trust GPT-4 (and certainly not 3.5) to modify my code, even if just to fix compilation errors, without some additional reasoning steps or oversight.

[0] https://www.anthropic.com/engineering/building-effective-age...

woadwarrior01 2 months ago

I tried Claude Code with a small-ish C++ codebase recently and found it to be quite lacking. It kept making a lot of silly syntax errors and going around in circles. Spent about $20 in credits without it getting anywhere close to being able to solve the task I was trying to guide it through. OTOH, I know a lot of people who swear by it. But they all seem to be Python or Front-end developers.
- Wheaties466 2 months ago
  
  Do we really know why LLMs seem to score the highest with python related coding tasks? I would think there are equally good examples of javascript/c++/java code to train from but I always see python with the highest scores.
  
  triyambakam 2 months ago
  
  Could be related to how flexible Python is. Pretty easy to write bad and "working" Python code
- greenavocado 2 months ago
  
  May I ask what you tried? I have had strong successes with C++ generation
  
  woadwarrior01 2 months ago
  
  It was a bit esoteric, but not terribly so, some metal-cpp based code for a macOS app.
  
  pjmlp 2 months ago
  
  Probably due to lack of data then, it isn't as if there are many developers rushing to use metal-cpp instead of Objective-C or Swift with Metal.
  Even the documentation is basically having to go through the source code.
mfld 2 months ago

I find that Claude code works well to fix rust compile errors in most cases. Interestingly, the paper didn't compare against agentic coding tools at all, which of course will be more easy to use and more generally applicable.

pveierland 2 months ago

Anecdotally, Gemini 2.5 Pro has been yielding good results lately for Rust. It's been able to one-shot pretty intricate proc macros that required multiple supporting functions (~200 LoC).

Strong typing is super helpful when using AI, since if you're properly grounded and understand the interface well, and you are specifying against that interface, then the mental burden of understanding the output and integrating with the rest of the system is much lower compared to when large amounts of new structure is created without well defined and understood bounds.

goeiedaggoeie 2 months ago

I find that these area all pretty bad with more advanced code still, especially once FFI comes into play. Small chunks ok, but even when working with specification (think some ISO standard from video) and working on something simple (eg a small gstreamer rust plugin), it is still not quite there. C(++) same story.
All round however, 10 years ago I would have taken this assistance!
- danielbln 2 months ago
  
  And 5 years ago this would have been firmly science fiction.
mountainriver 2 months ago

Agree, I’ve been one-shotting entire features into my rust code base with 2.5
It’s been very fun!
- faitswulff 2 months ago
  
  What coding assistant do you use?
  
  mountainriver 2 months ago
  
  Cursor

rgoulter 2 months ago

At a glance, this seems really neat. -- I reckon one thing LLMs have been useful to help with is "the things I'd copy-paste from stack overflow". A loop of "let's fix each error" reminds me of that.

I'd also give +1 to "LLMs as force multiplier". -- If you know what you're doing & understand what's going on, it seems very useful to have an LLM-supported tool able to help automatically resolve compilation errors. -- But if you don't know what you're doing, I'd worry perhaps the LLM will help you implement code that's written with poor taste.

I can imagine LLMs could also help explain errors on demand. -- "You're trying to do this, you can't do that because..., instead, what you should do is...".

MathiasPius 2 months ago

I suspect this might be helpful for minor integration challenges or library upgrades like others have mentioned, but in my experience, the vast majority of Rust compilation issues fall into one of two buckets:

1. Typos, oversights (like when adding new enum variants), etc. All things which in most cases are solved with a single keystroke using non-LLM LSPs.

2. Wrong assumptions (on my part) about lifetimes, ownership, or overall architecture. All problems which I very much doubt an LLM will be able to reason about, because the problems usually lie in my understanding or modelling of the problem domain, not anything to do with the code itself.

croemer 2 months ago

Was the paper really written 2 years ago?

The paper states "We exclude error codes that are no longer relevant in the latest version of the Rust compiler (1.67.1)".

A quick search shows that Rust 1.68.0 was released in March 2023: https://releases.rs/docs/1.68.0/

Update: looks like it really is 2 years old. "We evaluate both GPT-3.5-turbo (which we call as GPT-3.5) and GPT-4"

meltyness 2 months ago

Yeah, the problem LLMs will have with Rust is the adherence to the type system, and the type system's capability to perform type inference. It essentially demands coherent processing memory, similar to the issues LLMs have performing arithmetic while working with limited total features.
https://leetcode.com/problems/zigzag-grid-traversal-with-ski...
Here's an example of an ostensibly simple problem that I've solved (pretty much adversarially) with a type like: StepBy< Cloned< FlatMap< Chunks<Vec<i32>>, FnMut<&[i32]> -> Chain<Iter<i32>, Rev<Iter<i32>> > > > >
So this (pretty much) maximally dips into the type system to solve the problem, and as a result any comprehension the LLM must develop mechanistically about the type system is redundant.
It's a pretty wicked problem the degree to which the type system is used to solve problems, and the degree to which imperative code solves problems that, except for hopes and wishes, which portions map from purpose to execution will likely remain incomprehensible.

petesergeant 2 months ago

Every coding assistant or LLM I've used generally makes a real hash of TypeScript's types, so I'm a little skeptical, but also:

> RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.

74% feels like it would be just the right amount that people would keep hitting "retry" without thinking about the error at all. I've found LLMs great for throwing together simple scripts in languages I just don't know or can't lookup the syntax for, but I'm still struggling to get serious work out of them in languages I know well where I'm trying to do anything vaguely complicated.

Worse, they often produce plausible code that does something in a weird or suboptimal way. Tests that don't actually really test anything, or more subtle but actual bugs in logic, that you wouldn't write yourself, but need to be very on the ball to catch in code you're reviewing.

jcgrillo 2 months ago

74% feels way too low to be useful, which aligns with my limited experience trying to get any value from LLMs for software engineering. It's just too frustrating making the machine guess and check its way to the answer you already know.

chaosprint 2 months ago

I am creator and maintainer of several Rust projects:

https://github.com/chaosprint/glicol

https://github.com/chaosprint/asak

For LLM, even the latest Gemini 2.5 Pro and Claude 3.7 Thinking, it is difficult to give a code that can be compiled at once.

I think the main challenges are:

1. Their training material is relatively lagging. Most Rust projects are not 1.0, and the API is constantly changing, which is also the source of most compilation errors.

2. Trying to do too much at one time increases the probability of errors.

3. The agent does not follow human's work habits very well, go to docs.rs to read the latest documents and look at examples. After making mistakes, search for network resources such as GitHub.

Maybe this is where cursor rules and mcp can work hard. But at present, it is far behind.

NoboruWataya 2 months ago

Anecdotally, ChatGPT (I use the free tier) does not seem to be very good at Rust. For any problem with any complexity it will very often suggest solutions which violate the borrowing rules. When the error is pointed out to it, it will acknowledge the error and suggest a revised solution with either the same or a different borrowing issue. And repeat.

A 74% success rate may be an impressive improvement over the SOTA for LLMs, but frankly a tool designed to fix your errors being wrong, at best, 1 in 4 times seems like it would be rather frustrating.

danielbln 2 months ago

Free tier ChatGPT (so probably gpt-4o) is quite a bit behind the SOTA, especially compared to agentic workflows (LLM that autonomously perform actions, run tests, read/write/edit files, validate output etc.).
Gemini 2.5 pro is a much stronger model, so is Claude 3.7 and presumably GPT4.1 (vis API).
- greenavocado 2 months ago
  
  Gemini 2.5 pro is far ahead of even Claude
  Chart:
  https://raw.githubusercontent.com/KCORES/kcores-llm-arena/re...
  Description of the challenges:
  https://github.com/KCORES/kcores-llm-arena

noodletheworld 2 months ago

Hot take: this is the future.

Strongly typed languages have a fundamentally superior iteration strategy for coding agents.

The rust compiler, particularly, will often give extremely specific “how to fix” advice… but in general I see this as a future trend with rust and, increasingly, other languages.

Fundamentally, being able to assert “this code compiles” (and iterate until it does) before returning “completed task” is superior for agents to dynamic languages where the only possible verification is runtime.

(And at best the agent can assert “i guess it looks ok”)

pseudony 2 months ago

I actually don't think it's that cut and dry. I expect especially that rust (due to lifetimes) will stump LLMs - fixing locally triggers a need for refactor elsewhere.
I actually think a language like Clojure (very functional, very compositional, focus on local, stand-alone functions, manipulate base data-structures (list, set, map), not specialist types (~classes) would do well.
That said, atm. I get WAY more issues in ocaml suggestions from claude than for Python. Training is king - the LLM cannot reason so types are not as big a help as one might think.
- littlestymaar 2 months ago
  
  > fixing locally triggers a need for refactor elsewhere.
  Yes, but such refactors are most of the time very mechanical, and there's no reason to believe the agent won't be able to do it.
  > the LLM cannot reason so types are not as big a help as one might think.
  You are missing the point: the person you are responding expects it to be superior in an agentic scenario, where the LLM can try its code and see the compiler output, rather than in a pure text-generation situation where the LLM can only assess the code from bird eye view.
  
  Capricorn2481 2 months ago
  
  No, I think others are missing the point. An "Agentic scenario" is not dissimilar from passing code manually to an AI, it just does it by itself. And if you've tried to use AI for Rust, you would understand why this is not reliable.
  An LLM can read compiler output, but how it corrects the code is, ultimately, a semantic guess. It can look at the names of types, it can use its training to guess where new code should go based on types, but it's not able to actually use those types while making changes. It would use them in the same way it would use comments, to inform what code it should output. It makes a guess, checks the compiler output, makes another guess, etc. This may lead to code that compiles, but not code that should be committed by any means. And Rust is not what I'd call a "flexible language," where lots of different coding styles are acceptable in a codebase. You can easily end up with brittle code.
  So you don't get much benefits from types, but you do have the overhead of semantic complexity. This is a huge problem for a language like Rust, which is one of the most semantically complicated languages. The best languages are going to be ones that are semantically simple but popular, like Golang. Although I do think Clojure's support is impressive given how little code there is compared to other languages.
  
  inciampati 2 months ago
  
  Mechanical repairs, and often indicative of mistakes about lifetimes. So it's just part of the game.
- noodletheworld 2 months ago
  
  > so types are not as big a help as one might think.
  Yes, they are.
  An agent can combine the compiler type system and iterate.
  That is impossible using clojure.
  The reason you have problems woth ocaml is that the tooling youre using is too shit to support iterating until the compiler passes before returning the results to you.
  …not because tooling doesnt exist. Not because the tooling doesn't work.
  —> because you are not using it.
  Sure, rust ownership makes it hard for LLMs. Faaair point; but ultimately, why would a coding agent ever suggest code to you that doesnt compile?
  Either: a) the agent tooling is poor or b) it is impossible to verify if the code compiles.
  One of those is a solvable problem.
  One is not.
  (Yes, what many current agents do is run test suites; but dynamically generating valid tests is tricky; checking if code compiles is not tricky.)
  
  diggan 2 months ago
  
  > An agent can combine the compiler type system and iterate.
  > That is impossible using clojure.
  It might be impossible to use the compiler type system, but in Clojure you have much more powerful tools for actually working with your program as it runs, one would think this would be a much better setup for an LLM that aims to implement something.
  Instead of just relying on the static types based on text, the LLM could actually inspect the live data as the program runs.
  Besides, the LLM could also replace individual functions/variables in a running program, without having to restart.
  The more I think about it, the more obvious it becomes how well fitted Clojure would be for an LLM to iteratively build an actual working program, compared to other static approaches like using Rust.
  
  michalsustr 2 months ago
  
  I understand the point , however I think explicit types are still superior, due to abundance of data in the training phase. It seems to me to be too computationally hard to incorporate a REPL-like interactive interface in the gpu training loop. Since it’s processing large amounts of data you want to keep it simple, without back-and-forth with CPUs that would kill performance. And if you can’t do it at training time, it’s hard to expect for the LLM to do well at inference time.
  Well, if you could run clojure purely on gpu/inside the neural net, that might be interesting!
  
  diggan 2 months ago
  
  Why would it be more expensive to include a REPL-like experienced compared to running the whole of the Rust compiler, in the GPU training loop?
  Not that I argued that you should that (I don't think either makes much sense, point was at inference time, not for training), but if you apply that to one side of the argument (for Clojure a REPL), don't you think you should also apply that to the other side (for Rust, a compiler) for a fair comparison?
  
  michalsustr 2 months ago
  
  I agree. I am under the impression that unlike Rust, there aren’t explicit types required in Clojure. (I don’t know clojure)
  So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.
  
  diggan 2 months ago
  
  > So there are examples online, with rust code and types and compiler errors, and how to fix them. But for clojure, the type information is missing and you need to get it from repl.
  Right, my point is that instead of the LLM relying on static types and text, with Clojure the LLM could actually inspect the live application. So instead of trying to "understand" that variable A contains 123, it'll do "<execute>(println A)</execute>" and whatever, and then see the results for themselves.
  Haven't thought deeply about it, but my intuition tells me the more (accurate and fresh) relevant data you can give the LLM for solving problems, the better. So having the actual live data available is better than trying to figure out what the data would be based on static types and manually following the flow.
  
  michalsustr 2 months ago
  
  If you want to build LLM specific to clojure, it could be probably engineered, to add the types as traces for training via synthetic dataset, and provide them from repl at inference time. Sounds like awfully large amount of work for non mainstream language.
jillesvangurp 2 months ago

I'm waiting for someone to figure out that coding is essentially a sequence of refactoring steps where each step is a code transformation that transforms it from one valid state to another. Equipping refactoring IDEs with an MCP facade would give direct access to that as well as feedback on compilation state and lots of other information. That makes it a lot easier to do structured transformations of entire code bases without having to feed the entire code base as a context and then hope the LLM hallucinates together the right tokens and uses reasoning to figure out if it might be correct. They are actually pretty good at doing that but it doesn't scale very well currently and gets expensive quickly (in time and tokens).
This stuff is indeed inherently harder for dynamic languages. But it's been standard for (some) statically compiled languages like Java, Kotlin, C#, Scala, etc. for most of this century. I was using refactoring IDEs for Java as early as 2002.
- igouy 2 months ago
  
  Smalltalk Refactoring Browser! (Where do you think Java IDEs got the idea from?)
  "A very large Smalltalk application was developed at Cargill to support the operation of grain elevators and the associated commodity trading activities. The Smalltalk client application has 385 windows and over 5,000 classes. About 2,000 classes in this application interacted with an early (circa 1993) data access framework. The framework dynamically performed a mapping of object attributes to data table columns.
  Analysis showed that although dynamic look up consumed 40% of the client execution time, it was unnecessary.
  A new data layer interface was developed that required the business class to provide the object attribute to column mapping in an explicitly coded method. Testing showed that this interface was orders of magnitude faster. The issue was how to change the 2,100 business class users of the data layer.
  A large application under development cannot freeze code while a transformation of an interface is constructed and tested. We had to construct and test the transformations in a parallel branch of the code repository from the main development stream. When the transformation was fully tested, then it was applied to the main code stream in a single operation.
  Less than 35 bugs were found in the 17,100 changes. All of the bugs were quickly resolved in a three-week period.
  If the changes were done manually we estimate that it would have taken 8,500 hours, compared with 235 hours to develop the transformation rules.
  The task was completed in 3% of the expected time by using Rewrite Rules. This is an improvement by a factor of 36."
  from “Transformation of an application data layer” Will Loew-Blosser OOPSLA 2002
  https://dl.acm.org/doi/10.1145/604251.604258
  
  pjmlp 2 months ago
  
  > Smalltalk Refactoring Browser! (Where do you think Java IDEs got the idea from?)
  Eclipse still has the navigation browser from Visual Age for Smalltalk. :)
- _QrE 2 months ago
  
  It's not really that much harder, if at all, for dynamic languages, because you can use type hints in some cases (i.e. Python), and a different language (typescript) in case of Javascript; there's plenty of tools that'll tell you if you're not respecting those type hints, and you can feed the output to the LLM.
  But yeah, if we get better & faster models, then hopefully we might get to a point where we can let the LLM manage its own context itself, and then we can see what it can do with large codebases.
- pjmlp 2 months ago
  
  Which based many of their tools on what Xerox PARC has done with their Smalltalk, Mesa (XDE), Mesa/Cedar, Interlisp-D environments.
  This kind of processing is possible on dynamic languages, when using an image base system, as it also contains metadata that somehow takes the role of static types.
  From the previous list only Mesa and Cedar are statically typed.
- seanhunter 2 months ago
  
  Feels like this would be possible to achieve using group theory and a lot of work on representing ASTs of program segments in such a way as to be able to invert them.
diggan 2 months ago

On the other hand, using "it compiles" as a heuristic for "it does what I want" seems to be missing the goal of why you're coding what you're coding in the first place. I'd much rather setup one E2E test with how I want the thing to work, then let the magical robot figure out how to get there while also being able to run the test and see if they're there yet or not.
cardanome 2 months ago

Not really. Even humans regularly get lifetimes wrong.
As someone not super experienced in Rust, my workflow was often very very compiler-error-driven. I would type a bit, see what it says, changes it and so on. Maybe someone more experienced can write whole chucks single-pass that compile on first try but that should far exceed anything generative AI will be able to do in the next few years.
The problem here is that iteration with AI is slow and expensive at the moment.
If anything you want to use a language with automatic garbage collection as it removes mental overhead for both generative AI as well as humans. Also you want to to have a more boilerplate heavy language because they are more easily to reason about while the boilerplate doesn't matter when the AI does the work.
I haven't tried it but I suspect golang should work very well. The language is very stable so older training data still works fine. Projects are very uniform, there isn't much variation in coding style, so easy to grok for AI.
Also probably Java but I suspect it might get confused with the different versions and all the magic certain frameworks use.
- greenavocado 2 months ago
  
  All LLMs still massively struggle with resource lifetimes irrespective of the language
  
  hu3 2 months ago
  
  IMO they struggle a whole lot more with low level/manual lifetimes like C, C++ and Rust.
slashdev 2 months ago

I've been saying this for years on X. I think static languages are winning in general now, having gained much of the ergonomics of dynamic languages without sacrificing anything.
But AI thrives with a tight feedback loop, and that's works best with static languages. A Python linter (or even mypy) just isn't as good as the Rust compiler.
The future will be dominated by static languages.
I say this is a long-time dynamic languages and Python proponent who started seeing the light back when Go was first released.
sega_sai 2 months ago

I think this is a great point! I.e. while for humans, it's easier to write not strongly-typed python-like code, as you skip a lot of boiler-plate code, but for AI, the boiler-plate is probably useful, because it reinforces what variable is of what type, and also obviously it's easier to detect errors early on at compilation time.
I actually wonder if that will force languages like python to create a more strictly enforced type modes, as boiler-plate is much less of an issue now.
pjmlp 2 months ago

Hot take, this is a transition step, like the -S switch back when Assembly developers didn't believe compilers could output code as good as themselves.
Eventually a few decades later, optimising backends made hand written Assembly a niche use case.
Eventually AI based programming tools will be able to generate executables. And like it happened with -S we might require the generation into a classical programming language to validate what the AI compiler backend is doing, until it gets good enough and only those arguing on AI Compiler Explorer will care.
- secondcoming 2 months ago
  
  It's probably pointless writing run of the mill assembly these days, but SIMD has seen a resurgence in low-level coding, at least until compilers get better at generating it. I don't think I'd fully trust LLM generated SIMD code as if it was flawed it'd be a nightmare to debug.
  
  pjmlp 2 months ago
  
  Well, that won't stop folks trying though.
  "Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning"
  https://arxiv.org/html/2311.13721v3
- imtringued 2 months ago
  
  This won't be a thing and for very obvious reasons.
  Programming languages solve the specification problem, (which happens to be equivalent to "The Control Problem"). If you want the computer to behave in a certain way, you will have to provide a complete specification of the behavior. The more loose and informal that specification is, the more blanks have to be filled in, the more you are letting the AI make decisions for you.
  You tell your robotic chef to make a pizza, and he does, but it turns out it decided to make a vegan pizza. You yell at the robot for making a mistake and it sure gets that you don't want a vegan pizza, so it decides to add canned tuna. Except, turns out you don't like tuna either. You yell at the robot again and again until it gets it. Every single time you're telling the AI that it made a mistake, you're actually providing a negative specification of what not to do. In the extreme case you will have to give the AI an exhaustive list of your preferences and dislikes, in other words, a complete specification.
  By directly producing executables, you have reduced the number of knobs and levers that can be used to steer the AI and made it that much harder to provide a specification of what the application is supposed to do. In other words, you're assuming that the model in itself is already a complete specification and your prompt is just retrieving the already existing specification.
  
  pjmlp 2 months ago
  
  That was the argument from many Assembly developers against FORTRAN compilers, if you dive into literature of the time.
  Also this is already happening in low code SaaS products where integrations have now AI on their workflows.
  For example, https://www.sitecore.com/products/xm-cloud/ai-workflow
  Which in a way are like high level interpreters, and eventually we will have compilers as well.
  Not saying it happen tomorrow, but it will come.
  
  antonvs 2 months ago
  
  > Programming languages solve the specification problem, (which happens to be equivalent to "The Control Problem").
  Not equivalent, for several reasons. An obvious one is simply that even providing a perfect specification in fact does nothing to solve the control problem, because you still have to enforce following of the specification.
  It's similar to the human version of the control problem: we can write comprehensive and copious laws, but that doesn't automatically prevent humans from breaking them.
  Besides that, the control problem is inherently subjective. The difficulty of breaking it down into an exhaustive list of deterministic rules is a big part of the problem, which programming languages do nothing to solve.
inciampati 2 months ago

I've found this to be very true. I don't think this is a hot take. It's the big take.
Now I code almost all tools that aren't shell scripting in rust. I'm only using dynamic languages when forced to by platform or dependencies. I'm looking at you, pytorch.
justanything 2 months ago

would this mean that LLM would be able to generate code easier for strongly typed languages?
- littlestymaar 2 months ago
  
  In an agentic scenario (when they can actually run the compiler by themselves) yes.
  
  hu3 2 months ago
  
  Yep.
  I just tell LLM to create and run unit tests after applying changes.
  When tests fail, LLM can use the error message to fix code. Be it compilation error of code or logic error in unit tests.
  
  greenavocado 2 months ago
  
  I find llms extremely susceptible to spinning in a circle effectively halting in these situations
  
  hu3 2 months ago
  
  True. That's why my instructions file tell them to try to fix once and stop if it fails again.

k_bx 2 months ago

So far the best way to fix Rust for me was to use OpenAI's CODEX tool. Rust libraries change APIs often and evolve quickly, but luckily all the code is available under ~/.cargo/registry, so it can go and read the actual library code. Very useful!

croemer 2 months ago

Paper is a bit pointless if one can't use the tool.

The paper links to a Github repo with nothing but a 3 sentence README, no activity for 9 months, reading

> We are in the process of open-sourcing the implementation of RustAssistant. Watch this space for updates.

manmal 2 months ago

Maybe this is the right thread to ask: I’ve read that Elixir is a bit under supported by many LLMs. Whereas Ruby/Rails and Python work very well. Are there any recommendations for models that seem particularly useful for Elixir?

arrowsmith 2 months ago

Claude is the best for Elixir in my experience, although you still need to hold its hand quite a lot (cursor rules etc).
None of the models are updated for Phoenix 1.8 either, which has been very frustrating.
- manmal 2 months ago
  
  Thank you!

flohofwoe 2 months ago

So Microsoft programmers will become code monkeys that stumble from one compiler error to the next without any idea what they are actually doing, got it ;)

(it's also a poor look for Rust's ergonomics tbh, but that's not a new issue)

jeffreygoesto 2 months ago

Yupp. And they brag about bangin' on it without any understanding until it magically compiles.

infogulch 2 months ago

I wonder if the reason why LLMs are not very good at debugging is because there's not very much code published that is in this intermediate state with obvious compilation errors.

qiine 2 months ago

huh isn't stackoverflow questions a big source ? ;p

pjmlp 2 months ago

With limited bandwidth, so will check later, it would be great if it could do code suggestions for affine types related errors, or explain what is wrong, this would help a lot regarding Rust's adoption.

CryZe 2 months ago

I'd love to see VSCode integrate all the LSP information into Copilot. That seems to be the natural evolution of this idea.

delduca 2 months ago

> unlike unsafe languages like C/C++

The world is unsafe!

themusicgod1 2 months ago

[flagged]

tomhow 2 months ago

We detached this comment from https://news.ycombinator.com/item?id=43871528 and marked it off topic.
Please don't fulminate. Please don't sneer, including at the rest of the community.
Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Eschew flamebait. Avoid generic tangents. Omit internet tropes.
Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.
Please don't use uppercase for emphasis. If you want to emphasize a word or phrase, put asterisks* around it and it will get italicized.*
https://news.ycombinator.com/newsguidelines.html

vaylian 2 months ago

> These unique Rust features also pose a steep learning curve for programmers.

This is a common misunderstanding of what a learning curve is:

https://en.wikipedia.org/wiki/Learning_curve#%22Steep_learni...