AI Programming has completely missed the mark

I recently read an article about outages from AI generated code, and I have read similar stories in the past year about failings in quality metrics when AI generated code is used. When I was researching Developer Centred Security one of the key takeaways I felt was that for us to build more secure applications we don’t just need better tools, we need more support for developers and programmers to build good quality code. I don’t think LLM based AI, in particular AI-for-code tools like Microsoft’s Copilot, is helping here, because it is fundamentally misaligned from these goals.

Helping developers build more secure code means a couple of different things:

Security and code quality as a business goal
Processes that put security of code front and centre
Improved collaboration between dev, test, and security teams both within and outwith organisations
Better QA, including better testing, more structured discussion, and reflection
Better education that puts security, QA, and critique front and centre in practical activities
Co-Design of tools and libraries which involved end users (programmers)
For tool, library, and assistance, align aims and sources with trusted sources of security principles
Don’t distract developers from using their expertise
Improve documentation in third party libraries and in developed applications

While I’m not totally against using tools to help generate code (and indeed tools like templating, linting, suggested fixes, and more have been around for a long time), I think that AI code generation has really missed the mark, on many different levels.

Business goals – The fundamental selling point is not to offer secure code, it is to offer faster generation of code. More lines written per day is not a metric that is conducive to building more secure code. Managers may also be tricked into thinking an AI can do more than it can, which could impact security as well.
Security Front and Centre – LLMs are language models, they are not logic models, and they are certainly not security models. And attempts to increase their ability to reason do not appear to be bearing fruit. There are domain specific languages, and associated tools, that can be used to define, model, and test security protocols, but AI that spits out code cannot help you understand these.
Improved Collaboration – LLM based AI in no way assists in collaboration. If used as an alternative to pair programming, you’re reducing the human discourse. If you’re using it within discussions like emails, it’s impairing your directness.
Better QA – So far most of the LLM based AI I have seen is focused squarely on code generation. I haven’t seen any big announcements from companies building tools like static and dynamic analysers, fuzzers, test suite builders, CI systems that use AI. Likely because these tools already exist and don’t need LLMs. It’s not a sexy proposal, but boring behind the scenes tools like this are what is needed for good QA, not AI.
Co-Design – I think that the way these LLM based tools are being deployed is antithecal to good co-design principles. They are being built as general purpose chatbot tools, then shoehorned into programming, and despite the protestations of being ‘open’, some companies take steps to actively hide how their AI tools are designed.
Trusted Sources – While there is some utility to having places like Stack Overflow and GitHub where people can look for assistance with their programming, the quality of examples in these places is often poor, and when developers use them they can fail to correctly assess the code quality, ignore comments that offer analysis and alternatives, and these are the sources that are going into training LLMs. So it is only natural that LLMs will end up outputting faulty code.
Distractions – the purpose of an LLM is to generate language that is statistically probably. Not correct, just probable. This results in LLMs randomly generating APIs which makes it impossible to rely on as a secure tool, and generates more work for developers. If they generate output, and realise they have to fix a non-existent library or call, that’s one more job getting in the way of critically analysing whether the compilable code that was generated is actually secure.
Improving Documentation – LLMs generate a lot of slop. They are often misused to generate reams of text, far more than is needed, and as a tool to assist with documentation this is unhelpful. What is needed for good documentation is a developer that understand the intrinsic motivations and goals of other developers who will use their code in the future, to anticipate what assistance and hints they will need. That is a valuable skill, and one that is challenging to acquire. Handing off the task to an AI helps no-one.

If AI really was going to help build more secure code, here are some of the things that it would need to do:

A fundamental shift in the logic used to build models in the first place. LLMs may model things like written English grammar OK, but new principles for designing models that can build code according to a clearly defined specification is needed. Any training needs to be done with trusted and proven secure examples only. And output should be constrained to what is genuinely valid output, with no ‘hallucinations’.
In addition to the above, security modelling needs to be embedded alongside language modelling. This may require building AI systems that offer traceability for how they arrive at outputs.
The systems have to be built in an entirely open fashion, allowing for scrutiny and customisation to each developer and project’s needs.
We need changes to education – When students are using LLMs to help write assignments more and more, the entire system of giving coding assignments is going to need a paradigm shift towards analysis, critique and discussion. If AI is going to be normal, though some doubt that it is ‘inevitable’, that critical approach to AI output is going to be essential. AI cannot and should not assist with this, and I hope educators are aware of this, not just in computing studies, but across all areas.

I do not think that the companies making AI have any incentive, nor even the ability to resolve these needs. So I’m not holding my breath for secure AI code generators. And I would strongly advise that anyone in a business critical situation not use AI code generation in any way.

AI Programming has completely missed the mark

Join the Conversation

Cancel reply

Comment