AI as autonomous developer: Are we there yet?

04/10/2025

8 min

With the rise of the AI chatbot’s, the main question everybody asks is “is my job still safe?”. The improvements in technology and models are rapidly changing, and the AI-assistants get better and ‘smarter’ everyday. In the summer of 2024 I did my first analyses on how far we are in letting an AI build an entire (web) application. Using Github’s Copilot plugin in IntelliJ I started with the general available ‘Pet store API’ from the swagger website, and a prompt with the swagger api and a prompt to create a Springboot application to implement the API. The results back then were somehow promising, but nowhere near the point that a non-developer should be able to create his own application. The AI struggled with the versioning of spring security; it used the Spring Security V6 dependency, but the config for version 5. It also did some strange implementations in the service and repo layers. But it was mostly limited by the fact that the Copilot agent didn’t have full access to the project scope, which became clear when it tried to fix a unit-test on an badly implemented service method; It kept trying to make the test respond to the method, instead of altering the method itself and really solve the problem. I recently stumbled upon an article on medium.com, mentioning a ‘new’ IDE called Windsurf. An entire IDE built on VSCode and the AI assistant Cascade. Windsurf fixes one major issue I experienced last summer; it promises full context awareness, and multi file edits. Together with the fact that LLM’s have improved a lot since then, it was a good reason to repeat the experiment, and ask ’the’ question: are we there yet?

The Challenge

Similar to my previous attempt, the main goal and question is:

“Can an AI assistant create an entire Springboot application only based on prompts, and without the need for a human to interfere in any code changes directly.“

Just as last time, I’m using the petstore API to create a SpringBoot application using Maven. The combination of implementing an API with Springboot is a generally practised setup, and requires the implementation of all the basic components (security, database, serialisation and testing) seen in most applications. The principle is to let AI build the entire application on its own. With the promise that Windsurf has complete awareness of the context, I’m even stricter on this one, and I don’t want to interfere in the process of generating code myself, and let AI fix it completely on its own.

How the AI Built the App

The first step was to see if the claims of Windsurf were true. After installing the application I asked the simple question:

“Please create a springboot application using maven. Implement the “Pet” endpoints from the swagger documentation found at https://petstore.swagger.io/.”

The first signs were promising: It jumped right in, creating a pom.xml and a springboot application, which build successfully. This was going to be fun! I added an extra prompt, enabling database support, and it reacted nearly flawlessly providing the config for the connection, and altering the Pet models up to Entities. Time to get serious, and see “what this baby can do” 🙂. I added some extra requirements to my original prompt, created a new project folder and let Cascade repeat its trick in an empty workspace.

1. Setting the Foundation

“Please create a springboot application using maven. Implement the “Pet” endpoints from the swagger documentation found at https://petstore.swagger.io/.

Enable spring security, but allow all requests on the /pet endpoint. Implement a datasource to an in-memory database, and use flyway to automatically update the database schema. Add 30 different Pets to the database. Please use Lombok for getters and setters, and implement the builder pattern. Add a service layer using interfacing, so the controller cannot directly interact with the repository.”On which Cascade responded:

create-a-springboot-application As you can see, Cascade’s response is descriptive, and follows the given requirements in a logical way. It finalises with a summary of the changes it made, and suggests running the application using a commandline command, which can be run directly from the prompt. Neat!

2. Build Failure

First time building the application gave a nice Build Failure (just like in the real world). I don’t know if I should be surprised or disappointed, but before I could ask myself this question, the failure was already analysed, and the necessary changes were made.

checked-command-status

3. A running application

A second try to build the application, and now it’s a Build Success. So let’s start the application, and see that it all works like a charm (right?). Well, it appeared that port 8080 was still in use from a previous attempt. Cascade noticed this, and suggested a couple of commands to get the pid of the responsible process, and then the command to kill it. Again, all by suggesting the right commands, which can be executed with a single click. A second try, and the application started successfully.

4. Still some issues

Testing the endpoints Cascade suggested did show a flaw in the application; instead of a JSON response showing the details from the pets that were in the database, it showed a merely disappointing { }. Pointing this out to Cascade, it however responded in a way that didn’t disappoint;

pet java

Which did solve the issue, resulting in a JSON response which included all the Pet details

A Loop of Challenges

How promising this all may seem, I’ve also experienced some less successful results along the way. Choosing for Lombok in the initial prompt was kind of necessary. In some attempts, the assistant started using a Lombok implementation, but made some other errors along the way. It responded by disabling Lombok, and switching back to good ol’ Getters and Setters, wich obviously didn’t solve the original problem, which let to:

lombok annotation

This did also happen once using Spring security. Enabling it in a Spring security 6 environment using a v5 config made Cascade loop through several changes, eventually completely disabling Spring Security. Although the application ran fine when Security was completely disabled, it’s not the solution we’re looking for. Some Autowiring issues were identified by the IDE itself. Although it did require me to find the file with the original issue in the project and point it out myself, the only thing it required me to do was to click on the issue to feed it to the assistant. Being ‘scoped’ on a single issue, Cascade then was able to come up with a solution, and solve it in a proper manner.

The Final Result

It runs, so it must be good right? And to be honest, there’s not much to complain about the final result. Sure my prompt was pretty complete, and asked for some specific solutions and patterns. But overall, the assistant made decent choices, and the overall quality of the generated code is high.

Some examples:

Versions of dependencies are located in the property section in the pom.xml.
Use of JPA when configuring a database.
All code is neatly formatted.
Unittest method names include expected outcome (ex.
getPetById_ShouldReturnPet()).
Tests are clearly segmented in an ‘Arrange, ‘Test’ and ‘Assert’ structure.
Use of Java streaming where possible.

Although this is pretty straight forward code, and it doesn’t include complex patterns or logic, it’s still very impressive. In this example you would probably choose for the immutable toList() method, instead of the Collectors variant. But apart from that, I would approve the merge request with minimal comments. One thing to keep in mind though is that I specifically asked for certain patterns with my prompts. For example, the AI assistant didn’t mind at all exposing the database entities directly in the REST response. It required me to ask for separate dto’s and mappers to prevent the use of database entities directly in the JSON. And there’s also the unpredictability of the outcome of a prompt. Repeating the same prompt several times in a row, the outcomes all differ in some way, which resulted in half of the attempts starting flawlessly, and the other half ending in a build failure which required some sort of (human) intervention.

Conclusion
So, reading this you should think we’re there; no more developers needed, and anyone with a proper prompt can create a functional and running web application. Right? Yeah, but no.
The recent improvements in the LLM and the introduction of AI based IDE are a huge step forward. Creating a basic application shouldn’t take more than half an hour, and adding features to an existing one is peanuts. But, (and this still remains the main issue) you need a skilled developer to tell not only what you want, but also how you want it. In this example, I’ve used the Petstore API, knowing that it is a fully specified and correct API. And with my own experience I knew I had to ask the assistant I wanted it to use Flyway, Lombok, Interfacing and (for this case) an H2 database. Without these strict requirements, the outcomes are unpredictable, and chances are that the assistant would end up in an infinite loop of error solving, or create something that does run, but doesn’t work.
Given how an AI assistant (and LLM) works, this shouldn’t be a surprise; It is designed to generate answers. And it does so by calculating the best next word in a sentence. And therefore it lacks the ability to really challenge your prompt when things are unclear or unspecified. It makes its own decisions instead of asking for additional information. And that’s exactly where an experienced developer is still needed; to review decisions that were made by the assistant, and formulate boundaries to ensure the right patterns are met. With the right prompt, it’ll go almost fully automatic, but it’s getting to that right prompt, and validating the outcome that’s making the developer still required.
So are we there yet? No, not yet. And we might be further away from the point where AI takes over than we think. But when you precisely describe what you want, and how you want it, and give it enough boundaries to function in its best way. It’ll minimize the need for intervention, and you’ve got a code companion which boosts productivity like never before.