AI Coding Assistant

This repository contains a proof-of-concept of an AI coding assistant capable of taking a prompt from the user, examining the repository looking for the most appropriate place to update, and then generating code appropriate to the prompt, and then a unit test and commit message.

Solutions Approach

The following approach is taken by this implementation:

  • The repository is scanned, chunked, and embeddings are generated for all supported source code.
  • The user provided prompt is also embedded and a vector search is performed to find the most appropriate chunk of code to modify.
  • The identified chunk and surrounding context is provided to a code-appropriate LLM and the LLM is asked to write code appropriate to solve the prompt.
  • The generated code is extracted from the LLM response and a diff patch is produced, the diff is applied to the source file.
  • The context of the generated code and original prompt are provided to the code LLM and its asked to generate an appropriate unit test.
  • The unit test is either appended to an existing test file or a new test file is generated.
  • The changed files are staged and committed to Git after generating an appropriate Git commit message using a chat appropriate LLM.

This implementation is very simple and has numerous drawbacks (see limitations / assumptions section below) but it does work.

The reason this approach was taken was that it could be implemented quickly and produces reasonable looking results while being a solid platform for further iteration.

Code Structure

Package Description
cmd Main entrypoint
cmd/autopatch Command to generate a git patch automatically from a prompt.
cmd/indexer Command to (re-)index a repository and regenerate embeddings.
cmd/chunks Command to search chunks based on an embeddings prompt.
pkg/config Configuration file definition.
pkg/database Database abstraction, used to perform bootstrapping / migrations.
pkg/database/migrations Database schema.
pkg/indexer Embedding generator for Git repositories.
pkg/llm Abstraction layer over LLM implementations.
pkg/llm/prompt Prompt templates for the LLM.

Requirements

To use this application you must provide the following:

  • A PostgreSQL database server with the pgvector extension installed and configured.
  • A working Go toolchain.
  • An ollama server for Prompts and Embedding.
  • (Optional) An OpenAI compatible API for running larger models.

Usage

To configure the application, copy config.yaml.tmpl to config.yaml in the root of the work directory and modify as appropriate.

To run the application (drop --execute if you only want a dry run, leave it if you like to live dangerously):

go run ./cmd autopatch --repo /path/to/git/repo --task "coding prompt" --execute

To see what got generated:

git log --full-diff -p -n 1

Limitations / Assumptions

The following shortcuts have been taken to reduce time to implementation:

  • The application does not use an autonomous agentic approach as this would have taken implementing verification tools for agent executed steps along with robust retry logic. Instead this implementation uses a much simpler rules based approach.
  • This application currently only supports modifying codebases written in Go. There's nothing fundamental about the approach taken that prevents supporting other languages, but it would have added complexity to the implementation and testing.
  • The application currently only supports modifying a single chunk of code. To support multi-chunk editing an autonomous agentic approach would need to be used that would recursively identify interesting segements of code to patch. This would have greatly increased the time to implementation.
  • No attempt is made to verify the correctness of the code generated. This can be done by making sure the code compiles, passes its generated test(s), and when run against a linter doesn't generate warnings. If an agentic approach were used feedback from the tools could be passed back to the code generator to correct the code segments.
  • The approach taken to generate the embeddings is very simple. Files are segmented into fixed size chunks and are embedded. A more robust approach would be to generate an AST for all code in the repository along with an accompanying reference graph. A coordinating agent would consume the symbols generated to produce a set of modification tasks and then context from the coordinator could be passed to a patching agent along with the relevant section of code to modify in order to produce a set of patches.
  • No attempt was made to tune the models to the task, there are almost certainly better models to use than the ones selected. This was the first set of models that produced a workable result.

Models Used in Testing

The following models were used while developing the application.

Model Purpose
nomic-embed-text Used for generating embeddings from source chunks.
gemma2-9b-it Used for generating code.
llama3.2 Used for conversational prompts and generating git commit messages.

Run Example

An example of an execution:

git clone https://github.com/andyjessop/simple-go-server

go run ./cmd auto-patch \
  --repo /path/to/simple-go-server \
  --task "add a new api route to main that responds to GETs to /echo with 'hello world' with the current time appended at the end" \
  --execute

Log:

{"time":"2025-04-20T10:16:53.742200259-04:00","level":"INFO","msg":"repo already indexed, skipping"}
{"time":"2025-04-20T10:16:53.793235826-04:00","level":"INFO","msg":"found most relevant file chunk","file":"~/Projects/simple-go-server/main.go","start":2048,"end":2461,"score":0.4782574772834778,"id":1}
{"time":"2025-04-20T10:16:55.908966322-04:00","level":"INFO","msg":"applying generated patch to file","file":"~/Projects/simple-go-server/main.go"}
{"time":"2025-04-20T10:16:56.412154285-04:00","level":"INFO","msg":"applying generated unit test to file","file":"~/Projects/simple-go-server/main_test.go"}
{"time":"2025-04-20T10:16:56.758200341-04:00","level":"INFO","msg":"committed changes to git repo","repo":"~/Projects/simple-go-server"}

Diff Generated:

commit 29c5b5a2a2b71a05b4b782e94d482938cf51ba2b (HEAD -> main)
Author: Michael Powers <swedishborgie@gmail.com>
Date:   Sun Apr 20 10:16:56 2025 -0400

    "Added new API route to echo 'Hello World' with current time in response to GET requests to /echo"

diff --git a/main.go b/main.go
index 9acae7f..b475345 100644
--- a/main.go
+++ b/main.go
@@ -1,3 +1,4 @@
+
 package main
 
 import (
@@ -8,6 +9,7 @@ import (
        "net/http"
        "strconv"
        "sync"
+       "time"
 )
 
 type Post struct {
@@ -24,6 +26,7 @@ var (
 func main() {
        http.HandleFunc("/posts", postsHandler)
        http.HandleFunc("/posts/", postHandler)
+       http.HandleFunc("/echo", echoHandler)
 
        fmt.Println("Server is running at http://localhost:8080")
        log.Fatal(http.ListenAndServe(":8080", nil))
@@ -121,3 +124,7 @@ func handleDeletePost(w http.ResponseWriter, r *http.Request, id int) {
        delete(posts, id)
        w.WriteHeader(http.StatusOK)
 }
+
+func echoHandler(w http.ResponseWriter, r *http.Request) {
+       w.Write([]byte("hello world " + time.Now().Format(time.RFC3339)))
+}
diff --git a/main_test.go b/main_test.go
index 4ca1d78..a051498 100644
--- a/main_test.go
+++ b/main_test.go
@@ -81,3 +81,25 @@ func TestHandlePostAndDeletePosts(t *testing.T) {
                t.Errorf("handler returned wrong status code for delete: got %v want %v", status, http.StatusOK)
        }
 }
+
+
+func TestEchoHandler(t *testing.T) {
+       req, err := http.NewRequest("GET", "/echo", nil)
+       if err != nil {
+               t.Fatal(err)
+       }
+
+       rr := httptest.NewRecorder()
+       echoHandler(rr, req)
+
+       if rr.Code != http.StatusOK {
+               t.Errorf("handler returned wrong status code: got %v want %v",
+                       rr.Code, http.StatusOK)
+       }
+
+       expected := "hello world " + time.Now().Format(time.RFC3339)
+       if rr.Body.String() != expected {
+               t.Errorf("handler returned unexpected body: got %v want %v",
+                       rr.Body.String(), expected)
+       }
+}
Description
A prototype implementation of an AI coding assistant.
Readme 262 KiB
Languages
Go 100%