Files
ai-code-assistant/README.md

3.4 KiB

AI Coding Assistant

This repository contains a proof-of-concept of an AI coding assistant capable of taking a prompt from the user, examining the repository looking for the most appropriate place to update, and then generating code appropriate to the prompt, and then a unit test and commit message.

Requirements

To use this application you must provide the following:

  • A PostgreSQL database server with the pgvector extension installed and configured.
  • A working Go toolchain.
  • An ollama server for Prompts and Embedding.
  • (Optional) An OpenAI compatible API for running larger models.

Usage

To configure the application, copy config.yaml.tmpl to config.yaml in the root of the work directory and modify as appropriate.

To run the application (drop --execute if you only want a dry run, leave it if you like to live dangerously):

go run ./cmd autopatch --repo /path/to/git/repo --task "coding prompt" --execute

To see what got generated:

git log --full-diff -p -n 1

Models Used in Testing

The following models were used while developing the application.

Model Purpose
nomic-embed-text Used for generating embeddings from source chunks.
gemma2-9b-it Used for generating code.
llama3.2 Used for conversational prompts and generating git commit messages.

Limitations / Assumptions

The following shortcuts have been taken to reduce time to implementation:

  • The application does not use an autonomous agentic approach as this would have taken implementing verification tools for agent executed steps along with robust retry logic. Instead this implementation uses a much simpler rules based approach.
  • This application currently only supports modifying codebases written in Go. There's nothing fundamental about the approach taken that prevents supporting other languages, but it would have added complexity to the implementation and testing.
  • The application currently only supports modifying a single chunk of code. To support multi-chunk editing an autonomous agentic approach would need to be used that would recursively identify interesting segements of code to patch. This would have greatly increased the time to implementation.
  • No attempt is made to verify the correctness of the code generated. This can be done by making sure the code compiles, passes its generated test(s), and when run against a linter doesn't generate warnings. If an agentic approach were used feedback from the tools could be passed back to the code generator to correct the code segments.
  • The approach taken to generate the embeddings is very simple. Files are segmented into fixed size chunks and are embedded. A more robust approach would be to generate an AST for all code in the repository along with an accompanying reference graph. A coordinating agent would consume the symbols generated to produce a set of modification tasks and then context from the coordinator could be passed to a patching agent along with the relevant section of code to modify in order to produce a set of patches.
  • No attempt was made to tune the models to the task, there are almost certainly better models to use than the ones selected. This was the first set of models that produced a workable result.