diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..3c23f21 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,98 @@ +# System Design + +This document contains a systems design for a product capable of helping software engineers by accepting prompts +from a variety of sources (Github, Jira, IDEs, etc) and autonomously retrieving context, creating a plan to solve the +prompt, executing the plan, verifying the results, and then pushing the results back to the most appropriate tool. + +At a very high level the system looks like this: \ +![arch-diagram](diagrams/arch-diagram.png) + +Components: + +* API - A REST based API capable of accepting incoming webhooks from a variety of tools. The webhooks will generate + tasks that will require context gathering, planning, execution, and testing to resolve. These get passed to the + indexer to gather context. +* Indexer - An event based processor that accepts tasks from the API and gathers context (e.g. Git files / commits, Jira + ticket status, etc) that will need to be indexed and stored for efficient retrieval by later stages. +* Agent Runner - Takes the task and associated context generated by the Indexer and generates an execution plan. Works + synchronously with the task executor to execute the plan. As tasks are executed the plan should be adjusted to ensure + the task is accomplished. +* Task Executor - A protected process running with [gVisor](https://gvisor.dev/) (a container security orchestration + mechanism) that has a set of tools available to it that the agent runner can execute to perform its task. The executor + will have a Network Policy applied such that network access is restricted the bare minimum required to use the + tools. \ + Example tools include: + * Code Context - Select appropriate context for code generators. + * Code Generators - Generate code for given tasks. + * Compilers - Ensure code compiles. + * Linters - Ensure code is well formatted and doesn't violate any defined coding standards. + * Run Tests - Ensure tests (unit, integration, system) continue to pass after making changes, make sure new tests + pass. +* Result Processor - Responsible for receiving the result of a task from the Agent Runner and disseminating it to + interested parties through the API and directly to integrated services. + +## System Dependencies + +* The solution sits on top of `Amazon Web Services` as an industry standard compute provider. We intentionally will + not use AWS products that do not have good analogs with other compute providers (e.g. Kinesis, Nova, Bedrock, etc) to + avoid vendor lock in. +* The solution is built and deployed on top of `Elastic Kubernetes Service` to provide a flexible orchestration layer + that will allow us to deploy, scale, monitor, and repair the application with relatively low effort. Updates with EKS + can be orchestrated such that they are delivered without downtime to consumers. +* For asynchronous event flows we'll use `Apache Kafka` this will allow us to handle a very large event volume with low + performance overhead. Events can be processed as cluster capacity allows and events will not be dropped in the event + of an application availability issue. +* For observability, we'll use `Prometheus` and `Grafana` in the application to provide metrics. For logs we'll use `Grafana + Loki`. This will allow us to see how the application is performing as well as identify any issues as they arise. +* To provide large language and embedding models we can host `ollama` on top of GPU equipped GKE nodes. Models can be + distributed via persistent volumes and models can be pre-loaded into vRAM with an init container. Autoscalers can be + used to scale up and down specific model versions based on demand. This doesn't preclude using LLM-as-a-service + providers. +* Persistent data storage will be done via `PostgreSQL` hosted on top of `Amazon Relational Database Service`. The + `pgvector` + extension will be used to provide efficient vector searches for searching embeddings. + +## Scaling Considerations +The system can dynamically scale based on load. A horizontal pod autoscaler can be used on each component of the system +to allow the system to scale up or down based on the current load. For "compute" instances CPU utilization can be used +to determine when to scale. For "gpu" instances an external metric measuring GPU utilization can be used to determine +when scaling operations are appropriate. For efficient usage of GPU vRAM and load spreading, models can be packaged +together such that they saturate most of the available vRAM, they can be scaled independently. + +In order to accommodate load bursts the system will operate largely asynchronously. Boundaries between system components +will be buffered with Kafka to allow the components to only consume what they're able to handle without data getting +lost or the need for complex retry mechanisms. If the number of models gets large a proxy could be developed that can +dynamically route requests to the appropriate backend with the appropriate model pre-loaded. + +## Testing / Model Migration Strategy + +An essential property of any AI based system is the ability to measure the performance of the system over time. This is +important to ensure that models can be safely migrated as the market evolves and better models are released. + +A simple approach to measure performance over time is to create a representative set of example tasks that should be +run when changes are introduced. Performance should be measured against the baseline on a number of different metrics +such as: + +* Number of agentic iterations required to solve the task (less is better). +* Amount of code / responses generated (less is better). +* Success rate (more is better). + +Over time, as problematic areas are identified, new tasks should be introduced to the set to improve the training data. + +## Migrating / Managing Embeddings + +One particularly sensitive area for migrating models is around embeddings models. Better models are routinely published +but, it is expensive to re-index data, especially if the volumes are large. + +The vector database should store the model that produced each embedding. When new embedding models are introduced the +indexer should use the new embedding model to index new content, but should allow old content to be searched using old +models. If models are permanently retired the content should be re-indexed with a supported embeddings model. The vector +database should allow the same content to be indexed with multiple models at the same time. + +## Data Flow + +![dataflow-diagram](diagrams/dataflow-diagram.png) + +## Sequence Diagram + +![sequence-diagram](diagrams/sequence-diagram.png) \ No newline at end of file diff --git a/README.md b/README.md index 716a3d5..6a51bc4 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,8 @@ work. The reason this approach was taken was that it could be implemented quickly and produces reasonable looking results while being a solid platform for further iteration. +A design for a more full-featured and robust implementation can be found in [DESIGN.md](DESIGN.md). + ## Code Structure | Package | Description | diff --git a/diagrams/arch-diagram.drawio b/diagrams/arch-diagram.drawio new file mode 100644 index 0000000..b4c1f61 --- /dev/null +++ b/diagrams/arch-diagram.drawio @@ -0,0 +1,174 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/diagrams/arch-diagram.png b/diagrams/arch-diagram.png new file mode 100644 index 0000000..d4f6521 Binary files /dev/null and b/diagrams/arch-diagram.png differ diff --git a/diagrams/dataflow-diagram.drawio b/diagrams/dataflow-diagram.drawio new file mode 100644 index 0000000..a881d4e --- /dev/null +++ b/diagrams/dataflow-diagram.drawio @@ -0,0 +1,193 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/diagrams/dataflow-diagram.png b/diagrams/dataflow-diagram.png new file mode 100644 index 0000000..8c6e5b7 Binary files /dev/null and b/diagrams/dataflow-diagram.png differ diff --git a/diagrams/sequence-diagram.drawio b/diagrams/sequence-diagram.drawio new file mode 100644 index 0000000..c56eb58 --- /dev/null +++ b/diagrams/sequence-diagram.drawio @@ -0,0 +1,217 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/diagrams/sequence-diagram.png b/diagrams/sequence-diagram.png new file mode 100644 index 0000000..4f734c4 Binary files /dev/null and b/diagrams/sequence-diagram.png differ