AI Workflow Architecture
This document provides the detailed technical specification for the LangGraph-powered workflow at the heart of Aurix. This state machine transforms raw audio into structured documents and calculates the Cognitive Load Index (θ).
1. High-Level Workflow
The process is a multi-stage pipeline designed for robustness, real-time feedback, and error recovery.
2. State Definition
The workflow operates on a state object that is passed between nodes, each enriching it with new information.
// A simplified version of the full state
interface WorkflowState {
// Input
audioBuffer: ArrayBuffer | null;
// Transcription
rawTranscript: string;
// Analysis
documentType: 'technical' | 'meeting' | 'general';
topics: string[];
entities: string[];
complexityScore: number; // A score from 0-1 representing content complexity
// Generation
sections: Array<{ title: string; content: string }>;
diagrams: Array<{ title: string; mermaidCode: string }>;
// Output
finalDocument: string;
cognitiveLoadIndex: number;
// Control
errors: any[];
}
3. Node Specifications
transcription_node
- Purpose: To convert the user's voice into text.
- Process:
- Receives a raw audio buffer.
- Uses a local
whisper.cppmodel to perform speech-to-text. - Provides a real-time stream of the transcript to the UI.
- Output: Populates
state.rawTranscript.
analysis_node
- Purpose: To understand the content and complexity of the user's speech.
- Process:
- Uses a local LLM (via Ollama) to analyze the
rawTranscript. - Classifies the content (e.g., technical spec, meeting notes).
- Extracts key topics and named entities.
- Calculates a
complexityScorebased on factors like technical jargon, density of concepts, and sentence structure. This score is a primary input for the Cognitive Load Index.
- Uses a local LLM (via Ollama) to analyze the
- Output: Populates
state.documentType,state.topics,state.entities, andstate.complexityScore.
document_generation_node
- Purpose: To structure the raw transcript into a coherent document.
- Process:
- Takes the transcript and analysis as input.
- Prompts a local LLM to generate a logical outline (headings, subheadings).
- Generates content for each section based on the outline.
- Output: Populates
state.sections.
diagram_generation_node
- Purpose: To create visual diagrams from the user's descriptions.
- Process:
- Scans the transcript for keywords indicating a diagram is being described (e.g., "flowchart," "sequence diagram," "shows how the user logs in").
- If hints are found, it prompts an LLM with the relevant text, asking it to generate valid Mermaid.js syntax.
- Includes a validation step to ensure the generated Mermaid code is renderable.
- Output: Populates
state.diagrams.
assembly_node
- Purpose: To combine all generated parts into a final document.
- Process:
- Creates a single Markdown string.
- Iterates through the
sectionsanddiagrams, inserting them in the correct order.
- Output: Populates
state.finalDocument.
cognitive_load_index_node
- Purpose: To calculate the final Cognitive Load Index (θ).
- Process:
- This node acts as a final aggregator.
- It combines the
complexityScorefrom theanalysis_nodewith other factors like the total length of the transcript and the number of diagrams generated. - It applies a weighting formula, which can be adjusted over time by user feedback, to produce the final score.
- Output: Populates
state.cognitiveLoadIndex.
4. Conditional Logic & Error Handling
The graph uses conditional edges to provide resilience:
- If diagram generation fails, the workflow can bypass it and still produce a text-only document.
- If transcription quality is low, it can prompt the user to re-record or use a more powerful (optional) cloud model.
- A central
error_handler_nodecatches failures from any step, logs them, and updates the UI with a helpful message, preventing the application from crashing.
This architecture creates a powerful and resilient system for turning spoken ideas into structured, useful documentation while providing novel insights into the user's own cognitive workload.