CLI Project: Document Q&A System
In the previous tutorial, you built a complete RAG pipeline using PostgreSQL that can ingest documents and answer questions. Now, let's take that foundation and create something truly practical - a command-line interface (CLI) that real users can interact with.
Think of this as transforming your RAG pipeline from a programmatic library into a user-friendly application. Instead of hardcoded documents and queries, users will be able to point the CLI to their own files and ask questions interactively, all while leveraging PostgreSQL for persistent vector storage.
By the end of this tutorial, you'll have a CLI tool that can:
- Read documents from actual files (
.txt,.md,.json) - Process multiple documents at once using PostgreSQL storage
- Provide an interactive chat interface for asking questions
- Use PostgreSQL with pgvector for persistent vector storage
- Handle real-world scenarios like file errors and invalid queries
- Execute SQL similarity searches for optimal performance
What is a CLI Document QA System?
A CLI Document QA System is a command-line application that combines your PostgreSQL-based RAG pipeline with user interface capabilities. It bridges the gap between your technical RAG implementation and practical user needs.
Here's how users will interact with your system:
# Ingest documents from a directory
npm run cli -- ingest ./my-docs
# Ask a single question
npm run cli -- query "How do I handle TypeScript errors?"
# Start an interactive conversation
npm run cli -- interactive
# Process specific file types
npm run cli -- ingest ./docs --type md
# Check database status
npm run cli -- status
Key Differences from Tutorial 4.4:
Tutorial 4.4 (RAG Pipeline):
- Programmatic interface with hardcoded documents
- Demonstrated concepts with sample text
- Required code modifications to test new documents
- PostgreSQL storage but no CLI interface
Tutorial 4.5 (CLI System):
- Interactive command-line interface
- Reads actual files from the filesystem
- PostgreSQL persistent vector database storage
- User-friendly commands and error handling
- Real-world file processing and validation
- SQL-based similarity search with pgAdmin debugging
CLI Application Architecture
Our CLI application will have several distinct modes of operation, all backed by PostgreSQL:
graph TD
A[CLI Application] --> B[Ingest Mode]
A --> C[Query Mode]
A --> D[Interactive Mode]
A --> E[Status Mode]
B --> F[Read Files]
F --> G[Process Documents]
G --> H[PostgreSQL Storage]
C --> I[PostgreSQL Query]
I --> J[SQL Similarity Search]
J --> K[Return Answer]
D --> L[PostgreSQL Connection]
L --> M[Interactive Loop]
M --> N[SQL-based Questions]
N --> M
E --> O[Database Statistics]
Core Components:
- Command Parser: Handle command-line arguments and flags
- File Processor: Read and validate different file formats
- PostgreSQL Document Manager: Enhanced version with database persistence
- Interactive Interface: Chat-like experience for questions
- Progress Indicators: User feedback during long operations
- Database Connection: PostgreSQL with pgvector integration
Project Setup & Requirements
Let's set up our CLI project with proper structure and dependencies:
Environment Setup
# Install required dependencies
npm install @google/generative-ai pg @types/pg
npm install commander chalk ora
npm install @types/node typescript ts-node
npm install --save-dev @types/fs-extra
npm install fs-extra readline
# Install PostgreSQL and pgvector (if not already installed)
# On macOS with Homebrew:
brew install postgresql
brew install pgvector
# On Ubuntu /Debian:
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-14-pgvector
New Dependencies Explained:
- commander: Professional command-line argument parsing
- chalk: Colored terminal output for better UX
- ora: Elegant terminal spinners for progress indication
- fs-extra: Enhanced file system operations
- readline: Interactive input handling for chat mode
- pg: PostgreSQL client for database operations
Create the project structure:
mkdir cli-document-qa
cd cli-document-qa
mkdir src test-docs
touch src/cli.ts src/document-manager.ts src/file-processor.ts
Make sure your .env file contains both your Google API key and database credentials:
# .env
GOOGLE_API_KEY=your_gemini_api_key_here
DB_HOST=localhost
DB_PORT=5432
DB_NAME=cli_document_qa_db
DB_USER=postgres
DB_PASSWORD=your_password
Working Code Example
Let's build our CLI Document QA System step by step using PostgreSQL for vector storage:
Step 1: Database Setup
First, create the database schema in pgAdmin or psql:
-- Create the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create document_chunks table for our CLI system
CREATE TABLE document_chunks (
id SERIAL PRIMARY KEY,
chunk_id VARCHAR(255) UNIQUE NOT NULL,
content TEXT NOT NULL,
embedding vector(768), -- Google's text-embedding-004 uses 768 dimensions
metadata JSONB NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create an index for faster similarity searches
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Create an index on chunk_id for faster lookups
CREATE INDEX ON document_chunks (chunk_id);
-- Create an index on metadata for filtering
CREATE INDEX ON document_chunks USING gin (metadata);
This SQL creates our vector-enabled table specifically designed for the CLI system.
Step 2: PostgreSQL Document Manager with CLI Integration
// src/document-manager.ts
import { GoogleGenerativeAI } from '@google/generative-ai';
import { Client } from 'pg';
import * as path from 'path';
import { ProcessedFile } from './file-processor';
interface DocumentChunk {
id: number;
chunk_id: string;
content: string;
embedding: number[];
metadata: {
source: string;
chunkIndex: number;
fileType: string;
lastModified: Date;
};
}
interface SearchResult {
chunk: DocumentChunk;
similarity: number;
}
export class PostgreSQLDocumentManager {
private client: Client;
private genAI: GoogleGenerativeAI;
private embeddingModel: any;
private generativeModel: any;
constructor() {
this.client = new Client({
host: process.env.DB_HOST || 'localhost',
port: parseInt(process.env.DB_PORT || '5432'),
database: process.env.DB_NAME || 'cli_document_qa_db',
user: process.env.DB_USER || 'postgres',
password: process.env.DB_PASSWORD || 'password',
});
this.genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
this.embeddingModel = this.genAI.getGenerativeModel({
model: "text-embedding-004"
});
this.generativeModel = this.genAI.getGenerativeModel({
model: "gemini-2.5-flash"
});
}
async connect(): Promise<void> {
await this.client.connect();
console.log('📊 Connected to PostgreSQL database');
}
async disconnect(): Promise<void> {
await this.client.end();
console.log('📊 Disconnected from PostgreSQL database');
}
}
This creates our PostgreSQL-based document manager specifically designed for CLI usage.
Step 3: Database Statistics and Management
// src/document-manager.ts (add to PostgreSQLDocumentManager class)
async getStats() {
try {
const query = `
SELECT
COUNT(*) as total_chunks,
COUNT(DISTINCT metadata->>'source') as total_documents,
array_agg(DISTINCT metadata->>'fileType') as file_types,
MAX(created_at) as last_updated
FROM document_chunks
`;
const result = await this.client.query(query);
const row = result.rows[0];
return {
totalChunks: parseInt(row.total_chunks),
totalDocuments: parseInt(row.total_documents),
fileTypes: row.file_types ? row.file_types.filter(Boolean) : [],
lastUpdated: row.last_updated ? new Date(row.last_updated) : null
};
} catch (error) {
console.error('Error getting stats:', error);
return {
totalChunks: 0,
totalDocuments: 0,
fileTypes: [],
lastUpdated: null
};
}
}
async clearDatabase(): Promise<void> {
try {
await this.client.query('DELETE FROM document_chunks');
console.log('🗑️ Database cleared');
} catch (error) {
console.error('Error clearing database:', error);
throw error;
}
}
async checkDocumentExists(filePath: string): Promise<boolean> {
try {
const query = `
SELECT EXISTS(
SELECT 1 FROM document_chunks
WHERE metadata->>'source' = $1
)
`;
const result = await this.client.query(query, [filePath]);
return result.rows[0].exists;
} catch (error) {
console.error('Error checking document existence:', error);
return false;
}
}
This adds database management capabilities specific to CLI operations.
Step 4: File Processing Implementation
// src/file-processor.ts
import fs from 'fs-extra';
import * as path from 'path';
export interface ProcessedFile {
path: string;
content: string;
type: string;
size: number;
lastModified: Date;
}
export class FileProcessor {
private supportedExtensions = ['.txt', '.md', '.json'];
async processFile(filePath: string): Promise<ProcessedFile | null> {
try {
const stats = await fs.stat(filePath);
const ext = path.extname(filePath).toLowerCase();
if (!this.supportedExtensions.includes(ext)) {
console.warn(`⚠️ Skipping unsupported file type: ${filePath}`);
return null;
}
let content: string;
switch (ext) {
case '.json':
const jsonData = await fs.readJson(filePath);
content = this.extractTextFromJson(jsonData);
break;
default:
content = await fs.readFile(filePath, 'utf-8');
}
if (content.trim().length === 0) {
console.warn(`⚠️ Skipping empty file: ${filePath}`);
return null;
}
return {
path: filePath,
content: content.trim(),
type: ext.slice(1), // Remove the dot
size: stats.size,
lastModified: stats.mtime
};
} catch (error) {
console.error(`❌ Error processing file ${filePath}:`, error);
return null;
}
}
private extractTextFromJson(data: any): string {
const textParts: string[] = [];
const extractText = (obj: any, prefix: string = ''): void => {
if (typeof obj === 'string') {
textParts.push(obj);
} else if (typeof obj === 'object' && obj !== null) {
if (Array.isArray(obj)) {
obj.forEach((item, index) => extractText(item, `${prefix}[${index}]`));
} else {
Object.entries(obj).forEach(([key, value]) => {
const newPrefix = prefix ? `${prefix}.${key}` : key;
if (typeof value === 'string' && value.length > 10) {
textParts.push(`${key}: ${value}`);
} else {
extractText(value, newPrefix);
}
});
}
}
};
extractText(data);
return textParts.join('\n\n');
}
async processDirectory(dirPath: string, recursive: boolean = true): Promise<ProcessedFile[]> {
const files: ProcessedFile[] = [];
try {
const items = await fs.readdirSync(dirPath);
for (const item of items) {
const itemPath = path.join(dirPath, item);
const stats = await fs.stat(itemPath);
if (stats.isDirectory() && recursive) {
const subFiles = await this.processDirectory(itemPath, recursive);
files.push(...subFiles);
} else if (stats.isFile()) {
const processedFile = await this.processFile(itemPath);
if (processedFile) {
files.push(processedFile);
}
}
}
} catch (error) {
console.error(`❌ Error processing directory ${dirPath}:`, error);
}
return files;
}
}
This handles reading and processing different file types safely with enhanced error handling.
Step 5: PostgreSQL Document Ingestion
// src/document-manager.ts (add to PostgreSQLDocumentManager class)
private splitIntoChunks(text: string, chunkSize: number = 600, overlap: number = 100): string[] {
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
let end = start + chunkSize;
if (end < text.length) {
// Find better breakpoints
const breakChars = ['\n\n', '\n', '. ', '!', '?', ';'];
let bestBreak = -1;
for (const breakChar of breakChars) {
const breakIndex = text.lastIndexOf(breakChar, end);
if (breakIndex > start + chunkSize * 0.5) {
bestBreak = breakIndex + breakChar.length;
break;
}
}
if (bestBreak > -1) {
end = bestBreak;
}
}
chunks.push(text.slice(start, end).trim());
start = end - overlap;
}
return chunks.filter(chunk => chunk.length > 20);
}
async ingestFile(file: ProcessedFile): Promise<void> {
console.log(`📄 Processing: ${path.basename(file.path)} (${file.size} bytes)`);
try {
// Check if file already exists and remove old chunks
const existsQuery = `DELETE FROM document_chunks WHERE metadata->>'source' = $1`;
await this.client.query(existsQuery, [file.path]);
const textChunks = this.splitIntoChunks(file.content);
console.log(` └─ Created ${textChunks.length} chunks`);
for (let i = 0; i < textChunks.length; i++) {
const chunk = textChunks[i];
// Create embedding for the chunk
const result = await this.embeddingModel.embedContent(chunk);
const embedding = result.embedding.values;
// Prepare chunk data
const chunkId = `${file.path}_chunk_${i}`;
const metadata = {
source: file.path,
chunkIndex: i,
fileType: file.type,
lastModified: file.lastModified.toISOString()
};
// Insert into PostgreSQL
const insertQuery = `
INSERT INTO document_chunks (chunk_id, content, embedding, metadata)
VALUES ($1, $2, $3, $4)
`;
const values = [
chunkId,
chunk,
`[${embedding.join(',')}]`, // Convert array to PostgreSQL vector format
JSON.stringify(metadata)
];
await this.client.query(insertQuery, values);
}
console.log(`✅ Successfully processed ${file.path}`);
} catch (error) {
console.error(`❌ Error ingesting file ${file.path}:`, error);
throw error;
}
}
This implements PostgreSQL-based document ingestion with proper chunk management.
Step 6: PostgreSQL Similarity Search Implementation
// src/document-manager.ts (add to PostgreSQLDocumentManager class)
async searchSimilarChunks(query: string, topK: number = 4): Promise<SearchResult[]> {
try {
// Create embedding for the query
const result = await this.embeddingModel.embedContent(query);
const queryEmbedding = result.embedding.values;
// Execute similarity search using PostgreSQL with pgvector
const searchQuery = `
SELECT
id,
chunk_id,
content,
metadata,
created_at,
1 - (embedding <=> $1) AS similarity
FROM document_chunks
ORDER BY embedding <=> $1
LIMIT $2
`;
const queryVector = `[${queryEmbedding.join(',')}]`;
const res = await this.client.query(searchQuery, [queryVector, topK]);
return res.rows.map(row => ({
chunk: {
id: row.id,
chunk_id: row.chunk_id,
content: row.content,
embedding: [], // We don't need to return the full embedding
metadata: {
source: row.metadata.source,
chunkIndex: row.metadata.chunkIndex,
fileType: row.metadata.fileType,
lastModified: new Date(row.metadata.lastModified)
}
},
similarity: parseFloat(row.similarity)
}));
} catch (error) {
console.error('Error searching similar chunks:', error);
throw error;
}
}
async askQuestion(question: string, maxResults: number = 4): Promise<string> {
// Check if database has any documents
const stats = await this.getStats();
if (stats.totalChunks === 0) {
return "❌ No documents have been ingested yet. Please ingest some documents first using --ingest.";
}
try {
// Search for relevant chunks using PostgreSQL
const searchResults = await this.searchSimilarChunks(question, maxResults);
if (searchResults.length === 0 || searchResults[0].similarity < 0.3) {
return "🤔 I couldn't find relevant information in the ingested documents to answer your question.";
}
// Prepare context from retrieved chunks
const context = searchResults.map((result, index) => {
const source = path.basename(result.chunk.metadata.source);
return `[Source: ${source}]\n${result.chunk.content}`;
}).join('\n\n---\n\n');
// Generate answer using the LLM
const prompt = `You are a helpful assistant answering questions based on provided documentation.
Context from documents:
${context}
Question: ${question}
Instructions:
- Answer based only on the provided context
- If the context doesn't contain enough information, say so clearly
- Be specific and reference the sources when possible
- Keep the answer concise but complete
- Format your response in a friendly, conversational tone
Answer:`;
const response = await this.generativeModel.generateContent(prompt);
const answer = response.response.text();
// Add source information
const sources = Array.from(new Set(searchResults.map(r => path.basename(r.chunk.metadata.source))));
const sourceInfo = `\n\n📚 Sources: ${sources.join(', ')}`;
return answer + sourceInfo;
} catch (error) {
console.error('❌ Error processing question:', error);
return "❌ An error occurred while processing your question. Please try again.";
}
}
This is the key change! 🎯 This implements the search functionality using PostgreSQL's native vector operations with the <=> cosine distance operator, exactly as you requested.
Step 7: CLI Interface Implementation
// src/cli.ts
import { Command } from 'commander';
import chalk from 'chalk';
import ora from 'ora';
import * as readline from 'readline';
import fs from 'fs-extra';
import { PostgreSQLDocumentManager } from './document-manager';
import { FileProcessor } from './file-processor';
import 'dotenv/config';
const program = new Command();
program
.name('doc-qa')
.description('CLI Document Question & Answer System using PostgreSQL RAG')
.version('1.0.0');
const documentManager = new PostgreSQLDocumentManager();
const fileProcessor = new FileProcessor();
// Ensure proper cleanup on exit
process.on('SIGINT', async () => {
console.log(chalk.yellow('\n🔄 Cleaning up...'));
await documentManager.disconnect();
process.exit(0);
});
This sets up our CLI framework with proper PostgreSQL connection management.
Step 8: Ingest Command Implementation
// src/cli.ts (add after program setup)
program
.command('ingest')
.description('Ingest documents from files or directories into PostgreSQL')
.argument('<path>', 'File or directory path to ingest')
.option('-r, --recursive', 'Process directories recursively', true)
.option('-t, --type <type>', 'Filter by file type (txt, md, json)')
.action(async (inputPath, options) => {
const spinner = ora('Connecting to PostgreSQL database...').start();
try {
await documentManager.connect();
spinner.succeed('Connected to PostgreSQL database');
// Check if path exists
if (!(await fs.pathExists(inputPath))) {
spinner.fail(`Path does not exist: ${inputPath}`);
return;
}
spinner.start('Discovering files...');
const stats = await fs.stat(inputPath);
let files;
if (stats.isFile()) {
const file = await fileProcessor.processFile(inputPath);
files = file ? [file] : [];
} else {
files = await fileProcessor.processDirectory(inputPath, options.recursive);
}
// Filter by type if specified
if (options.type) {
files = files.filter(f => f.type === options.type);
}
if (files.length === 0) {
spinner.warn('No valid files found to process');
return;
}
spinner.succeed(`Found ${files.length} files to process`);
// Process files
console.log(chalk.blue('\n📄 Processing documents...\n'));
for (let i = 0; i < files.length; i++) {
const file = files[i];
const progress = `(${i + 1}/${files.length})`;
console.log(chalk.gray(progress), `Processing ${file.path}...`);
await documentManager.ingestFile(file);
}
// Show final stats
const finalStats = await documentManager.getStats();
console.log(chalk.green('\n✅ Ingestion complete!'));
console.log(chalk.blue(`📊 Total chunks: ${finalStats.totalChunks}`));
console.log(chalk.blue(`📁 Documents: ${finalStats.totalDocuments}`));
console.log(chalk.blue(`📋 File types: ${finalStats.fileTypes.join(', ')}`));
} catch (error) {
spinner.fail('Ingestion failed');
console.error(chalk.red('❌ Error:'), error);
} finally {
await documentManager.disconnect();
}
});
This implements the document ingestion command with PostgreSQL storage and progress indicators.
Step 9: Query Command Implementation
// src/cli.ts (add after ingest command)
program
.command('query')
.description('Ask a question about the ingested documents')
.argument('<question>', 'Question to ask')
.action(async (question) => {
const spinner = ora('Connecting to PostgreSQL database...').start();
try {
await documentManager.connect();
const stats = await documentManager.getStats();
if (stats.totalChunks === 0) {
spinner.fail('No documents found in database. Please ingest documents first.');
return;
}
spinner.text = 'Processing question with PostgreSQL similarity search...';
const answer = await documentManager.askQuestion(question);
spinner.stop();
console.log(chalk.blue('\n🤔 Question:'), question);
console.log(chalk.green('\n🤖 Answer:'));
console.log(answer);
} catch (error) {
spinner.fail('Query failed');
console.error(chalk.red('❌ Error:'), error);
} finally {
await documentManager.disconnect();
}
});
This implements the single-query command functionality using PostgreSQL.
Step 10: Interactive Mode Implementation
// src/cli.ts (add after query command)
program
.command('interactive')
.description('Start an interactive Q&A session')
.alias('chat')
.action(async () => {
const spinner = ora('Connecting to PostgreSQL database...').start();
try {
await documentManager.connect();
const stats = await documentManager.getStats();
if (stats.totalChunks === 0) {
spinner.fail('No documents found in database. Please ingest documents first.');
return;
}
spinner.succeed('Connected to PostgreSQL database');
console.log(chalk.blue('\n🎯 Interactive Document Q&A Session (PostgreSQL Backend)'));
console.log(chalk.gray(`📊 ${stats.totalChunks} chunks from ${stats.totalDocuments} documents`));
console.log(chalk.gray('💡 Type "quit", "exit", or press Ctrl+C to end\n'));
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
const askQuestion = async (): Promise<void> => {
rl.question(chalk.cyan('❓ Your question: '), async (question) => {
if (!question.trim()) {
console.log(chalk.yellow('Please enter a question.\n'));
askQuestion();
return;
}
if (['quit', 'exit', 'q'].includes(question.toLowerCase().trim())) {
console.log(chalk.blue('\n👋 Thanks for using the Document Q&A system!'));
rl.close();
await documentManager.disconnect();
return;
}
console.log(chalk.gray('🤔 Searching PostgreSQL database...'));
try {
const answer = await documentManager.askQuestion(question);
console.log(chalk.green('\n🤖 Answer:'));
console.log(answer);
console.log(chalk.gray('\n' + '─'.repeat(50) + '\n'));
} catch (error) {
console.log(chalk.red('❌ Error processing question:'), error);
}
askQuestion(); // Continue the loop
});
};
askQuestion();
} catch (error) {
spinner.fail('Failed to start interactive mode');
console.error(chalk.red('❌ Error:'), error);
await documentManager.disconnect();
}
});
This creates an interactive chat interface for continuous Q&A using PostgreSQL.
Step 11: Status Command and SQL Debugging
// src/cli.ts (add after interactive command)
program
.command('status')
.description('Show PostgreSQL database statistics')
.alias('status')
.action(async () => {
const spinner = ora('Connecting to PostgreSQL database...').start();
try {
await documentManager.connect();
const stats = await documentManager.getStats();
if (stats.totalChunks === 0) {
spinner.warn('No documents found in database');
console.log(chalk.yellow('🔍 No documents have been ingested yet.'));
console.log(chalk.blue('💡 Use "ingest <path>" to add documents.'));
return;
}
spinner.succeed('Connected to PostgreSQL database');
console.log(chalk.blue('\n📊 PostgreSQL Database Statistics'));
console.log(chalk.gray('─'.repeat(40)));
console.log(`📄 Total chunks: ${chalk.green(stats.totalChunks)}`);
console.log(`📁 Documents: ${chalk.green(stats.totalDocuments)}`);
console.log(`📋 File types: ${chalk.green(stats.fileTypes.join(', '))}`);
if (stats.lastUpdated) {
console.log(`🕒 Last updated: ${chalk.green(stats.lastUpdated.toLocaleString())}`);
}
console.log(chalk.blue('\n🔍 SQL Debugging Queries for pgAdmin:'));
console.log(chalk.gray('─'.repeat(40)));
console.log(chalk.cyan(`
-- View all document chunks
SELECT
chunk_id,
LEFT(content, 100) as content_preview,
metadata->>'source' as source,
metadata->>'fileType' as file_type,
created_at
FROM document_chunks
ORDER BY created_at DESC;
-- Test similarity search (replace with actual query embedding)
SELECT
chunk_id,
LEFT(content, 150) as content_preview,
metadata->>'source' as source,
1 - (embedding <=> '[0.1,-0.2,0.3,...]') AS similarity
FROM document_chunks
ORDER BY similarity DESC
LIMIT 5;
-- Get file type statistics
SELECT
metadata->>'fileType' as file_type,
COUNT(*) as chunk_count,
COUNT(DISTINCT metadata->>'source') as document_count
FROM document_chunks
GROUP BY metadata->>'fileType';
`));
} catch (error) {
spinner.fail('Failed to load status');
console.error(chalk.red('❌ Error:'), error);
} finally {
await documentManager.disconnect();
}
});
// Parse command line arguments
program.parse();
This completes our CLI with PostgreSQL status information and SQL debugging queries.
Testing with Real Documents
Let's create some test documents and demonstrate the complete PostgreSQL-based system:
Create Test Documents
# Create test directory and files
mkdir test-docs
# Create a TypeScript tutorial
cat > test-docs/typescript-basics.md << 'EOF'
# TypeScript Basics
TypeScript is a superset of JavaScript that adds static typing.
## Installation
Install TypeScript globally using npm:
npm install -g typescript
## Basic Types
- string: Text values
- number: Numeric values
- boolean: True/false values
- Array<T>: Arrays of type T
## Error Handling
Use try-catch blocks for error handling:
try {
const result = await fetchData();
} catch (error) {
console.error('Error:', error);
}
EOF
# Create a debugging guide
cat > test-docs/debugging.txt << 'EOF'
Debugging TypeScript Applications
1. Use console.log() for quick debugging
2. Set breakpoints in VS Code debugger
3. Use the TypeScript compiler error messages
4. Enable source maps for better debugging experience
5. Use the browser developer tools for web applications
Common debugging techniques:
- Add debugger; statements in your code
- Use console.table() for objects
- Enable strict mode for better error detection
EOF
# Create a JSON configuration example
cat > test-docs/config-example.json << 'EOF'
{
"name": "TypeScript Project",
"description": "A sample TypeScript project with configuration examples",
"dependencies": {
"typescript": "^5.0.0",
"@types/node": "^18.0.0"
},
"scripts": {
"build": "tsc",
"dev": "ts-node src/index.ts",
"test": "jest"
},
"tsconfig": {
"compilerOptions": {
"target": "ES2020",
"module": "commonjs",
"strict": true,
"sourceMap": true
}
},
"tips": "Always use strict mode for better type checking and error detection in TypeScript projects."
}
EOF
Running the Complete PostgreSQL System(tsx version)
# ⚙️ No build step needed — tsx compiles automatically
# Or run directly with ts-node
alias doc-qa="npx -y tsx src/cli.ts"
# Test the complete workflow:
# 1. Check initial status (should be empty)
doc-qa status
# 2. Ingest test documents into PostgreSQL
doc-qa ingest test-docs
# 3. Check status again (should show ingested documents)
doc-qa status
# 4. Ask single questions (uses PostgreSQL similarity search)
doc-qa query "How do I install TypeScript?"
doc-qa query "What are the basic TypeScript types?"
doc-qa query "How can I debug TypeScript applications?"
# 5. Start interactive session (PostgreSQL backend)
doc-qa interactive
Example Interactive Session with PostgreSQL
🎯 Interactive Document Q&A Session (PostgreSQL Backend)
📊 15 chunks from 3 documents
💡 Type "quit", "exit", or press Ctrl+C to end
❓ Your question: How do I install TypeScript?
🤔 Searching PostgreSQL database...
🤖 Answer:
To install TypeScript, you can use npm (Node Package Manager). Run the following command
to install TypeScript globally:
npm install -g typescript
Once installed, you can verify the installation by running 'tsc --version' to check the
TypeScript compiler version.
📚 Sources: typescript-basics.md
──────────────────────────────────────────────────
❓ Your question: What are the basic TypeScript types?
🤔 Searching PostgreSQL database...
🤖 Answer:
TypeScript provides several basic types:
- string: For text values
- number: For numeric values (integers and floats)
- boolean: For true/false values
- Array<T>: For arrays of a specific type T
These types help you catch errors during development and provide better IDE support
with autocomplete and type checking.
📚 Sources: typescript-basics.md
──────────────────────────────────────────────────
❓ Your question: quit
👋 Thanks for using the Document Q&A system!
SQL Queries for pgAdmin Debugging
You can run these queries directly in pgAdmin to debug your CLI system:
-- View all ingested document chunks
SELECT
chunk_id,
LEFT(content, 100) as content_preview,
metadata->>'source' as source,
metadata->>'fileType' as file_type,
created_at
FROM document_chunks
ORDER BY created_at DESC;
-- Test similarity search manually (replace with actual query embedding)
SELECT
chunk_id,
LEFT(content, 150) as content_preview,
metadata->>'source' as source,
1 - (embedding <=> '[0.1,-0.2,0.3,...]') AS similarity
FROM document_chunks
ORDER BY similarity DESC
LIMIT 5;
-- Get statistics about your CLI database
SELECT
COUNT(*) as total_chunks,
COUNT(DISTINCT metadata->>'source') as unique_sources,
AVG(LENGTH(content)) as avg_chunk_length,
MIN(created_at) as first_ingestion,
MAX(created_at) as last_ingestion
FROM document_chunks;
-- Find chunks from a specific file type
SELECT
chunk_id,
content,
metadata->>'source' as source
FROM document_chunks
WHERE metadata->>'fileType' = 'md'
ORDER BY (metadata->>'chunkIndex')::int;
FAQ
Summary
Congratulations! You've built a complete CLI Document QA System using PostgreSQL that transforms your RAG pipeline into a practical, user-friendly application with production-ready database storage.
Key achievements:
- PostgreSQL Integration: Built a CLI tool that uses PostgreSQL with pgvector for persistent vector storage
- SQL Similarity Search: Implemented efficient similarity search using PostgreSQL's native
<=>operator - CLI Interface: Created a professional command-line tool with proper argument parsing and user feedback
- File Processing: Implemented support for multiple file formats (.txt, .md, .json) with extensible architecture
- Interactive Mode: Created a chat-like interface for natural conversation with documents stored in PostgreSQL
- Database Management: Added proper connection handling, statistics, and debugging capabilities
- Error Handling: Implemented robust error handling and user-friendly error messages
- Progress Indicators: Added visual feedback for long-running operations
- Real-world Testing: Demonstrated the system with actual files and practical use cases
Your CLI application now provides:
- Document ingestion from files and directories into PostgreSQL
- Persistent vector storage with proper indexing for fast searches
- Single-query mode for quick questions using SQL similarity search
- Interactive chat mode for extended conversations
- Status reporting and database statistics
- SQL debugging queries for pgAdmin exploration
- Colored output and progress indicators for better UX
This PostgreSQL-based CLI system serves as the foundation for building production-ready RAG applications with enterprise-grade data persistence and scalability. In the next module, we'll explore advanced protocols and multi-agent systems that can further enhance your AI applications.
Complete Code
You can find the complete, runnable code for this tutorial on GitHub: https://github.com/avestalabs/academy/tree/main/4-rag/cli-document-q%26a-system