What are the advantages of using PostgreSQL over file-based storage for the CLI?

PostgreSQL provides several key advantages: persistent storage across CLI sessions, concurrent access for multiple users, ACID compliance for data integrity, efficient similarity search with pgvector indexing, SQL-based debugging and data exploration, backup and recovery capabilities, and scalability for large document collections. The CLI can handle thousands of documents without performance degradation.

How does the CLI handle large directories with many files?

The CLI processes files sequentially to avoid overwhelming the embedding API with concurrent requests. It shows progress indicators and handles errors gracefully, continuing to process other files if one fails. PostgreSQL efficiently stores and indexes all chunks, so even processing 1000+ files results in fast similarity searches. The database connection is managed properly to avoid timeouts during long ingestion processes.

Can I use this CLI with different LLM providers besides Gemini?

Yes! You can modify the PostgreSQLDocumentManager class to support other providers. Replace the GoogleGenerativeAI initialization with your preferred provider (OpenAI, Anthropic, etc.) and update the embedding and generation method calls. The PostgreSQL vector storage and similarity search logic remains the same regardless of the LLM provider, making it easy to switch between different AI services.

CLI Project: Document Q&A System

In the previous tutorial, you built a complete RAG pipeline using PostgreSQL that can ingest documents and answer questions. Now, let's take that foundation and create something truly practical - a command-line interface (CLI) that real users can interact with.

Think of this as transforming your RAG pipeline from a programmatic library into a user-friendly application. Instead of hardcoded documents and queries, users will be able to point the CLI to their own files and ask questions interactively, all while leveraging PostgreSQL for persistent vector storage.

By the end of this tutorial, you'll have a CLI tool that can:

Read documents from actual files (.txt, .md, .json)
Process multiple documents at once using PostgreSQL storage
Provide an interactive chat interface for asking questions
Use PostgreSQL with pgvector for persistent vector storage
Handle real-world scenarios like file errors and invalid queries
Execute SQL similarity searches for optimal performance

What is a CLI Document QA System?

A CLI Document QA System is a command-line application that combines your PostgreSQL-based RAG pipeline with user interface capabilities. It bridges the gap between your technical RAG implementation and practical user needs.

Here's how users will interact with your system:

# Ingest documents from a directory
npm run cli -- ingest ./my-docs

# Ask a single question
npm run cli -- query "How do I handle TypeScript errors?"

# Start an interactive conversation
npm run cli -- interactive

# Process specific file types
npm run cli -- ingest ./docs --type md

# Check database status
npm run cli -- status

Key Differences from Tutorial 4.4:

Tutorial 4.4 (RAG Pipeline):

Programmatic interface with hardcoded documents
Demonstrated concepts with sample text
Required code modifications to test new documents
PostgreSQL storage but no CLI interface

Tutorial 4.5 (CLI System):

Interactive command-line interface
Reads actual files from the filesystem
PostgreSQL persistent vector database storage
User-friendly commands and error handling
Real-world file processing and validation
SQL-based similarity search with pgAdmin debugging

CLI Application Architecture

Our CLI application will have several distinct modes of operation, all backed by PostgreSQL:

graph TD
  A[CLI Application] --> B[Ingest Mode]
  A --> C[Query Mode]
  A --> D[Interactive Mode]
  A --> E[Status Mode]

  B --> F[Read Files]
  F --> G[Process Documents]
  G --> H[PostgreSQL Storage]

  C --> I[PostgreSQL Query]
  I --> J[SQL Similarity Search]
  J --> K[Return Answer]

  D --> L[PostgreSQL Connection]
  L --> M[Interactive Loop]
  M --> N[SQL-based Questions]
  N --> M

  E --> O[Database Statistics]

Core Components:

Command Parser: Handle command-line arguments and flags
File Processor: Read and validate different file formats
PostgreSQL Document Manager: Enhanced version with database persistence
Interactive Interface: Chat-like experience for questions
Progress Indicators: User feedback during long operations
Database Connection: PostgreSQL with pgvector integration

Project Setup & Requirements

Let's set up our CLI project with proper structure and dependencies:

Environment Setup

# Install required dependencies
npm install @google/generative-ai pg @types/pg
npm install commander chalk ora
npm install @types/node typescript ts-node
npm install --save-dev @types/fs-extra
npm install fs-extra readline

# Install PostgreSQL and pgvector (if not already installed)
# On macOS with Homebrew:
brew install postgresql
brew install pgvector

# On Ubuntu /Debian:
sudo apt-get install postgresql postgresql-contrib
sudo apt-get install postgresql-14-pgvector

New Dependencies Explained:

commander: Professional command-line argument parsing
chalk: Colored terminal output for better UX
ora: Elegant terminal spinners for progress indication
fs-extra: Enhanced file system operations
readline: Interactive input handling for chat mode
pg: PostgreSQL client for database operations

Create the project structure:

mkdir cli-document-qa
cd cli-document-qa
mkdir src test-docs
touch src/cli.ts src/document-manager.ts src/file-processor.ts

Make sure your .env file contains both your Google API key and database credentials:

# .env
GOOGLE_API_KEY=your_gemini_api_key_here
DB_HOST=localhost
DB_PORT=5432
DB_NAME=cli_document_qa_db
DB_USER=postgres
DB_PASSWORD=your_password

Working Code Example

Let's build our CLI Document QA System step by step using PostgreSQL for vector storage:

Step 1: Database Setup

First, create the database schema in pgAdmin or psql:

-- Create the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create document_chunks table for our CLI system
CREATE TABLE document_chunks (
  id SERIAL PRIMARY KEY,
  chunk_id VARCHAR(255) UNIQUE NOT NULL,
  content TEXT NOT NULL,
  embedding vector(768), -- Google's text-embedding-004 uses 768 dimensions
  metadata JSONB NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create an index for faster similarity searches
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Create an index on chunk_id for faster lookups
CREATE INDEX ON document_chunks (chunk_id);

-- Create an index on metadata for filtering
CREATE INDEX ON document_chunks USING gin (metadata);

This SQL creates our vector-enabled table specifically designed for the CLI system.

Step 2: PostgreSQL Document Manager with CLI Integration

// src/document-manager.ts
import { GoogleGenerativeAI } from '@google/generative-ai';
import { Client } from 'pg';
import * as path from 'path';
import { ProcessedFile } from './file-processor';

interface DocumentChunk {
  id: number;
  chunk_id: string;
  content: string;
  embedding: number[];
  metadata: {
    source: string;
    chunkIndex: number;
    fileType: string;
    lastModified: Date;
  };
}

interface SearchResult {
  chunk: DocumentChunk;
  similarity: number;
}

export class PostgreSQLDocumentManager {
  private client: Client;
  private genAI: GoogleGenerativeAI;
  private embeddingModel: any;
  private generativeModel: any;

  constructor() {
    this.client = new Client({
      host: process.env.DB_HOST || 'localhost',
      port: parseInt(process.env.DB_PORT || '5432'),
      database: process.env.DB_NAME || 'cli_document_qa_db',
      user: process.env.DB_USER || 'postgres',
      password: process.env.DB_PASSWORD || 'password',
    });

    this.genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
    this.embeddingModel = this.genAI.getGenerativeModel({
      model: "text-embedding-004"
    });
    this.generativeModel = this.genAI.getGenerativeModel({
      model: "gemini-2.5-flash"
    });
  }

  async connect(): Promise<void> {
    await this.client.connect();
    console.log('📊 Connected to PostgreSQL database');
  }

  async disconnect(): Promise<void> {
    await this.client.end();
    console.log('📊 Disconnected from PostgreSQL database');
  }
}

This creates our PostgreSQL-based document manager specifically designed for CLI usage.

Step 3: Database Statistics and Management

// src/document-manager.ts (add to PostgreSQLDocumentManager class)
async getStats() {
  try {
    const query = `
      SELECT
        COUNT(*) as total_chunks,
        COUNT(DISTINCT metadata->>'source') as total_documents,
        array_agg(DISTINCT metadata->>'fileType') as file_types,
        MAX(created_at) as last_updated
      FROM document_chunks
    `;

    const result = await this.client.query(query);
    const row = result.rows[0];

    return {
      totalChunks: parseInt(row.total_chunks),
      totalDocuments: parseInt(row.total_documents),
      fileTypes: row.file_types ? row.file_types.filter(Boolean) : [],
      lastUpdated: row.last_updated ? new Date(row.last_updated) : null
    };
  } catch (error) {
    console.error('Error getting stats:', error);
    return {
      totalChunks: 0,
      totalDocuments: 0,
      fileTypes: [],
      lastUpdated: null
    };
  }
}

async clearDatabase(): Promise<void> {
  try {
    await this.client.query('DELETE FROM document_chunks');
    console.log('🗑️ Database cleared');
  } catch (error) {
    console.error('Error clearing database:', error);
    throw error;
  }
}

async checkDocumentExists(filePath: string): Promise<boolean> {
  try {
    const query = `
      SELECT EXISTS(
        SELECT 1 FROM document_chunks
        WHERE metadata->>'source' = $1
      )
    `;
    const result = await this.client.query(query, [filePath]);
    return result.rows[0].exists;
  } catch (error) {
    console.error('Error checking document existence:', error);
    return false;
  }
}

This adds database management capabilities specific to CLI operations.

Step 4: File Processing Implementation

// src/file-processor.ts
import fs from 'fs-extra';
import * as path from 'path';

export interface ProcessedFile {
  path: string;
  content: string;
  type: string;
  size: number;
  lastModified: Date;
}

export class FileProcessor {
  private supportedExtensions = ['.txt', '.md', '.json'];

  async processFile(filePath: string): Promise<ProcessedFile | null> {
    try {
      const stats = await fs.stat(filePath);
      const ext = path.extname(filePath).toLowerCase();

      if (!this.supportedExtensions.includes(ext)) {
        console.warn(`⚠️ Skipping unsupported file type: ${filePath}`);
        return null;
      }

      let content: string;

      switch (ext) {
        case '.json':
          const jsonData = await fs.readJson(filePath);
          content = this.extractTextFromJson(jsonData);
          break;
        default:
          content = await fs.readFile(filePath, 'utf-8');
      }

      if (content.trim().length === 0) {
        console.warn(`⚠️ Skipping empty file: ${filePath}`);
        return null;
      }

      return {
        path: filePath,
        content: content.trim(),
        type: ext.slice(1), // Remove the dot
        size: stats.size,
        lastModified: stats.mtime
      };

    } catch (error) {
      console.error(`❌ Error processing file ${filePath}:`, error);
      return null;
    }
  }

  private extractTextFromJson(data: any): string {
    const textParts: string[] = [];

    const extractText = (obj: any, prefix: string = ''): void => {
      if (typeof obj === 'string') {
        textParts.push(obj);
      } else if (typeof obj === 'object' && obj !== null) {
        if (Array.isArray(obj)) {
          obj.forEach((item, index) => extractText(item, `${prefix}[${index}]`));
        } else {
          Object.entries(obj).forEach(([key, value]) => {
            const newPrefix = prefix ? `${prefix}.${key}` : key;
            if (typeof value === 'string' && value.length > 10) {
              textParts.push(`${key}: ${value}`);
            } else {
              extractText(value, newPrefix);
            }
          });
        }
      }
    };

    extractText(data);
    return textParts.join('\n\n');
  }

  async processDirectory(dirPath: string, recursive: boolean = true): Promise<ProcessedFile[]> {
    const files: ProcessedFile[] = [];

    try {
      const items = await fs.readdirSync(dirPath);

      for (const item of items) {
        const itemPath = path.join(dirPath, item);
        const stats = await fs.stat(itemPath);

        if (stats.isDirectory() && recursive) {
          const subFiles = await this.processDirectory(itemPath, recursive);
          files.push(...subFiles);
        } else if (stats.isFile()) {
          const processedFile = await this.processFile(itemPath);
          if (processedFile) {
            files.push(processedFile);
          }
        }
      }
    } catch (error) {
      console.error(`❌ Error processing directory ${dirPath}:`, error);
    }

    return files;
  }
}

This handles reading and processing different file types safely with enhanced error handling.

Step 5: PostgreSQL Document Ingestion

// src/document-manager.ts (add to PostgreSQLDocumentManager class)
private splitIntoChunks(text: string, chunkSize: number = 600, overlap: number = 100): string[] {
  const chunks: string[] = [];
  let start = 0;

  while (start < text.length) {
    let end = start + chunkSize;

    if (end < text.length) {
      // Find better breakpoints
      const breakChars = ['\n\n', '\n', '. ', '!', '?', ';'];
      let bestBreak = -1;

      for (const breakChar of breakChars) {
        const breakIndex = text.lastIndexOf(breakChar, end);
        if (breakIndex > start + chunkSize * 0.5) {
          bestBreak = breakIndex + breakChar.length;
          break;
        }
      }

      if (bestBreak > -1) {
        end = bestBreak;
      }
    }

    chunks.push(text.slice(start, end).trim());
    start = end - overlap;
  }

  return chunks.filter(chunk => chunk.length > 20);
}

async ingestFile(file: ProcessedFile): Promise<void> {
  console.log(`📄 Processing: ${path.basename(file.path)} (${file.size} bytes)`);

  try {
    // Check if file already exists and remove old chunks
    const existsQuery = `DELETE FROM document_chunks WHERE metadata->>'source' = $1`;
    await this.client.query(existsQuery, [file.path]);

    const textChunks = this.splitIntoChunks(file.content);
    console.log(`  └─ Created ${textChunks.length} chunks`);

    for (let i = 0; i < textChunks.length; i++) {
      const chunk = textChunks[i];

      // Create embedding for the chunk
      const result = await this.embeddingModel.embedContent(chunk);
      const embedding = result.embedding.values;

      // Prepare chunk data
      const chunkId = `${file.path}_chunk_${i}`;
      const metadata = {
        source: file.path,
        chunkIndex: i,
        fileType: file.type,
        lastModified: file.lastModified.toISOString()
      };

      // Insert into PostgreSQL
      const insertQuery = `
        INSERT INTO document_chunks (chunk_id, content, embedding, metadata)
        VALUES ($1, $2, $3, $4)
      `;

      const values = [
        chunkId,
        chunk,
        `[${embedding.join(',')}]`, // Convert array to PostgreSQL vector format
        JSON.stringify(metadata)
      ];

      await this.client.query(insertQuery, values);
    }

    console.log(`✅ Successfully processed ${file.path}`);
  } catch (error) {
    console.error(`❌ Error ingesting file ${file.path}:`, error);
    throw error;
  }
}

This implements PostgreSQL-based document ingestion with proper chunk management.

Step 6: PostgreSQL Similarity Search Implementation

// src/document-manager.ts (add to PostgreSQLDocumentManager class)
async searchSimilarChunks(query: string, topK: number = 4): Promise<SearchResult[]> {
  try {
    // Create embedding for the query
    const result = await this.embeddingModel.embedContent(query);
    const queryEmbedding = result.embedding.values;

    // Execute similarity search using PostgreSQL with pgvector
    const searchQuery = `
      SELECT
        id,
        chunk_id,
        content,
        metadata,
        created_at,
        1 - (embedding <=> $1) AS similarity
      FROM document_chunks
      ORDER BY embedding <=> $1
      LIMIT $2
    `;

    const queryVector = `[${queryEmbedding.join(',')}]`;
    const res = await this.client.query(searchQuery, [queryVector, topK]);

    return res.rows.map(row => ({
      chunk: {
        id: row.id,
        chunk_id: row.chunk_id,
        content: row.content,
        embedding: [], // We don't need to return the full embedding
        metadata: {
          source: row.metadata.source,
          chunkIndex: row.metadata.chunkIndex,
          fileType: row.metadata.fileType,
          lastModified: new Date(row.metadata.lastModified)
        }
      },
      similarity: parseFloat(row.similarity)
    }));
  } catch (error) {
    console.error('Error searching similar chunks:', error);
    throw error;
  }
}

async askQuestion(question: string, maxResults: number = 4): Promise<string> {
  // Check if database has any documents
  const stats = await this.getStats();
  if (stats.totalChunks === 0) {
    return "❌ No documents have been ingested yet. Please ingest some documents first using --ingest.";
  }

  try {
    // Search for relevant chunks using PostgreSQL
    const searchResults = await this.searchSimilarChunks(question, maxResults);

    if (searchResults.length === 0 || searchResults[0].similarity < 0.3) {
      return "🤔 I couldn't find relevant information in the ingested documents to answer your question.";
    }

    // Prepare context from retrieved chunks
    const context = searchResults.map((result, index) => {
      const source = path.basename(result.chunk.metadata.source);
      return `[Source: ${source}]\n${result.chunk.content}`;
    }).join('\n\n---\n\n');

    // Generate answer using the LLM
    const prompt = `You are a helpful assistant answering questions based on provided documentation.

Context from documents:
${context}

Question: ${question}

Instructions:
- Answer based only on the provided context
- If the context doesn't contain enough information, say so clearly
- Be specific and reference the sources when possible
- Keep the answer concise but complete
- Format your response in a friendly, conversational tone

Answer:`;

    const response = await this.generativeModel.generateContent(prompt);
    const answer = response.response.text();

    // Add source information
    const sources = Array.from(new Set(searchResults.map(r => path.basename(r.chunk.metadata.source))));
    const sourceInfo = `\n\n📚 Sources: ${sources.join(', ')}`;

    return answer + sourceInfo;

  } catch (error) {
    console.error('❌ Error processing question:', error);
    return "❌ An error occurred while processing your question. Please try again.";
  }
}

This is the key change! 🎯 This implements the search functionality using PostgreSQL's native vector operations with the <=> cosine distance operator, exactly as you requested.

Step 7: CLI Interface Implementation

// src/cli.ts
import { Command } from 'commander';
import chalk from 'chalk';
import ora from 'ora';
import * as readline from 'readline';
import fs from 'fs-extra';
import { PostgreSQLDocumentManager } from './document-manager';
import { FileProcessor } from './file-processor';
import 'dotenv/config'; 

const program = new Command();

program
.name('doc-qa')
.description('CLI Document Question & Answer System using PostgreSQL RAG')
.version('1.0.0');

const documentManager = new PostgreSQLDocumentManager();
const fileProcessor = new FileProcessor();

// Ensure proper cleanup on exit
process.on('SIGINT', async () => {
  console.log(chalk.yellow('\n🔄 Cleaning up...'));
  await documentManager.disconnect();
  process.exit(0);
});

This sets up our CLI framework with proper PostgreSQL connection management.

Step 8: Ingest Command Implementation

// src/cli.ts (add after program setup)
program
.command('ingest')
.description('Ingest documents from files or directories into PostgreSQL')
.argument('<path>', 'File or directory path to ingest')
.option('-r, --recursive', 'Process directories recursively', true)
.option('-t, --type <type>', 'Filter by file type (txt, md, json)')
.action(async (inputPath, options) => {
  const spinner = ora('Connecting to PostgreSQL database...').start();

  try {
    await documentManager.connect();
    spinner.succeed('Connected to PostgreSQL database');

    // Check if path exists
    if (!(await fs.pathExists(inputPath))) {
      spinner.fail(`Path does not exist: ${inputPath}`);
      return;
    }

    spinner.start('Discovering files...');
    const stats = await fs.stat(inputPath);
    let files;

    if (stats.isFile()) {
      const file = await fileProcessor.processFile(inputPath);
      files = file ? [file] : [];
    } else {
      files = await fileProcessor.processDirectory(inputPath, options.recursive);
    }

    // Filter by type if specified
    if (options.type) {
      files = files.filter(f => f.type === options.type);
    }

    if (files.length === 0) {
      spinner.warn('No valid files found to process');
      return;
    }

    spinner.succeed(`Found ${files.length} files to process`);

    // Process files
    console.log(chalk.blue('\n📄 Processing documents...\n'));

    for (let i = 0; i < files.length; i++) {
      const file = files[i];
      const progress = `(${i + 1}/${files.length})`;
      console.log(chalk.gray(progress), `Processing ${file.path}...`);
      await documentManager.ingestFile(file);
    }

    // Show final stats
    const finalStats = await documentManager.getStats();
    console.log(chalk.green('\n✅ Ingestion complete!'));
    console.log(chalk.blue(`📊 Total chunks: ${finalStats.totalChunks}`));
    console.log(chalk.blue(`📁 Documents: ${finalStats.totalDocuments}`));
    console.log(chalk.blue(`📋 File types: ${finalStats.fileTypes.join(', ')}`));

  } catch (error) {
    spinner.fail('Ingestion failed');
    console.error(chalk.red('❌ Error:'), error);
  } finally {
    await documentManager.disconnect();
  }
});

This implements the document ingestion command with PostgreSQL storage and progress indicators.

Step 9: Query Command Implementation

// src/cli.ts (add after ingest command)
program
.command('query')
.description('Ask a question about the ingested documents')
.argument('<question>', 'Question to ask')
.action(async (question) => {
  const spinner = ora('Connecting to PostgreSQL database...').start();

  try {
    await documentManager.connect();

    const stats = await documentManager.getStats();
    if (stats.totalChunks === 0) {
      spinner.fail('No documents found in database. Please ingest documents first.');
      return;
    }

    spinner.text = 'Processing question with PostgreSQL similarity search...';
    const answer = await documentManager.askQuestion(question);
    spinner.stop();

    console.log(chalk.blue('\n🤔 Question:'), question);
    console.log(chalk.green('\n🤖 Answer:'));
    console.log(answer);

  } catch (error) {
    spinner.fail('Query failed');
    console.error(chalk.red('❌ Error:'), error);
  } finally {
    await documentManager.disconnect();
  }
});

This implements the single-query command functionality using PostgreSQL.

Step 10: Interactive Mode Implementation

// src/cli.ts (add after query command)
program
.command('interactive')
.description('Start an interactive Q&A session')
.alias('chat')
.action(async () => {
  const spinner = ora('Connecting to PostgreSQL database...').start();

  try {
    await documentManager.connect();

    const stats = await documentManager.getStats();
    if (stats.totalChunks === 0) {
      spinner.fail('No documents found in database. Please ingest documents first.');
      return;
    }

    spinner.succeed('Connected to PostgreSQL database');

    console.log(chalk.blue('\n🎯 Interactive Document Q&A Session (PostgreSQL Backend)'));
    console.log(chalk.gray(`📊 ${stats.totalChunks} chunks from ${stats.totalDocuments} documents`));
    console.log(chalk.gray('💡 Type "quit", "exit", or press Ctrl+C to end\n'));

    const rl = readline.createInterface({
      input: process.stdin,
      output: process.stdout
    });

    const askQuestion = async (): Promise<void> => {
      rl.question(chalk.cyan('❓ Your question: '), async (question) => {
        if (!question.trim()) {
          console.log(chalk.yellow('Please enter a question.\n'));
          askQuestion();
          return;
        }

        if (['quit', 'exit', 'q'].includes(question.toLowerCase().trim())) {
          console.log(chalk.blue('\n👋 Thanks for using the Document Q&A system!'));
          rl.close();
          await documentManager.disconnect();
          return;
        }

        console.log(chalk.gray('🤔 Searching PostgreSQL database...'));

        try {
          const answer = await documentManager.askQuestion(question);
          console.log(chalk.green('\n🤖 Answer:'));
          console.log(answer);
          console.log(chalk.gray('\n' + '─'.repeat(50) + '\n'));
        } catch (error) {
          console.log(chalk.red('❌ Error processing question:'), error);
        }

        askQuestion(); // Continue the loop
      });
    };

    askQuestion();

  } catch (error) {
    spinner.fail('Failed to start interactive mode');
    console.error(chalk.red('❌ Error:'), error);
    await documentManager.disconnect();
  }
});

This creates an interactive chat interface for continuous Q&A using PostgreSQL.

Step 11: Status Command and SQL Debugging

// src/cli.ts (add after interactive command)
program
.command('status')
.description('Show PostgreSQL database statistics')
.alias('status')
.action(async () => {
  const spinner = ora('Connecting to PostgreSQL database...').start();

  try {
    await documentManager.connect();

    const stats = await documentManager.getStats();
    if (stats.totalChunks === 0) {
      spinner.warn('No documents found in database');
      console.log(chalk.yellow('🔍 No documents have been ingested yet.'));
      console.log(chalk.blue('💡 Use "ingest <path>" to add documents.'));
      return;
    }

    spinner.succeed('Connected to PostgreSQL database');

    console.log(chalk.blue('\n📊 PostgreSQL Database Statistics'));
    console.log(chalk.gray('─'.repeat(40)));
    console.log(`📄 Total chunks: ${chalk.green(stats.totalChunks)}`);
    console.log(`📁 Documents: ${chalk.green(stats.totalDocuments)}`);
    console.log(`📋 File types: ${chalk.green(stats.fileTypes.join(', '))}`);
    if (stats.lastUpdated) {
      console.log(`🕒 Last updated: ${chalk.green(stats.lastUpdated.toLocaleString())}`);
    }

    console.log(chalk.blue('\n🔍 SQL Debugging Queries for pgAdmin:'));
    console.log(chalk.gray('─'.repeat(40)));
    console.log(chalk.cyan(`
-- View all document chunks
SELECT
  chunk_id,
  LEFT(content, 100) as content_preview,
  metadata->>'source' as source,
  metadata->>'fileType' as file_type,
  created_at
FROM document_chunks
ORDER BY created_at DESC;

-- Test similarity search (replace with actual query embedding)
SELECT
  chunk_id,
  LEFT(content, 150) as content_preview,
  metadata->>'source' as source,
  1 - (embedding <=> '[0.1,-0.2,0.3,...]') AS similarity
FROM document_chunks
ORDER BY similarity DESC
LIMIT 5;

-- Get file type statistics
SELECT
  metadata->>'fileType' as file_type,
  COUNT(*) as chunk_count,
  COUNT(DISTINCT metadata->>'source') as document_count
FROM document_chunks
GROUP BY metadata->>'fileType';
    `));

  } catch (error) {
    spinner.fail('Failed to load status');
    console.error(chalk.red('❌ Error:'), error);
  } finally {
    await documentManager.disconnect();
  }
});

// Parse command line arguments
program.parse();

This completes our CLI with PostgreSQL status information and SQL debugging queries.

Testing with Real Documents

Let's create some test documents and demonstrate the complete PostgreSQL-based system:

Create Test Documents

# Create test directory and files
mkdir test-docs

# Create a TypeScript tutorial
cat > test-docs/typescript-basics.md << 'EOF'
# TypeScript Basics

TypeScript is a superset of JavaScript that adds static typing.

## Installation
Install TypeScript globally using npm:

    npm install -g typescript

## Basic Types
- string: Text values
- number: Numeric values
- boolean: True/false values
- Array<T>: Arrays of type T

## Error Handling
Use try-catch blocks for error handling:

    try {
      const result = await fetchData();
    } catch (error) {
      console.error('Error:', error);
    }
EOF

# Create a debugging guide
cat > test-docs/debugging.txt << 'EOF'
Debugging TypeScript Applications

1. Use console.log() for quick debugging
2. Set breakpoints in VS Code debugger
3. Use the TypeScript compiler error messages
4. Enable source maps for better debugging experience
5. Use the browser developer tools for web applications

Common debugging techniques:
- Add debugger; statements in your code
- Use console.table() for objects
- Enable strict mode for better error detection
EOF

# Create a JSON configuration example
cat > test-docs/config-example.json << 'EOF'
{
  "name": "TypeScript Project",
  "description": "A sample TypeScript project with configuration examples",
  "dependencies": {
    "typescript": "^5.0.0",
    "@types/node": "^18.0.0"
  },
  "scripts": {
    "build": "tsc",
    "dev": "ts-node src/index.ts",
    "test": "jest"
  },
  "tsconfig": {
    "compilerOptions": {
      "target": "ES2020",
      "module": "commonjs",
      "strict": true,
      "sourceMap": true
    }
  },
  "tips": "Always use strict mode for better type checking and error detection in TypeScript projects."
}
EOF

Running the Complete PostgreSQL System(tsx version)

# ⚙️ No build step needed — tsx compiles automatically

# Or run directly with ts-node
alias doc-qa="npx -y tsx src/cli.ts"

# Test the complete workflow:

# 1. Check initial status (should be empty)
doc-qa status

# 2. Ingest test documents into PostgreSQL
doc-qa ingest test-docs

# 3. Check status again (should show ingested documents)
doc-qa status

# 4. Ask single questions (uses PostgreSQL similarity search)
doc-qa query "How do I install TypeScript?"
doc-qa query "What are the basic TypeScript types?"
doc-qa query "How can I debug TypeScript applications?"

# 5. Start interactive session (PostgreSQL backend)
doc-qa interactive

Example Interactive Session with PostgreSQL

🎯 Interactive Document Q&A Session (PostgreSQL Backend)
📊 15 chunks from 3 documents
💡 Type "quit", "exit", or press Ctrl+C to end

❓ Your question: How do I install TypeScript?
🤔 Searching PostgreSQL database...

🤖 Answer:
To install TypeScript, you can use npm (Node Package Manager). Run the following command 
to install TypeScript globally:

    npm install -g typescript

Once installed, you can verify the installation by running 'tsc --version' to check the 
TypeScript compiler version.

📚 Sources: typescript-basics.md

──────────────────────────────────────────────────

❓ Your question: What are the basic TypeScript types?
🤔 Searching PostgreSQL database...

🤖 Answer:
TypeScript provides several basic types:

- string: For text values
- number: For numeric values (integers and floats)
- boolean: For true/false values
- Array<T>: For arrays of a specific type T

These types help you catch errors during development and provide better IDE support 
with autocomplete and type checking.

📚 Sources: typescript-basics.md

──────────────────────────────────────────────────

❓ Your question: quit
👋 Thanks for using the Document Q&A system!

SQL Queries for pgAdmin Debugging

You can run these queries directly in pgAdmin to debug your CLI system:

-- View all ingested document chunks
SELECT
  chunk_id,
  LEFT(content, 100) as content_preview,
  metadata->>'source' as source,
  metadata->>'fileType' as file_type,
  created_at
FROM document_chunks
ORDER BY created_at DESC;

-- Test similarity search manually (replace with actual query embedding)
SELECT
  chunk_id,
  LEFT(content, 150) as content_preview,
  metadata->>'source' as source,
  1 - (embedding <=> '[0.1,-0.2,0.3,...]') AS similarity
FROM document_chunks
ORDER BY similarity DESC
LIMIT 5;

-- Get statistics about your CLI database
SELECT
  COUNT(*) as total_chunks,
  COUNT(DISTINCT metadata->>'source') as unique_sources,
  AVG(LENGTH(content)) as avg_chunk_length,
  MIN(created_at) as first_ingestion,
  MAX(created_at) as last_ingestion
FROM document_chunks;

-- Find chunks from a specific file type
SELECT
  chunk_id,
  content,
  metadata->>'source' as source
FROM document_chunks
WHERE metadata->>'fileType' = 'md'
ORDER BY (metadata->>'chunkIndex')::int;

FAQ

Summary

Congratulations! You've built a complete CLI Document QA System using PostgreSQL that transforms your RAG pipeline into a practical, user-friendly application with production-ready database storage.

Key achievements:

PostgreSQL Integration: Built a CLI tool that uses PostgreSQL with pgvector for persistent vector storage
SQL Similarity Search: Implemented efficient similarity search using PostgreSQL's native <=> operator
CLI Interface: Created a professional command-line tool with proper argument parsing and user feedback
File Processing: Implemented support for multiple file formats (.txt, .md, .json) with extensible architecture
Interactive Mode: Created a chat-like interface for natural conversation with documents stored in PostgreSQL
Database Management: Added proper connection handling, statistics, and debugging capabilities
Error Handling: Implemented robust error handling and user-friendly error messages
Progress Indicators: Added visual feedback for long-running operations
Real-world Testing: Demonstrated the system with actual files and practical use cases

Your CLI application now provides:

Document ingestion from files and directories into PostgreSQL
Persistent vector storage with proper indexing for fast searches
Single-query mode for quick questions using SQL similarity search
Interactive chat mode for extended conversations
Status reporting and database statistics
SQL debugging queries for pgAdmin exploration
Colored output and progress indicators for better UX

This PostgreSQL-based CLI system serves as the foundation for building production-ready RAG applications with enterprise-grade data persistence and scalability. In the next module, we'll explore advanced protocols and multi-agent systems that can further enhance your AI applications.

Complete Code

You can find the complete, runnable code for this tutorial on GitHub: https://github.com/avestalabs/academy/tree/main/4-rag/cli-document-q%26a-system