How does streaming work with conversation history?

Streaming only changes how the response is delivered to the user - you still need to build the complete response and add it to your conversation history. The AI sees the full conversation context when generating the streaming response, just like with regular responses.

What's the performance difference between streaming and non-streaming conversations?

Streaming doesn't change the actual generation time, but it dramatically improves perceived performance. Users see responses immediately instead of waiting for completion, making the application feel much more responsive and engaging.

Can I implement conversation features like "typing indicators" with streaming?

Yes! You can show a "🤔 AI:" indicator when starting the stream, then replace it with the actual response. This gives users clear feedback about the conversation state and when responses are being generated.

CLI Interactive Chat Assistant

In the previous tutorials, you learned how to engineer effective prompts, manage conversation history, and implement streaming responses. You also built a basic CLI assistant in tutorial 1.7. Now it's time to combine all these skills and create something truly powerful - a conversational AI assistant that runs in your terminal, remembers everything you've discussed, and delivers responses in real-time.

This tutorial represents the culmination of everything you've learned in module 2. We'll combine prompt engineering (2.1), conversation management (2.2), and streaming responses (2.3) into one sophisticated CLI application.

What We're Building

We're creating an enhanced CLI chat assistant that:

Maintains full conversation history throughout the session
Uses system messages to define AI personality and behavior
Streams responses in real-time for immediate feedback
Provides a natural, flowing conversation experience
Shows conversation context and statistics
Allows users to view conversation history
Handles graceful conversation management and cleanup

Here's what the enhanced interaction will look like:

🤖 AI Programming Tutor Ready!
💡 I'm here to help you learn programming. Type 'help' for commands.

You: What is TypeScript?
🤔 AI: TypeScript is a strongly typed programming language that builds on JavaScript by adding static type definitions. It helps catch errors early and makes your code more maintainable...

📊 Messages: 2 | Session tokens: 89

You: Can you show me an example?
🤔 AI: Absolutely! Since we were just discussing TypeScript, here's a simple example that demonstrates type annotations:

function greet(name: string): string {
  return `Hello, ${name}!`;
}

📊 Messages: 4 | Session tokens: 156

You: What are the benefits over regular JavaScript?
🤔 AI: Great follow-up question! Based on the TypeScript example I just showed you, here are the key benefits over regular JavaScript...

📊 Messages: 6 | Session tokens: 234

Notice how the AI responses appear word by word in real-time (indicated by the streaming text), and the AI references previous parts of the conversation. This combines the power of streaming with conversation management.

Understanding the Architecture

graph TD
    A[Initialize Assistant] --> B[Set System Message]
    B --> C[Start Conversation Loop]
    C --> D[User Input]
    D --> E{Command Check}
    E -->|Regular Message| F[Add to History]
    E -->|Special Command| G[Handle Command]
    F --> H[Start Streaming Response]
    H --> I[Display Chunks in Real-time]
    I --> J[Build Complete Response]
    J --> K[Add Response to History]
    K --> L[Display Stats]
    L --> C
    G --> C
    E -->|Exit| M[Show Session Summary]

The key enhancement is the streaming response flow that provides immediate feedback while maintaining conversation context.

Environment Setup

You'll use the same setup from previous tutorials:

Google AI API key in your .env file
@google/genai package installed
TypeScript project with readline support

If you need to set up a new project:

mkdir streaming-chat-assistant
cd streaming-chat-assistant
npm init -y
npm install @google/genai dotenv
npm install -D typescript @types/node ts-node

Working Code Example

Let's build our streaming conversational CLI assistant step by step, incorporating everything you've learned in this module.

Step 1: Define the Message Interface and Setup

import { GoogleGenAI } from "@google/genai";
import * as readline from "readline";
import * as dotenv from "dotenv";

dotenv.config();

interface Message {
  role: "system" | "user" | "assistant";
  content: string;
}

We're using the same message interface from tutorial 2.2, which allows us to track who said what throughout the conversation.

Step 2: Initialize the AI Client and Conversation

const apiKey = process.env.GEMINI_API_KEY;

if (!apiKey) {
  console.error("❌ GEMINI_API_KEY not found in environment variables");
  process.exit(1);
}

const genAI = new GoogleGenAI({ apiKey });
const messages: Message[] = [];
let totalTokensUsed = 0;

The key difference here is the messages array that will store our entire conversation history, just like in tutorial 2.2.

Step 3: Create Conversation Management Functions

function addMessage(role: "system" | "user" | "assistant", content: string) {
  messages.push({ role, content });
}

function formatMessagesForAPI(): string {
  return messages.map((msg) => `${msg.role}: ${msg.content}`).join("\n");
}

These functions handle adding messages to our conversation history and formatting them for the API call, maintaining the conversation context.

Step 4: Set Up the System Message

function initializeAssistant() {
  addMessage(
    "system",
    "You are a helpful programming tutor. Keep your answers practical and concise. When users ask follow-up questions, reference previous parts of our conversation to maintain context."
  );

  console.log("🤖 AI Programming Tutor Ready!");
  console.log(
    "💡 I'm here to help you learn programming. Type 'help' for commands.\n"
  );
}

This system message is crucial - it tells the AI to act as a programming tutor and specifically instructs it to reference previous conversation parts.

Step 5: Core Streaming Chat Function

async function streamingChat(userMessage: string): Promise<string> {
  try {
    // Add user message to conversation history
    addMessage("user", userMessage);

    console.log("🤔 AI: ");

    // Send entire conversation history to AI for streaming
    const conversationPrompt = formatMessagesForAPI();

    const result = await genAI.models.generateContentStream({
      model: "gemini-2.5-flash",
      contents: conversationPrompt,
      config: {
        temperature: 0.7,
        maxOutputTokens: 500,
      },
    });

    let fullResponse = "";

    // Process streaming chunks
    for await (const chunk of result) {
      const chunkText = chunk.text || "";
      fullResponse += chunkText;

      // Display each chunk immediately for real-time effect
      process.stdout.write(chunkText);
    }

    console.log("\n"); // Add newline when streaming is complete

    // Add complete AI response to conversation history
    addMessage("assistant", fullResponse);

    // Track token usage
    const usage = result.usageMetadata;
    if (usage) {
      const sessionTokens =
        (usage.promptTokenCount || 0) + (usage.candidatesTokenCount || 0);
      totalTokensUsed += sessionTokens;
    }

    return fullResponse;
  } catch (error) {
    console.error("\n❌ Error in conversation:", error);
    return "Sorry, I encountered an error. Please try again.";
  }
}

This is the heart of our streaming conversational assistant. It combines conversation management from 2.2 with streaming responses from 2.3, creating a real-time chat experience that maintains context.

Step 6: Enhanced Command Handling

function handleSpecialCommands(input: string): boolean {
  const command = input.toLowerCase().trim();

  switch (command) {
    case "help":
      console.log("📋 Available commands:");
      console.log("  help     - Show this help message");
      console.log("  history  - Show conversation history");
      console.log("  clear    - Clear conversation (keeps system message)");
      console.log("  stats    - Show session statistics");
      console.log("  exit     - End the conversation\n");
      return true;

    case "history":
      showConversationHistory();
      return true;

    case "clear":
      clearConversation();
      return true;

    case "stats":
      showStats();
      return true;

    default:
      return false;
  }
}

These commands give users control over their conversation experience, allowing them to view history, clear the conversation, or get help.

Step 7: Helper Functions for Commands

function showConversationHistory() {
  console.log("📜 Conversation History:");
  console.log("=".repeat(50));

  messages.forEach((msg, index) => {
    if (msg.role === "system") return; // Skip system message in display

    const speaker = msg.role === "user" ? "You" : "AI";
    const content =
      msg.content.length > 100
        ? msg.content.substring(0, 100) + "..."
        : msg.content;

    console.log(`${index}. ${speaker}: ${content}`);
  });

  console.log("=".repeat(50) + "\n");
}

function clearConversation() {
  // Keep only the system message
  const systemMessage = messages.find((msg) => msg.role === "system");
  messages.length = 0;

  if (systemMessage) {
    messages.push(systemMessage);
  }

  console.log("🧹 Conversation cleared! Starting fresh.\n");
}

function showStats() {
  const userMessages = messages.filter((msg) => msg.role === "user").length;
  const aiMessages = messages.filter((msg) => msg.role === "assistant").length;

  console.log("📊 Session Statistics:");
  console.log(`  Total messages: ${messages.length - 1}`); // Exclude system message
  console.log(`  Your messages: ${userMessages}`);
  console.log(`  AI responses: ${aiMessages}`);
  console.log(`  Total tokens used: ${totalTokensUsed}\n`);
}

These helper functions provide useful conversation management features that enhance the user experience.

Step 8: Main Conversation Loop with Streaming

function startConversation(): void {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  const askQuestion = (): void => {
    rl.question("You: ", async (input) => {
      const userInput = input.trim();

      if (userInput.toLowerCase() === "exit") {
        console.log("\n👋 Thanks for chatting! Here's your session summary:");
        showStats();
        rl.close();
        return;
      }

      if (userInput === "") {
        console.log("Please enter a message or command.\n");
        askQuestion();
        return;
      }

      // Check for special commands
      if (handleSpecialCommands(userInput)) {
        askQuestion();
        return;
      }

      // Regular streaming conversation
      await streamingChat(userInput);

      // Show quick stats after each response
      const messageCount = messages.length - 1; // Exclude system message
      console.log(
        `📊 Messages: ${messageCount} | Session tokens: ${totalTokensUsed}\n`
      );

      askQuestion();
    });
  };

  askQuestion();
}

This main loop handles user input, processes commands, manages the streaming conversation flow, and provides helpful feedback after each interaction.

Step 9: Application Startup and Cleanup

// Graceful shutdown handling
process.on("SIGINT", () => {
  console.log("\n\n👋 Conversation ended!");
  showStats();
  process.exit(0);
});

// Start the assistant
initializeAssistant();
startConversation();

This ensures proper cleanup when users exit with Ctrl+C and starts our streaming conversational assistant.

Understanding the Streaming Conversation Flow

Here's what happens in a typical streaming conversation:

Initialization: System message sets AI personality
User Input: User asks a question
History Building: Question added to conversation array
Stream Start: Begin streaming response from AI
Real-time Display: Show each chunk as it arrives
Response Building: Combine chunks into full response
History Update: Add complete response to conversation array
Loop: Process repeats with growing context

The key insight is that streaming enhances the user experience while maintaining all the conversation management benefits.

The streaming version provides the same conversation management with dramatically improved user experience.

Common Pitfalls to Avoid

Forgetting to build the complete response: While displaying chunks in real-time, make sure to build the full response for conversation history.

Not handling streaming errors: Streaming connections can be interrupted, so always wrap streaming calls in try-catch blocks.

Mixing streaming and non-streaming: Stick to one approach throughout your application for consistency.

Poor visual feedback: Make sure users understand when the AI is thinking vs. when it's streaming a response.

Token tracking issues: Remember to track tokens from the streaming response metadata, not individual chunks.

FAQ

Summary

You've built a sophisticated streaming conversational CLI assistant that represents the culmination of everything learned in module 2. This assistant combines prompt engineering fundamentals, conversation management, and real-time streaming responses into a production-ready chat interface. The key achievements are: maintaining conversation history while streaming responses in real-time, using system messages for consistent AI personality, providing enhanced user experience through commands and statistics, and creating a natural dialogue flow where the AI references previous conversation parts. This foundation prepares you perfectly for advanced topics like function calling and structured interactions in the next module.

Complete Code

You can find the complete, runnable code for this tutorial on GitHub: https://github.com/avestalabs/academy/tree/main/2-core-llm-interactions/cli-interactive-chat-assistant