Understanding Tokens and Costs

Now that you've made your first LLM calls, it's crucial to understand the "currency" of AI interactions: tokens. Think of tokens as the fuel that powers your AI applications - every prompt you send and every response you receive consumes tokens.

As an AI Engineer, understanding tokens helps you build efficient applications that use AI resources wisely. Let's explore what tokens are and how to track them.

What Are Tokens?

Tokens are the basic units that LLMs use to process text. They're not exactly words, characters, or sentences - they're something in between that represents how the AI "reads" and "thinks" about text.

Here's a simple way to think about it:

graph TD
    A[Raw Text] --> B[Tokenization Process]
    B --> C[Individual Tokens]
    C --> D[LLM Processing]
    D --> E[Generated Response]
    E --> F[Response Tokens]

How Text Becomes Tokens

The process of converting text into tokens is called "tokenization." Here's how it works:

Step 1: Text Input You provide a string of text: "Hello, world!"

Step 2: Tokenization The AI breaks it down into meaningful chunks: ["Hello", ",", " world", "!"]

Step 3: Processing Each token gets converted to numbers that the AI can understand and process.

Token Examples

Let's see how different text gets broken into tokens:

Simple words:

"Hello" = 1 token
"TypeScript" = 2 tokens (Type + Script)

Common phrases:

"Hello world" = 2 tokens
"How are you?" = 4 tokens

Technical terms:

"async/await" = 3 tokens (async + / + await)
"console.log" = 3 tokens (console + . + log)

Code snippets:

const x = 5; = 6 tokens
function hello() {} = 7 tokens

Why Understanding Tokens Matters

Tokens are important because they affect:

API usage limits - Free tiers have token quotas
Response speed - More tokens = longer processing time
Context windows - Models have maximum token limits
Application efficiency - Better token usage = better performance

Working Code Example

Let's build a simple token tracker to see actual token consumption from Gemini's API responses.

Basic Token Tracking

Create a new file src/token-tracker.ts:

import { GoogleGenAI } from "@google/genai";
import * as dotenv from "dotenv";

dotenv.config();

We'll use the same setup as before, but now we'll focus on extracting token usage information from the API response.

const apiKey = process.env.GEMINI_API_KEY;
if (!apiKey) {
  console.error("GEMINI_API_KEY not found");
  process.exit(1);
}

const genAI = new GoogleGenAI({ apiKey });

This establishes our connection to Gemini, just like in previous chapters.

async function trackTokenUsage(prompt: string) {
  try {
    const response = await genAI.models.generateContent({
      model: "gemini-2.5-flash",
      contents: prompt,
    });

    console.log("📝 Prompt:", prompt);
    console.log("🤖 Response:", response.text);
    console.log("\n📊 Token Usage:");
    console.log("  Input tokens:", response.usageMetadata?.promptTokenCount);
    console.log(
      "  thoughts tokens:",
      response.usageMetadata?.thoughtsTokenCount
    );
    console.log(
      "  Output tokens:",
      response.usageMetadata?.candidatesTokenCount
    );
    console.log("  Total tokens:", response.usageMetadata?.totalTokenCount);
  } catch (error) {
    console.error("Error:", error);
  }
}

This function makes an API call and displays the actual token counts from Gemini's response metadata. The usageMetadata object contains three key properties:

promptTokenCount: Tokens used for your input prompt
thoughtsTokenCount: Token used for thinking
candidatesTokenCount: Tokens used for the AI's response
totalTokenCount: Sum of input and output tokens

// Test with a simple prompt
trackTokenUsage("What is TypeScript?");

This demonstrates how to see real token usage for any prompt you send to Gemini.

Testing Different Prompts

You can test different types of prompts to see how token usage varies:

// Try these different prompts one at a time
trackTokenUsage("Hello");
trackTokenUsage("Write a simple TypeScript function");
trackTokenUsage("Explain variables in programming");

Replace the prompt in your trackTokenUsage() call to see how different prompt lengths and types affect token consumption.

Understanding Usage Metadata

The usageMetadata object from Gemini's response gives you precise information about token consumption:

Key Properties

promptTokenCount: The number of tokens in your input prompt

This includes your question, instructions, and any context you provide
Longer, more detailed prompts use more input tokens

thoughtsTokenCount: The number of tokens LLM used to think for the response

This represents the internal processing tokens the LLM uses for its reasoning and planning.
It reflects the computational effort the model expends to formulate a response, distinct from input or output tokens.

candidatesTokenCount: The number of tokens in the AI's response

This is the generated text that comes back from the AI
Longer responses use more output tokens

totalTokenCount: The sum of input and output tokens

This represents the total "cost" of the interaction
Both input and output count toward your usage limits

Practical Examples

When you run the code, you might see output like:

📝 Prompt: What is TypeScript in one sentence?
🤖 Response: TypeScript is a typed superset of JavaScript that compiles to plain JavaScript, adding static type-checking to improve code quality and scalability.

📊 Token Usage:
  Input tokens: 8
  thoughts tokens: 581
  Output tokens: 26
  Total tokens: 615

This shows you exactly how many tokens each part of the interaction consumed.

FAQ

Summary

You now understand the fundamentals of tokens in LLM applications! Here's what you've learned:

Tokens are the basic units that LLMs use to process and generate text
Both input and output consume tokens - prompts and responses both count
Gemini provides actual token counts in the usageMetadata of API responses
You can track real usage by examining promptTokenCount, candidatesTokenCount, and totalTokenCount

Understanding tokens helps you build more efficient AI applications and monitor your usage effectively. This knowledge becomes especially important as you build larger applications and need to manage resource consumption.

In the next chapters, we'll use this token knowledge as we build more sophisticated AI interactions.

Complete Code

You can find the complete, runnable code for this tutorial on GitHub: [Link to GitHub Repository]