Understanding Tokens and Costs
Now that you've made your first LLM calls, it's crucial to understand the "currency" of AI interactions: tokens. Think of tokens as the fuel that powers your AI applications - every prompt you send and every response you receive consumes tokens.
As an AI Engineer, understanding tokens helps you build efficient applications that use AI resources wisely. Let's explore what tokens are and how to track them.
What Are Tokens?
Tokens are the basic units that LLMs use to process text. They're not exactly words, characters, or sentences - they're something in between that represents how the AI "reads" and "thinks" about text.
Here's a simple way to think about it:
graph TD
A[Raw Text] --> B[Tokenization Process]
B --> C[Individual Tokens]
C --> D[LLM Processing]
D --> E[Generated Response]
E --> F[Response Tokens]
How Text Becomes Tokens
The process of converting text into tokens is called "tokenization." Here's how it works:
Step 1: Text Input You provide a string of text: "Hello, world!"
Step 2: Tokenization The AI breaks it down into meaningful chunks: ["Hello", ",", " world", "!"]
Step 3: Processing Each token gets converted to numbers that the AI can understand and process.
Token Examples
Let's see how different text gets broken into tokens:
Simple words:
- "Hello" = 1 token
- "TypeScript" = 2 tokens (Type + Script)
Common phrases:
- "Hello world" = 2 tokens
- "How are you?" = 4 tokens
Technical terms:
- "async/await" = 3 tokens (async + / + await)
- "console.log" = 3 tokens (console + . + log)
Code snippets:
const x = 5;
= 6 tokensfunction hello() {}
= 7 tokens
Why Understanding Tokens Matters
Tokens are important because they affect:
- API usage limits - Free tiers have token quotas
- Response speed - More tokens = longer processing time
- Context windows - Models have maximum token limits
- Application efficiency - Better token usage = better performance
Working Code Example
Let's build a simple token tracker to see actual token consumption from Gemini's API responses.
Basic Token Tracking
Create a new file src/token-tracker.ts
:
import { GoogleGenAI } from "@google/genai";
import * as dotenv from "dotenv";
dotenv.config();
We'll use the same setup as before, but now we'll focus on extracting token usage information from the API response.
const apiKey = process.env.GEMINI_API_KEY;
if (!apiKey) {
console.error("GEMINI_API_KEY not found");
process.exit(1);
}
const genAI = new GoogleGenAI({ apiKey });
This establishes our connection to Gemini, just like in previous chapters.
async function trackTokenUsage(prompt: string) {
try {
const response = await genAI.models.generateContent({
model: "gemini-2.5-flash",
contents: prompt,
});
console.log("π Prompt:", prompt);
console.log("π€ Response:", response.text);
console.log("\nπ Token Usage:");
console.log(" Input tokens:", response.usageMetadata?.promptTokenCount);
console.log(
" thoughts tokens:",
response.usageMetadata?.thoughtsTokenCount
);
console.log(
" Output tokens:",
response.usageMetadata?.candidatesTokenCount
);
console.log(" Total tokens:", response.usageMetadata?.totalTokenCount);
} catch (error) {
console.error("Error:", error);
}
}
This function makes an API call and displays the actual token counts from Gemini's response metadata. The usageMetadata
object contains three key properties:
promptTokenCount
: Tokens used for your input promptthoughtsTokenCount
: Token used for thinkingcandidatesTokenCount
: Tokens used for the AI's responsetotalTokenCount
: Sum of input and output tokens
// Test with a simple prompt
trackTokenUsage("What is TypeScript?");
This demonstrates how to see real token usage for any prompt you send to Gemini.
Testing Different Prompts
You can test different types of prompts to see how token usage varies:
// Try these different prompts one at a time
trackTokenUsage("Hello");
trackTokenUsage("Write a simple TypeScript function");
trackTokenUsage("Explain variables in programming");
Replace the prompt in your trackTokenUsage()
call to see how different prompt lengths and types affect token consumption.
Understanding Usage Metadata
The usageMetadata
object from Gemini's response gives you precise information about token consumption:
Key Properties
promptTokenCount: The number of tokens in your input prompt
- This includes your question, instructions, and any context you provide
- Longer, more detailed prompts use more input tokens
thoughtsTokenCount: The number of tokens LLM used to think for the response
- This represents the internal processing tokens the LLM uses for its reasoning and planning.
- It reflects the computational effort the model expends to formulate a response, distinct from input or output tokens.
candidatesTokenCount: The number of tokens in the AI's response
- This is the generated text that comes back from the AI
- Longer responses use more output tokens
totalTokenCount: The sum of input and output tokens
- This represents the total "cost" of the interaction
- Both input and output count toward your usage limits
Practical Examples
When you run the code, you might see output like:
π Prompt: What is TypeScript in one sentence?
π€ Response: TypeScript is a typed superset of JavaScript that compiles to plain JavaScript, adding static type-checking to improve code quality and scalability.
π Token Usage:
Input tokens: 8
thoughts tokens: 581
Output tokens: 26
Total tokens: 615
This shows you exactly how many tokens each part of the interaction consumed.
FAQ
Summary
You now understand the fundamentals of tokens in LLM applications! Here's what you've learned:
- Tokens are the basic units that LLMs use to process and generate text
- Both input and output consume tokens - prompts and responses both count
- Gemini provides actual token counts in the usageMetadata of API responses
- You can track real usage by examining promptTokenCount, candidatesTokenCount, and totalTokenCount
Understanding tokens helps you build more efficient AI applications and monitor your usage effectively. This knowledge becomes especially important as you build larger applications and need to manage resource consumption.
In the next chapters, we'll use this token knowledge as we build more sophisticated AI interactions.
Complete Code
You can find the complete, runnable code for this tutorial on GitHub: [Link to GitHub Repository]