Gemini 3 Agent Development: @google/genai SDK

· combray's blog


Summary #

For building a software writing agent with Gemini 3, use the @google/genai SDK (v1.30.0)[1]. This is Google's official, production-ready TypeScript/JavaScript SDK that replaced the deprecated @google/generative-ai package. The old package will lose all support on August 31, 2025[2].

Gemini 3 Pro (model: gemini-3-pro-preview) was released November 25, 2025[3]. It offers state-of-the-art reasoning with a 1M token context window, 64K token output, and scores 54.2% on Terminal-Bench 2.0 for tool use[4]. Pricing is $2/M input tokens and $12/M output tokens for prompts under 200K tokens[3].

Gemini 2.5 Flash (model: gemini-2.5-flash) is the recommended "quick" model for high-speed, low-cost tasks.

Imagen 4 Ultra (model: imagen-4.0-ultra-generate-001) is Google's strongest image generation model, excelling at photorealism and text rendering. Veo 3.1 (model: veo-3.1-generate-preview) generates 8-second video clips with native audio from text or image prompts[6].

Philosophy & Mental Model #

The @google/genai SDK provides a unified interface for all Gemini capabilities:

  1. ai.models - Direct content generation with generateContent and generateContentStream
  2. ai.chats - Stateful multi-turn conversations that automatically manage history and thought signatures
  3. ai.files - Upload and reference files for multimodal prompts
  4. ai.operations - Poll async operations (video generation)
  5. ai.live - Real-time interactions with audio/video

Key Gemini 3 concepts:

Mental model for agents: Use ai.chats.create() for your agent loop. The chat object maintains conversation history, handles thought signatures automatically, and supports streaming with sendMessageStream(). Define tools via functionDeclarations for the model to invoke your agent's capabilities.

Setup #

Step 1: Install the SDK #

1npm install @google/genai

Step 2: Configure environment #

Add to your mise.toml:

1[env]
2GEMINI_API_KEY = "{{env.GEMINI_API_KEY}}"

Or set directly:

1export GEMINI_API_KEY="your-api-key-from-ai-studio"

Get your API key from Google AI Studio.

Step 3: Create the client #

1import { GoogleGenAI } from "@google/genai";
2
3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

For Vertex AI (enterprise):

1const ai = new GoogleGenAI({
2  vertexai: true,
3  project: process.env.GOOGLE_CLOUD_PROJECT,
4  location: process.env.GOOGLE_CLOUD_LOCATION || "us-central1",
5});

Core Usage Patterns #

Pattern 1: Streaming Text Generation #

For responsive CLI output, use streaming to display tokens as they arrive:

 1import { GoogleGenAI } from "@google/genai";
 2
 3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 4
 5async function streamResponse(prompt: string): Promise<string> {
 6  const response = await ai.models.generateContentStream({
 7    model: "gemini-3-pro-preview",
 8    contents: prompt,
 9  });
10
11  let fullText = "";
12  for await (const chunk of response) {
13    const text = chunk.text;
14    if (text) {
15      process.stdout.write(text);
16      fullText += text;
17    }
18  }
19  return fullText;
20}

Pattern 2: Multi-Turn Chat with Streaming #

Use ai.chats for agent conversations. History and thought signatures are managed automatically:

 1import { GoogleGenAI } from "@google/genai";
 2
 3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 4
 5async function runAgentLoop() {
 6  const chat = ai.chats.create({
 7    model: "gemini-3-pro-preview",
 8    config: {
 9      systemInstruction: "You are a software engineering assistant...",
10    },
11  });
12
13  // First turn
14  const stream1 = await chat.sendMessageStream({
15    message: "Help me refactor this function to use async/await",
16  });
17  for await (const chunk of stream1) {
18    process.stdout.write(chunk.text ?? "");
19  }
20
21  // Second turn - history is maintained automatically
22  const stream2 = await chat.sendMessageStream({
23    message: "Now add error handling",
24  });
25  for await (const chunk of stream2) {
26    process.stdout.write(chunk.text ?? "");
27  }
28
29  // Access conversation history if needed
30  const history = chat.getHistory(true); // curated history
31}

Pattern 3: Function Calling for Agent Tools #

Define tools the model can invoke. The SDK handles thought signatures automatically:

 1import { GoogleGenAI, Type } from "@google/genai";
 2
 3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 4
 5const readFileTool = {
 6  name: "read_file",
 7  description: "Read the contents of a file from the filesystem",
 8  parameters: {
 9    type: Type.OBJECT,
10    properties: {
11      path: {
12        type: Type.STRING,
13        description: "The absolute path to the file to read",
14      },
15    },
16    required: ["path"],
17  },
18};
19
20const writeFileTool = {
21  name: "write_file",
22  description: "Write content to a file",
23  parameters: {
24    type: Type.OBJECT,
25    properties: {
26      path: { type: Type.STRING, description: "File path" },
27      content: { type: Type.STRING, description: "Content to write" },
28    },
29    required: ["path", "content"],
30  },
31};
32
33// Simple logging helper with timestamp
34function log(message: string) {
35  console.log(`[${new Date().toISOString()}] ${message}`);
36}
37
38async function agentWithTools() {
39  const chat = ai.chats.create({
40    model: "gemini-3-pro-preview",
41    config: {
42      tools: [{ functionDeclarations: [readFileTool, writeFileTool] }],
43    },
44  });
45
46  const response = await chat.sendMessage({
47    message: "Read the file at /src/index.ts and add error handling",
48  });
49
50  // Check if model wants to call a function
51  if (response.functionCalls && response.functionCalls.length > 0) {
52    const functionCall = response.functionCalls[0];
53    log(`Tool requested: ${functionCall.name}`);
54    log(`Args: ${JSON.stringify(functionCall.args)}`);
55
56    // Execute the tool and send result back
57    const result = await executeFunction(functionCall.name, functionCall.args);
58
59    const followUp = await chat.sendMessage({
60      message: [{ functionResponse: { name: functionCall.name, response: { result } } }],
61    });
62  }
63}

Pattern 4: Image Generation with Imagen 4 Ultra #

Generate high-quality images. Use gemini-2.5-flash for speed (if text-only) or imagen-4.0-ultra-generate-001 for strongest image generation:

 1import { GoogleGenAI } from "@google/genai";
 2import * as fs from "node:fs";
 3
 4const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 5
 6async function generateImage(prompt: string, outputPath: string) {
 7  const response = await ai.models.generateContent({
 8    model: "imagen-4.0-ultra-generate-001",
 9    contents: prompt,
10    config: {
11      // responseModalities: ["IMAGE"], // For Imagen models, often implicit or specific config
12      // Check specific Imagen 4 Ultra config requirements
13    },
14  });
15
16  // Depending on the model, the response structure might vary slightly, 
17  // but generally follows the candidate parts pattern.
18  for (const part of response.candidates[0].content.parts) {
19     if (part.inlineData) {
20      const buffer = Buffer.from(part.inlineData.data, "base64");
21      fs.writeFileSync(outputPath, buffer);
22      console.log(`[${new Date().toISOString()}] Image saved to ${outputPath}`);
23    }
24  }
25}

Pattern 5: Multi-Turn Image Editing #

Refine images conversationally (Note: specific editing capabilities depend on model support, gemini-3-pro-image-preview is often better for conversational multimodal editing, but imagen-4.0-ultra is strongest for generation):

 1import { GoogleGenAI } from "@google/genai";
 2import * as fs from "node:fs";
 3
 4const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 5
 6async function iterativeImageEditing() {
 7  // Using Gemini 3 Pro for conversational editing as it supports text+image modalities well
 8  const chat = ai.chats.create({
 9    model: "gemini-3-pro-preview", // or gemini-3-pro-image-preview if available for this
10    config: {
11      responseModalities: ["TEXT", "IMAGE"],
12    },
13  });
14
15  // Generate initial image
16  let response = await chat.sendMessage({
17    message: "Create a diagram showing microservices architecture",
18  });
19  saveImage(response, "diagram-v1.png");
20
21  // Refine it
22  response = await chat.sendMessage({
23    message: "Add a message queue between the services",
24    config: {
25      responseModalities: ["TEXT", "IMAGE"],
26      imageConfig: { aspectRatio: "16:9", imageSize: "2K" },
27    },
28  });
29  saveImage(response, "diagram-v2.png");
30}
31
32function saveImage(response: any, filename: string) {
33  for (const part of response.candidates[0].content.parts) {
34    if (part.inlineData) {
35      fs.writeFileSync(filename, Buffer.from(part.inlineData.data, "base64"));
36      console.log(`[${new Date().toISOString()}] Image saved to ${filename}`);
37    }
38  }
39}

Pattern 6: Video Generation with Veo 3.1 #

Video generation is async - submit a request, poll for completion, then download:

 1import { GoogleGenAI } from "@google/genai";
 2
 3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 4
 5async function generateVideo(prompt: string, outputPath: string) {
 6  // Submit generation request
 7  let operation = await ai.models.generateVideos({
 8    model: "veo-3.1-generate-preview",
 9    prompt: prompt,
10    config: {
11      aspectRatio: "16:9",
12      negativePrompt: "low quality, blurry, distorted",
13    },
14  });
15
16  // Poll until complete
17  while (!operation.done) {
18    console.log(`[${new Date().toISOString()}] Generating video...`);
19    await new Promise((resolve) => setTimeout(resolve, 10000));
20    operation = await ai.operations.getVideosOperation({ operation });
21  }
22
23  // Download the video
24  await ai.files.download({
25    file: operation.response.generatedVideos[0].video,
26    downloadPath: outputPath,
27  });
28  console.log(`[${new Date().toISOString()}] Video saved to ${outputPath}`);
29}

Pattern 7: Image-to-Video Pipeline #

Generate an image first, then animate it:

 1import { GoogleGenAI } from "@google/genai";
 2
 3const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
 4
 5async function imageToVideo(description: string, outputPath: string) {
 6  // Step 1: Generate image with Imagen 4 Ultra
 7  const imageResponse = await ai.models.generateContent({
 8    model: "imagen-4.0-ultra-generate-001",
 9    contents: description,
10  });
11
12  const imageData = imageResponse.candidates[0].content.parts[0].inlineData;
13
14  // Step 2: Animate with Veo
15  let operation = await ai.models.generateVideos({
16    model: "veo-3.1-generate-preview",
17    prompt: description,
18    image: {
19      imageBytes: imageData.data,
20      mimeType: imageData.mimeType,
21    },
22  });
23
24  while (!operation.done) {
25    await new Promise((resolve) => setTimeout(resolve, 10000));
26    operation = await ai.operations.getVideosOperation({ operation });
27  }
28
29  await ai.files.download({
30    file: operation.response.generatedVideos[0].video,
31    downloadPath: outputPath,
32  });
33}

Anti-Patterns & Pitfalls #

Don't: Use the deprecated @google/generative-ai package #

1// Bad - deprecated, loses support August 2025
2import { GoogleGenerativeAI } from "@google/generative-ai";

Why it's wrong: The old SDK won't receive Gemini 3 features and will be completely unsupported after August 31, 2025[2].

Instead: Use @google/genai #

1// Good - actively maintained, supports all Gemini 3 features
2import { GoogleGenAI } from "@google/genai";

Don't: Lower temperature for Gemini 3 #

1// Bad - degrades reasoning quality
2const response = await ai.models.generateContent({
3  model: "gemini-3-pro-preview",
4  contents: prompt,
5  config: { temperature: 0.2 },
6});

Why it's wrong: Gemini 3's reasoning engine is optimized for temperature 1.0. Lowering it may cause looping or degraded performance on complex tasks[7].

Instead: Keep temperature at 1.0 (default) #

1// Good - let the model reason optimally
2const response = await ai.models.generateContent({
3  model: "gemini-3-pro-preview",
4  contents: prompt,
5  // temperature defaults to 1.0
6});

Don't: Manually manage thought signatures when using ai.chats #

1// Bad - unnecessary complexity
2const response = await chat.sendMessage({ message: "..." });
3const signature = response.candidates[0].content.parts[0].thoughtSignature;
4// manually tracking signatures...

Why it's wrong: The SDK handles thought signatures automatically when using ai.chats. Manual management is error-prone and unnecessary[8].

Instead: Let the SDK handle it #

1// Good - SDK manages signatures automatically
2const chat = ai.chats.create({ model: "gemini-3-pro-preview" });
3await chat.sendMessage({ message: "first message" });
4await chat.sendMessage({ message: "follow up" }); // signatures handled

Don't: Use lowercase for imageSize #

1// Bad - will be rejected
2config: {
3  imageConfig: { imageSize: "2k" }  // lowercase fails!
4}

Why it's wrong: The API requires uppercase "K" in image sizes[9].

Instead: Always use uppercase #

1// Good
2config: {
3  imageConfig: { imageSize: "2K" }  // uppercase required
4}

Don't: Forget to poll video generation #

1// Bad - returns immediately without the video
2const operation = await ai.models.generateVideos({
3  model: "veo-3.1-generate-preview",
4  prompt: "...",
5});
6// operation.response is undefined here!

Why it's wrong: Video generation is asynchronous. The initial response is just an operation handle.

Instead: Poll until done #

1// Good - wait for completion
2let operation = await ai.models.generateVideos({ model: "veo-3.1-generate-preview", prompt: "..." });
3while (!operation.done) {
4  await new Promise(r => setTimeout(r, 10000));
5  operation = await ai.operations.getVideosOperation({ operation });
6}
7// Now operation.response.generatedVideos is available

Don't: Expose API keys in client-side code #

1// Bad - exposes key to users
2const ai = new GoogleGenAI({ apiKey: "AIza..." }); // hardcoded in frontend

Why it's wrong: Anyone can extract the key from your JavaScript bundle and abuse it.

Instead: Use server-side proxy #

1// Good - key stays on server
2// Server: handles auth, proxies to Gemini API
3// Client: calls your server endpoint

Caveats #

References #

[1] @google/genai - npm - Official SDK package, v1.30.0, 723 dependents

[2] @google/generative-ai - npm - Deprecation notice, support ends August 31, 2025

[3] New Gemini API updates for Gemini 3 - Google Developers Blog - Gemini 3 release announcement, November 25, 2025

[4] Gemini 3 Pro - Google DeepMind - Model specifications and benchmarks

[5] Nano Banana Pro: Gemini 3 Pro Image - Google Blog - Nano Banana Pro announcement, November 20, 2025

[6] Generate videos with Veo 3.1 - Google AI Developers - Veo 3.1 API documentation and code examples

[7] Thought Signatures - Google AI Developers - Thought signature handling requirements

[8] Chat Class - @google/genai - Chat API reference, automatic signature handling

[9] Image generation with Gemini - Google AI Developers - Nano Banana documentation, configuration options

[10] GitHub - googleapis/js-genai - SDK source code and samples

[11] Gemini 3 Developer Guide - Google AI Developers - Comprehensive Gemini 3 developer documentation

[12] Function calling with the Gemini API - Google AI Developers - Function calling patterns and examples

last updated: