Summary #
For LLM agents that need web search capabilities, there are three main approaches: built-in model grounding (like Gemini's Google Search), AI-native search APIs (Tavily, Exa.ai), and traditional SERP APIs (Serper, SerpAPI). Each serves different use cases with distinct trade-offs.
For this project using the Gemini SDK, the simplest path is Gemini's built-in googleSearch tool which is already partially implemented. It provides automatic search-to-response integration at $35/1,000 queries with full citation support. For budget-conscious high-volume usage, Serper offers Google results at $0.30/1,000 queries. For semantic/research-heavy applications requiring rich content extraction, Tavily ($0.008/query with 1,000 free/month) or Exa.ai provide superior results optimized for LLM consumption.
The recommended architecture is a tiered approach: use Gemini's native grounding as the primary tool for its seamless integration, with Tavily or Serper as fallback options for specific use cases or cost optimization.
Project Context #
This project is a TypeScript coding agent built with:
- Runtime: Bun
- AI SDK:
@google/genai(Gemini 2.5 Pro) - Existing tools:
read_file,list_files,edit_file,web_fetch,run_typecheck,add_package,remove_package - Current search: Gemini's
googleSearchtool is already declared insrc/agent.ts:37
The agent uses a tool-calling pattern where tools are defined with inputSchema and execute functions. Web search should integrate seamlessly with this pattern.
Detailed Findings #
Option 1: Gemini Built-in Google Search Grounding #
What it is: Native integration in the Gemini API that allows the model to autonomously search Google and incorporate results into responses.
Why consider it: Already integrated into this project's tech stack. Zero additional dependencies. Model decides when to search automatically. Full citation and source attribution built-in.
How to implement:
The project already has the basic setup in src/agent.ts:
1private getToolDeclarations(): Tool[] {
2 return [
3 {
4 functionDeclarations: tools.map((t) => ({
5 name: t.name,
6 description: t.description,
7 parameters: t.inputSchema,
8 })),
9 },
10 { "googleSearch": {} }, // Already present!
11 ];
12}
To access grounding metadata (citations), process the response:
1interface GroundingMetadata {
2 webSearchQueries?: string[];
3 searchEntryPoint?: { renderedContent: string };
4 groundingChunks?: Array<{ web: { uri: string; title: string } }>;
5 groundingSupports?: Array<{
6 segment: { startIndex: number; endIndex: number };
7 groundingChunkIndices: number[];
8 confidenceScores: number[];
9 }>;
10}
11
12// In processResponse, extract grounding metadata:
13const groundingMetadata = candidate.groundingMetadata as GroundingMetadata | undefined;
14if (groundingMetadata?.groundingChunks) {
15 console.log("Sources:", groundingMetadata.groundingChunks.map(c => c.web.uri));
16}
Trade-offs:
- Pro: Zero configuration, seamless integration, model autonomously decides when to search
- Pro: Automatic citations with source URLs and text highlighting
- Pro: Works with all Gemini 2.x models
- Con: $35 per 1,000 grounded queries (relatively expensive at scale)
- Con: Multiple searches in one API call count as multiple billable uses
- Con: Must display Google Search branding per license requirements
- Con: Less control over search parameters (no domain filtering, date ranges)
Option 2: Tavily - RAG-Optimized Search API #
What it is: A search engine built specifically for AI agents that handles searching, scraping, and content preparation in a single API call.
Why consider it: Purpose-built for RAG workflows. Returns concise, LLM-ready snippets. 93.3% grounding accuracy on OpenAI Simple QA benchmark.
How to implement:
1bun add tavily
1import { TavilyClient } from "tavily";
2
3const tavilySearchTool: ToolDefinition = {
4 name: "tavily_search",
5 description: "Search the web for current information using Tavily's AI-optimized search. Returns concise results with citations.",
6 inputSchema: {
7 type: Type.OBJECT,
8 properties: {
9 query: {
10 type: Type.STRING,
11 description: "The search query",
12 },
13 search_depth: {
14 type: Type.STRING,
15 description: "Search depth: 'basic' for quick results, 'advanced' for comprehensive",
16 enum: ["basic", "advanced"],
17 },
18 max_results: {
19 type: Type.NUMBER,
20 description: "Maximum number of results (1-10)",
21 },
22 },
23 required: ["query"],
24 },
25 execute: async ({ query, search_depth = "basic", max_results = 5 }) => {
26 const client = new TavilyClient({ apiKey: process.env.TAVILY_API_KEY });
27
28 const response = await client.search(query as string, {
29 searchDepth: search_depth as "basic" | "advanced",
30 maxResults: max_results as number,
31 includeAnswer: true,
32 includeRawContent: false,
33 });
34
35 return {
36 answer: response.answer,
37 results: response.results.map(r => ({
38 title: r.title,
39 url: r.url,
40 content: r.content,
41 score: r.score,
42 })),
43 };
44 },
45};
Trade-offs:
- Pro: 1,000 free searches/month for prototyping
- Pro: Single API call handles search + content extraction
- Pro: Output optimized for LLM context windows
- Pro: Good accuracy for direct question answering
- Con: $0.008/query ($8 per 1,000) on paid tier
- Con: Less semantic depth than Exa for research tasks
- Con: Relies on Google for underlying search results
Option 3: Exa.ai - Semantic Embeddings Search #
What it is: A neural search engine using embeddings for meaning-based retrieval, offering deep content extraction and multi-step research capabilities.
Why consider it: 94.9% accuracy on complex benchmarks. Ideal for research-heavy tasks requiring semantic understanding. Proprietary search index (not just Google wrapper).
How to implement:
1bun add exa-js
1import Exa from "exa-js";
2
3const exaSearchTool: ToolDefinition = {
4 name: "exa_search",
5 description: "Semantic web search using embeddings. Best for research tasks requiring deep understanding and related content discovery.",
6 inputSchema: {
7 type: Type.OBJECT,
8 properties: {
9 query: {
10 type: Type.STRING,
11 description: "Natural language search query",
12 },
13 num_results: {
14 type: Type.NUMBER,
15 description: "Number of results (1-10)",
16 },
17 type: {
18 type: Type.STRING,
19 description: "'auto', 'neural' (semantic), or 'keyword' search",
20 enum: ["auto", "neural", "keyword"],
21 },
22 include_domains: {
23 type: Type.ARRAY,
24 items: { type: Type.STRING },
25 description: "Limit to specific domains (e.g., ['github.com', 'stackoverflow.com'])",
26 },
27 },
28 required: ["query"],
29 },
30 execute: async ({ query, num_results = 5, type = "auto", include_domains }) => {
31 const exa = new Exa(process.env.EXA_API_KEY);
32
33 const searchResults = await exa.searchAndContents(query as string, {
34 numResults: num_results as number,
35 type: type as "auto" | "neural" | "keyword",
36 includeDomains: include_domains as string[] | undefined,
37 text: { maxCharacters: 2000 },
38 highlights: true,
39 });
40
41 return {
42 results: searchResults.results.map(r => ({
43 title: r.title,
44 url: r.url,
45 text: r.text,
46 highlights: r.highlights,
47 publishedDate: r.publishedDate,
48 })),
49 };
50 },
51};
Exa also offers specialized endpoints:
1// Find similar pages to a given URL
2const similar = await exa.findSimilar("https://example.com/article", {
3 numResults: 5,
4});
5
6// Direct answers with citations
7const answer = await exa.answer("What is the capital of France?");
8
9// Automated research with structured output
10const research = await exa.research("Latest trends in TypeScript 2025");
Trade-offs:
- Pro: Superior semantic understanding via embeddings
- Pro: Rich content extraction with highlights
- Pro: Zero data retention option for privacy
- Pro: Best for multi-hop reasoning and research tasks
- Con: Pricing not publicly listed (enterprise-oriented)
- Con: Slower than keyword-based alternatives (350ms-3.5s depending on mode)
- Con: May be overkill for simple factual queries
Option 4: Serper - Budget-Friendly Google Results #
What it is: A lightweight SERP API providing fast, structured Google search results at the lowest cost.
Why consider it: $0.30 per 1,000 queries (10x cheaper than competitors). 2,500 free queries to start. 1-2 second response times. Clean JSON output.
How to implement:
1const serperSearchTool: ToolDefinition = {
2 name: "google_search",
3 description: "Search Google for current information. Returns structured search results including snippets, links, and related questions.",
4 inputSchema: {
5 type: Type.OBJECT,
6 properties: {
7 query: {
8 type: Type.STRING,
9 description: "The search query",
10 },
11 num_results: {
12 type: Type.NUMBER,
13 description: "Number of results (default 10)",
14 },
15 type: {
16 type: Type.STRING,
17 description: "Search type: 'search', 'news', 'images', 'places'",
18 enum: ["search", "news", "images", "places"],
19 },
20 },
21 required: ["query"],
22 },
23 execute: async ({ query, num_results = 10, type = "search" }) => {
24 const response = await fetch("https://google.serper.dev/search", {
25 method: "POST",
26 headers: {
27 "X-API-KEY": process.env.SERPER_API_KEY!,
28 "Content-Type": "application/json",
29 },
30 body: JSON.stringify({
31 q: query,
32 num: num_results,
33 type: type,
34 }),
35 });
36
37 const data = await response.json();
38
39 return {
40 organic: data.organic?.map((r: any) => ({
41 title: r.title,
42 link: r.link,
43 snippet: r.snippet,
44 position: r.position,
45 })),
46 answerBox: data.answerBox,
47 peopleAlsoAsk: data.peopleAlsoAsk,
48 relatedSearches: data.relatedSearches,
49 };
50 },
51};
Trade-offs:
- Pro: Extremely cost-effective ($0.0003 per query at volume)
- Pro: 2,500 free queries to start
- Pro: Fast response times (1-2 seconds)
- Pro: Rich SERP data (answer boxes, PAA, related searches)
- Con: Raw SERP data requires post-processing for LLM consumption
- Con: No built-in content extraction (just snippets)
- Con: Need separate tool for full page content
Option 5: MCP Server Architecture #
What it is: Using the Model Context Protocol to expose web search as a standardized tool that any MCP-compatible client can use.
Why consider it: Standardized interface across different AI applications. Can swap search providers without changing client code. Claude Code, Cursor, and other tools support MCP.
How to implement:
This project could expose its tools as an MCP server, or connect to existing MCP search servers:
1// Using langchain-mcp-tools for client-side consumption
2import { convertMcpToLangchainTools } from "langchain-mcp-tools";
3
4// Or using FastMCP to create a server (Python example pattern)
5// See: https://github.com/vikrambhat2-mcp-server-web-search
Trade-offs:
- Pro: Standardized protocol with growing ecosystem
- Pro: Tools become reusable across applications
- Pro: Can combine multiple search backends
- Con: Additional architectural complexity
- Con: Better suited for tool distribution than internal use
- Con: MCP ecosystem still maturing
Recommendation #
For this Gemini-based coding agent, I recommend a layered approach:
Primary: Gemini Google Search Grounding (Already Implemented) #
Keep and enhance the existing googleSearch tool. It's already in the codebase and provides:
- Seamless model integration (model decides when to search)
- Automatic citation generation
- No additional dependencies
Enhancement: Add grounding metadata extraction to surface sources to users.
Secondary: Add Serper for Cost-Effective Control #
For scenarios where you need explicit search control or want to reduce costs, add Serper:
- 2,500 free queries to start
- $0.30/1,000 for high volume
- Explicit control over when searches happen
Use case: When the agent needs to search specific domains, do news searches, or when you want to batch searches efficiently.
Optional: Add Tavily for RAG Workflows #
If the agent evolves toward RAG-heavy use cases (e.g., researching documentation, synthesizing multiple sources):
- 1,000 free/month is generous for development
- Single API call for search + content
- Output already optimized for LLM context
Implementation Priority #
- Immediate: Extract and display
groundingMetadatafrom Gemini responses - Short-term: Add Serper tool for explicit, cost-effective searches
- As needed: Add Tavily for content-heavy research tasks
Example Multi-Tool Architecture #
1// In tools.ts, add both options for flexibility:
2export const tools: ToolDefinition[] = [
3 // ... existing tools
4 serperSearchTool, // For explicit, cost-effective searches
5 tavilySearchTool, // For RAG-optimized content (optional)
6];
7
8// In agent.ts, keep Gemini grounding for automatic searches:
9private getToolDeclarations(): Tool[] {
10 return [
11 { functionDeclarations: tools.map(t => ({ ... })) },
12 { googleSearch: {} }, // Model-initiated grounding
13 ];
14}
This gives the model three options:
- Use
googleSearchgrounding automatically for general questions - Call
google_search(Serper) explicitly for specific SERP needs - Call
tavily_searchfor deep content extraction
When NOT to Use This #
-
Offline/Air-gapped environments: Web search requires internet connectivity. Use local knowledge bases or RAG with local embeddings instead.
-
Highly sensitive queries: Search APIs log queries by default. For sensitive applications, use Exa's zero data retention tier or implement your own search infrastructure.
-
Real-time streaming needs: Search adds 1-15 seconds latency. For chat applications requiring instant responses, pre-fetch likely needed information or use cached results.
-
Cost-critical high-volume applications: At millions of queries/month, even Serper's low costs add up. Consider building your own search index with tools like Meilisearch or Elasticsearch.
-
Domain-specific search: For searching your own documentation/codebase, use dedicated tools like vector databases (Pinecone, Weaviate) or documentation search (Algolia DocSearch) rather than web search APIs.
Sources #
- Grounding with Google Search | Gemini API
- Tavily - The Web Access Layer for AI Agents
- Exa.ai - Web Search API for AI
- Serper - Google Search API
- Beyond Tavily - Complete Guide to AI Search APIs 2025
- Exa.ai vs Tavily Comparison
- LangChain MCP Documentation
- The Ultimate Guide to Web Search APIs for LLMs
- The Complete Guide to Web Search APIs 2025