Web Search Tools for LLM Agents

title: "Web Search Tools for LLM Agents" date: 2025-11-26 tags: [llm, agent, web-search, api, grounding, rag, gemini, tavily, exa, serper] project_stack: [bun, typescript, @google/genai, gemini] recommendation: "Use Gemini's built-in Google Search grounding for simplest integration; add Tavily for RAG-optimized workflows or Serper for cost-effective high-volume needs" use_when: - "Building an AI agent that needs real-time web information" - "Implementing RAG pipelines that require factual grounding" - "Reducing hallucinations by anchoring responses to web sources" - "Creating chatbots that need current events knowledge" dont_use_when: - "Working with purely offline/local tasks" - "Processing sensitive queries that shouldn't hit external APIs" - "Budget is zero and query volume exceeds free tier limits" - "Need complete control over search index (use your own search engine)"

Summary #

For LLM agents that need web search capabilities, there are three main approaches: built-in model grounding (like Gemini's Google Search), AI-native search APIs (Tavily, Exa.ai), and traditional SERP APIs (Serper, SerpAPI). Each serves different use cases with distinct trade-offs.

For this project using the Gemini SDK, the simplest path is Gemini's built-in googleSearch tool which is already partially implemented. It provides automatic search-to-response integration at $35/1,000 queries with full citation support. For budget-conscious high-volume usage, Serper offers Google results at $0.30/1,000 queries. For semantic/research-heavy applications requiring rich content extraction, Tavily ($0.008/query with 1,000 free/month) or Exa.ai provide superior results optimized for LLM consumption.

The recommended architecture is a tiered approach: use Gemini's native grounding as the primary tool for its seamless integration, with Tavily or Serper as fallback options for specific use cases or cost optimization.

Project Context #

This project is a TypeScript coding agent built with:

Runtime: Bun
AI SDK: @google/genai (Gemini 2.5 Pro)
Existing tools: read_file, list_files, edit_file, web_fetch, run_typecheck, add_package, remove_package
Current search: Gemini's googleSearch tool is already declared in src/agent.ts:37

The agent uses a tool-calling pattern where tools are defined with inputSchema and execute functions. Web search should integrate seamlessly with this pattern.

Detailed Findings #

Option 1: Gemini Built-in Google Search Grounding #

What it is: Native integration in the Gemini API that allows the model to autonomously search Google and incorporate results into responses.

Why consider it: Already integrated into this project's tech stack. Zero additional dependencies. Model decides when to search automatically. Full citation and source attribution built-in.

How to implement:

The project already has the basic setup in src/agent.ts:

 1private getToolDeclarations(): Tool[] {
 2  return [
 3    {
 4      functionDeclarations: tools.map((t) => ({
 5        name: t.name,
 6        description: t.description,
 7        parameters: t.inputSchema,
 8      })),
 9    },
10    { "googleSearch": {} },  // Already present!
11  ];
12}

To access grounding metadata (citations), process the response:

 1interface GroundingMetadata {
 2  webSearchQueries?: string[];
 3  searchEntryPoint?: { renderedContent: string };
 4  groundingChunks?: Array<{ web: { uri: string; title: string } }>;
 5  groundingSupports?: Array<{
 6    segment: { startIndex: number; endIndex: number };
 7    groundingChunkIndices: number[];
 8    confidenceScores: number[];
 9  }>;
10}
11
12// In processResponse, extract grounding metadata:
13const groundingMetadata = candidate.groundingMetadata as GroundingMetadata | undefined;
14if (groundingMetadata?.groundingChunks) {
15  console.log("Sources:", groundingMetadata.groundingChunks.map(c => c.web.uri));
16}

Trade-offs:

Pro: Zero configuration, seamless integration, model autonomously decides when to search
Pro: Automatic citations with source URLs and text highlighting
Pro: Works with all Gemini 2.x models
Con: $35 per 1,000 grounded queries (relatively expensive at scale)
Con: Multiple searches in one API call count as multiple billable uses
Con: Must display Google Search branding per license requirements
Con: Less control over search parameters (no domain filtering, date ranges)

Option 2: Tavily - RAG-Optimized Search API #

What it is: A search engine built specifically for AI agents that handles searching, scraping, and content preparation in a single API call.

Why consider it: Purpose-built for RAG workflows. Returns concise, LLM-ready snippets. 93.3% grounding accuracy on OpenAI Simple QA benchmark.

How to implement:

1bun add tavily

 1import { TavilyClient } from "tavily";
 2
 3const tavilySearchTool: ToolDefinition = {
 4  name: "tavily_search",
 5  description: "Search the web for current information using Tavily's AI-optimized search. Returns concise results with citations.",
 6  inputSchema: {
 7    type: Type.OBJECT,
 8    properties: {
 9      query: {
10        type: Type.STRING,
11        description: "The search query",
12      },
13      search_depth: {
14        type: Type.STRING,
15        description: "Search depth: 'basic' for quick results, 'advanced' for comprehensive",
16        enum: ["basic", "advanced"],
17      },
18      max_results: {
19        type: Type.NUMBER,
20        description: "Maximum number of results (1-10)",
21      },
22    },
23    required: ["query"],
24  },
25  execute: async ({ query, search_depth = "basic", max_results = 5 }) => {
26    const client = new TavilyClient({ apiKey: process.env.TAVILY_API_KEY });
27
28    const response = await client.search(query as string, {
29      searchDepth: search_depth as "basic" | "advanced",
30      maxResults: max_results as number,
31      includeAnswer: true,
32      includeRawContent: false,
33    });
34
35    return {
36      answer: response.answer,
37      results: response.results.map(r => ({
38        title: r.title,
39        url: r.url,
40        content: r.content,
41        score: r.score,
42      })),
43    };
44  },
45};

Trade-offs:

Pro: 1,000 free searches/month for prototyping
Pro: Single API call handles search + content extraction
Pro: Output optimized for LLM context windows
Pro: Good accuracy for direct question answering
Con: $0.008/query ($8 per 1,000) on paid tier
Con: Less semantic depth than Exa for research tasks
Con: Relies on Google for underlying search results

Option 3: Exa.ai - Semantic Embeddings Search #

What it is: A neural search engine using embeddings for meaning-based retrieval, offering deep content extraction and multi-step research capabilities.

Why consider it: 94.9% accuracy on complex benchmarks. Ideal for research-heavy tasks requiring semantic understanding. Proprietary search index (not just Google wrapper).

How to implement:

1bun add exa-js

 1import Exa from "exa-js";
 2
 3const exaSearchTool: ToolDefinition = {
 4  name: "exa_search",
 5  description: "Semantic web search using embeddings. Best for research tasks requiring deep understanding and related content discovery.",
 6  inputSchema: {
 7    type: Type.OBJECT,
 8    properties: {
 9      query: {
10        type: Type.STRING,
11        description: "Natural language search query",
12      },
13      num_results: {
14        type: Type.NUMBER,
15        description: "Number of results (1-10)",
16      },
17      type: {
18        type: Type.STRING,
19        description: "'auto', 'neural' (semantic), or 'keyword' search",
20        enum: ["auto", "neural", "keyword"],
21      },
22      include_domains: {
23        type: Type.ARRAY,
24        items: { type: Type.STRING },
25        description: "Limit to specific domains (e.g., ['github.com', 'stackoverflow.com'])",
26      },
27    },
28    required: ["query"],
29  },
30  execute: async ({ query, num_results = 5, type = "auto", include_domains }) => {
31    const exa = new Exa(process.env.EXA_API_KEY);
32
33    const searchResults = await exa.searchAndContents(query as string, {
34      numResults: num_results as number,
35      type: type as "auto" | "neural" | "keyword",
36      includeDomains: include_domains as string[] | undefined,
37      text: { maxCharacters: 2000 },
38      highlights: true,
39    });
40
41    return {
42      results: searchResults.results.map(r => ({
43        title: r.title,
44        url: r.url,
45        text: r.text,
46        highlights: r.highlights,
47        publishedDate: r.publishedDate,
48      })),
49    };
50  },
51};

Exa also offers specialized endpoints:

 1// Find similar pages to a given URL
 2const similar = await exa.findSimilar("https://example.com/article", {
 3  numResults: 5,
 4});
 5
 6// Direct answers with citations
 7const answer = await exa.answer("What is the capital of France?");
 8
 9// Automated research with structured output
10const research = await exa.research("Latest trends in TypeScript 2025");

Trade-offs:

Pro: Superior semantic understanding via embeddings
Pro: Rich content extraction with highlights
Pro: Zero data retention option for privacy
Pro: Best for multi-hop reasoning and research tasks
Con: Pricing not publicly listed (enterprise-oriented)
Con: Slower than keyword-based alternatives (350ms-3.5s depending on mode)
Con: May be overkill for simple factual queries

Option 4: Serper - Budget-Friendly Google Results #

What it is: A lightweight SERP API providing fast, structured Google search results at the lowest cost.

Why consider it: $0.30 per 1,000 queries (10x cheaper than competitors). 2,500 free queries to start. 1-2 second response times. Clean JSON output.

How to implement:

 1const serperSearchTool: ToolDefinition = {
 2  name: "google_search",
 3  description: "Search Google for current information. Returns structured search results including snippets, links, and related questions.",
 4  inputSchema: {
 5    type: Type.OBJECT,
 6    properties: {
 7      query: {
 8        type: Type.STRING,
 9        description: "The search query",
10      },
11      num_results: {
12        type: Type.NUMBER,
13        description: "Number of results (default 10)",
14      },
15      type: {
16        type: Type.STRING,
17        description: "Search type: 'search', 'news', 'images', 'places'",
18        enum: ["search", "news", "images", "places"],
19      },
20    },
21    required: ["query"],
22  },
23  execute: async ({ query, num_results = 10, type = "search" }) => {
24    const response = await fetch("https://google.serper.dev/search", {
25      method: "POST",
26      headers: {
27        "X-API-KEY": process.env.SERPER_API_KEY!,
28        "Content-Type": "application/json",
29      },
30      body: JSON.stringify({
31        q: query,
32        num: num_results,
33        type: type,
34      }),
35    });
36
37    const data = await response.json();
38
39    return {
40      organic: data.organic?.map((r: any) => ({
41        title: r.title,
42        link: r.link,
43        snippet: r.snippet,
44        position: r.position,
45      })),
46      answerBox: data.answerBox,
47      peopleAlsoAsk: data.peopleAlsoAsk,
48      relatedSearches: data.relatedSearches,
49    };
50  },
51};

Trade-offs:

Pro: Extremely cost-effective ($0.0003 per query at volume)
Pro: 2,500 free queries to start
Pro: Fast response times (1-2 seconds)
Pro: Rich SERP data (answer boxes, PAA, related searches)
Con: Raw SERP data requires post-processing for LLM consumption
Con: No built-in content extraction (just snippets)
Con: Need separate tool for full page content

Option 5: MCP Server Architecture #

What it is: Using the Model Context Protocol to expose web search as a standardized tool that any MCP-compatible client can use.

Why consider it: Standardized interface across different AI applications. Can swap search providers without changing client code. Claude Code, Cursor, and other tools support MCP.

How to implement:

This project could expose its tools as an MCP server, or connect to existing MCP search servers:

1// Using langchain-mcp-tools for client-side consumption
2import { convertMcpToLangchainTools } from "langchain-mcp-tools";
3
4// Or using FastMCP to create a server (Python example pattern)
5// See: https://github.com/vikrambhat2-mcp-server-web-search

Trade-offs:

Pro: Standardized protocol with growing ecosystem
Pro: Tools become reusable across applications
Pro: Can combine multiple search backends
Con: Additional architectural complexity
Con: Better suited for tool distribution than internal use
Con: MCP ecosystem still maturing

Recommendation #

For this Gemini-based coding agent, I recommend a layered approach:

Primary: Gemini Google Search Grounding (Already Implemented) #

Keep and enhance the existing googleSearch tool. It's already in the codebase and provides:

Seamless model integration (model decides when to search)
Automatic citation generation
No additional dependencies

Enhancement: Add grounding metadata extraction to surface sources to users.

Secondary: Add Serper for Cost-Effective Control #

For scenarios where you need explicit search control or want to reduce costs, add Serper:

2,500 free queries to start
$0.30/1,000 for high volume
Explicit control over when searches happen

Use case: When the agent needs to search specific domains, do news searches, or when you want to batch searches efficiently.

Optional: Add Tavily for RAG Workflows #

If the agent evolves toward RAG-heavy use cases (e.g., researching documentation, synthesizing multiple sources):

1,000 free/month is generous for development
Single API call for search + content
Output already optimized for LLM context

Implementation Priority #

Immediate: Extract and display groundingMetadata from Gemini responses
Short-term: Add Serper tool for explicit, cost-effective searches
As needed: Add Tavily for content-heavy research tasks

Example Multi-Tool Architecture #

 1// In tools.ts, add both options for flexibility:
 2export const tools: ToolDefinition[] = [
 3  // ... existing tools
 4  serperSearchTool,    // For explicit, cost-effective searches
 5  tavilySearchTool,    // For RAG-optimized content (optional)
 6];
 7
 8// In agent.ts, keep Gemini grounding for automatic searches:
 9private getToolDeclarations(): Tool[] {
10  return [
11    { functionDeclarations: tools.map(t => ({ ... })) },
12    { googleSearch: {} },  // Model-initiated grounding
13  ];
14}

This gives the model three options:

Use googleSearch grounding automatically for general questions
Call google_search (Serper) explicitly for specific SERP needs
Call tavily_search for deep content extraction

When NOT to Use This #

Offline/Air-gapped environments: Web search requires internet connectivity. Use local knowledge bases or RAG with local embeddings instead.
Highly sensitive queries: Search APIs log queries by default. For sensitive applications, use Exa's zero data retention tier or implement your own search infrastructure.
Real-time streaming needs: Search adds 1-15 seconds latency. For chat applications requiring instant responses, pre-fetch likely needed information or use cached results.
Cost-critical high-volume applications: At millions of queries/month, even Serper's low costs add up. Consider building your own search index with tools like Meilisearch or Elasticsearch.
Domain-specific search: For searching your own documentation/codebase, use dedicated tools like vector databases (Pinecone, Weaviate) or documentation search (Algolia DocSearch) rather than web search APIs.

Sources #

last updated: 2025-12-01