Chrome Automation: Chrome DevTools Protocol (CDP) Direct

· combray's blog


Summary #

The Chrome DevTools Protocol (CDP) is the low-level protocol that powers Chrome DevTools and serves as the foundation for libraries like Puppeteer and Playwright[1]. It communicates over WebSocket, allowing direct control of Chromium-based browsers through JSON-RPC messages[2]. While higher-level libraries provide convenience, using CDP directly gives you complete control with zero abstraction overhead.

For your use case (general automation, single browser, Chrome-only), CDP direct is a viable choice. It's particularly interesting because it can be driven from bash scripts using tools like websocat[3], making it accessible without any Node.js dependencies. The protocol is well-documented and stable for common operations like navigation, screenshots, and PDF generation[4].

Key metrics: CDP is the official protocol maintained by the Chrome DevTools team at Google. It's used internally by Puppeteer (90.3k GitHub stars, 6M+ weekly npm downloads) and Playwright (77k+ GitHub stars, 22M+ weekly npm downloads)[5][6].

Philosophy & Mental Model #

CDP operates on a domains and commands model[1]:

The mental model is simple: you're sending JSON messages over a WebSocket and receiving JSON responses. Every command has:

Browser <--WebSocket--> Your Script
         JSON-RPC messages

Key abstractions:

  1. Browser: The top-level Chrome process, has its own WebSocket endpoint
  2. Targets: Things you can debug (pages, workers, service workers)
  3. Sessions: Connections to specific targets (each page gets a session)

Setup #

Option 1: Bash with websocat (Zero Dependencies) #

Install websocat (WebSocket client for command line):

 1# macOS
 2brew install websocat
 3
 4# Linux (download binary)
 5wget https://github.com/vi/websocat/releases/download/v1.13.0/websocat.x86_64-unknown-linux-musl
 6chmod +x websocat.x86_64-unknown-linux-musl
 7sudo mv websocat.x86_64-unknown-linux-musl /usr/local/bin/websocat
 8
 9# Or with cargo
10cargo install websocat

You also need jq for JSON parsing:

1# macOS
2brew install jq
3
4# Linux
5apt install jq

Option 2: Node.js/TypeScript #

No special libraries needed—just the built-in ws package:

1pnpm add ws
2pnpm add -D @types/ws

Launch Chrome with Remote Debugging #

 1# macOS
 2"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
 3  --remote-debugging-port=9222 \
 4  --user-data-dir=/tmp/chrome-debug
 5
 6# Linux
 7google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
 8
 9# Headless mode
10google-chrome --headless --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
11
12# New headless (Chrome 112+) - recommended
13google-chrome --headless=new --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Core Usage Patterns #

Pattern 1: Bash - Get WebSocket URL and Navigate #

 1#!/bin/bash
 2set -e
 3
 4# Start Chrome headless (background)
 5google-chrome --headless=new --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-cdp &
 6CHROME_PID=$!
 7sleep 2  # Wait for Chrome to start
 8
 9# Get the WebSocket URL for a new page
10PAGE_INFO=$(curl -s http://127.0.0.1:9222/json/new)
11WS_URL=$(echo "$PAGE_INFO" | jq -r '.webSocketDebuggerUrl')
12echo "WebSocket URL: $WS_URL"
13
14# Navigate to a page
15echo '{"id":1,"method":"Page.navigate","params":{"url":"https://example.com"}}' | \
16  websocat -n1 "$WS_URL"
17
18# Wait for page to load
19sleep 2
20
21# Clean up
22kill $CHROME_PID

Pattern 2: Bash - Take a Screenshot #

 1#!/bin/bash
 2set -e
 3
 4WS_URL="$1"  # Pass WebSocket URL as argument
 5
 6# Capture screenshot (returns base64)
 7RESPONSE=$(echo '{"id":1,"method":"Page.captureScreenshot","params":{"format":"png"}}' | \
 8  websocat -n1 "$WS_URL")
 9
10# Extract base64 data and decode to file
11echo "$RESPONSE" | jq -r '.result.data' | base64 -d > screenshot.png
12
13echo "Screenshot saved to screenshot.png"

Pattern 3: Bash - Get Page HTML #

 1#!/bin/bash
 2set -e
 3
 4WS_URL="$1"
 5
 6# Get the document root node
 7DOC_RESPONSE=$(echo '{"id":1,"method":"DOM.getDocument","params":{"depth":-1}}' | \
 8  websocat -n1 "$WS_URL")
 9
10ROOT_NODE_ID=$(echo "$DOC_RESPONSE" | jq -r '.result.root.nodeId')
11
12# Get outer HTML of the entire document
13HTML_RESPONSE=$(echo "{\"id\":2,\"method\":\"DOM.getOuterHTML\",\"params\":{\"nodeId\":$ROOT_NODE_ID}}" | \
14  websocat -n1 "$WS_URL")
15
16echo "$HTML_RESPONSE" | jq -r '.result.outerHTML'

Pattern 4: Bash - Run JavaScript (evaluate) #

 1#!/bin/bash
 2set -e
 3
 4WS_URL="$1"
 5JS_EXPRESSION="$2"
 6
 7# Enable Runtime domain first (required for evaluate)
 8echo '{"id":1,"method":"Runtime.enable"}' | websocat -n1 "$WS_URL" > /dev/null
 9
10# Run JavaScript expression
11RESPONSE=$(echo "{\"id\":2,\"method\":\"Runtime.evaluate\",\"params\":{\"expression\":\"$JS_EXPRESSION\",\"returnByValue\":true}}" | \
12  websocat -n1 "$WS_URL")
13
14echo "$RESPONSE" | jq '.result.result.value'

Usage:

1./evaluate.sh "$WS_URL" "document.title"
2./evaluate.sh "$WS_URL" "document.querySelectorAll('a').length"
3./evaluate.sh "$WS_URL" "Array.from(document.querySelectorAll('h1')).map(h => h.textContent)"

Pattern 5: Bash - Generate PDF #

 1#!/bin/bash
 2set -e
 3
 4WS_URL="$1"
 5OUTPUT="${2:-output.pdf}"
 6
 7# Generate PDF (returns base64)
 8RESPONSE=$(echo '{"id":1,"method":"Page.printToPDF","params":{"printBackground":true,"format":"A4"}}' | \
 9  websocat -n1 "$WS_URL")
10
11# Decode and save
12echo "$RESPONSE" | jq -r '.result.data' | base64 -d > "$OUTPUT"
13
14echo "PDF saved to $OUTPUT"

Pattern 6: Bash - Query Selectors #

 1#!/bin/bash
 2set -e
 3
 4WS_URL="$1"
 5SELECTOR="$2"
 6
 7# Get document first
 8DOC=$(echo '{"id":1,"method":"DOM.getDocument"}' | websocat -n1 "$WS_URL")
 9ROOT_ID=$(echo "$DOC" | jq -r '.result.root.nodeId')
10
11# Query selector
12QUERY=$(echo "{\"id\":2,\"method\":\"DOM.querySelector\",\"params\":{\"nodeId\":$ROOT_ID,\"selector\":\"$SELECTOR\"}}" | \
13  websocat -n1 "$WS_URL")
14
15NODE_ID=$(echo "$QUERY" | jq -r '.result.nodeId')
16
17if [ "$NODE_ID" = "0" ] || [ "$NODE_ID" = "null" ]; then
18  echo "Element not found"
19  exit 1
20fi
21
22# Get the HTML of the matched element
23HTML=$(echo "{\"id\":3,\"method\":\"DOM.getOuterHTML\",\"params\":{\"nodeId\":$NODE_ID}}" | \
24  websocat -n1 "$WS_URL")
25
26echo "$HTML" | jq -r '.result.outerHTML'

Pattern 7: TypeScript - Full Example #

 1import WebSocket from 'ws';
 2
 3interface CDPResponse {
 4  id: number;
 5  result?: Record<string, unknown>;
 6  error?: { code: number; message: string };
 7}
 8
 9class CDPClient {
10  private ws: WebSocket;
11  private messageId = 0;
12  private pending = new Map<number, {
13    resolve: (value: CDPResponse) => void;
14    reject: (error: Error) => void;
15  }>();
16
17  private constructor(ws: WebSocket) {
18    this.ws = ws;
19    this.ws.on('message', (data) => {
20      const msg = JSON.parse(data.toString()) as CDPResponse;
21      if (msg.id !== undefined) {
22        const handler = this.pending.get(msg.id);
23        if (handler) {
24          this.pending.delete(msg.id);
25          if (msg.error) {
26            handler.reject(new Error(msg.error.message));
27          } else {
28            handler.resolve(msg);
29          }
30        }
31      }
32    });
33  }
34
35  static async connect(wsUrl: string): Promise<CDPClient> {
36    const ws = new WebSocket(wsUrl);
37    await new Promise<void>((resolve, reject) => {
38      ws.once('open', resolve);
39      ws.once('error', reject);
40    });
41    return new CDPClient(ws);
42  }
43
44  async send(method: string, params: Record<string, unknown> = {}): Promise<CDPResponse> {
45    const id = ++this.messageId;
46    return new Promise((resolve, reject) => {
47      this.pending.set(id, { resolve, reject });
48      this.ws.send(JSON.stringify({ id, method, params }));
49    });
50  }
51
52  close() {
53    this.ws.close();
54  }
55}
56
57// Usage
58async function main() {
59  // Get WebSocket URL from Chrome's JSON endpoint
60  const response = await fetch('http://127.0.0.1:9222/json/new');
61  const { webSocketDebuggerUrl } = await response.json();
62
63  const client = await CDPClient.connect(webSocketDebuggerUrl);
64
65  // Navigate
66  await client.send('Page.navigate', { url: 'https://example.com' });
67  await new Promise(r => setTimeout(r, 2000)); // Wait for load
68
69  // Take screenshot
70  const screenshot = await client.send('Page.captureScreenshot', { format: 'png' });
71  const imageData = Buffer.from(screenshot.result!.data as string, 'base64');
72  await Bun.write('screenshot.png', imageData); // or fs.writeFileSync
73
74  // Get HTML via evaluate (simpler than DOM methods)
75  const html = await client.send('Runtime.evaluate', {
76    expression: 'document.documentElement.outerHTML',
77    returnByValue: true
78  });
79  console.log('HTML length:', (html.result!.result as any).value.length);
80
81  // Run any JavaScript
82  const title = await client.send('Runtime.evaluate', {
83    expression: 'document.title',
84    returnByValue: true
85  });
86  console.log('Title:', (title.result!.result as any).value);
87
88  // Generate PDF
89  const pdf = await client.send('Page.printToPDF', {
90    printBackground: true,
91    format: 'A4'
92  });
93  const pdfData = Buffer.from(pdf.result!.data as string, 'base64');
94  await Bun.write('page.pdf', pdfData);
95
96  client.close();
97}
98
99main().catch(console.error);

Pattern 8: Complete Bash Automation Script #

  1#!/bin/bash
  2# chrome-automate.sh - Complete browser automation in bash
  3set -e
  4
  5CHROME_PORT=9222
  6CHROME_PID=""
  7WS_URL=""
  8
  9# Start Chrome
 10start_chrome() {
 11  local headless="${1:-true}"
 12  local url="${2:-about:blank}"
 13
 14  local headless_flag=""
 15  if [ "$headless" = "true" ]; then
 16    headless_flag="--headless=new"
 17  fi
 18
 19  google-chrome $headless_flag \
 20    --remote-debugging-port=$CHROME_PORT \
 21    --user-data-dir=/tmp/chrome-cdp-$$ \
 22    --no-first-run \
 23    --no-default-browser-check \
 24    "$url" &
 25  CHROME_PID=$!
 26
 27  # Wait for CDP to be ready
 28  for i in {1..30}; do
 29    if curl -s "http://127.0.0.1:$CHROME_PORT/json/version" > /dev/null 2>&1; then
 30      break
 31    fi
 32    sleep 0.1
 33  done
 34}
 35
 36# Get WebSocket URL for first page
 37get_page_ws() {
 38  curl -s "http://127.0.0.1:$CHROME_PORT/json/list" | jq -r '.[0].webSocketDebuggerUrl'
 39}
 40
 41# Create new page and get its WebSocket URL
 42new_page() {
 43  curl -s "http://127.0.0.1:$CHROME_PORT/json/new" | jq -r '.webSocketDebuggerUrl'
 44}
 45
 46# Send CDP command
 47cdp() {
 48  local ws_url="$1"
 49  local method="$2"
 50  local params="${3:-{}}"
 51
 52  local id=$RANDOM
 53  echo "{\"id\":$id,\"method\":\"$method\",\"params\":$params}" | websocat -n1 "$ws_url"
 54}
 55
 56# Navigate to URL
 57navigate() {
 58  local ws_url="$1"
 59  local url="$2"
 60  cdp "$ws_url" "Page.navigate" "{\"url\":\"$url\"}"
 61}
 62
 63# Wait for page load (simple version)
 64wait_load() {
 65  sleep "${1:-2}"
 66}
 67
 68# Take screenshot
 69screenshot() {
 70  local ws_url="$1"
 71  local output="${2:-screenshot.png}"
 72  local format="${output##*.}"
 73
 74  cdp "$ws_url" "Page.captureScreenshot" "{\"format\":\"$format\"}" | \
 75    jq -r '.result.data' | base64 -d > "$output"
 76}
 77
 78# Generate PDF
 79pdf() {
 80  local ws_url="$1"
 81  local output="${2:-page.pdf}"
 82
 83  cdp "$ws_url" "Page.printToPDF" '{"printBackground":true}' | \
 84    jq -r '.result.data' | base64 -d > "$output"
 85}
 86
 87# Get page HTML
 88get_html() {
 89  local ws_url="$1"
 90  cdp "$ws_url" "Runtime.evaluate" '{"expression":"document.documentElement.outerHTML","returnByValue":true}' | \
 91    jq -r '.result.result.value'
 92}
 93
 94# Run JavaScript
 95evaluate() {
 96  local ws_url="$1"
 97  local expression="$2"
 98  cdp "$ws_url" "Runtime.evaluate" "{\"expression\":$(echo "$expression" | jq -Rs .),\"returnByValue\":true}" | \
 99    jq -r '.result.result.value'
100}
101
102# Query selector and get text
103query_text() {
104  local ws_url="$1"
105  local selector="$2"
106  evaluate "$ws_url" "document.querySelector('$selector')?.textContent"
107}
108
109# Close browser
110close_chrome() {
111  if [ -n "$CHROME_PID" ]; then
112    kill "$CHROME_PID" 2>/dev/null || true
113  fi
114}
115
116# Cleanup on exit
117trap close_chrome EXIT
118
119# ============ EXAMPLE USAGE ============
120if [ "${BASH_SOURCE[0]}" = "$0" ]; then
121  echo "Starting Chrome..."
122  start_chrome true
123
124  WS_URL=$(get_page_ws)
125  echo "WebSocket: $WS_URL"
126
127  echo "Navigating to example.com..."
128  navigate "$WS_URL" "https://example.com"
129  wait_load 2
130
131  echo "Page title: $(evaluate "$WS_URL" 'document.title')"
132
133  echo "Taking screenshot..."
134  screenshot "$WS_URL" "example.png"
135
136  echo "Generating PDF..."
137  pdf "$WS_URL" "example.pdf"
138
139  echo "Getting HTML..."
140  get_html "$WS_URL" > example.html
141
142  echo "Done! Created: example.png, example.pdf, example.html"
143fi

Anti-Patterns & Pitfalls #

Don't: Forget to Wait for Navigation #

1# BAD - screenshot might capture blank page
2navigate "$WS_URL" "https://example.com"
3screenshot "$WS_URL" "bad.png"

Why it's wrong: Page.navigate returns immediately when navigation starts, not when the page is loaded.

Instead: Wait for Load Event or Use Timeout #

1# GOOD - wait for page to load
2navigate "$WS_URL" "https://example.com"
3sleep 2  # Simple but effective
4screenshot "$WS_URL" "good.png"
5
6# BETTER - wait for specific condition via evaluate
7while [ "$(evaluate "$WS_URL" 'document.readyState')" != "complete" ]; do
8  sleep 0.1
9done

Don't: Use DOM Methods for Simple HTML Extraction #

1# BAD - complex chain of commands
2cdp "$WS_URL" "DOM.getDocument" '{"depth":-1}'
3# Then extract nodeId, then getOuterHTML...

Why it's wrong: DOM methods require multiple round trips and node ID management.

Instead: Use Runtime.evaluate #

1# GOOD - single command
2evaluate "$WS_URL" "document.documentElement.outerHTML"
3evaluate "$WS_URL" "document.querySelector('h1').textContent"

Don't: Hardcode Message IDs in Production #

1# BAD - will break with concurrent requests
2echo '{"id":1,"method":"Page.navigate"...}'
3echo '{"id":1,"method":"Page.captureScreenshot"...}'  # Same ID!

Why it's wrong: If you send concurrent requests with the same ID, you can't match responses.

Instead: Generate Unique IDs #

1# GOOD
2cdp() {
3  local id=$RANDOM  # Or use incrementing counter
4  echo "{\"id\":$id,\"method\":\"$method\"...}"
5}

Don't: Forget Error Handling #

1# BAD - ignores errors
2RESPONSE=$(cdp "$WS_URL" "Page.navigate" '{"url":"invalid"}')
3echo "Success!"

Instead: Check for Errors #

1# GOOD
2RESPONSE=$(cdp "$WS_URL" "Page.navigate" '{"url":"https://example.com"}')
3ERROR=$(echo "$RESPONSE" | jq -r '.error.message // empty')
4if [ -n "$ERROR" ]; then
5  echo "Error: $ERROR" >&2
6  exit 1
7fi

Don't: Leave Chrome Processes Running #

1# BAD - orphaned Chrome process
2start_chrome
3navigate "$WS_URL" "https://example.com"
4# Script ends without cleanup

Instead: Use Trap for Cleanup #

1# GOOD
2trap close_chrome EXIT
3start_chrome
4# ... do work ...
5# Chrome is automatically killed on exit

Caveats #

References #

[1] Chrome DevTools Protocol - Official protocol documentation

[2] Getting Started With Chrome DevTools Protocol - Excellent tutorial by Puppeteer maintainer

[3] Chrome DevTools Remote Control Using Linux Bash - Bash + websocat examples

[4] CDP Page Domain - Navigate, screenshot, PDF commands

[5] Puppeteer npm - npm package stats

[6] Playwright npm trends - npm download stats

[7] Chrome Headless Mode - Official headless documentation

[8] websocat GitHub - WebSocket CLI tool

[9] CDP Runtime Domain - JavaScript evaluation

[10] CDP DOM Domain - DOM manipulation commands

last updated: