Updated on 2026-05-12 NovaDataHub Engineering

Tutorial

How to Use a Google SERP API with Python

Python is one of the easiest ways to start working with Google search result data, but the first successful request is only part of the job. This tutorial moves from a simple requests-based call into reusable session setup, response parsing, timeout handling, retry behavior, and storage patterns that work better for recurring rank-tracking or search-monitoring jobs.

Authenticate with x-api-keySend a sync request with Python requestsParse structured JSON blocks safelyHandle 400, 401, 402, 429, and 504 responsesUse timeouts and backoff without retry stormsStore SERP results for rank tracking and recurring monitoring

Prerequisites

Before you start, create a NovaDataHub account, enable the Google SERP API, and keep your x-api-key available in a local environment variable or secret manager. Install requests if it is not already available, and decide which market context you want to test first so your first payload is meaningful.

pip install requests

Set up a reusable session and API key

For production code, use a Session so common headers and connection reuse stay in one place. Keep the API key in an environment variable rather than in source files.

import os
import requests

api_key = os.environ['NOVADATAHUB_SERP_KEY']
session = requests.Session()
session.headers.update({'x-api-key': api_key})
url = 'https://novadatahub.com/search'

Send a sync request first

Start with sync mode so you can inspect the full payload in one response while you validate the endpoint, query parameters, authentication, and locale settings.

params = {
    'q': 'google serp api python',
    'gl': 'us',
    'hl': 'en',
    'device': 'desktop',
    'sync': 'true'
}
response = session.get(url, params=params, timeout=60)
response.raise_for_status()
payload = response.json()
print(payload)

Parse the response structure into normalized rows

A typical sync response includes top-level fields such as ok, status, jobId, and result. Inside result, look for arrays such as organic, ads_top, paa, related_searches, and local_pack so your code can route each block into the right downstream model.

from datetime import datetime, timezone

result = payload.get('result', {})
collected_at = datetime.now(timezone.utc).isoformat()
organic_rows = []
for row in result.get('organic', []):
    organic_rows.append({
        'query': params['q'],
        'gl': params['gl'],
        'hl': params['hl'],
        'device': params['device'],
        'collected_at': collected_at,
        'position': row.get('position'),
        'title': row.get('title'),
        'url': row.get('url')
    })
print(organic_rows[:3])

Handle API errors, timeouts, and sync time-limit responses

Your client should treat malformed requests, invalid keys, quota issues, rate limits, and sync timeouts differently. Keep network timeouts separate from HTTP responses so your logs stay useful. For sync timeout responses, capture the returned jobId and continue through the async workflow instead of discarding the request.

try:
    response = session.get(url, params=params, timeout=60)
except requests.Timeout as exc:
    raise TimeoutError('Network timeout while waiting for SERP response.') from exc

if response.status_code == 400:
    raise ValueError('Check required query parameters such as q.')
elif response.status_code == 401:
    raise PermissionError('Verify x-api-key and enabled service access.')
elif response.status_code == 402:
    raise RuntimeError('Check active plan or quota state before retrying.')
elif response.status_code == 429:
    raise RuntimeError('Back off and retry later to avoid a retry storm.')
elif response.status_code == 504:
    body = response.json()
    print('Sync request timed out. Continue with async workflow using jobId:', body.get('jobId'))
else:
    response.raise_for_status()

Add retry and backoff for production

Transient network failures and 429 responses should not use tight loops. Prefer a capped exponential backoff pattern and log the request context that caused the failure so you can debug production jobs later.

import time

for attempt in range(5):
    resp = session.get(url, params=params, timeout=60)
    if resp.status_code != 429:
        break
    delay = min(60, 2 ** attempt)
    print(f'rate limited, sleeping {delay}s before retry')
    time.sleep(delay)
resp.raise_for_status()

Know when async workflow is the better fit

If you are running many searches on a schedule, async mode is often a cleaner fit than waiting for every result inline. Submit the job, store the jobId with the request context, and poll terminal states on a measured cadence rather than tying the whole workflow to a single long sync request.

Store SERP results for rank tracking

Do not store only position numbers. Keep the query, locale, device, timestamp, and raw JSON payload or normalized result rows together so you can compare matched result sets over time and explain later changes in ranking or result composition.

record = {
    'query': params['q'],
    'gl': params['gl'],
    'hl': params['hl'],
    'device': params['device'],
    'collected_at': collected_at,
    'payload': payload,
    'organic_rows': organic_rows
}
print(record.keys())

Move from test request to production workflow

Once one sync request works, connect the same request pattern to your storage layer, monitoring jobs, and alerting. Then use the docs page for parameter details, the async tutorial for larger jobs, and the rank-tracking tutorials for comparison-ready storage and reporting flows.

FAQ

Tutorial questions

Do I need browser automation in Python?

No. The API handles result collection so your Python code can work directly with structured JSON.

Should I start with sync or async mode?

Start with sync=true while you validate the request and payload shape. Move to async jobs when the workflow becomes bulk, recurring, or timeout-sensitive.

What should I store for rank tracking?

Store at least the query, locale, device, timestamp, and either the raw JSON or normalized result rows so later comparisons stay grounded in the original context.

How should I handle 429 or 504 responses?

Use backoff for 429 responses, and for 504 sync timeouts continue by polling the async job using the returned jobId when available.

Should I use requests.Session in production?

Yes. A shared Session makes headers, connection reuse, and common request behavior easier to manage cleanly.

Continue with connected pages

Start with 2,000 free API calls

Create a free NovaDataHub account, enable the API you need, and test structured JSON responses before moving into production.

Start free View pricing Read docs