Skip to content

GM Scraper

Repo: savvydealer-adam/gm-scraper · Path: C:/Users/adam/gm-scraper · Owner: Adam Status: Active dev · % Done: 55 · Last commit: 2026-03-14 Deployed: not deployed

What it is

GM / Chevrolet OEM parts catalog scraper. Pulls parts, fitments, and images from multiple SimplePart-based GM parts sites in parallel.

Why it exists

savvy-parts needs a clean, canonical GM parts catalog (part numbers, descriptions, fitments, pricing, images). Scraping directly from SimplePart dealers is faster and more complete than relying on GM's feeds.

How it works

Python + httpx + lxml + Pydantic. CLI via click (gm command). - Multi-source parallel scraping across 4 SimplePart domains — cross-reference for completeness - Sitemap-driven URL discovery - HTML parser for part detail pages - Rate limiter to avoid getting blocked - Image downloader + local cache - Enrichment + export pipelines - SQLite (gm_parts.db) as working store

What's done

  • 4-domain parallel scraper
  • Sitemap, page scraper, image downloader, rate limiter modules
  • Enrichment + exporter
  • Pydantic models for parts + fitments
  • Click CLI (gm ...)

What's next

  • Full-catalog production run + dedupe across the 4 sources
  • Export format that savvy-parts can ingest directly
  • Ongoing refresh schedule
  • Image hosting pipeline (cache/CDN)

Where the code lives

  • gm_scraper/scraper/ — page scraper, image downloader, rate limiter
  • gm_scraper/sitemap/ — sitemap discovery
  • gm_scraper/enrichment/ — post-scrape enrichment
  • gm_scraper/exporter/ — output formats
  • gm_scraper/cli.py — CLI entry point

Integrations

  • Feeds savvy-parts (OEM parts e-commerce platform)
  • Pairs with future Ford / Chrysler scrapers for multi-brand parts coverage

Don't rebuild this — extend it

For another OEM on the SimplePart platform, copy the scraper modules and swap domain configs in config.toml — the architecture is already parallel-safe.