GM Scraper¶
Repo: savvydealer-adam/gm-scraper · Path: C:/Users/adam/gm-scraper · Owner: Adam
Status: Active dev · % Done: 55 · Last commit: 2026-03-14
Deployed: not deployed
What it is¶
GM / Chevrolet OEM parts catalog scraper. Pulls parts, fitments, and images from multiple SimplePart-based GM parts sites in parallel.
Why it exists¶
savvy-parts needs a clean, canonical GM parts catalog (part numbers, descriptions, fitments, pricing, images). Scraping directly from SimplePart dealers is faster and more complete than relying on GM's feeds.
How it works¶
Python + httpx + lxml + Pydantic. CLI via click (gm command).
- Multi-source parallel scraping across 4 SimplePart domains — cross-reference for completeness
- Sitemap-driven URL discovery
- HTML parser for part detail pages
- Rate limiter to avoid getting blocked
- Image downloader + local cache
- Enrichment + export pipelines
- SQLite (gm_parts.db) as working store
What's done¶
- 4-domain parallel scraper
- Sitemap, page scraper, image downloader, rate limiter modules
- Enrichment + exporter
- Pydantic models for parts + fitments
- Click CLI (
gm ...)
What's next¶
- Full-catalog production run + dedupe across the 4 sources
- Export format that
savvy-partscan ingest directly - Ongoing refresh schedule
- Image hosting pipeline (cache/CDN)
Where the code lives¶
gm_scraper/scraper/— page scraper, image downloader, rate limitergm_scraper/sitemap/— sitemap discoverygm_scraper/enrichment/— post-scrape enrichmentgm_scraper/exporter/— output formatsgm_scraper/cli.py— CLI entry point
Integrations¶
- Feeds
savvy-parts(OEM parts e-commerce platform) - Pairs with future Ford / Chrysler scrapers for multi-brand parts coverage
Don't rebuild this — extend it¶
For another OEM on the SimplePart platform, copy the scraper modules and swap domain configs in config.toml — the architecture is already parallel-safe.