ProjectsEvenRank — Distributed Web Crawlers at Scale

Past — EvenRank Data Sciencearchived
EvenRank — Distributed Web Crawlers at Scale
Early-career work at EvenRank Data Science (2019): distributed web crawlers for LinkedIn profiles, Google Maps listings, and email discovery, with robust anti-bot handling, proxy rotation, and a MongoDB-backed job queue.
Backend Engineer (2019)
Jan 2019 – Dec 2019
What I built
- Async Python crawler framework with coroutine-based scheduling
- LinkedIn scraper (Selenium CLI + API) with login and session rotation
- Google Maps business-listings extractor with de-dup across locales
- Mail scraper with MX-record lookup + SMTP verification pipeline
- Proxy rotation and user-agent fingerprinting across 500+ concurrent workers
- MongoDB-backed distributed job queue with retry and dead-letter handling
Hard problems
- Staying ahead of LinkedIn's anti-bot evolution across multiple quarters
- Keeping extractors resilient to DOM changes without a fragile XPath soup
- Verifying email validity at scale without burning sender reputation
Tech stack
PythonNode.jsSeleniumPuppeteerScrapyMongoDBRedisExpress
Tags
Web-scrapingDistributed systemsAnti-botAsyncData pipelines
Source code is not in the public domain. Happy to walk through architecture or specific modules on a call — get in touch.