arnabpal.me
HomeAboutProjectsCoursesBlogContact
ProjectsEvenRank — Distributed Web Crawlers at Scale
EvenRank — Distributed Web Crawlers at Scale
Past — EvenRank Data Sciencearchived

EvenRank — Distributed Web Crawlers at Scale

Early-career work at EvenRank Data Science (2019): distributed web crawlers for LinkedIn profiles, Google Maps listings, and email discovery, with robust anti-bot handling, proxy rotation, and a MongoDB-backed job queue.

Backend Engineer (2019)
Jan 2019 – Dec 2019

What I built

  • Async Python crawler framework with coroutine-based scheduling
  • LinkedIn scraper (Selenium CLI + API) with login and session rotation
  • Google Maps business-listings extractor with de-dup across locales
  • Mail scraper with MX-record lookup + SMTP verification pipeline
  • Proxy rotation and user-agent fingerprinting across 500+ concurrent workers
  • MongoDB-backed distributed job queue with retry and dead-letter handling

Hard problems

  • Staying ahead of LinkedIn's anti-bot evolution across multiple quarters
  • Keeping extractors resilient to DOM changes without a fragile XPath soup
  • Verifying email validity at scale without burning sender reputation

Tech stack

PythonNode.jsSeleniumPuppeteerScrapyMongoDBRedisExpress

Tags

Web-scrapingDistributed systemsAnti-botAsyncData pipelines

Source code is not in the public domain. Happy to walk through architecture or specific modules on a call — get in touch.

arnabpal.me

A platform for software engineering insights, courses, and projects focused on backend development and AI applications.

Quick Links

  • Home
  • About
  • Projects
  • Courses
  • Blog

Resources

  • GitHub Projects
  • Newsletter
  • Privacy Policy
  • Terms of Service

© 2026 Arnab Pal. All rights reserved.