serverlesslambdacloud-functionsarchitectureworkerscomparison

Why Serverless (Lambda, Cloud Functions) Is Overkill for Most API Workers

Lambda and Cloud Functions solve for stateless compute at arbitrary scale. For data extraction and enrichment workers, their complexity and cold start penalties are rarely worth it.

S
Seek API Team
·

AWS Lambda is remarkable technology. The ability to run arbitrary code without provisioning or managing servers, scaling automatically from zero to thousands of concurrent executions, billed to the millisecond — it’s a genuine engineering achievement.

The question isn’t whether Lambda works. It’s whether Lambda is the right abstraction for data extraction and enrichment workers. For most teams, it isn’t.

What serverless is actually good at

AWS Lambda and Google Cloud Functions shine for:

  • Event-driven compute: An image is uploaded → trigger a function to resize it
  • API backend handlers: Stateless request → response without sustained traffic
  • Lightweight transformations: Parse a webhook payload and route to a queue
  • Burst workloads: Unpredictable traffic spikes that don’t justify reserved capacity

The Lambda model works when:

  1. Your function is stateless (no side effects beyond the return value)
  2. Cold starts are acceptable (or mitigated by provisioned concurrency)
  3. External dependencies are minimal
  4. Execution time is short (under 15 minutes for Lambda)

Why Lambda struggles with scraping workers

Cold starts with heavy dependencies

A web scraping worker typically needs Playwright, Chromium, and related libraries. The Chromium binary alone is ~200MB. Adding it to a Lambda deployment package or container image means cold start times of 3–8 seconds before any business logic runs.

For a worker called infrequently, this is fine. For a worker processing 100 jobs in quick succession, you’re paying 3–8 seconds of overhead per cold start — or you pay for provisioned concurrency ($50–$200/month per function) to keep instances warm.

Memory requirements

A headless Chromium instance requires 1–2 GB of RAM to run reliably. Lambda pricing is memory × time:

  • 1024 MB × 30 seconds = $0.0000199 per execution
  • 2048 MB × 30 seconds = $0.0000398 per execution

At 100,000 executions/month: $2–$4. Manageable. But the true cost includes:

  • Cold start mitigation (provisioned concurrency)
  • Container image storage
  • VPC configuration for proxy access
  • Monitoring and alerting infrastructure
  • Development overhead

15-minute execution limit

Lambda’s maximum execution time is 15 minutes. For most scraping tasks this isn’t a constraint. But for workers that process multi-page documents, handle complex navigation flows, or need to wait for anti-bot challenges to clear, hitting the limit causes silent failures.

Cloud Functions (Google) has a 60-minute max, which is better. But the 15-minute Lambda limit is a gotcha that only reveals itself in production.

VPC + proxy complexity

Serious scraping requires proxy rotation. Lambda running inside a VPC needs a NAT gateway to access the internet — which costs ~$32/month flat regardless of usage. Configuring proxy rotation through Lambda requires either:

  • A proxy provider accessible over HTTPS (simple but expensive per request)
  • A proxy pool running on dedicated infrastructure (negates “serverless” simplicity)

Operational overhead

Building a “scraping Lambda” still requires:

  • Container image with Chromium
  • IAM roles and policies
  • CloudWatch logging configuration
  • Error handling and retry logic
  • Dead letter queues for failed invocations
  • VPC configuration if using proxies
  • A queue (SQS) if processing batches asynchronously

This is not “zero ops.” This is a moderate amount of ops, paid in setup time and architectural complexity even if not in server management.

The Seek API worker model vs Lambda

When you use a Seek API worker instead of a Lambda function:

ConcernLambdaSeek API Worker
Cold start3–8s (unless provisioned)None (platform manages warmth)
Memory configYou choose, you payManaged
Proxy setupVPC + NAT or external serviceIncluded
Chromium bundlingYou manageIncluded
Retry logicYou buildPlatform-provided
Anti-bot updatesYour responsibilityWorker maintainer’s responsibility
IAM/security configYou configureN/A
MonitoringCloudWatch configDashboard included

The trade: you lose flexibility (you can’t run arbitrary code) but you gain operational simplicity and zero infrastructure management.

When Lambda is still the right choice

Lambda makes sense for worker infrastructure when:

  1. Proprietary logic: The worker does something that no managed platform covers, and you need to run it on your own infrastructure
  2. Data residency: Compliance requires data never leaves your AWS account
  3. Tight integration with existing AWS services: If your data pipeline is deeply embedded in AWS (S3, RDS, SQS), keeping the worker in Lambda reduces latency and cross-service data transfer costs
  4. Extreme scale with cost optimization: At millions of executions/month, a highly optimized Lambda function can be cheaper than per-job pricing

The architectural principle

Lambda solves for: “I need to run arbitrary code reactively without managing servers.”

Seek API workers solve for: “I need structured data from a source without maintaining extraction infrastructure.”

These are different problems. Lambda is compute infrastructure. Workers are data infrastructure. For data extraction and enrichment — the vast majority of API worker use cases — the worker platform model is simpler, cheaper to operate, and ready to use immediately.

Use Lambda for the glue between systems. Use workers for the data acquisition. Combine them as needed.