AI pipelines are slow. There’s no way around it. Whether you’re calling GPT-4o, running a multi-step RAG pipeline, or processing a batch of documents, the execution time is measured in seconds — and often tens of seconds.
This creates a fundamental mismatch with the standard HTTP request/response model. Most web APIs are designed to respond in milliseconds. Your frontend, your orchestrator, your serverless function — they all have timeouts that were set before AI workloads existed.
In this article, we’ll look at why synchronous AI pipelines fail at scale, what the async job queue pattern looks like, and how to implement it cleanly on Seek API.
The synchronous API anti-pattern
Here’s what a naive synchronous AI pipeline looks like:
// Client
const result = await fetch('/api/summarize', {
method: 'POST',
body: JSON.stringify({ url }),
});
const data = await result.json();
// → This hangs for 15 seconds. Then times out.
The problems compound quickly:
Client-side timeouts. Browser fetch defaults to no timeout, but many HTTP clients default to 30s. Your AI pipeline routinely exceeds that.
Server timeouts. Lambda functions timeout at 29s by default. Vercel Edge Functions offer 30s. Many reverse proxies timeout at 60-90s. Your pipeline doesn’t care.
Retry storms. When a request times out, clients retry. Three retries on a 15-second operation means 45 seconds of duplicate work and three times the LLM API costs.
Poor user experience. The user sees a spinner for 15 seconds, then an error. There’s no progress, no feedback, no way to check later.
The async job pattern
The solution is to decouple job submission from job completion. This is a 30-year-old pattern from message queues, applied to AI APIs:
Client Server
| |
|-- POST /jobs -----------→ | (submit, instant)
|← 200 { job_uuid } ------ |
| |
| [work happening]
| |
|-- GET /jobs/uuid ------→ | (poll, ~2s later)
|← 200 { status: PENDING } |
| |
|-- GET /jobs/uuid ------→ | (poll again)
|← 200 { status: completed, result: {...} }
The key insight: submission and execution are separate concerns. The submission endpoint returns instantly. The work happens asynchronously. The client polls until ready.
This maps perfectly to how AI APIs actually behave:
- Deterministic latency: Each poll is fast (milliseconds). No request ever hangs for 15 seconds.
- Natural retry behavior: If a poll fails, retry the poll — not the expensive work.
- Progress visibility: You can return intermediate status (PENDING → PROCESSING → completed).
- Fire-and-forget: Clients can submit and check back later, even from a mobile app with a spotty connection.
Implementing async jobs with Seek API
On Seek API, every worker call is async by default. Here’s the full flow:
1. Submit the job
curl -X POST https://api.seek-api.com/v1/workers/gpt-summarizer/jobs \
-H "X-Api-Key: sk_prod_xxxx" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/long-article",
"maxLength": 500
}'
Response (immediate, ~50ms):
{
"job_uuid": "job_f3a2b1c4d5e6",
"worker_id": "gpt-summarizer",
"status": "PENDING",
"submitted_at": "2026-03-05T10:24:39.483Z"
}
2. Poll for completion
curl https://api.seek-api.com/v1/jobs/job_f3a2b1c4d5e6 \
-H "X-Api-Key: sk_prod_xxxx"
After 2-3 seconds, you’ll get:
{
"job_uuid": "job_f3a2b1c4d5e6",
"status": "completed",
"duration_ms": 2340,
"cost_usd": 0.003,
"response_json": {
"title": "The Future of Infrastructure",
"summary": "A deep analysis of how serverless computing is reshaping...",
"keyPoints": ["Point 1", "Point 2", "Point 3"],
"wordCount": 3420
}
}
3. Handle the result
In your application, you’d poll with exponential backoff:
async function pollJob(jobUuid, apiKey, maxAttempts = 30) {
for (let i = 0; i < maxAttempts; i++) {
const res = await fetch(`https://api.seek-api.com/v1/jobs/${jobUuid}`, {
headers: { 'X-Api-Key': apiKey },
});
const job = await res.json();
if (job.status === 'completed') return job.response_json;
if (job.status === 'failed') throw new Error(job.error ?? 'Job failed');
// Exponential backoff: 1s, 2s, 4s, max 8s
const delay = Math.min(1000 * Math.pow(2, i), 8000);
await new Promise((r) => setTimeout(r, delay));
}
throw new Error('Job timed out after polling');
}
Multi-step AI pipelines
The async job model becomes even more powerful for multi-step pipelines, where the output of one job feeds the input of the next.
Here’s a real-world example: a lead enrichment pipeline that:
- Scrapes a company website
- Extracts key information
- Runs an AI analysis
async function enrichLead(websiteUrl) {
// Step 1: Scrape the website
const scrapeJob = await submitJob('website-scraper', { url: websiteUrl });
const scrapeResult = await pollJob(scrapeJob.job_uuid);
// Step 2: Extract structured data
const extractJob = await submitJob('data-extractor', {
html: scrapeResult.html,
schema: ['company_name', 'industry', 'tech_stack', 'team_size'],
});
const extractResult = await pollJob(extractJob.job_uuid);
// Step 3: AI analysis
const analysisJob = await submitJob('lead-scorer', {
company: extractResult,
criteria: ['b2b', 'funded', 'tech-enabled'],
});
const analysis = await pollJob(analysisJob.job_uuid);
return { ...extractResult, analysis };
}
Each step runs on a dedicated worker with the right resources. Steps are isolated — a failure in step 2 doesn’t affect the step 1 result. You can retry individual steps without re-running the full pipeline.
When to go parallel
For pipelines where steps are independent, run them in parallel:
async function analyzeMultiplePages(urls) {
// Submit all jobs simultaneously
const jobs = await Promise.all(
urls.map((url) => submitJob('page-analyzer', { url }))
);
// Poll them all in parallel
const results = await Promise.all(
jobs.map((job) => pollJob(job.job_uuid))
);
return results;
}
You get near-linear scaling: 10 URLs in parallel takes roughly the same time as 1 URL serially.
Handling failures gracefully
Async jobs fail. Networks fail. LLM APIs hit rate limits. Your pipeline needs to handle this cleanly.
Categories of failure
| Status | Meaning | Action |
|---|---|---|
FAILED | Job threw an error | Check error field. Retry if transient. |
TIMEOUT | Job exceeded timeout | Increase timeout or optimize worker. |
CANCELLED | Manually cancelled | Resubmit if needed. |
Retry-safe workers
Design your workers to be idempotent — running them twice with the same input should produce the same output. This makes retries safe:
export const handler = async (input) => {
// Check cache first (using input hash as key)
const cacheKey = hashInput(input);
const cached = await cache.get(cacheKey);
if (cached) return cached;
// Do work
const result = await doExpensiveWork(input);
// Cache result
await cache.set(cacheKey, result, { ttl: 3600 });
return result;
};
Webhooks vs polling
Polling is simple and universally compatible. But for long-running jobs (5-60 minutes), you may prefer webhooks to avoid polling loops.
Seek API (coming Q2 2026) will support webhook callbacks on job completion:
{
"url": "https://example.com/webhook",
"maxResults": 500,
"__callback_url": "https://your-server.com/hooks/job-done"
}
For now, polling with exponential backoff is the recommended approach and works well for jobs up to a few minutes.
Summary
Synchronous HTTP and AI pipelines are fundamentally mismatched. The async job pattern — submit, get UUID, poll — solves every problem:
- No timeouts
- Natural retry behavior
- Fire-and-forget capability
- Linear parallelism
The overhead is minimal (a few extra HTTP calls) and the resilience gains are massive. Whether you’re building a one-off enrichment script or a production AI pipeline processing thousands of documents per day, async jobs are the right default.