boilerplatevideoserverless

Boilerplate: Serverless video processing pipeline on Cloudflare + FFmpeg for microdramas

ffrees

2026-02-01

12 min read

Deploy a cost-aware serverless video pipeline using Cloudflare Workers, R2 and FFmpeg for microdramas — upload, transcode, thumbnail, and CDN delivery.

Hook: ship microdramas without breaking your budget

If you're building short, episodic video — microdramas, vertical shorts, or fast-turn prototypes — you need a pipeline that is fast to deploy, predictable in cost, and avoids vendor lock‑in. This boilerplate shows how to accept uploads, run FFmpeg-based transcodes and thumbnails on lightweight compute, store originals in Cloudflare R2, and deliver with the Cloudflare CDN — all orchestrated by Cloudflare Workers. It’s optimized for cost-conscious teams using free tiers and tiny instances, and it’s deployable end-to-end.

Why this matters in 2026

Short-form and vertical video exploded in 2024–2026; investors and platforms are doubling down on microdramas and episodic short content. At the same time, edge compute and CDN vendors moved aggressively to reduce egress friction — Cloudflare’s product and platform plays (including recent acquisitions and AI investments) make R2+Workers an increasingly attractive hosting and orchestration surface for media workflows. That means you can get low-latency delivery and low-cost storage while keeping the heavy compute for transcoding off edge and on cheap ephemeral containers.

Architecture overview (most important first)

At a glance, this pipeline follows three principles: keep storage on R2, orchestrate at the edge with Workers, and run FFmpeg on a small, ephemeral runner for heavy CPU work.

Client -> Upload Worker: client uploads file directly through a Cloudflare Worker endpoint, which writes the original to R2.
Job Queue / Webhook: Worker enqueues a lightweight job (job metadata only) to a secure job endpoint or queue. The job is tiny — path, id, callback token.
FFmpeg Runner (ephemeral): a tiny container (Fly.io/Railway/Render) pulls the original (via a secure Worker proxy), runs FFmpeg workflows (transcode + thumbnails), and writes outputs back to R2.
Delivery Worker / CDN: a Cloudflare Worker serves media from R2 with cache headers tuned for CDN caching. Optionally use Cloudflare Images for adaptive formats.

Why run FFmpeg off‑edge?

Cloudflare Workers can’t execute native binaries; heavy CPU tasks like AV1 encoding still belong on a containerized runner.
Running on tiny ephemeral runners gives predictable billing: you pay for seconds of CPU instead of long-running VMs.
This split keeps the hot path (upload, metadata, serving) on the edge for latency and cost benefits.

Cost-aware design patterns

Keep these rules in your build checklist — they minimize surprises as traffic grows.

Store originals in R2 and serve derivatives through the Cloudflare CDN to reduce egress charges and improve cache hit rates. See the Zero‑Trust Storage Playbook for governance patterns around bucket access.
Avoid full re-encodes per request. Generate a standard set of renditions (e.g., 1080p, 720p, 480p, thumbnail) and use manifest-driven ABR or simple device hints to select variant.
Prefer fast-presets and CRF for speed — use faster encoding and slightly higher CRF (e.g., crf 24–28) for prototypes and side projects to reduce CPU time.
Throttle uploads and transcodes with queue depth limits and backpressure to avoid burst costs.
Use ephemeral runners (scale-to-zero platforms) with per-second billing and a small memory footprint; schedule batch jobs during off-peak if you want lower spot pricing.

Deployable recipe: step-by-step

Below are the minimal, deployable components. Replace placeholders with your domain, account, and secret names.

1) Upload Worker (write to R2)

Goal: accept multipart uploads and write the original file into R2. The Worker also enqueues a job to the transcode endpoint.

// workers/upload/index.js
addEventListener('fetch', event => event.respondWith(handle(event.request)));

async function handle(req) {
  if (req.method !== 'POST') return new Response('Use POST', { status: 405 });

  const form = await req.formData();
  const file = form.get('file');
  if (!file) return new Response('file required', { status: 400 });

  const id = crypto.randomUUID();
  const key = `uploads/${id}/${file.name}`;

  // R2 binding: BUCKET
  await BUCKET.put(key, file.stream(), {
    httpMetadata: { contentType: file.type || 'application/octet-stream' }
  });

  // enqueue job (call an internal webhook with HMAC token)
  const job = { id, key, created: Date.now() };
  await fetch(TRANSCODE_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'X-Signature': hmac(job) },
    body: JSON.stringify(job)
  });

  return new Response(JSON.stringify({ id, key }), { headers: { 'Content-Type': 'application/json' } });
}

function hmac(payload) {
  // simple HMAC helper - use SubtleCrypto
  return 'TODO_SIGNATURE';
}

Notes: bind your R2 bucket to the Worker (BUCKET) and store the webhook URL in an environment variable. Use a real HMAC using SubtleCrypto to sign job payloads.

2) Transcode runner (ephemeral container)

Goal: receive job webhook, fetch source via a secure Worker-proxy URL, run FFmpeg, and upload outputs back to R2 using the S3-compatible API.

Dockerfile (minimal with static FFmpeg)

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y curl ca-certificates \
  && rm -rf /var/lib/apt/lists/*
# Use a small static ffmpeg build (replace with your pinned release or build at CI)
ADD https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz /tmp/
RUN apt-get update && apt-get install -y xz-utils && \
  tar -xJf /tmp/ffmpeg-release-amd64-static.tar.xz -C /usr/local/bin --strip-components=1 \
  && chmod +x /usr/local/bin/ffmpeg /usr/local/bin/ffprobe && rm -rf /tmp/*
COPY runner /app
WORKDIR /app
CMD ["node", "index.js"]

Node runner (simplified)

// runner/index.js
const http = require('http');
const { spawn } = require('child_process');
const fetch = require('node-fetch');

const PORT = process.env.PORT || 8080;
const PROXY_URL = process.env.R2_PROXY_URL; // Worker endpoint that streams R2 objects with HMAC check

http.createServer(async (req, res) => {
  if (req.method !== 'POST') { res.writeHead(405); return res.end(); }

  let body = '';
  for await (const chunk of req) body += chunk;
  const job = JSON.parse(body);

  // fetch source via proxy (streams)
  const sourceUrl = `${PROXY_URL}?key=${encodeURIComponent(job.key)}&token=${job.id}`;
  const r = await fetch(sourceUrl);
  if (!r.ok) { res.writeHead(502); return res.end('source fetch failed'); }

  // write to local temp
  const tmpIn = `/tmp/${job.id}.in`;
  const tmpOut = `/tmp/${job.id}.mp4`;
  const outThumb = `/tmp/${job.id}.jpg`;
  const fileStream = require('fs').createWriteStream(tmpIn);
  await new Promise((resolve, reject) => { r.body.pipe(fileStream).on('finish', resolve).on('error', reject); });

  // run FFmpeg: transcode and thumbnail
  await runFFmpeg(['-i', tmpIn, '-c:v', 'libx264', '-preset', 'fast', '-crf', '24', '-c:a', 'aac', '-b:a', '96k', tmpOut]);
  await runFFmpeg(['-ss', '00:00:02', '-i', tmpIn, '-vframes', '1', '-q:v', '2', outThumb]);

  // upload outputs: POST to an upload Worker endpoint that writes to R2 (keeps secrets in Workers)
  await uploadToWorker(`/upload-derivative?id=${job.id}&type=mp4`, tmpOut);
  await uploadToWorker(`/upload-derivative?id=${job.id}&type=thumb`, outThumb);

  res.writeHead(200);
  res.end('ok');

}, PORT).listen(PORT);

function runFFmpeg(args) {
  return new Promise((resolve, reject) => {
    const ff = spawn('ffmpeg', ['-y', ...args]);
    ff.stderr.on('data', d => console.error(d.toString()));
    ff.on('close', code => code === 0 ? resolve() : reject(new Error('ffmpeg failed')));
  });
}

async function uploadToWorker(path, filePath) {
  const stats = require('fs').statSync(filePath);
  const r = await fetch(`${process.env.UPLOAD_WORKER_URL}${path}`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/octet-stream', 'Content-Length': stats.size },
    body: require('fs').createReadStream(filePath)
  });
  if (!r.ok) throw new Error('upload failed');
}

Notes: this runner keeps secrets out of the container by uploading outputs to a secure Worker endpoint (which writes to R2). If you prefer the runner to write directly to R2, use the S3-compatible API with short-lived credentials — but storing secrets on the edge is usually safer.

3) Worker proxy for secure reads

To prevent making the R2 bucket public, provide a Worker endpoint that validates a signed token and streams R2.get() to the runner. The runner calls this proxy to fetch the original.

// workers/proxy/index.js
addEventListener('fetch', evt => evt.respondWith(handle(evt.request)));
async function handle(req) {
  const url = new URL(req.url);
  const key = url.searchParams.get('key');
  const token = url.searchParams.get('token');
  if (!validate(token, key)) return new Response('unauthorized', { status: 401 });

  const obj = await BUCKET.get(key);
  if (!obj) return new Response('not found', { status: 404 });
  return new Response(obj.body, { headers: { 'Content-Type': obj.httpMetadata.contentType || 'application/octet-stream' } });
}

function validate(token, key) { /* HMAC check */ return true; }

4) Serving derivatives via CDN-friendly Worker

Serve files with strong Cache-Control headers and long TTLs. The Worker will set CORS for players and respect Range headers for streaming.

// workers/serve/index.js
addEventListener('fetch', evt => evt.respondWith(handle(evt.request)));
async function handle(req) {
  const url = new URL(req.url);
  const key = url.pathname.replace(/^\//, ''); // e.g., derivatives/{id}.mp4
  const obj = await BUCKET.get(key, { onlyIf: { etagMatches: req.headers.get('if-none-match') } });
  if (!obj) return new Response('not found', { status: 404 });

  const headers = new Headers();
  headers.set('Content-Type', obj.httpMetadata.contentType || 'video/mp4');
  headers.set('Cache-Control', 'public, max-age=31536000, immutable');
  headers.set('Accept-Ranges', 'bytes');
  // Add CORS for players
  headers.set('Access-Control-Allow-Origin', '*');

  return new Response(obj.body, { headers });
}

This Worker sits behind Cloudflare’s CDN — the first request will hit the edge, and subsequent requests will be cache hits at POPs worldwide.

FFmpeg flags and profiles for microdramas (practical)

Profiles below are tuned for cost and quality. Choose one set for prototypes; upgrade later for production-quality encodes.

1080p web (master): ffmpeg -i in.mp4 -c:v libx264 -preset fast -crf 24 -c:a aac -b:a 96k out_1080.mp4
720p mobile: ffmpeg -i in.mp4 -vf scale=-2:720 -c:v libx264 -preset fast -crf 26 -c:a aac -b:a 64k out_720.mp4
480p low: ffmpeg -i in.mp4 -vf scale=-2:480 -c:v libx264 -preset veryfast -crf 28 -c:a aac -b:a 48k out_480.mp4
Thumbnail (poster): ffmpeg -ss 00:00:02 -i in.mp4 -vframes 1 -q:v 2 thumb.jpg
WebM / AV1 (optional): be aware AV1 is heavier on CPU — only enable for high-value content.

Operational considerations & observability

Don’t treat this as a simple upload: transcodes are where costs and failures appear.

Idempotency: tag jobs by original’s checksum and ensure retries aren’t double‑processing. Use a one-page stack audit when reconciling repeated failures and to remove redundant processes.
Retries and backoff: keep exponential backoff and a DLQ if your runner is overloaded.
Metrics: emit timing for upload, fetch, encode CPU seconds, and bytes stored. Log ffmpeg stderr to a centralized store for debugging codec failures.
Cost alerts: track R2 storage growth and runner CPU time; set thresholds to switch encodes from AV1 to H.264 if cost spikes.

Security and permissions

Keep secrets in Workers — Workers should own R2 write permissions; the runner uploads results through authenticated Worker endpoints to avoid embedding R2 keys in containers. See the Zero‑Trust Storage Playbook for patterns on short-lived credentials.
Signed job tokens: use HMAC with a rotating key to validate webhooks between Workers and runners.
Least privilege: upload endpoints accept only known types and limit file sizes to prevent abuse.

Scaling patterns

Start serverless and evaluate these upgrades as you need scale:

Batch workers — accumulate jobs and run batched FFmpeg processes when concurrency is limited.
Spot / preemptible runners — use cheaper capacity for non-real-time batches.
Edge thumbnailing — for quick low-res previews, use FFmpeg.wasm at the edge (in Workers) for thumbnails only; avoid full transcodes in WASM for now.
Adaptive bitrates — pre-generate renditions and use a simple manifest to switch client playback based on bandwidth.

2026 trends to watch (and prepare for)

AV1 and successor codecs are mainstream — but CPU cost remains high; plan for accelerated encoding or outsource to dedicated hardware for large catalogs.
Edge inference for thumbnails and metadata — Cloudflare’s recent moves into AI and data marketplaces (e.g., acquisitions announced late 2025) make it easier to integrate creator metadata and automated tagging at the edge.
Zero-evasion egress models — CDN providers continue to blur the line between storage and CDN egress pricing; design assuming improved edge-to-storage economics but monitor billing closely.
Serverless containers rise — scale-to-zero container platforms are now better for short transcode jobs than large VMs; architect accordingly.

Real-world example: cost-sane microdrama rollout

Scenario: a 6-episode microdrama series; each episode 90s, average 30MB upload.

Store originals in R2 — long-term storage at a low rate relative to compute.
Run a single fast preset transcode for initial release: ~crf 24 H.264 720p to reduce CPU time and storage.
Generate one thumbnail per episode with FFmpeg.wasm on the edge for instant UI while full thumbnails come from the runner.
Serve all assets through Cloudflare CDN with long cache TTLs; update cache when new promos are published.

Outcome: predictable, small compute bill for a handful of ephemeral runs and minimized egress when most traffic hits cached assets at the CDN.

Troubleshooting quick hits

Uploader fails: confirm Worker receives multipart/form-data and R2.put() has correct Content-Type.
Runner can’t fetch source: validate proxy worker token HMAC and verify the proxy’s BUCKET.get() path exists.
FFmpeg out-of-memory: reduce runner memory or downscale inputs before encode; consider chunked transcodes for long-form content.
Cache not serving: ensure Cache-Control headers are set and that the Worker is routed through Cloudflare's CDN rather than bypassed.

Advanced ideas and next steps

If you graduate from prototype to production, consider:

Integrating a small video processing farm with hardware acceleration for AV1/H.265.
Implementing multi-cloud runners and a centralized job scheduler to avoid single-provider compute limits.
Using Cloudflare Images or Stream for advanced feature sets (adaptive images, on-the-fly transforms) if budget allows; R2+Workers remains flexible and cost-efficient.
Adding automated content moderation and AI-based scene tagging using edge AI services — useful for microdrama discovery and recommendation.

Actionable checklist before you deploy

Bind R2 and create the Buckets for originals and derivatives.
Deploy Upload Worker with proper size limits and HMAC signing.
Implement Runner with secure Worker proxy for reads and secure upload endpoints.
Set Cache-Control headers in the Serve Worker and test CDN hits globally.
Instrument metrics for upload count, encode CPU seconds, storage bytes, and bandwidth served.

Wrap-up and call to action

This boilerplate is designed to get a microdrama pipeline live in hours, not weeks: edge orchestration with Workers, durable, low-cost storage in R2, and ephemeral FFmpeg runners balance cost, speed, and quality. Start small with fast presets and a single runner, monitor CPU seconds and storage, and iterate to add AV1 or hardware acceleration only when necessary.

Ready to deploy? Clone the starter repo, wire your Workers to an R2 bucket, and spin a tiny Fly.io or Railway runner. If you want the complete boilerplate (Workers code, Docker runner, CI scripts), grab the repo linked from this article page and follow the Quickstart. Share your results and optimizations — we’ll publish community-tested presets for microdramas in a follow-up.

For teams prototyping microdramas in 2026: ship fast, measure CPU, and cache aggressively.

Next step: deploy the Upload Worker and test with a single 30s clip. If it succeeds, push the job and watch the runner transcode — you’ll have a reliable end-to-end pipeline in under an hour.

frees

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.