Chapter 08 | Traffic Hijacking: Injecting WARP Proxies into Dockerized Firecrawl

7 MIN READ | UPDATED: 2026-06-16
DIRECT SUMMARY // KEY TAKEAWAY

Solve the communication barrier between Docker containers and the host machine. Learn how to use 'host.docker.internal' to route Firecrawl's traffic through your local WARP proxy for stealthy scraping.

8.1 The Communication Barrier Between Docker and Host

We now have two components:

  1. Host machine: A Cloudflare WARP proxy listening on localhost:40000.
  2. Docker container: The Firecrawl service running inside.

The Core Issue: Inside a Docker container, localhost refers to the container itself, not your Mac or Linux host. If we set PROXY_SERVER=socks5://localhost:40000 in the Firecrawl configuration, requests will be sent to the container's own port 40000, resulting in a Connection refused error.

The host.docker.internal Magic

To solve this, Docker provides a special DNS name: host.docker.internal. Accessing this address within a container automatically resolves to the host machine's internal IP address.


8.2 Modifying Firecrawl Environment Variables

We need to tell Firecrawl: All outbound scraping requests must go through port 40000 on the host machine.

Open the .env file in your cloned Firecrawl directory and add the following two lines at the end:

# /firecrawl/.env

# Global SOCKS5 proxy configuration (pointing to host's WARP proxy)
# Note: Do not use localhost; you must use host.docker.internal
PROXY_SERVER="socks5://host.docker.internal:40000"

# (Optional but recommended) If the target site has Geo-location limits
# PROXY_USERNAME="usr"
# PROXY_PASSWORD="pwd"

Note: Linux Docker might not resolve host.docker.internal by default. If you are running on a Linux server, you need to add extra_hosts: - "host.docker.internal:host-gateway" to the relevant containers in docker-compose.yaml.


8.3 Dual Injection: Configuring Node.js and Playwright

As analyzed in Chapter 6, Firecrawl has two scraping engines. Modifying .env alone is not enough; we must ensure that docker-compose.yaml passes this proxy environment variable to both critical containers.

Open docker-compose.yaml to check or modify:

services:
  # 1. Core API service (handles static page scraping)
  firecrawl-api:
    image: mendableai/firecrawl-api:latest
    environment:
      - PROXY_SERVER=${PROXY_SERVER}  # ✅ Ensure this line exists
      # ... other configurations

  # 2. Playwright browser service (handles dynamic JS rendering)
  playwright-service:
    image: mendableai/firecrawl-playwright:latest
    environment:
      - PROXY_SERVER=${PROXY_SERVER}  # ✅ Ensure this line exists
      # ... other configurations

After modifying, restart the containers to apply the changes:

docker compose down
docker compose up -d

8.4 Penetration Test: Verifying the Proxy

It's time to verify our hard work. We will have the local Firecrawl scrape a website that displays the client IP (such as httpbin.org/ip or cloudflare.com/cdn-cgi/trace).

Open your terminal and send the following request to test static scraping:

curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.cloudflare.com/cdn-cgi/trace",
    "formats": ["markdown"]
  }'

Analysis of Expected Results: If you see warp=on and an unfamiliar IP in the returned Markdown text, it means Node.js (API container) is successfully using the WARP proxy!

Next, test dynamic browser rendering:

curl -X POST http://localhost:3002/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.cloudflare.com/cdn-cgi/trace",
    "formats": ["markdown"],
    "waitFor": 2000
  }'

(Note: Adding waitFor often triggers the underlying Playwright engine for rendering.)

If it still returns warp=on, congratulations! The entire Firecrawl cluster has been successfully transformed into a stealthy scraper with Cloudflare Edge IP masking capabilities. And all of this is running locally without affecting any other software on your computer!


8.5 Chapter Review

  1. Why can't you set the proxy address to socks5://127.0.0.1:40000 in the Docker container configuration?
  2. What happens if you don't inject the PROXY_SERVER environment variable into the playwright-service container?
  3. If you want to temporarily disable the proxy for a specific scraping task, does the Firecrawl API support overriding the environment variable in the request body?