7.1 Why Self-Host Firecrawl?
While Firecrawl offers a convenient cloud API (500 free requests/month), self-hosting is the only logical choice for serious data acquisition scenarios:
- Bypass Request Limits: The self-hosted version is completely free with no request limits.
- Data Privacy and Security: Sensitive business data never passes through third-party servers.
- Greater Control: You can customize underlying Playwright settings or, as shown in this tutorial, force traffic through your own WARP proxy pool.
7.2 Firecrawl's Docker Architecture
Firecrawl is not a single service; it's a microservice architecture composed of multiple components. Understanding its internal structure helps us inject proxy configurations later.
graph TD
User[Initiates API Request] --> API[firecrawl-api (Node.js/Express)]
API -->|Cache / Queue Tasks| Redis[(Redis)]
API -->|Directly Scrapes HTML| Target[Target Website]
API -->|JS Rendering Tasks sent to Headless Browser| PW[playwright-service]
PW -->|Drives Browser Instance| Browser((Chromium))
Browser -->|Scrapes Dynamic Content| Target
subgraph Docker Compose Network
API
Redis
PW
endAs shown in the diagram, when scraping static pages, firecrawl-api issues the HTTP request directly. However, when formats: ["screenshot"] is enabled or a SPA page requiring JS rendering is encountered, the task is handed over to the playwright-service.
This means that to fully bypass anti-scraping systems, we must configure proxies for both of these containers!
7.3 Basic Installation: Pull and Start
Step 1: Clone the Repository
git clone https://github.com/mendableai/firecrawl.git
cd firecrawl
Step 2: Configure Environment Variables
Copy the environment configuration file. The default settings are sufficient for running the service locally.
cp .env.example .env
Note: The default self-hosted version does not require an API Key (the
USE_DB_AUTHENTICATION=truesetting in.envis commented out). All requests sent tolocalhost:3002are allowed by default.
Step 3: Launch with One Command
Since the project consists of several microservices, it is managed using Docker Compose:
docker compose up -d
The initial startup requires pulling images and compiling the Playwright service, which may take 5-10 minutes.
Step 4: Verify Service Readiness
Once the containers are up, check the API health status:
curl http://localhost:3002/v1/scrape
If it returns something like {"success":false,"error":"Invalid URL"} (or similar error indicating it reached the API), the service is successfully running and listening for requests.
7.4 Common Startup Issues
| Symptom | Cause | Solution |
|---|---|---|
docker compose command not found |
Docker not installed correctly | Install Docker Desktop (macOS/Windows) or docker-ce (Linux) |
Port conflict: Bind for 0.0.0.0:3002 failed |
3002 or 6379 (Redis) already in use | Stop the service using the port or change the port mapping in docker-compose.yaml |
| Container keeps restarting (Exit 1) | Missing .env or permission issues |
Run docker compose logs firecrawl-api to see the specific error |
7.5 Chapter Review
- Why does Firecrawl require a separate
playwright-servicecontainer? What problem does it solve? - What happens if you are already running Redis on your machine when you start Firecrawl? How do you resolve it?
- Why is it generally unnecessary to configure a
FIRECRAWL_API_KEYin a local self-hosted environment?