A Docker-based API server for scraping LinkedIn job listings. This application converts job search results into JSON format instead of CSV files, making it perfect for integration with other applications.
- RESTful API: Easy-to-use HTTP endpoints for job scraping
- Docker-based: Runs in Ubuntu containers for consistency
- JSON Output: Returns structured JSON data instead of CSV files
- Configurable: Support for custom search parameters
- Authentication: Supports both LinkedIn credentials and session cookies
- Rate Limiting: Built-in delays and retry logic to respect LinkedIn's limits
git clone <repository-url>
cd n8n_linkedinJobScrapperCopy the example environment file and fill in your LinkedIn credentials:
cp .env.example .envEdit .env and add your LinkedIn credentials (see Authentication section below).
# Build and start the API server
docker-compose up --build
# Or run in background
docker-compose up -d --buildThe API will be available at http://localhost:8000
Add your LinkedIn credentials to .env:
LINKEDIN_EMAIL=[email protected]
LINKEDIN_PASSWORD=your_password- Log into LinkedIn in your browser
- Open Developer Tools (F12)
- Go to Application/Storage > Cookies > https://www.linkedin.com
- Copy these cookie values to your
.envfile:li_atJSESSIONIDliaplidcbcookiebscookie
GET /healthGET /jobsScrapes jobs using the configuration from config.json.
POST /jobs/custom
Content-Type: application/json
{
"search_urls": [
{
"name": "Python Jobs",
"url": "https://www.linkedin.com/jobs/search/?keywords=python&location=...",
"description": "Python developer jobs"
}
],
"max_pages": 2,
"filter_easy_apply": true
}GET /jobs/search?keywords=python&location=101282230&experience_level=2&time_filter=r3600&max_pages=1Parameters:
keywords: Job search keywords (required)location: LinkedIn location ID (default: 101282230 - India)experience_level: 1=Internship, 2=Entry level, 3=Associate, 4=Mid-Senior, 5=Director, 6=Executivetime_filter: r3600=1h, r86400=24h, r604800=1wmax_pages: Maximum pages to scrape
All job endpoints return JSON in this format:
{
"success": true,
"message": "Successfully scraped 15 jobs",
"jobs": [
{
"job_id": "3775423847",
"job_url": "https://www.linkedin.com/jobs/view/3775423847",
"company_name": "TechCorp",
"job_title": "Software Engineer",
"time_posted": "1 hour ago",
"num_applicants": "50 applicants",
"job_location": "San Francisco, CA",
"experience_needed": "2+ years experience in...",
"description_content": "We are looking for...",
"has_easy_apply": true,
"application_type": "Easy Apply"
}
],
"total_jobs": 15,
"timestamp": "2024-01-15T10:30:00"
}Edit config.json to customize default search parameters:
{
"MAX_PAGES_PER_SEARCH": 1,
"SEARCH_URLS": [
{
"name": "Firmware Jobs - Last Hour",
"url": "https://www.linkedin.com/jobs/search/?f_E=2&f_TPR=r3600&keywords=firmware",
"description": "Firmware jobs posted in last hour"
}
],
"REQUEST_TIMEOUT": 30,
"PAGE_DELAY": 10
}# Health check
curl http://localhost:8000/health
# Get jobs with default config
curl http://localhost:8000/jobs
# Search for specific jobs
curl "http://localhost:8000/jobs/search?keywords=python&max_pages=2"import requests
# Health check
response = requests.get("http://localhost:8000/health")
print(response.json())
# Get jobs
response = requests.get("http://localhost:8000/jobs")
jobs_data = response.json()
print(f"Found {jobs_data['total_jobs']} jobs")
for job in jobs_data['jobs']:
print(f"- {job['job_title']} at {job['company_name']}")# Build image
docker-compose build
# Start service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop service
docker-compose down
# Rebuild and restart
docker-compose down && docker-compose up --build -d| Variable | Description | Required |
|---|---|---|
LINKEDIN_EMAIL |
LinkedIn email/username | Yes* |
LINKEDIN_PASSWORD |
LinkedIn password | Yes* |
LINKEDIN_LI_AT |
LinkedIn li_at cookie | Yes* |
PORT |
API server port | No (default: 8000) |
HOST |
API server host | No (default: 0.0.0.0) |
DEBUG_MODE |
Enable debug logging | No |
*Either email/password OR session cookies are required.
- The scraper includes built-in delays between requests
- Uses rotating user agents to avoid detection
- Respects LinkedIn's rate limits with exponential backoff
- Avoid running too frequently to prevent account restrictions
- Authentication Failed: Check your credentials in
.env - CAPTCHA Required: Use session cookies instead of username/password
- No Jobs Found: Verify your search URLs are correct
- Rate Limited: Increase delays in
config.json
Check container logs for detailed information:
docker-compose logs -f linkedin-scraper-apiOnce the server is running, visit http://localhost:8000/docs for interactive API documentation (Swagger UI).
This project is licensed under the terms specified in the LICENSE file.