feat(docker): add user-provided hooks support to Docker API #1388

ntohidi · 2025-08-11T05:28:51Z

Summary

Adds comprehensive user-provided hooks support to the Docker API, allowing users to inject custom Python functions as strings that execute at specific points in the crawling pipeline. This enables advanced customization of crawler behavior without modifying the core codebase.

Fixes #1377

List of files changed and why

deploy/docker/hook_manager.py - New module implementing UserHookManager for safe validation, compilation, and execution of user-provided hook functions with error isolation
deploy/docker/api.py - Updated handle_crawl_request and handle_stream_crawl_request to integrate hooks, maintain hook_manager instance for execution tracking
deploy/docker/schemas.py - Added HookConfig and CrawlRequestWithHooks models to support hooks in API requests
deploy/docker/server.py - Updated /crawl and /crawl/stream endpoints to accept hooks parameter, added /hooks/info endpoint for hook discovery
docs/md_v2/core/docker-deployment.md - Added comprehensive hooks documentation section with security warnings, real-world examples, and best practices
docs/examples/docker_hooks_examples.py - Created example file demonstrating all 8 hooks with practical use cases (authentication, performance optimization, content extraction)
tests/docker/test_hooks_client.py - Added test suite for basic hooks functionality and error handling
tests/docker/test_hooks_comprehensive.py - Added comprehensive test suite covering all hook points, timeout handling, and edge cases

How Has This Been Tested?

Built Docker image locally using docker compose build
Ran container on port 11235 and verified all endpoints
Executed comprehensive test suite (docs/examples/docker_hooks_examples.py) with 100% success rate
Tested all 8 hook points individually and in combination
Verified error isolation - malformed hooks don't crash the crawler
Tested timeout protection with slow-running hooks
Validated execution tracking and statistics reporting
Confirmed JSON serialization of all hook results
Tested with real URLs (httpbin.org, Wikipedia, BBC, GitHub)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Implements comprehensive hooks functionality allowing users to provide custom Python functions as strings that execute at specific points in the crawling pipeline. Key Features: - Support for all 8 crawl4ai hook points: • on_browser_created: Initialize browser settings • on_page_context_created: Configure page context • before_goto: Pre-navigation setup • after_goto: Post-navigation processing • on_user_agent_updated: User agent modification handling • on_execution_started: Crawl execution initialization • before_retrieve_html: Pre-extraction processing • before_return_html: Final HTML processing Implementation Details: - Created UserHookManager for validation, compilation, and safe execution - Added IsolatedHookWrapper for error isolation and timeout protection - AST-based validation ensures code structure correctness - Sandboxed execution with restricted builtins for security - Configurable timeout (1-120 seconds) prevents infinite loops - Comprehensive error handling ensures hooks don't crash main process - Execution tracking with detailed statistics and logging API Changes: - Added HookConfig schema with code and timeout fields - Extended CrawlRequest with optional hooks parameter - Added /hooks/info endpoint for hook discovery - Updated /crawl and /crawl/stream endpoints to support hooks Safety Features: - Malformed hooks return clear validation errors - Hook errors are isolated and reported without stopping crawl - Execution statistics track success/failure/timeout rates - All hook results are JSON-serializable Testing: - Comprehensive test suite covering all 8 hooks - Error handling and timeout scenarios validated - Authentication, performance, and content extraction examples - 100% success rate in production testing Documentation: - Added extensive hooks section to docker-deployment.md - Security warnings about user-provided code risks - Real-world examples using httpbin.org, GitHub, BBC - Best practices and troubleshooting guide ref #1377

coderabbitai · 2025-08-11T05:28:59Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/docker-hooks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(docker): add user-provided hooks support to Docker API #1388

feat(docker): add user-provided hooks support to Docker API #1388

Uh oh!

ntohidi commented Aug 11, 2025

Uh oh!

coderabbitai bot commented Aug 11, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

feat(docker): add user-provided hooks support to Docker API #1388

Are you sure you want to change the base?

feat(docker): add user-provided hooks support to Docker API #1388

Uh oh!

Conversation

ntohidi commented Aug 11, 2025

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Uh oh!

coderabbitai bot commented Aug 11, 2025

Review skipped

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

CodeRabbit Configuration File (`.coderabbit.yaml`)