-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
feat(docker): add user-provided hooks support to Docker API #1388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Implements comprehensive hooks functionality allowing users to provide custom Python functions as strings that execute at specific points in the crawling pipeline. Key Features: - Support for all 8 crawl4ai hook points: • on_browser_created: Initialize browser settings • on_page_context_created: Configure page context • before_goto: Pre-navigation setup • after_goto: Post-navigation processing • on_user_agent_updated: User agent modification handling • on_execution_started: Crawl execution initialization • before_retrieve_html: Pre-extraction processing • before_return_html: Final HTML processing Implementation Details: - Created UserHookManager for validation, compilation, and safe execution - Added IsolatedHookWrapper for error isolation and timeout protection - AST-based validation ensures code structure correctness - Sandboxed execution with restricted builtins for security - Configurable timeout (1-120 seconds) prevents infinite loops - Comprehensive error handling ensures hooks don't crash main process - Execution tracking with detailed statistics and logging API Changes: - Added HookConfig schema with code and timeout fields - Extended CrawlRequest with optional hooks parameter - Added /hooks/info endpoint for hook discovery - Updated /crawl and /crawl/stream endpoints to support hooks Safety Features: - Malformed hooks return clear validation errors - Hook errors are isolated and reported without stopping crawl - Execution statistics track success/failure/timeout rates - All hook results are JSON-serializable Testing: - Comprehensive test suite covering all 8 hooks - Error handling and timeout scenarios validated - Authentication, performance, and content extraction examples - 100% success rate in production testing Documentation: - Added extensive hooks section to docker-deployment.md - Security warnings about user-provided code risks - Real-world examples using httpbin.org, GitHub, BBC - Best practices and troubleshooting guide ref #1377
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Summary
Adds comprehensive user-provided hooks support to the Docker API, allowing users to inject custom Python functions as strings that execute at specific points in the crawling pipeline. This enables advanced customization of crawler behavior without modifying the core codebase.
Fixes #1377
List of files changed and why
deploy/docker/hook_manager.py
- New module implementing UserHookManager for safe validation, compilation, and execution of user-provided hook functions with error isolationdeploy/docker/api.py
- Updated handle_crawl_request and handle_stream_crawl_request to integrate hooks, maintain hook_manager instance for execution trackingdeploy/docker/schemas.py
- Added HookConfig and CrawlRequestWithHooks models to support hooks in API requestsdeploy/docker/server.py
- Updated /crawl and /crawl/stream endpoints to accept hooks parameter, added /hooks/info endpoint for hook discoverydocs/md_v2/core/docker-deployment.md
- Added comprehensive hooks documentation section with security warnings, real-world examples, and best practicesdocs/examples/docker_hooks_examples.py
- Created example file demonstrating all 8 hooks with practical use cases (authentication, performance optimization, content extraction)tests/docker/test_hooks_client.py
- Added test suite for basic hooks functionality and error handlingtests/docker/test_hooks_comprehensive.py
- Added comprehensive test suite covering all hook points, timeout handling, and edge casesHow Has This Been Tested?
docker compose build
docs/examples/docker_hooks_examples.py
) with 100% success rateChecklist: