-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[WIKI-553] chore: improved pages components tracking #7966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: preview
Are you sure you want to change the base?
Conversation
Linked to Plane Work Item(s) This comment was auto-generated by Plane |
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughCentralizes HTML component extraction into a COMPONENT_MAP with extract_all_components and get_entity_details, updates page_transaction to use unified extraction and diffing (bulk insert/delete of PageLog), adds logging and import adjustments, and updates view call sites to pass new_description_html/old_description_html/page_id. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Task as PageTransactionTask
participant Extract as ComponentExtractor
participant DB as DB (Page / PageLog)
participant Log as Logger
Note over Task,Extract: Single-pass extraction for old and new HTML
Task->>DB: Fetch Page by page_id
alt Page missing / error
Task->>Log: log exception
Task-->>Task: exit
else Page found
Task->>Extract: extract_all_components(old_description_html)
Extract-->>Task: old_components {component: [mentions]}
Task->>Extract: extract_all_components(new_description_html)
Extract-->>Task: new_components {component: [mentions]}
Note over Task: Normalize mentions -> entity details
loop each component in new_components
Task->>Task: get_entity_details(component, mention) -> (id,type,name)
end
Task->>DB: Query existing PageLog entries for page
Task->>Task: Compute deleted_entity_ids and new_unseen_entities
Task->>DB: Bulk insert new PageLog (ignore_conflicts=True)
Task->>DB: Bulk delete removed PageLog entries
Task-->>Task: finish
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances page component tracking by expanding support beyond mentions to include image components and implementing a more flexible, optimized architecture for tracking page content changes.
- Refactored component extraction to handle multiple component types in a single pass
- Added support for image-component tracking alongside existing mention-component tracking
- Improved performance with optimized HTML parsing and increased bulk operation batch size
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
apps/api/plane/bgtasks/page_transaction_task.py (1)
130-136
: Consider adding success logging for observability.Adding log statements for successful bulk operations helps with monitoring and debugging in production.
# Bulk insert and cleanup if new_transactions: PageLog.objects.bulk_create( new_transactions, batch_size=50, ignore_conflicts=True ) + logger.info(f"Created {len(new_transactions)} PageLog entries for page {page_id}") if deleted_transaction_ids: PageLog.objects.filter(transaction__in=deleted_transaction_ids).delete() + logger.info(f"Deleted {len(deleted_transaction_ids)} PageLog entries for page {page_id}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/api/plane/bgtasks/page_transaction_task.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/page_transaction_task.py (2)
apps/api/plane/db/models/page.py (2)
Page
(19-73)PageLog
(76-113)apps/api/plane/utils/exception_logger.py (1)
log_exception
(9-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint API
- GitHub Check: Analyze (javascript)
🔇 Additional comments (4)
apps/api/plane/bgtasks/page_transaction_task.py (4)
3-3
: LGTM!The addition of logging import and logger setup improves observability for background task execution.
Also applies to: 11-17
42-69
: LGTM!The single-pass component extraction approach is efficient and the defensive error handling (returning empty dicts for missing HTML or exceptions) prevents cascading failures.
72-79
: LGTM!The normalization logic cleanly separates raw attribute extraction from entity detail mapping, and the fallback for unknown components prevents errors.
18-35
: No change needed for entity_type. PageLog.entity_type isn’t used in any downstream filters or logic, so leaving it null in both extractors is intentional.
@NarayanBavisetti Can you benchmark the old and new logic and add the results to the PR description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
apps/api/plane/bgtasks/page_transaction_task.py (2)
41-68
: Consider logging exceptions for better observability.The exception handler at line 66 silently returns empty results without logging. While this defensive approach prevents the task from failing, it makes debugging harder when HTML parsing issues occur.
Apply this diff to add logging:
except Exception: + logger.warning("Failed to extract components from HTML", exc_info=True) return {component: [] for component in component_map.keys()}
This maintains the defensive behavior while providing visibility into parsing failures.
41-77
: Optional: Consider adding type hints for clarity.The new utility functions would benefit from type hints to make the interface clearer:
from typing import Dict, List, Any def extract_all_components(description_html: str) -> Dict[str, List[Dict[str, Any]]]: """ Extracts all component types from the HTML value in a single pass. Returns a dict mapping component_type -> list of extracted entities. """ # ... def get_entity_details(component: str, mention: dict) -> Dict[str, Any]: """ Normalizes mention attributes into entity_name, entity_type, entity_identifier. """ # ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/api/plane/app/views/page/base.py
(4 hunks)apps/api/plane/bgtasks/page_transaction_task.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-17T08:23:54.935Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7966
File: apps/api/plane/bgtasks/page_transaction_task.py:108-127
Timestamp: 2025-10-17T08:23:54.935Z
Learning: In the page transaction system (apps/api/plane/bgtasks/page_transaction_task.py), entity names (entity_name) and entity identifiers (entity_identifier) for mentions and components remain constant once set and are not hardcoded values that change, so PageLog records don't need to handle updates to existing entity attributes.
Applied to files:
apps/api/plane/bgtasks/page_transaction_task.py
📚 Learning: 2025-10-17T08:21:37.502Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7966
File: apps/api/plane/bgtasks/page_transaction_task.py:37-39
Timestamp: 2025-10-17T08:21:37.502Z
Learning: In `apps/api/plane/bgtasks/page_transaction_task.py`, the `component_map` variable is intentionally kept separate from `COMPONENT_MAP` to allow other modules to extend it with additional component types in the future, even though it currently just copies `COMPONENT_MAP`.
Applied to files:
apps/api/plane/bgtasks/page_transaction_task.py
🧬 Code graph analysis (2)
apps/api/plane/bgtasks/page_transaction_task.py (2)
apps/api/plane/db/models/page.py (2)
Page
(19-73)PageLog
(76-113)apps/api/plane/utils/exception_logger.py (1)
log_exception
(9-20)
apps/api/plane/app/views/page/base.py (1)
apps/api/plane/bgtasks/page_transaction_task.py (1)
page_transaction
(81-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
apps/api/plane/app/views/page/base.py (1)
140-144
: LGTM! Consistent parameter refactoring across all call sites.All
page_transaction.delay()
invocations correctly pass the new structured parameters (new_description_html
,old_description_html
,page_id
). The old/new values are captured at the right time:
- Create and duplicate flows correctly pass
old_description_html=None
- Update flows correctly capture the old value before save
- Transaction tasks are triggered before the database save, ensuring correct old values
Also applies to: 174-178, 508-512, 558-562
apps/api/plane/bgtasks/page_transaction_task.py (2)
81-142
: LGTM! Efficient component-based extraction with correct diff logic.The refactored approach is solid:
- Single-pass HTML parsing for all component types (lines 92-94)
- Correct set-based diffing to identify additions and deletions (lines 103-105)
- Proper skip logic to avoid duplicate logs (line 109)
- Bulk operations for performance (lines 131-136)
The transformation from ad-hoc extraction to a unified, extensible component map is a clear improvement in maintainability.
81-142
: Provide benchmark results as requested.Per the PR comments, @dheeru0198 requested benchmark comparison between the old and new logic. Please add the benchmark results to the PR description to help assess the performance impact of this refactoring.
Consider benchmarking:
- HTML parsing time for typical page sizes
- Database bulk operation performance
- Overall task execution time
This data will help validate that the refactoring maintains or improves performance.
Description
improved the tracking in pages for the images and mentions.
Type of Change
Summary by CodeRabbit
Bug Fixes
Refactor
before the refactor


after the refactor
there is no major shift in the memory, but have reduced the size of the content being passed in the worker (before this all the description formats were being passed in the task, now its change only
description_html
is been sent.