Skip to content

Conversation

NarayanBavisetti
Copy link
Collaborator

@NarayanBavisetti NarayanBavisetti commented Oct 14, 2025

Description

improved the tracking in pages for the images and mentions.

Type of Change

  • Improvement (change that would cause existing functionality to not work as expected)

Summary by CodeRabbit

  • Bug Fixes

    • Prevents duplicate page activity entries and removes outdated ones when mentions change.
    • Handles empty or missing content gracefully to avoid errors.
  • Refactor

    • Streamlined component extraction into a single pass for faster, more reliable updates to page activity.
    • Consolidated logging and error handling to improve observability.

before the refactor
Screenshot 2025-10-17 at 3 10 42 PM
after the refactor
Screenshot 2025-10-17 at 3 11 04 PM

there is no major shift in the memory, but have reduced the size of the content being passed in the worker (before this all the description formats were being passed in the task, now its change only description_html is been sent.

Copy link

makeplane bot commented Oct 14, 2025

Linked to Plane Work Item(s)

This comment was auto-generated by Plane

Copy link
Contributor

coderabbitai bot commented Oct 14, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Centralizes HTML component extraction into a COMPONENT_MAP with extract_all_components and get_entity_details, updates page_transaction to use unified extraction and diffing (bulk insert/delete of PageLog), adds logging and import adjustments, and updates view call sites to pass new_description_html/old_description_html/page_id.

Changes

Cohort / File(s) Summary
Page transaction refactor and helpers
apps/api/plane/bgtasks/page_transaction_task.py
- Added COMPONENT_MAP-driven extract_all_components(description_html) for single-pass parsing of HTML into component->mentions lists.
- Added get_entity_details(component, mention) to normalize entity_name, entity_type, entity_identifier per component.
- Refactored page_transaction(new_description_html, old_description_html, page_id) to compute old_components/new_components, diff IDs per component, bulk-insert new PageLog entries (ignore_conflicts=True) and bulk-delete removed entries; preserved Page retrieval and error handling.
- Added logger and adjusted imports placement.
View call-site updates (transaction payload args)
apps/api/plane/app/views/page/base.py
- Replaced prior new_value/old_value call-site payloads with structured keyword args: new_description_html, old_description_html, and page_id across create, partial_update, binary update, duplication, and pages description endpoints. No public API signature changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Task as PageTransactionTask
  participant Extract as ComponentExtractor
  participant DB as DB (Page / PageLog)
  participant Log as Logger

  Note over Task,Extract: Single-pass extraction for old and new HTML

  Task->>DB: Fetch Page by page_id
  alt Page missing / error
    Task->>Log: log exception
    Task-->>Task: exit
  else Page found
    Task->>Extract: extract_all_components(old_description_html)
    Extract-->>Task: old_components {component: [mentions]}
    Task->>Extract: extract_all_components(new_description_html)
    Extract-->>Task: new_components {component: [mentions]}

    Note over Task: Normalize mentions -> entity details
    loop each component in new_components
      Task->>Task: get_entity_details(component, mention) -> (id,type,name)
    end

    Task->>DB: Query existing PageLog entries for page
    Task->>Task: Compute deleted_entity_ids and new_unseen_entities
    Task->>DB: Bulk insert new PageLog (ignore_conflicts=True)
    Task->>DB: Bulk delete removed PageLog entries
    Task-->>Task: finish
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hopped through HTML in one quick pass,
Found mentions, names, each little class.
Old crumbs removed, new crumbs kept near,
I bulk-insert, bulk-delete — tidy and clear.
A rabbit’s logbook saved without a fuss. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is incomplete and lacks critical information required by the template. While the "Description" and "Type of Change" sections are present, the "Test Scenarios" section—which appears to be a required field based on the template structure—is entirely missing. The description itself is also quite brief, providing only minimal detail about what was improved. Additionally, the PR objectives note indicates that a reviewer requested benchmark results be added to the PR description, but this has not been completed. The author should expand the PR description to include a "Test Scenarios" section that documents how the changes were tested and validated. More technical detail should be added to the "Description" section to explain the refactoring approach and benefits. Additionally, as requested by the reviewer, benchmark results comparing the old and new logic should be included in the description to demonstrate the performance impact of these changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "[WIKI-553] chore: improved pages components tracking" accurately reflects the main objective of this pull request. The summary shows the changes introduce a centralized component extraction mechanism and refactor page transaction workflows to improve tracking of images and mentions in pages. The title is concise, specific, and clearly communicates the primary improvement being made without unnecessary noise or vague terms.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore-page-component-tracking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances page component tracking by expanding support beyond mentions to include image components and implementing a more flexible, optimized architecture for tracking page content changes.

  • Refactored component extraction to handle multiple component types in a single pass
  • Added support for image-component tracking alongside existing mention-component tracking
  • Improved performance with optimized HTML parsing and increased bulk operation batch size

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
apps/api/plane/bgtasks/page_transaction_task.py (1)

130-136: Consider adding success logging for observability.

Adding log statements for successful bulk operations helps with monitoring and debugging in production.

         # Bulk insert and cleanup
         if new_transactions:
             PageLog.objects.bulk_create(
                 new_transactions, batch_size=50, ignore_conflicts=True
             )
+            logger.info(f"Created {len(new_transactions)} PageLog entries for page {page_id}")

         if deleted_transaction_ids:
             PageLog.objects.filter(transaction__in=deleted_transaction_ids).delete()
+            logger.info(f"Deleted {len(deleted_transaction_ids)} PageLog entries for page {page_id}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97e6622 and 9790ff4.

📒 Files selected for processing (1)
  • apps/api/plane/bgtasks/page_transaction_task.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
apps/api/plane/bgtasks/page_transaction_task.py (2)
apps/api/plane/db/models/page.py (2)
  • Page (19-73)
  • PageLog (76-113)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Lint API
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (4)
apps/api/plane/bgtasks/page_transaction_task.py (4)

3-3: LGTM!

The addition of logging import and logger setup improves observability for background task execution.

Also applies to: 11-17


42-69: LGTM!

The single-pass component extraction approach is efficient and the defensive error handling (returning empty dicts for missing HTML or exceptions) prevents cascading failures.


72-79: LGTM!

The normalization logic cleanly separates raw attribute extraction from entity detail mapping, and the fallback for unknown components prevents errors.


18-35: No change needed for entity_type. PageLog.entity_type isn’t used in any downstream filters or logic, so leaving it null in both extractors is intentional.

@pushya22 pushya22 modified the milestones: 0.12.1-dev, 0.13-dev Oct 15, 2025
@dheeru0198
Copy link
Member

@NarayanBavisetti Can you benchmark the old and new logic and add the results to the PR description?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
apps/api/plane/bgtasks/page_transaction_task.py (2)

41-68: Consider logging exceptions for better observability.

The exception handler at line 66 silently returns empty results without logging. While this defensive approach prevents the task from failing, it makes debugging harder when HTML parsing issues occur.

Apply this diff to add logging:

     except Exception:
+        logger.warning("Failed to extract components from HTML", exc_info=True)
         return {component: [] for component in component_map.keys()}

This maintains the defensive behavior while providing visibility into parsing failures.


41-77: Optional: Consider adding type hints for clarity.

The new utility functions would benefit from type hints to make the interface clearer:

from typing import Dict, List, Any

def extract_all_components(description_html: str) -> Dict[str, List[Dict[str, Any]]]:
    """
    Extracts all component types from the HTML value in a single pass.
    Returns a dict mapping component_type -> list of extracted entities.
    """
    # ...

def get_entity_details(component: str, mention: dict) -> Dict[str, Any]:
    """
    Normalizes mention attributes into entity_name, entity_type, entity_identifier.
    """
    # ...
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9790ff4 and 913c134.

📒 Files selected for processing (2)
  • apps/api/plane/app/views/page/base.py (4 hunks)
  • apps/api/plane/bgtasks/page_transaction_task.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-17T08:23:54.935Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7966
File: apps/api/plane/bgtasks/page_transaction_task.py:108-127
Timestamp: 2025-10-17T08:23:54.935Z
Learning: In the page transaction system (apps/api/plane/bgtasks/page_transaction_task.py), entity names (entity_name) and entity identifiers (entity_identifier) for mentions and components remain constant once set and are not hardcoded values that change, so PageLog records don't need to handle updates to existing entity attributes.

Applied to files:

  • apps/api/plane/bgtasks/page_transaction_task.py
📚 Learning: 2025-10-17T08:21:37.502Z
Learnt from: NarayanBavisetti
PR: makeplane/plane#7966
File: apps/api/plane/bgtasks/page_transaction_task.py:37-39
Timestamp: 2025-10-17T08:21:37.502Z
Learning: In `apps/api/plane/bgtasks/page_transaction_task.py`, the `component_map` variable is intentionally kept separate from `COMPONENT_MAP` to allow other modules to extend it with additional component types in the future, even though it currently just copies `COMPONENT_MAP`.

Applied to files:

  • apps/api/plane/bgtasks/page_transaction_task.py
🧬 Code graph analysis (2)
apps/api/plane/bgtasks/page_transaction_task.py (2)
apps/api/plane/db/models/page.py (2)
  • Page (19-73)
  • PageLog (76-113)
apps/api/plane/utils/exception_logger.py (1)
  • log_exception (9-20)
apps/api/plane/app/views/page/base.py (1)
apps/api/plane/bgtasks/page_transaction_task.py (1)
  • page_transaction (81-142)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
apps/api/plane/app/views/page/base.py (1)

140-144: LGTM! Consistent parameter refactoring across all call sites.

All page_transaction.delay() invocations correctly pass the new structured parameters (new_description_html, old_description_html, page_id). The old/new values are captured at the right time:

  • Create and duplicate flows correctly pass old_description_html=None
  • Update flows correctly capture the old value before save
  • Transaction tasks are triggered before the database save, ensuring correct old values

Also applies to: 174-178, 508-512, 558-562

apps/api/plane/bgtasks/page_transaction_task.py (2)

81-142: LGTM! Efficient component-based extraction with correct diff logic.

The refactored approach is solid:

  • Single-pass HTML parsing for all component types (lines 92-94)
  • Correct set-based diffing to identify additions and deletions (lines 103-105)
  • Proper skip logic to avoid duplicate logs (line 109)
  • Bulk operations for performance (lines 131-136)

The transformation from ad-hoc extraction to a unified, extensible component map is a clear improvement in maintainability.


81-142: Provide benchmark results as requested.

Per the PR comments, @dheeru0198 requested benchmark comparison between the old and new logic. Please add the benchmark results to the PR description to help assess the performance impact of this refactoring.

Consider benchmarking:

  • HTML parsing time for typical page sizes
  • Database bulk operation performance
  • Overall task execution time

This data will help validate that the refactoring maintains or improves performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants