Skip to content

✨ [RUM-10415] Add support for action name allowlist masking #3648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

cy-moi
Copy link
Collaborator

@cy-moi cy-moi commented Jun 19, 2025

Motivation

Mask action name with allowlists generated from rum privacy build plugin(WIP). This approach is purely client side and allowlist-based. We aim to mask all action names (custom & auto) OOTB with build time configuration using a build plugin.

The raw string literals would be extracted at build time and loaded on demand in runtime with pre-injected helpers. In SDK, we only need logics of tokenizing raw strings and saving to a dictionary, which will used to mask action names (tokenized in the same way).

Changes

  • Add allowlist processing helpers
  • Mask all action names with the allowlist when privacy build plugin is opt-in
Bundle Old Size New Size Δ Size Old Gzip New Gzip Δ Gzip
rum 147.37 KiB 148.89 KiB +1.52 KiB 50.61 KiB 51.12 KiB +0.51 KiB
rum_slim 106.94 KiB 108.47 KiB +1.53 KiB 36.74 KiB 37.31 KiB +0.57 KiB

Test instructions

Tests with BrowserStack should pass (for regEx compatibility)

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 88.07339% with 13 lines in your changes missing coverage. Please review.

Project coverage is 92.36%. Comparing base (4233472) to head (a4bd48c).
Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
...ore/src/domain/action/privacy/allowedDictionary.ts 87.14% 9 Missing ⚠️
packages/rum-core/src/domain/privacy.ts 50.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3648      +/-   ##
==========================================
- Coverage   92.44%   92.36%   -0.08%     
==========================================
  Files         322      324       +2     
  Lines        8129     8215      +86     
  Branches     1838     1862      +24     
==========================================
+ Hits         7515     7588      +73     
- Misses        614      627      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch 2 times, most recently from 8e29d87 to dc2ecea Compare June 23, 2025 10:05
Copy link

cit-pr-commenter bot commented Jun 23, 2025

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 0 B 148.80 KiB 148.80 KiB N/A%
Rum Recorder 0 B 18.02 KiB 18.02 KiB N/A%
Rum Profiler 0 B 4.63 KiB 4.63 KiB N/A%
Logs 0 B 51.88 KiB 51.88 KiB N/A%
Flagging 0 B 935 B 935 B N/A%
Rum Slim 0 B 108.38 KiB 108.38 KiB N/A%
Worker 0 B 23.59 KiB 23.59 KiB N/A%
🚀 CPU Performance
Action Name Base Average Cpu Time (ms) Local Average Cpu Time (ms) 𝚫
addglobalcontext N/A 0.009 0.009
addaction N/A 0.034 0.034
addtiming N/A 0.009 0.009
adderror N/A 0.030 0.030
startstopsessionreplayrecording N/A 0.001 0.001
startview N/A 0.009 0.009
logmessage N/A 0.028 0.028
🧠 Memory Performance
Action Name Base Consumption Memory (bytes) Local Consumption Memory (bytes) 𝚫 (bytes)
addglobalcontext NaN KiB 33.21 KiB NaN KiB
addaction NaN KiB 59.18 KiB NaN KiB
addtiming NaN KiB 33.22 KiB NaN KiB
adderror NaN KiB 63.18 KiB NaN KiB
startstopsessionreplayrecording NaN KiB 30.62 KiB NaN KiB
startview NaN KiB 434.50 KiB NaN KiB
logmessage NaN KiB 64.98 KiB NaN KiB

🔗 RealWorld

@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from 9654e39 to 489bf17 Compare June 23, 2025 13:06
@cy-moi cy-moi changed the title ✨ [RUM-10415] Add support for privacy plugin extracted data for masking ✨ [RUM-10415] Add support for action name allowlist masking Jun 23, 2025
@cy-moi cy-moi marked this pull request as ready for review June 23, 2025 14:26
@cy-moi cy-moi requested a review from a team as a code owner June 23, 2025 14:26
lifeCycle.notify(LifeCycleEventType.RAW_RUM_EVENT_COLLECTED, processAction(action))
)
const actionNameDictionary = createActionAllowList()
addAllowlistObserver(actionNameDictionary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could createActionAllowList() register the observer? It feels like this code shouldn't need to know about that. (Maybe createActionAllowList() is really startActionAllowListObserver() or something?)

Going a bit further: right now, there is no code that removes the allowlist observer when startActionCollection()'s stop() is called. So we create a new dictionary and add another observer whenever startActionCollection() is called, but nothing ever removes them, and they keep building up. I don't think we want that.

We should pick one of two approaches:

  • Use a single global allowlist observer and dictionary, and never replace them or clear them. (Reasonable, since $DD_ALLOW is also global.)
  • Register the allowlist observer and set up the dictionary when action collection starts; unregister the allowlist observer and clear the underlying dictionary when action collection stops.

The first avoids reprocessing the raw strings when recording restarts, so it has a performance benefit, but naturally you'll have to add some additional affordances for testing with that approach.

Copy link
Collaborator Author

@cy-moi cy-moi Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
With the first approach, do we need end-to-end tests or just keeping the contexts in unit tests would be enough? I'm ok to proceed with the second approach for now but clearing the dictionary and re-process could get expensive. We might need some field data on this.
Fixed with the 2nd approach.


let masked = false
return {
name: name.replace(getMatchRegex(), (word: string) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question: IIUC, when multiple tokens are masked, we might end up with a name like "MASKED MASKED MASKED", which could be confusing from a UI perspective. Especially, given that the other masking strategy displays "Masked Element", shouldn’t we aim for consistency? Do we have product input on this?

Copy link
Collaborator Author

@cy-moi cy-moi Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, although Masked Element is not idea for tokenized the names either, maybe we should seek another way, ie XXX XXX in session replay. Will ping product on this.

@@ -14,8 +14,9 @@ export const enum ActionNameSource {
TEXT_CONTENT = 'text_content',
STANDARD_ATTRIBUTE = 'standard_attribute',
BLANK = 'blank',
MASK_DISALLOWED = 'mask_disallowed',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 suggestion: ‏I find this source name confusing. As mentioned in my other comment, I think we should ensure consistent behavior across our masking strategies. That way, we can keep using MASK_PLACEHOLDER as the source. If we need to identify which masking strategy was applied, we could include it as a separate event attribute. Wdyt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes we would like to distinguish the two strategies. We did add mask_disallowed as a name source in rum-event-format as a separate attribute. But I'm open to change it to another name.

@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch 3 times, most recently from 1de7145 to ce564c5 Compare June 27, 2025 13:30
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch 2 times, most recently from 617489d to 129b1aa Compare June 27, 2025 14:55
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from 129b1aa to 549159f Compare June 27, 2025 14:55
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from 16affe6 to 6f20b2b Compare June 27, 2025 15:49
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from d69e70c to 6e5063b Compare June 30, 2025 12:45
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from 7c8f655 to 5105112 Compare July 1, 2025 14:59
@cy-moi
Copy link
Collaborator Author

cy-moi commented Jul 7, 2025

/to-staging

@dd-devflow
Copy link
Contributor

dd-devflow bot commented Jul 7, 2025

View all feedbacks in Devflow UI.

2025-07-07 12:05:26 UTC ℹ️ Start processing command /to-staging


2025-07-07 12:05:35 UTC ℹ️ Branch Integration: starting soon, merge expected in approximately 0s (p90)

Commit 5105112a2d will soon be integrated into staging-28.


2025-07-07 12:21:28 UTC ℹ️ Branch Integration: This commit was successfully integrated

Commit 5105112a2d has been merged into staging-28 in merge commit e142f86173.

Check out the triggered pipeline on Gitlab 🦊

If you need to revert this integration, you can use the following command: /code revert-integration -b staging-28

dd-mergequeue bot added a commit that referenced this pull request Jul 7, 2025
…o staging-28

Integrated commit sha: 5105112

Co-authored-by: cy-moi <[email protected]>
@dd-devflow dd-devflow bot added the staging-28 label Jul 7, 2025
const dictionary: AllowedDictionary = {
rawStringCounter: 0,
allowlist: new Set<string>(),
rawStringIterator: window.$DD_ALLOW?.values(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you may as well just let processRawAllowList() handle initializing this, and set it to undefined here. That way, it always gets initialized using the same mechanism.

}
}

const { splitRegex } = cachedRegexes!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps, to avoid the need for a non-null assertion here, you could replace initializeUnicodeSupport() with a function that returns the regular expressions if Unicode regular expressions are supported, and returns undefined otherwise? So you'd have e.g.:

const regexes = getOrInitRegexes();
if (!regexes) {
  // No Unicode regular expression support; fall back.
  return
}

// Unicode regular expressions were supported; use them!
const maskedName = name.replace(regexes.split, // ...

@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from aea0f2c to da92015 Compare July 9, 2025 11:36
@cy-moi cy-moi force-pushed the congyao/RUM-10415-add-privacy-allowlist-support branch from ca8df7f to 045d430 Compare July 9, 2025 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants