Skip to content

Fix UserAgent ANR - Take 2 #14431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 20, 2025
Merged

Conversation

hichamboushaba
Copy link
Member

@hichamboushaba hichamboushaba commented Aug 5, 2025

Closes WOOMOB-968

Description

This PR is a second attempt to try to fix the UserAgent ANR, as a reminder, this is an ANR that occurs when calling WebSettings.getDefaultUserAgent, we call this on the UserAgent#init, which happens on app launch, and sometimes it results in a background ANR. This is a known issue for Google, as the call is heavy, and they suggest making it on a background thread to avoid blocking the main thread, something that we tried on the first attempt, but it resulted in another WebView crash (peaMlT-Tk-p2), and we had to revert our fix.

My unproved theory for the crash is that the usage of the background thread in the UserAgent#init increased the chances for the stuck process situation that's explained here, and thus leading the AwDataDirLock crashes.

Now, we need to take a different approach for the fix, and I'll list the options we have to discuss and pick the better one:

Option 1: Use two UserAgent variants, one for API requests and one for WebView

My understanding is that for the API requests, the most important part of the UserAgent is just the app name and version, as the other parts of the UserAgent are more important when viewing HTML content where the web server might need to adapt the content depending on the WebView capabilities.
So based on the above, the idea here is to use two UserAgent variants:

  • One for the API requests, it will use the VM property http.agent, this is the default UserAgent of the device before adding the WebView parts, and it's the default value used by HttpUrlConnection
  • For the WebView, we'll keep using WebSettings.getDefaultUserAgent as then it will be called on foreground when the WebView is being initialized, and it will generally be fine.

For comparison, with this change, and with an emulator running Android 15, we'll use the following values:

  • apiUserAgent='Dalvik/2.1.0 (Linux; U; Android 15; sdk_gphone64_arm64 Build/AE3A.240806.043) wc-android/22.9-rc-2'
  • webViewUserAgent='Mozilla/5.0 (Linux; Android 15; sdk_gphone64_arm64 Build/AE3A.240806.043; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/124.0.6367.219 Mobile Safari/537.36 wc-android/22.9-rc-2'

Option 2: Use SharedPreferences for caching the UserAgent.

(This was the initial approach I used in this PR; check it in commit c74f359. After further research on AwDataDirLock and considering the shared theory above, I believe this could cause the same crashes. Sharing for discussion only)
In this approach, SharedPreferences will serve as a cache for the value. The plan is to load the initial value from SharedPreferences and then, after a set delay, update the cache (in case WebView has been updated).
The key factor in this fix is the delay before calling WebSettings.getDefaultUserAgent. When crashes related to AwDataDirLock occurred, we believe they mainly happened during app startup, as there were no encrypted logs available (edit: still confused about the lack of logs, but I'm not convinced it means necessarily app startup).

Option 3: Use SharedPreferences for caching the UserAgent 2

(This is a third option that's similar to Option 2, but which could be more robust, I didn't implement it just because Option 1 seemed simpler, I can implement it if we believe keeping the same UserAgent value for both API requests and the WebView is beneficial.)
In this option, we'll use the SharedPreferences as cache, but we'll make sure to call WebSettings.getDefaultUserAgent only when the app is going to foreground, when the app is going to foreground, there are less chances of keeping the process stuck, as the app will be given higher priority by the system. To achieve this, we can use ProcessLifecycleOwner and invoke the loading of the UserAgent when the app reaches the Started state.

@JorgeMucientes @malinajirka @wzieba pinging you as you have more context on this issue given the discussions on Linear, please share your thoughts on the suggested approaches.

Testing information

API requests

  1. Use a tool to inspect network requests (App Inspection from Android Studio or Flipper)
  2. Launch the app.
  3. Check some requests and confirm they have the expected UserAgent, in the format <http.agent> wc-android/<version>

WebView

  1. Open Blaze campaign creation screen.
  2. Enter all details and tap on confirm.
  3. Tap on the payment method button.
  4. Tap on Add a new payment method.
  5. Confirm the WebView loads as expected and that no nav bar is shown (I mean the Calypso nav bar)

The tests that have been performed

The above.

  • I have considered if this change warrants release notes and have added them to RELEASE-NOTES.txt if necessary. Use the "[Internal]" label for non-user-facing changes.

@hichamboushaba hichamboushaba added the type: crash The worst kind of bug. label Aug 5, 2025
@dangermattic
Copy link
Collaborator

dangermattic commented Aug 5, 2025

2 Warnings
⚠️ View files have been modified, but no screenshot or video is included in the pull request. Consider adding some for clarity.
⚠️ This PR is assigned to the milestone 23.1. This milestone is due in less than 2 days.
Please make sure to get it merged by then or assign it to a milestone with a later deadline.

Generated by 🚫 Danger

private const val APP_VERSION = "1.0"

@OptIn(ExperimentalCoroutinesApi::class)
@RunWith(RobolectricTestRunner::class)
class UserAgentTest {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests are broken now, I updated them when I implemented Option 1, but now they will fail, I will fix then when we agree on the approach.

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Aug 5, 2025

📲 You can test the changes from this Pull Request in WooCommerce-Wear Android by scanning the QR code below to install the corresponding build.
App Name WooCommerce-Wear Android
Platform⌚️ Wear OS
FlavorJalapeno
Build TypeDebug
Commit4ad7b33
Direct Downloadwoocommerce-wear-prototype-build-pr14431-4ad7b33.apk

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Aug 5, 2025

📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.

App Name WooCommerce Android
Platform📱 Mobile
FlavorJalapeno
Build TypeDebug
Commit4ad7b33
Direct Downloadwoocommerce-prototype-build-pr14431-4ad7b33.apk

Copy link
Contributor

@wzieba wzieba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea here is to use two UserAgent variants:

This sounds good to me 👍

I can't say I fully understood the AwDataDirLock issue (I read the attached comment from the tracker, but still) or how moving to the background thread could increase this, but having two user agents sound to me like a completely valid approach to test.

@malinajirka
Copy link
Contributor

I forgot to reply yesterday 🤦‍♂️.

Thanks for clearly summarizing your findings @hichamboushaba! I also think testing the two-agents approach is worth a shot.

@JorgeMucientes
Copy link
Contributor

Same as shared by Wojtek. I didn't really get the reasons why the last time moving the UserAgent initialization to the background led to the crashes. In any case, the 2 userAgents approach sounds like a good approach to test.

@hichamboushaba
Copy link
Member Author

hichamboushaba commented Aug 11, 2025

I can't say I fully understood the AwDataDirLock issue (I read the attached comment from the tracker, but still) or how moving to the background thread could increase this

Thank you all for the input, just regarding this, I'll try to explain further my theory here.
In our app, we use WorkManager to handle some background tasks, these background tasks use the NetworkType.CONNECTED constraint, so according to my theory, what could happen is that when the network is unstable, then this could happen:

  1. WorkManager starts the execution of a task, which will trigger the background thread for getting the UserAgent.
  2. Network disconnects quickly after, and WorkManager stops the Worker, then reschedule it for when Network connects again.
  3. For some reason, the process gets stuck (as discussed in the above issue)
  4. Network gets connected again, and the Worker is launched.
  5. Android starts a new process, and we launch a new background thread for getting the UserAgent.
  6. AwDataDirLock exception is thrown as we have now two processes accessing the same data dir.

This is just a theory, and I can't prove it, but it seems to match what we had, as all the crashes happened after a NETWORK_AVAILABLE event (as mentioned here peaMlT-Tk-p2#comment-2286).


The PR is now ready for review.

@hichamboushaba hichamboushaba force-pushed the issue/WOOMOB-968-fix-UserAgent-ANR branch from 4a3f4d6 to 87ac35a Compare August 11, 2025 16:28
@hichamboushaba hichamboushaba added this to the 23.1 milestone Aug 11, 2025
@hichamboushaba hichamboushaba marked this pull request as ready for review August 11, 2025 16:29
@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 0% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.92%. Comparing base (895a038) to head (4ad7b33).
⚠️ Report is 43 commits behind head on trunk.

Files with missing lines Patch % Lines
...a/org/wordpress/android/fluxc/network/UserAgent.kt 0.00% 14 Missing ⚠️
...erce/android/ui/compose/component/web/WCWebView.kt 0.00% 1 Missing ⚠️
...pplicationpasswords/ApplicationPasswordsNetwork.kt 0.00% 1 Missing ⚠️
...onpasswords/WPApiApplicationPasswordsRestClient.kt 0.00% 1 Missing ⚠️
...pcom/jetpackai/JetpackAITranscriptionRestClient.kt 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              trunk   #14431      +/-   ##
============================================
- Coverage     37.92%   37.92%   -0.01%     
  Complexity     9329     9329              
============================================
  Files          2015     2015              
  Lines        113218   113222       +4     
  Branches      14984    14985       +1     
============================================
  Hits          42942    42942              
- Misses        66360    66364       +4     
  Partials       3916     3916              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JorgeMucientes JorgeMucientes self-assigned this Aug 12, 2025
}

override fun toString(): String = userAgent
override fun toString(): String = apiUserAgent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to keep this? Its unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's really unused? I left it because I'm not entirely sure it's not used somewhere, AS find usages doesn't work well here, because it's an overriden function.

If we can confirm it's unused, I also prefer to have a better toString implementation here, or to get rid of the implementation completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't 100% confirm but back when I look at this what I did was "Fin in Files" and did the following searches:

  • agent.toS
  • Agent.to
  • gent.to

And none of those searches showed any matches. So I presumed there were no usages of this implementation of toString. However this method doesn't confirm 100% there aren't any uses.

Copy link
Member Author

@hichamboushaba hichamboushaba Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that Intellij Idea "Find Usages" have an option dialog that allows disabling finding Base functions:
Screenshot 2025-08-20 at 09 46 29

And this confirmed there is no usages at all, and with this confirmation I removed it in 4ad7b33

Copy link
Contributor

@JorgeMucientes JorgeMucientes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @hichamboushaba, everything works as expected and code looks good. I just left a minor suggestion but nothing blocking.

@malinajirka malinajirka removed their request for review August 19, 2025 13:03
@hichamboushaba hichamboushaba force-pushed the issue/WOOMOB-968-fix-UserAgent-ANR branch from 8e062f7 to 72d56b7 Compare August 20, 2025 08:47
We'll save the user agent to SharedPreferences, and then load it from them on subsequent launches.
We'll keep the value up-to-date by lazy call to `WebSettings.getDefaultUserAgent` hoping this would avoid the race conditions leading to the `AwDataDirLock` crash.
We now have two userAgents, one used for API calls, and one for the WebView. The one used in API calls uses the `http.agent` property, to avoid ANRs caused by `WebSettings.getDefaultUserAgent`
@hichamboushaba hichamboushaba force-pushed the issue/WOOMOB-968-fix-UserAgent-ANR branch from 72d56b7 to 4ad7b33 Compare August 20, 2025 08:49
@hichamboushaba hichamboushaba merged commit b87e087 into trunk Aug 20, 2025
17 checks passed
@hichamboushaba hichamboushaba deleted the issue/WOOMOB-968-fix-UserAgent-ANR branch August 20, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: crash The worst kind of bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants