feat: unable to import top pages... missed opportunities/suggestions #1121

tkotthakota-adobe · 2025-08-06T15:34:37Z

Please ensure your pull request adheres to the following guidelines:

make sure to link the related issues in this description. Or if there's no issue created, make sure you
describe here the problem you're solving.
when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

If the PR is changing the API specification:

make sure you add a "Not implemented yet" note the endpoint description, if the implementation is not ready
yet. Ideally, return a 501 status code with a message explaining the feature is not implemented yet.
make sure you add at least one example of the request and response.

If the PR is changing the API implementation or an entity exposed through the API:

make sure you update the API specification and the examples to reflect the changes.

If the PR is introducing a new audit type:

make sure you update the API specification with the type, schema of the audit result and an example

Related Issues

Thanks for contributing!

https://jira.corp.adobe.com/browse/SITES-34081

…ggestions

ramboz

Looks good overall, but there are a few things we could potentially improve on the original code I had.

src/support/utils.js

github-actions · 2025-08-06T16:53:36Z

This PR will trigger no release when merged.

rpapani · 2025-08-06T23:38:43Z

src/support/slack/commands/onboard.js

@@ -284,6 +286,19 @@ function OnboardCommand(context) {

      log.info(`Enabled the following imports for ${siteID}: ${reportLine.imports}`);

+      // Build the normalized URL for the site from the base URL
+      const normalizedUrl = await getNormalizedUrl(baseURL, log);


utils shouldn't use log in general, but throw errors or so

Initially added logs to see how the url is resolved for different scenarios. Unfortunately throwing error is not an option in some cases as we want to continue further processing. Agree that it is not ideal to pass log in util method. Let me remove it for now as we really do not need to know internals of how the url is resolved.

Unfortunately throwing error is not an option in some cases as we want to continue further processing

The caller should handle the exception and move forward, util should return the best possible response as per its contract.

rpapani · 2025-08-06T23:43:21Z

src/support/slack/commands/onboard.js

+      const normalizedUrl = await getNormalizedUrl(baseURL, log);
+
+      // Extract domain from URL if it has paths, query parameters, or hash fragments
+      const finalUrl = extractDomainFromUrl(normalizedUrl, log);


we'll only have to update fetch config, when the normalizedUrl has any paths after the domain name (i.e, it redirected to a different path, so we'll run into issue when we import), otherwise, no update config is required.

If normaizedUrl is same as finalUrl, do we want to use normalizedUrl as baseURL for the rest of the processing?

rpapani · 2025-08-06T23:59:36Z

src/support/utils.js

+ * @param {string} method - HTTP method to use ('HEAD' or 'GET').
+ * @returns {Promise<string>} A Promise that resolves to the normalized URL.
+ */
+export async function getNormalizedUrl(urlString, log, method = 'HEAD') {


can we use getCanonicalUrl/getResolvedUrl for this? @ramboz any thoughts?
I think its better to put this in https://github.com/adobe/spacecat-shared/blob/main/packages/spacecat-shared-utils/src/url-helpers.js, so that we can use in all the places rather than restricting to api-service

🤷🏼‍♂️ whatever the team feels more comfortable with. I really don't mind as long as the JSDoc is clear on what it does, really.

Seems to be below are some good naming options as per cursor based on the functionality. Perhaps we can name it as resolveCanonicalUrl?

resolveAndNormalizeUrl normalizeUrlWithRedirects resolveUrlWithFallback fetchCanonicalUrlWithFallback resolveCanonicalUrl

Agree on moving to shared-utils. Suggest doing it once this method is solidified to use across the board.

I'd prefer getCanonicalUrl but fine withresolveCanonicalUrl as well

src/support/utils.js

ramboz · 2025-08-07T15:03:14Z

src/support/utils.js

+ * @param {string} method - HTTP method to use ('HEAD' or 'GET').
+ * @returns {Promise<string>} A Promise that resolves to the normalized URL.
+ */
+export async function getNormalizedUrl(urlString, log, method = 'HEAD') {


🤷🏼‍♂️ whatever the team feels more comfortable with. I really don't mind as long as the JSDoc is clear on what it does, really.

ramboz · 2025-08-07T15:05:13Z

src/support/utils.js

+    if (urlString !== resp.url) {
+      log.info(`Redirected to ${resp.url}`);
+      return getNormalizedUrl(resp.url, log, method);
+    }


I would double-check if resp.url is the same as resp.headers.get('Location')… I can't remember if I consciously separated them initially for a specific reason, or if it's just an oversight and I forgot to align the property access.

Looks like they are not same.
resp.url seems to be what we need in both HEAD and GET method scenarios?

If you make a request to http://example.com and it returns: Status: 301 Moved Permanently Location: https://example.com Then: resp.url would be https://example.com (the final URL after following the redirect) resp.headers.get('Location') would be https://example.com (the redirect target) But if you make a request to https://example.com and it returns: Status: 200 OK No Location header Then: resp.url would be https://example.com (the final URL) resp.headers.get('Location') would be null (no redirect)

fix on unable to import top pages leading missed opportunities and su…

d188867

…ggestions

tkotthakota-adobe requested review from solaris007, rpapani, iuliag and ramboz August 6, 2025 15:34

ramboz reviewed Aug 6, 2025

View reviewed changes

src/support/utils.js Show resolved Hide resolved

simplify normalized url logic

8209b9c

tkotthakota-adobe requested a review from ramboz August 6, 2025 16:54

fetch config update fix

70f3487

rpapani reviewed Aug 6, 2025

View reviewed changes

rpapani reviewed Aug 7, 2025

View reviewed changes

src/support/utils.js Show resolved Hide resolved

ramboz reviewed Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: unable to import top pages... missed opportunities/suggestions #1121

feat: unable to import top pages... missed opportunities/suggestions #1121

Uh oh!

tkotthakota-adobe commented Aug 6, 2025

Uh oh!

ramboz left a comment

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

rpapani Aug 6, 2025

Uh oh!

tkotthakota-adobe Aug 7, 2025 •

edited

Loading

Uh oh!

rpapani Aug 7, 2025

Uh oh!

rpapani Aug 6, 2025

Uh oh!

tkotthakota-adobe Aug 7, 2025

Uh oh!

rpapani Aug 6, 2025

Uh oh!

ramboz Aug 7, 2025

Uh oh!

tkotthakota-adobe Aug 7, 2025 •

edited

Loading

Uh oh!

rpapani Aug 7, 2025

Uh oh!

Uh oh!

ramboz Aug 7, 2025

Uh oh!

ramboz Aug 7, 2025

Uh oh!

tkotthakota-adobe Aug 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: unable to import top pages... missed opportunities/suggestions #1121

Are you sure you want to change the base?

feat: unable to import top pages... missed opportunities/suggestions #1121

Uh oh!

Conversation

tkotthakota-adobe commented Aug 6, 2025

Related Issues

Uh oh!

ramboz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkotthakota-adobe Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkotthakota-adobe Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkotthakota-adobe Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tkotthakota-adobe Aug 7, 2025 •

edited

Loading

tkotthakota-adobe Aug 7, 2025 •

edited

Loading

tkotthakota-adobe Aug 7, 2025 •

edited

Loading