-
Notifications
You must be signed in to change notification settings - Fork 9
feat: unable to import top pages... missed opportunities/suggestions #1121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but there are a few things we could potentially improve on the original code I had.
This PR will trigger no release when merged. |
@@ -284,6 +286,19 @@ function OnboardCommand(context) { | |||
|
|||
log.info(`Enabled the following imports for ${siteID}: ${reportLine.imports}`); | |||
|
|||
// Build the normalized URL for the site from the base URL | |||
const normalizedUrl = await getNormalizedUrl(baseURL, log); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils shouldn't use log in general, but throw errors or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially added logs to see how the url is resolved for different scenarios. Unfortunately throwing error is not an option in some cases as we want to continue further processing. Agree that it is not ideal to pass log in util method. Let me remove it for now as we really do not need to know internals of how the url is resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately throwing error is not an option in some cases as we want to continue further processing
The caller should handle the exception and move forward, util should return the best possible response as per its contract.
const normalizedUrl = await getNormalizedUrl(baseURL, log); | ||
|
||
// Extract domain from URL if it has paths, query parameters, or hash fragments | ||
const finalUrl = extractDomainFromUrl(normalizedUrl, log); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll only have to update fetch config, when the normalizedUrl has any paths after the domain name (i.e, it redirected to a different path, so we'll run into issue when we import), otherwise, no update config is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If normaizedUrl is same as finalUrl, do we want to use normalizedUrl as baseURL for the rest of the processing?
* @param {string} method - HTTP method to use ('HEAD' or 'GET'). | ||
* @returns {Promise<string>} A Promise that resolves to the normalized URL. | ||
*/ | ||
export async function getNormalizedUrl(urlString, log, method = 'HEAD') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use getCanonicalUrl
/getResolvedUrl
for this? @ramboz any thoughts?
I think its better to put this in https://github.com/adobe/spacecat-shared/blob/main/packages/spacecat-shared-utils/src/url-helpers.js, so that we can use in all the places rather than restricting to api-service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷🏼♂️ whatever the team feels more comfortable with. I really don't mind as long as the JSDoc is clear on what it does, really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be below are some good naming options as per cursor based on the functionality. Perhaps we can name it as resolveCanonicalUrl?
resolveAndNormalizeUrl
normalizeUrlWithRedirects
resolveUrlWithFallback
fetchCanonicalUrlWithFallback
resolveCanonicalUrl
Agree on moving to shared-utils. Suggest doing it once this method is solidified to use across the board.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer getCanonicalUrl
but fine withresolveCanonicalUrl
as well
* @param {string} method - HTTP method to use ('HEAD' or 'GET'). | ||
* @returns {Promise<string>} A Promise that resolves to the normalized URL. | ||
*/ | ||
export async function getNormalizedUrl(urlString, log, method = 'HEAD') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷🏼♂️ whatever the team feels more comfortable with. I really don't mind as long as the JSDoc is clear on what it does, really.
if (urlString !== resp.url) { | ||
log.info(`Redirected to ${resp.url}`); | ||
return getNormalizedUrl(resp.url, log, method); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would double-check if resp.url
is the same as resp.headers.get('Location')
… I can't remember if I consciously separated them initially for a specific reason, or if it's just an oversight and I forgot to align the property access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like they are not same.
resp.url seems to be what we need in both HEAD and GET method scenarios?
If you make a request to http://example.com and it returns:
Status: 301 Moved Permanently
Location: https://example.com
Then:
resp.url would be https://example.com (the final URL after following the redirect)
resp.headers.get('Location') would be https://example.com (the redirect target)
But if you make a request to https://example.com and it returns:
Status: 200 OK
No Location header
Then:
resp.url would be https://example.com (the final URL)
resp.headers.get('Location') would be null (no redirect)
Please ensure your pull request adheres to the following guidelines:
describe here the problem you're solving.
If the PR is changing the API specification:
yet. Ideally, return a 501 status code with a message explaining the feature is not implemented yet.
If the PR is changing the API implementation or an entity exposed through the API:
If the PR is introducing a new audit type:
Related Issues
Thanks for contributing!
https://jira.corp.adobe.com/browse/SITES-34081