-
Notifications
You must be signed in to change notification settings - Fork 22
introduce ZyteAPITextResponse and ZyteAPIResponse to store raw Zyte Data API Response #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
9a83471
create ZyteAPITextResponse and ZyteAPIResponse
BurnzZ 8909473
update README and CHANGES with notes on new response classes
BurnzZ d0dc08d
set the encoding consistently to be 'utf-8'
BurnzZ 109dbf0
improve example and docs
BurnzZ 9695880
override replace() to prevent 'zyte_api_response' attribute from bein…
BurnzZ 8812a05
fix mypy failures
BurnzZ ba64103
enforce 'utf-8' encoding on Text responses
BurnzZ 84dac7d
update expectation for replacing zyte_api_response attribute
BurnzZ 5b83443
update README regarding default params
BurnzZ fb0b412
remove 'Content-Encoding' header when returning responses
BurnzZ 10a4603
remove the ZYTE_API_ENABLED setting
BurnzZ b7102fa
remove zyte_api_default_params in the spider
BurnzZ 2b4a0fb
refactor TestAPI to have single producer of requests and responses
BurnzZ 97ea1e4
implement ZYTE_API_DEFAULT_PARAMS in the settings
BurnzZ 5dd1bec
fix failing tests
BurnzZ 052d0d6
Merge pull request #14 from scrapy-plugins/fix-decompression-error
kmike 48a4766
rename zyte_api_response into zyte_api
BurnzZ 2455bdf
Merge pull request #13 from scrapy-plugins/default-settings
BurnzZ 910085b
add tests for css/xpath selectors
BurnzZ e3214d8
enable css/xpath selectors on httpResponseBody
BurnzZ e530053
handle empty 'browserHtml' or 'httpResponseBody'
BurnzZ 27c7a7d
Fix typos in docs
BurnzZ 5b7cf6f
update how replace() works
BurnzZ 2adc8a6
update README in line with the ZYTE_API_DEFAULT_PARAMS expectations
BurnzZ 32faf3d
add test case to ensure zyte_api is intact when replacing other attribs
BurnzZ cec0677
make process_response() private
BurnzZ e0865e7
update tests to ensure other response attribs are not updated on .rep…
BurnzZ 34a427f
raise an error if zyte_api is passed to .replace()
BurnzZ 37a4cc7
rename '.zyte_api' attribute as '.raw_api_response'
BurnzZ f5a9bb0
refactor to accept 'True' and '{}' to trigger Zyte API Requests
BurnzZ File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
from base64 import b64decode | ||
from typing import Dict, List, Optional, Tuple, Union | ||
|
||
from scrapy import Request | ||
from scrapy.http import Response, TextResponse | ||
from scrapy.responsetypes import responsetypes | ||
|
||
_DEFAULT_ENCODING = "utf-8" | ||
|
||
|
||
class ZyteAPIMixin: | ||
|
||
REMOVE_HEADERS = { | ||
# Zyte API already decompresses the HTTP Response Body. Scrapy's | ||
# HttpCompressionMiddleware will error out when it attempts to | ||
# decompress an already decompressed body based on this header. | ||
"content-encoding" | ||
} | ||
|
||
def __init__(self, *args, raw_api_response: Dict = None, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
self._raw_api_response = raw_api_response | ||
|
||
def replace(self, *args, **kwargs): | ||
if kwargs.get("raw_api_response"): | ||
raise ValueError("Replacing the value of 'raw_api_response' isn't allowed.") | ||
return super().replace(*args, **kwargs) | ||
|
||
@property | ||
def raw_api_response(self) -> Optional[Dict]: | ||
"""Contains the raw API response from Zyte API. | ||
|
||
To see the full list of parameters and their description, kindly refer to the | ||
`Zyte API Specification <https://docs.zyte.com/zyte-api/openapi.html#zyte-openapi-spec>`_. | ||
""" | ||
return self._raw_api_response | ||
|
||
@classmethod | ||
def _prepare_headers(cls, init_headers: Optional[List[Dict[str, str]]]): | ||
if not init_headers: | ||
return None | ||
return { | ||
h["name"]: h["value"] | ||
for h in init_headers | ||
if h["name"].lower() not in cls.REMOVE_HEADERS | ||
} | ||
|
||
|
||
class ZyteAPITextResponse(ZyteAPIMixin, TextResponse): | ||
|
||
attributes: Tuple[str, ...] = TextResponse.attributes + ("raw_api_response",) | ||
|
||
@classmethod | ||
def from_api_response(cls, api_response: Dict, *, request: Request = None): | ||
"""Alternative constructor to instantiate the response from the raw | ||
Zyte API response. | ||
""" | ||
body = None | ||
encoding = None | ||
|
||
if api_response.get("browserHtml"): | ||
encoding = _DEFAULT_ENCODING # Zyte API has "utf-8" by default | ||
body = api_response["browserHtml"].encode(encoding) | ||
elif api_response.get("httpResponseBody"): | ||
body = b64decode(api_response["httpResponseBody"]) | ||
|
||
return cls( | ||
url=api_response["url"], | ||
status=200, | ||
Gallaecio marked this conversation as resolved.
Show resolved
Hide resolved
|
||
body=body, | ||
encoding=encoding, | ||
request=request, | ||
flags=["zyte-api"], | ||
headers=cls._prepare_headers(api_response.get("httpResponseHeaders")), | ||
raw_api_response=api_response, | ||
) | ||
|
||
|
||
class ZyteAPIResponse(ZyteAPIMixin, Response): | ||
|
||
attributes: Tuple[str, ...] = Response.attributes + ("raw_api_response",) | ||
|
||
@classmethod | ||
def from_api_response(cls, api_response: Dict, *, request: Request = None): | ||
"""Alternative constructor to instantiate the response from the raw | ||
Zyte API response. | ||
""" | ||
return cls( | ||
url=api_response["url"], | ||
status=200, | ||
body=b64decode(api_response.get("httpResponseBody") or ""), | ||
request=request, | ||
flags=["zyte-api"], | ||
headers=cls._prepare_headers(api_response.get("httpResponseHeaders")), | ||
raw_api_response=api_response, | ||
) | ||
|
||
|
||
def _process_response( | ||
api_response: Dict[str, Union[List[Dict], str]], request: Request | ||
) -> Optional[Union[ZyteAPITextResponse, ZyteAPIResponse]]: | ||
"""Given a Zyte API Response and the ``scrapy.Request`` that asked for it, | ||
this returns either a ``ZyteAPITextResponse`` or ``ZyteAPIResponse`` depending | ||
on which if it can properly decode the HTTP Body or have access to browserHtml. | ||
""" | ||
|
||
# NOTES: Currently, Zyte API does NOT only allow both 'browserHtml' and | ||
# 'httpResponseBody' to be present at the same time. The support for both | ||
# will be addressed in the future. Reference: | ||
# - https://github.com/scrapy-plugins/scrapy-zyte-api/pull/10#issuecomment-1131406460 | ||
# For now, at least one of them should be present. | ||
|
||
if api_response.get("browserHtml"): | ||
# Using TextResponse because browserHtml always returns a browser-rendered page | ||
# even when requesting files (like images) | ||
return ZyteAPITextResponse.from_api_response(api_response, request=request) | ||
|
||
if api_response.get("httpResponseHeaders") and api_response.get("httpResponseBody"): | ||
response_cls = responsetypes.from_args( | ||
headers=api_response["httpResponseHeaders"], | ||
url=api_response["url"], | ||
# FIXME: update this when python-zyte-api supports base64 decoding | ||
body=b64decode(api_response["httpResponseBody"]), # type: ignore | ||
) | ||
if issubclass(response_cls, TextResponse): | ||
return ZyteAPITextResponse.from_api_response(api_response, request=request) | ||
|
||
return ZyteAPIResponse.from_api_response(api_response, request=request) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.