Skip to content

HTTP 302 Found responses with Cache Headers not being cached #13396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
mklein0 opened this issue May 16, 2025 · 3 comments
Open
1 task done

HTTP 302 Found responses with Cache Headers not being cached #13396

mklein0 opened this issue May 16, 2025 · 3 comments
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@mklein0
Copy link

mklein0 commented May 16, 2025

Description

The PIP HTTP cache for webservers using 302 responses for signed URLs response to package requests are failing to use the HTTP cache due to the 302 response code.

In particular this is an implementation issue with JFrog Artifactory PYPI proxy/cache which uses 302 signed URLs to point to cloud storage for requested python packages and wheels. In particular it appears JFrog uses the 302 redirect for packages larger than 512K.

Expected behavior

Given cache headers in the 302 response I would expect pip to cache the response according to the policy returned in the response. This is particularly convienient for CI processes which download large packages like PYTorch regularly.

pip version

25.1.1

Python version

3.11

OS

linux

How to Reproduce

  1. Setup a JFrog PYPI Artifactory instance. Alternatively, setup a pypi / pydev server which uses 302 redirects for all direct package requests.
  2. Create a virtual environment
  3. pip install ipython==7.32.0 (This has some transitive dependencies, but ignore that)
  4. pip uninstall -y ipython
  5. pip install ipython==7.32.0 (Ideally this would hit the HTTP Request cache)

If you want to test with a smaller package to see the cache being hit for JFROG Artifactory, can use ptyprocess==0.7.0

A related issue has been filed with the cachecontrol library: psf/cachecontrol#384

Output

This is the section of the logs which are most relvant to the issue

4:20,451   No cache entry available
  https://COMPANY.jfrog.io:443 "GET /COMPANY/api/pypi/python-dependencies/wheel/ipython-7.32.0-py3-none-any.whl HTTP/1.1" 302 0
2025-05-15T11:54:20,570   https://COMPANY.jfrog.io:443 "GET /COMPANY/api/pypi/python-dependencies/wheel/ipython-7.32.0-py3-none-any.whl HTTP/1.1" 302 0
  Status code 302 not in (200, 203, 300, 301, 308)
2025-05-15T11:54:20,571   Status code 302 not in (200, 203, 300, 301, 308)
  Looking up "https://storage.googleapis.com/art-COMPANY-filestore-ORGID/filestore/7e/7e977982984619f92ed9314d5375afe22e9fbda4?X-Artifactory-artifactPath=wheel%2Fipython-7.32.0-py3-none-any.whl&X-Artifactory-originPackageType=pypi&X-Artifactory-originProjectKey=default&X-Artifactory-originRepoType=local&X-Artifactory-originRepositoryKey=pypi-dependencies&X-Artifactory-packageType=pypi&X-Artifactory-projectKey=default&X-Artifactory-repoType=local&X-Artifactory-repositoryKey=pypi-dependencies&..." in the cache
2025-05-15T11:54:20,573   Looking up "https://storage.googleapis.com/art-COMPANY-filestore-ORGID/filestore/7e/7e977982984619f92ed9314d5375afe22e9fbda4?X-Artifactory-artifactPath=wheel%2Fipython-7.32.0-py3-none-any.whl&X-Artifactory-originPackageType=pypi&X-Artifactory-originProjectKey=default&X-Artifactory-originRepoType=local..." in the cache
  No cache entry available

The HTTP Response headers for the 302 Response were:

curl -D -  https://USERNAME:[email protected]/COMPANY/api/pypi/python-dependencies/wheel/ipython-7.32.0-py3-none-any.whl
HTTP/1.1 302 
Date: Thu, 15 May 2025 20:43:46 GMT
Content-Length: 0
Connection: keep-alive
X-JFrog-Version: Artifactory/7.113.0 81300900
X-Artifactory-Id: 98faa45ae10992ddb0024c56e402f11de7ed8e0e
X-Artifactory-Node-Id: COMPANY-artifactory-primary-0
Location: https://storage.googleapis.com/art-COMPANY-filestore-ORGID/filestore/7e/7e977982984619f92ed9314d5375afe22e9fbda4?X-Artifactory-artifactPath=wheel%2Fipython-7.32.0-py3-none-any.whl&...
Cache-Control: public, max-age=31536000
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Request-ID: ef93e4f961fc61edbd5d4752bb10c90c:ef93e4f961fc61edbd5d4752bb10c90c:ef93e4f961fc61edbd5d4752bb10c90c:0

Code of Conduct

@mklein0 mklein0 added type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels May 16, 2025
@gmargaritis
Copy link
Contributor

gmargaritis commented May 17, 2025

I understand the issue is about supporting caching of 302 redirects in general, but caching presigned URLs is unintuitive since they are short-lived by design. In this specific case, the max-age is set to 1 year, which is most likely a misconfiguration and would likely cause errors down the line. Given that, not caching the redirect happens to be the correct behavior.

@mklein0
Copy link
Author

mklein0 commented May 19, 2025

Is there a best practice I can point JFROG to? For the CDN/mirror deferal in their pip handshake?

I am caught in the middle. I have seen customers use a simpleindex to mask the 302 to pip. I could fork/patch pip for our use case in CI process, but why?

I will file/update a ticket with jfrog so they can advocate a position. Thank you.

FYI, the cache header for a package less than 512K also gives a max-age of the same duration. So I am not sure how to read into provider's Cache Header usage.

curl -D -  https://USERNAME:[email protected]/COMPANY/api/pypi/python-dependencies/packages/packages/4e/8c/f3147f5c4b73e7550fe5f9352eaa956ae838d5c51eb58e7a25b9f3e2643b/decorator-5.2.1-py3-none-any.whl
HTTP/1.1 200 
Date: Thu, 15 May 2025 20:52:42 GMT
Content-Type: application/octet-stream
Content-Length: 9190
Connection: keep-alive
X-JFrog-Version: Artifactory/7.113.0 81300900
X-Artifactory-Id: 98faa45ae10992ddb0024c56e402f11de7ed8e0e
X-Artifactory-Node-Id: COMPANY-artifactory-primary-0
Last-Modified: Mon, 24 Feb 2025 04:41:34 GMT
ETag: eac72a389dc19ccdf2822bc3e375f8f34f3d3fbd
X-Checksum-Sha1: eac72a389dc19ccdf2822bc3e375f8f34f3d3fbd
X-Checksum-Sha256: d316bb415a2d9e2d2b3abcc4084c6502fc09240e292cd76a76afc106a1c8e04a
X-Checksum-Md5: 8a7e69ac08bfea62a58e521308fba204
Accept-Ranges: bytes
X-Artifactory-Filename: decorator-5.2.1-py3-none-any.whl
Content-Disposition: attachment; filename="decorator-5.2.1-py3-none-any.whl"
Cache-Control: public, max-age=31536000
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Request-ID: bc226396c89954ad668ba7dbdc97418f:bc226396c89954ad668ba7dbdc97418f:bc226396c89954ad668ba7dbdc97418f:0

...

@gmargaritis
Copy link
Contributor

I think the original issue about caching 302s might be coming from the wrong angle. As mentioned in psf/cachecontrol#384, the real question is, why does JFrog want to cache these short-lived presigned URLs at all, and is there a genuine benefit in doing so?

I have seen customers use a simpleindex to mask the 302 to pip.

As far as I understand, the idea behind using simpleindex is to expose the presigned URLs directly (as 200 responses) instead of letting pip follow the 302. But that’s a bit misleading, since you’re essentially tricking pip into caching a short-lived presigned URL as if it were long-lived. If the response includes a long max-age, pip will happily cache it, and 403s are bound to happen.

FYI, the cache header for a package less than 512K also gives a max-age of the same duration. So I am not sure how to read into provider’s Cache Header usage.

The Location header is not included in the following excerpt, so we don't know if they are also using presigned URLs for <512k packages.

All in all, caching 302s that come with appropriate cache headers might technically be okay, but I'm not sure what we'd be trying to solve. I guess that if there isn't a valid use case for pip to support, this might be better addressed on the JFrog side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants