-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Make pip cache purge
and pip cache remove
delete additional unneeded files.
#9058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Heads up: The MacOS workers on Azure Pipelines are failing right now -- #9030. |
Thanks for the heads up. I also just discovered that |
You could wait for a couple of weeks, and we'll drop Python 2 support from |
Making it work without using Do y'all support Python 3 versions before 3.5? If you don't then, screw it, I'll just wait for you to drop Python 2 support lmao |
Nope. Python 3.5 and Python 2 get dropped after 20.3. :) |
Okay, in that case let's leave this for after Py2 is dropped. 👍 |
b30e590
to
226d1c2
Compare
Rebased off of master (ab7ff0a). (It seems Py2 is still in the list of tests; just wanted to keep the PR from getting stale.) |
226d1c2
to
94d1069
Compare
Rebased off master (d108e49). I'm not sure what the reason for the failing CI tasks is. I do see this error in the failed tasks, however:
|
As a note, anything requiring more than rebasing this PR will probably have to wait a month or two at least. Pretty busy with trying to get a house and such. 😅 |
This seems to be the specific error:
|
94d1069
to
cc0a462
Compare
ah, thank you @uranusjr. Had trouble finding that in the CI output. 😅 I fixed that problem (or at least, that's what
Not sure what's up with that. |
@duckinator the flake8 error seems to be still present. |
cc0a462
to
784d293
Compare
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Hiya @duckinator -- would you be interested in updating this PR? :) |
@pradyunsg I'll rebase this sometime in the next few days. If it's not done by Wednesday (September 22nd) feel free to ping me here again. 👍 |
Running a bit behind on things this week, but still hoping to get to it in the next few days. Sorry for the delay! |
No worries and thanks for the update! Please don’t feel pressured to update this ASAP. :) I’m gonna put this into the release milestone to remind myself to keep an eye out for updates here — it’s totally fine if this gets pushed down the road. |
784d293
to
1cbcc61
Compare
c75cae3
to
11db9a5
Compare
Rebased, and simplified(!) because some of the changes in the last ~year let me simplify my code. ^.^ |
pip cache purge
and pip cache remove
handle directories without .whl
filespip cache purge
and pip cache remove
handle directories that only have non-.whl
files
pip cache purge
and pip cache remove
handle directories that only have non-.whl
filespip cache purge
and pip cache remove
remove directories that only have non-.whl
files
8284080
to
50beb9b
Compare
c229662
to
a807060
Compare
Things sure did happen over the last 5 years. Thank you all so much for being patient with this and making sure I was kept in the loop on changing requirements! @pradyunsg this is finally ready for review, after nearly half a freaking decade. With these changes,
In addition, this PR:
|
pip cache purge
and pip cache remove
remove directories that only have non-.whl
filespip cache purge
and pip cache remove
delete additional unneeded files.
This comment was marked as outdated.
This comment was marked as outdated.
1816296
to
cd26c8d
Compare
EDIT: My concept for how to fix it worked, I just had the initial implementation wrong. I figured out a fix, but I'm not sure how to test it. In practice, both I fixed it by making the final returned value be a reverse-sorted set instead of a reverse-sorted list. This means that parent directories are only removed once, and they get removed after everything they contain. This is an example directory structure which reproduces the problem:
|
31e39ed
to
5eeed83
Compare
@duckinator glad to see that you aren't (too?) frustrated that we left this PR out to dry so long it probably fossilized. Just as a FYI, there is (surprise!) still limited review capacity so it may take some time for us to review your PR. However, I do want to see your work be merged at some point, so when I come back from my OSS "break", I'll make sure to review your PR! |
@ichard26 honestly, I was worried I was annoying y'all because I kept disappearing for months or years at a time then coming back and doing a barrage of work only to disappear again. 😅 Looking forward to getting this wrapped up whenever you get to it. Enjoy your break. 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally got around to doing a review. The idea looks good, but I think the implementation can be simplified. Sorry for making you do more work :)
Also, where are the unit tests? The discussion history seems to suggest that they were written at some point. Did they forgotten to be committed?
Thanks a lot for your persistence!
return format_size(directory_size(path)) | ||
|
||
|
||
def subdirs_without_files(path: str) -> Generator[Path]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to break it to you, but this seems to be buggy still. The root of the problem is that when it discards a subtree because a file is present, we need to discard the entire chain of parent directories. This logic seems to instead discard (skip) the inner subdirectories of the non-empty directory.
Imagine we have this structure:
root
├── d
│ ├── d
│ │ └── 3
│ │ └── uh.txt
│ └── f
│ └── d
When the code reaches root/f/d/d
and discovers it's empty, it will return all of its parents, but we should only delete the root/d/f/d
and root/d/f
parents. root
and root/d
still contain a file (at root/d/d/3/uh.txt
).
I spent some working on an alternative implementation. I haven't thoroughly tested this, but it seems to be more robust:
def subdirs_without_files(path: str) -> Generator[Path]:
"""Yields every subdirectory of +path+ that has no files under it."""
directories = []
non_empty = set()
for root, _, filenames in os.walk(path, topdown=False):
root_path = Path(root)
if filenames:
# This directory contains a file, mark it and its parent
# directories (but not ".") as non empty.
non_empty.update(root_path.parents[:-1])
non_empty.add(root_path)
directories.append(root_path)
for d in directories:
if d not in non_empty:
yield d
The gist is that we walk the entire directory structure looking for directories which contain a file, all while building a list of all directories. If a file is found, then the containing directory and its parents are marked as non-empty. Afterwards, we go through the directories and yield any we haven't marked earlier. And sine we're walking bottom-up, they will be naturally yielded in reverse order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not very surprised. Your approach makes sense. I'm not sure why I didn't use os.walk()
.
I'll need to add tests (see my other comment) and then I should be able to test if your approach fixes the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at my code again, one thing you may need to fix is root_path.parents[:-1]
. path.parents
include .
as the final parent for relative paths. I was working solely with relative paths while working on this code. I'd imagine the actual pip code is passing absolute paths in which case the last parent should not be skipped.
In practice, this probably won't affect anything since the final parent is not going to be under the pip cache directory, but if you see weird behaviour that's why.
I suppose a better design would be a mix of your approach and my approach where we manually walk the tree, using a stack to keep track of the parents we care about, and then add those to the exclusion list, but it probably doesn't make a difference?
Thanks for the review! I've responded to the things you mentioned (& accepted your commit suggestion). I'll try to work on this over the next few days. Please feel free to ping me if you don't hear anything by August 17th — I don't want it to slip through the cracks again! 🙂
No worries! I could tell it was rough, and expected it to need a second pass.
I'm not sure what happened here. If I did write them, I suspect they got lost when I switched OSes at some point in the last 5 years.
Glad to help! I'm really hoping I can get it wrapped up before the half-decade mark rolls around in October. 😂 |
…files. These commands now remove: - wheel cache folders without `.whl` files. - empty folders in the HTTP cache. - `selfcheck.json`, which pip does not use anymore.
Co-authored-by: Richard Si <[email protected]>
Haven't forgotten about this, but it's taking me a bit longer than expected to get to it. For now, I went ahead and rebased off main (d52011f) so it at least won't get stale. |
I'm moving this to 26.0 but ping me this week if you think it's ready for review. |
TODO:
The
pip cache purge
/pip cache remove
commands now:.whl
filesselfcheck.json
, which pip does not use anymore.This PR does not account for having
pip cache purge
remove things in thepip/selfchecks/
directory when appropriate, which was also discussed in #7372.