Skip to content

Conversation

@laldoroty
Copy link
Collaborator

Generate a unique folder with a UUID-based name within self.temp_dir so that there are no conflicts with deleting the contents of self.temp_dir when multiple jobs are launched.

@laldoroty laldoroty requested review from rknop and wmwv July 12, 2025 02:50
@laldoroty laldoroty self-assigned this Jul 12, 2025
@laldoroty laldoroty added the bug Something isn't working label Jul 12, 2025
@laldoroty
Copy link
Collaborator Author

Currently, this mostly works. It creates a new folder with a UUID-based name, and properly puts all the temporary files for that pipeline run inside that folder, then deletes the contents of the folder at the end. However, it does not delete the folder itself, so you end up with a lot of empty folders in self.temp_dir and have to manually remove those.

@laldoroty
Copy link
Collaborator Author

laldoroty commented Jul 12, 2025

@wmwv @rknop I think this is a reasonable time to return to the temp_dir/intermediate/scratch conversation. If we want to keep temp_dir really as a temporary directory with throwaway files, I can leave this as-is (i.e. the folder created within self.temp_dir is an unintelligible string of characters). However, if we want to merge the functionality of all of these folders and choices into one directory, then maybe we want the new sub-folder to be named after the object ID, or something human-readable like that. The problem with using the object ID is if we ever choose to run the same SN but different bands in parallel, we run into the same conflict. Though I guess we could do objectID_band... Thoughts?

@laldoroty
Copy link
Collaborator Author

Alternatively, we merge this for now because it's functional and punt the discussion about temporary directories/files to later. Thoughts? @wmwv @rknop

@wmwv
Copy link
Collaborator

wmwv commented Jul 22, 2025

Punt and merge.

@wmwv
Copy link
Collaborator

wmwv commented Jul 22, 2025

To have human-readable and identifiable directories implies that we have some tracking of processing. So the concept of "run", "collection", "campaign" or some other organized way of tracking what we're doing and whether things should overwrite, dependent on, or be total separate. I don't think we're ready to have this discussion yet.

Copy link
Collaborator

@wmwv wmwv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@laldoroty laldoroty marked this pull request as ready for review July 22, 2025 15:41
@laldoroty laldoroty merged commit 5d65320 into main Jul 22, 2025
3 checks passed
github-actions bot pushed a commit that referenced this pull request Jul 22, 2025
Issue #100: Delete contents of temp_dir without conflicts. 5d65320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Issue]: if temp_dir gets deleted, other jobs fail because they need the files in temp dir

3 participants