-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[DO NOT MERGE] TransferManager: DirectoryUploader & DirectoryDownloader #3288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jterapin
wants to merge
49
commits into
version-3
Choose a base branch
from
tm-directory-features
base: version-3
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
49 commits
Select commit
Hold shift + click to select a range
6cb37fe
Initial setup
jterapin d31ae4d
Minor adjustments
jterapin 74ed189
Directory downloader impl
jterapin 4e8db17
Directory uploader impl
jterapin 098049d
Merge branch 'version-3' into tm-directory-features
jterapin 8f387d2
Merge branch 'version-3' into tm-directory-features
jterapin 7749ba5
Add default executor
jterapin 99f0de6
Add running check to default executor
jterapin 441fa82
Refactor MultipartFileUploader with executor
jterapin c792439
Fix typo in MultipartFileUploader
jterapin adce496
Update TM upload file with executor
jterapin 012c2bc
Merge branch 'version-3' into tm-directory-features
jterapin ee9c9da
Merge branch 'version-3' into tm-directory-features
jterapin 75df844
Merge from version-3
jterapin e5d3245
Merge branch 'version-3' into tm-directory-features
jterapin 173f5e4
Merge branch 'version-3' into tm-directory-features
jterapin cf88ff2
Merge branch 'version-3' into tm-directory-features
jterapin 2758c4d
Update to only spawn workers when needed
jterapin b92d3b3
Update directory uploader
jterapin 6afb495
Update directory uploader
jterapin 86b53e8
Update uploader
jterapin d587ae1
Merge branch 'version-3' into tm-directory-features
jterapin eae3814
Add minor improvements to directory uploader
jterapin 14010ef
Merge branch 'version-3' into tm-directory-features
jterapin 8ab4edc
Fix specs
jterapin face84d
Minor updates to multipart file uploader
jterapin 36a1e87
Minor refactors
jterapin 7dd9f98
Fix options
jterapin 77ab1ba
Refactor DirectoryUploader
jterapin e843137
Merge version-3 into branch
jterapin 009127d
Update multipartfileuploader
jterapin 39912fd
Refactor FileDownloader
jterapin f9fb117
Implement Directory Downloader
jterapin d307555
Add TODO
jterapin a14649a
Merge version-3 into branch
jterapin b9231e7
Feedback - update default executor
jterapin d991128
Refactor file downloader
jterapin bc533a0
Support FileDownloader changes
jterapin 9efc77f
Extra updates to FileDownloader
jterapin 1cc3fcf
Address feedback for FileUploader and MultipartFileUploader
jterapin 7b6b220
Merge branch 'version-3' into tm-directory-features
jterapin 45d2f5d
Add improvements to directory uploader
jterapin 64d481e
Update DirectoryDownloader based on feedbacks
jterapin 2ab63fb
Minor feedback updates
jterapin 7af3e32
Merge branch 'version-3' into tm-directory-features
jterapin 747965f
Update executor
jterapin 2230478
Improve Directory Uploader
jterapin cb145a0
Handle failure cases correctly
jterapin 0cb35cd
Improve Executor
jterapin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
Unreleased Changes | ||
------------------ | ||
|
||
* Feature - TODO | ||
|
||
1.199.1 (2025-09-25) | ||
------------------ | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# frozen_string_literal: true | ||
|
||
module Aws | ||
module S3 | ||
# @api private | ||
class DefaultExecutor | ||
RUNNING = :running | ||
SHUTTING_DOWN = :shutting_down | ||
SHUTDOWN = :shutdown | ||
|
||
def initialize(options = {}) | ||
@max_threads = options[:max_threads] || 10 | ||
@state = RUNNING | ||
@queue = Queue.new | ||
@pool = [] | ||
@mutex = Mutex.new | ||
end | ||
|
||
def post(*args, &block) | ||
@mutex.synchronize do | ||
raise 'Executor has been shutdown and is no longer accepting tasks' unless @state == RUNNING | ||
|
||
@queue << [args, block] | ||
ensure_worker_available | ||
end | ||
true | ||
end | ||
|
||
def kill | ||
@mutex.synchronize do | ||
@state = SHUTDOWN | ||
@pool.each(&:kill) | ||
@pool.clear | ||
@queue.clear | ||
end | ||
true | ||
end | ||
|
||
def shutdown(timeout = nil) | ||
@mutex.synchronize do | ||
return true if @state == SHUTDOWN | ||
|
||
@state = SHUTTING_DOWN | ||
@pool.size.times { @queue << :shutdown } | ||
end | ||
|
||
if timeout | ||
deadline = Time.now + timeout | ||
@pool.each do |thread| | ||
remaining = deadline - Time.now | ||
break if remaining <= 0 | ||
|
||
thread.join([remaining, 0].max) | ||
end | ||
@pool.select(&:alive?).each(&:kill) | ||
else | ||
@pool.each(&:join) | ||
end | ||
|
||
@pool.clear | ||
@state = SHUTDOWN | ||
true | ||
end | ||
|
||
def running? | ||
@state == RUNNING | ||
end | ||
|
||
def shutting_down? | ||
@state == SHUTTING_DOWN | ||
end | ||
|
||
def shutdown? | ||
@state == SHUTDOWN | ||
end | ||
|
||
private | ||
|
||
def ensure_worker_available | ||
return unless @state == RUNNING | ||
|
||
@pool.select!(&:alive?) | ||
@pool << spawn_worker if @pool.size < @max_threads | ||
end | ||
|
||
def spawn_worker | ||
Thread.new do | ||
while (job = @queue.shift) | ||
break if job == :shutdown | ||
|
||
args, block = job | ||
block.call(*args) | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
# frozen_string_literal: true | ||
|
||
module Aws | ||
module S3 | ||
# Raised when DirectoryDownloader fails to download objects from S3 bucket | ||
class DirectoryDownloadError < StandardError | ||
def initialize(message, errors = []) | ||
@errors = errors | ||
super(message) | ||
end | ||
|
||
# @return [Array<StandardError>] The list of errors encountered when downloading objects | ||
attr_reader :errors | ||
end | ||
|
||
# @api private | ||
class DirectoryDownloader | ||
def initialize(options = {}) | ||
@client = options[:client] | ||
@executor = options[:executor] | ||
@abort_requested = false | ||
@mutex = Mutex.new | ||
end | ||
|
||
attr_reader :abort_requested | ||
|
||
def download(destination, bucket:, **options) | ||
if File.exist?(destination) | ||
raise ArgumentError, 'invalid destination, expected a directory' unless File.directory?(destination) | ||
else | ||
FileUtils.mkdir_p(destination) | ||
end | ||
|
||
download_opts = build_download_opts(destination, bucket, options) | ||
downloader = FileDownloader.new(client: @client, executor: @executor) | ||
producer = ObjectProducer.new(download_opts.merge(client: @client, directory_downloader: self)) | ||
downloads, errors = process_download_queue(producer, downloader, download_opts) | ||
build_result(downloads, errors) | ||
ensure | ||
@abort_requested = false | ||
end | ||
|
||
private | ||
|
||
def request_abort | ||
@mutex.synchronize { @abort_requested = true } | ||
end | ||
def build_download_opts(destination, bucket, opts) | ||
{ | ||
destination: destination, | ||
bucket: bucket, | ||
s3_prefix: opts.delete(:s3_prefix), | ||
ignore_failure: opts.delete(:ignore_failure) || false, | ||
filter_callback: opts.delete(:filter_callback), | ||
progress_callback: opts.delete(:progress_callback) | ||
} | ||
end | ||
|
||
def build_result(download_count, errors) | ||
if @abort_requested | ||
msg = "directory download failed: #{errors.map(&:message).join('; ')}" | ||
raise DirectoryDownloadError.new(msg, errors) | ||
else | ||
{ | ||
completed_downloads: [download_count - errors.count, 0].max, | ||
failed_downloads: errors.count, | ||
errors: errors.any? ? errors : nil | ||
}.compact | ||
end | ||
end | ||
|
||
def handle_error(executor, opts) | ||
return if opts[:ignore_failure] | ||
|
||
request_abort | ||
executor.kill | ||
end | ||
|
||
def process_download_queue(producer, downloader, opts) | ||
# Separate executor for lightweight queuing tasks, | ||
# avoiding interference with main @executor lifecycle | ||
queue_executor = DefaultExecutor.new | ||
jterapin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
progress = DirectoryProgress.new(opts[:progress_callback]) if opts[:progress_callback] | ||
download_attempts = 0 | ||
errors = [] | ||
begin | ||
producer.each do |object| | ||
break if @abort_requested | ||
|
||
download_attempts += 1 | ||
queue_executor.post(object) do |o| | ||
dir_path = File.dirname(o[:path]) | ||
FileUtils.mkdir_p(dir_path) unless dir_path == opts[:destination] || Dir.exist?(dir_path) | ||
|
||
downloader.download(o[:path], bucket: opts[:bucket], key: o[:key]) | ||
progress&.call(File.size(o[:path])) | ||
rescue StandardError => e | ||
errors << e | ||
handle_error(queue_executor, opts) | ||
end | ||
end | ||
rescue StandardError => e | ||
errors << e | ||
handle_error(queue_executor, opts) | ||
end | ||
queue_executor.shutdown | ||
[download_attempts, errors] | ||
end | ||
|
||
# @api private | ||
class ObjectProducer | ||
include Enumerable | ||
|
||
DEFAULT_QUEUE_SIZE = 100 | ||
|
||
def initialize(options = {}) | ||
@destination_dir = options[:destination] | ||
@client = options[:client] | ||
@bucket = options[:bucket] | ||
@s3_prefix = options[:s3_prefix] | ||
@filter_callback = options[:filter_callback] | ||
@directory_downloader = options[:directory_downloader] | ||
@object_queue = SizedQueue.new(DEFAULT_QUEUE_SIZE) | ||
end | ||
|
||
def each | ||
producer_thread = Thread.new do | ||
stream_objects | ||
ensure | ||
@object_queue << :done | ||
end | ||
|
||
# Yield objects from internal queue | ||
while (object = @object_queue.shift) != :done | ||
break if @directory_downloader.abort_requested | ||
|
||
yield object | ||
end | ||
ensure | ||
producer_thread.join | ||
end | ||
|
||
private | ||
|
||
def build_object_entry(key) | ||
{ path: File.join(@destination_dir, normalize_key(key)), key: key } | ||
end | ||
|
||
# TODO: double check handling of objects that ends with / | ||
def stream_objects(continuation_token: nil) | ||
resp = @client.list_objects_v2(bucket: @bucket, prefix: @s3_prefix, continuation_token: continuation_token) | ||
resp.contents.each do |o| | ||
break if @directory_downloader.abort_requested | ||
next if o.key.end_with?('/') | ||
next unless include_object?(o.key) | ||
|
||
@object_queue << build_object_entry(o.key) | ||
end | ||
stream_objects(continuation_token: resp.next_continuation_token) if resp.next_continuation_token | ||
end | ||
|
||
def include_object?(key) | ||
return true unless @filter_callback | ||
|
||
@filter_callback.call(key) | ||
end | ||
|
||
def normalize_key(key) | ||
key = key.delete_prefix(@s3_prefix) if @s3_prefix | ||
File::SEPARATOR == '/' ? key : key.tr('/', File::SEPARATOR) | ||
end | ||
end | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# frozen_string_literal: true | ||
|
||
module Aws | ||
module S3 | ||
# @api private | ||
class DirectoryProgress | ||
def initialize(progress_callback) | ||
@transferred_bytes = 0 | ||
@transferred_files = 0 | ||
@progress_callback = progress_callback | ||
@mutex = Mutex.new | ||
end | ||
|
||
def call(bytes_transferred) | ||
@mutex.synchronize do | ||
@transferred_bytes += bytes_transferred | ||
@transferred_files += 1 | ||
|
||
@progress_callback.call(@transferred_bytes, @transferred_files) | ||
end | ||
end | ||
end | ||
end | ||
end |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we were putting these in separate files right? If you want to promote the other two (multipart errors) to the files where they are used that's fine too, but let's stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I'm planning to separate them out.