Skip to content

Tickets/dm 51299 #939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: tickets/DM-43715
Choose a base branch
from
Open

Tickets/dm 51299 #939

wants to merge 13 commits into from

Conversation

jgates108
Copy link
Contributor

Added modes to transfering csv files from workers to the czar. "memory" stores the entire file in memory before merging, waiting before starting new transfers for memory to be available. "memdisk" is like "memory" except instead of waiting, it writes results (past a certain size) to disk.

@jgates108 jgates108 force-pushed the tickets/DM-51299 branch 3 times, most recently from 2cfec1a to f089d34 Compare July 3, 2025 20:03
@@ -272,6 +273,10 @@ services:
- type: volume
source: volume_czar_mariadb_run
target: /qserv/mariadb/run
- type: volume
source: volume_czar_tmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming this functionally (e.g. volume_czar_transfer) instead of by location? It doesn't have to be /tmp and this name could be confusing in configurations where transferDir below is changed? This would also be consisted with the other volume names here which are function-oriented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -124,6 +124,13 @@ class CzarConfig {
/// Getters for result aggregation options.
int getMaxTableSizeMB() const { return _maxTableSizeMB->getVal(); }
int getMaxSqlConnectionAttempts() const { return _maxSqlConnectionAttempts->getVal(); }
unsigned int getMaxTransferMemMB() const { return _resultMaxTransferMemMB->getVal(); }
/// Return the transfer directory, which defaults to /tmp, which is bad for performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say lose this comment? It is slightly misdirecting (it's not /tmp per-se which is bad for performance, it is anything which is in the Docker layer store).

If you'd like to keep the comment, I'd recommend changing to something like "allow customization to high performance volume" or some such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// Return the transfer directory. This is customizable to allow for a 
    /// high performance volume.

Better?

@@ -295,9 +289,6 @@ qdisp::MergeEndStatus MergingHandler::flushHttp(string const& fileUrl, uint64_t
"MergingHandler::" << __func__ << " uberJob=" << uberJob->getIdStr() << " fileUrl=" << fileUrl);

qdisp::MergeEndStatus mergeStatus = _mergeHttp(uberJob, fileUrl, fileSize);
if (mergeStatus.success) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: was this found redundant / vestigial / potentially problematic / now handled somewhere else?

Copy link
Contributor Author

@jgates108 jgates108 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now handled directly in InfileMerger::mergeHttp as _infileMerger->mergeCompleteFor(...) just added bytes to total byte counts. Things moved around a bit, so the total can be added once the merge finishes in InfileMerger::mergeHttp

Look for _totalResultSize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants