Skip to content

Use curl as optional client v1.4 #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Tmonster
Copy link
Contributor

@Tmonster Tmonster commented Aug 13, 2025

After duckdb/duckdb#18107 landed in duckdb/duckdb, and moving the duckdb submodule to a recent commit on v1.4-andium, this PR allows to switch at runtime based on the newly added httpfs config option httpfs_client_implementation:

D SET logging_storage=stdout;
D PRAGMA enable_logging('HTTP');
D SET httpfs_client_implementation='default';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:18.479, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537178.169255,VS0,VE1', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290029-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 1', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:06:18 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=806, Accept-Ranges=bytes}}}, CONNECTION, 2, 11, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='curl';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:06:30.247, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': '', 'headers': {content-type=application/octet-stream, x-ms-lease-state=available, last-modified='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, accept-ranges=bytes, x-ms-version=2025-05-05, server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', x-cache='HIT, HIT', __RESPONSE_STATUS__='HTTP/2 200 ', etag='"0x8DAF8D1CD43CA79"', x-ms-blob-type=BlockBlob, x-ms-server-encrypted=true, age=818, x-ms-lease-status=unlocked, x-served-by='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290021-RTM', fastly-restarts=1, via='1.1 varnish, 1.1 varnish', date='Thu, 03 Jul 2025 10:06:30 GMT', x-cache-hits='3730, 1', content-disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-timer='S1751537190.940711,VS0,VE1', content-length=21916382}}}, CONNECTION, 2, 13, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='httplib';
D select count(*)
  FROM 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet';
[LOG] 2025-07-03 10:07:45.552, HTTP, DEBUG, {'request': {'type': HEAD, 'url': 'https://github.com/duckdb/duckdb-data/releases/download/v1.0/event_baserunning_advance_attempt.parquet', 'headers': {}}, 'response': {'status': OK_200, 'reason': OK, 'headers': {X-Timer='S1751537265.144944,VS0,VE0', x-ms-request-id=91401b99-a01e-000c-7caa-dd765d000000, X-Served-By='cache-iad-kcgs7200132-IAD, cache-rtm-ehrd2290047-RTM', Fastly-Restarts=1, x-ms-lease-status=unlocked, x-ms-creation-time='Tue, 17 Jan 2023 21:28:40 GMT', x-ms-blob-type=BlockBlob, x-ms-blob-content-md5='0OgVgULsCWfa1mU4BlFEbg==', X-Cache-Hits='3730, 0', Via='1.1 varnish, 1.1 varnish', Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0, x-ms-version=2025-05-05, X-Cache='HIT, HIT', ETag='"0x8DAF8D1CD43CA79"', Connection=keep-alive, x-ms-lease-state=available, Last-Modified='Tue, 17 Jan 2023 21:28:40 GMT', Date='Thu, 03 Jul 2025 10:07:45 GMT', Content-Length=21916382, Content-Type=application/octet-stream, Content-Disposition='attachment; filename=event_baserunning_advance_attempt.parquet', x-ms-server-encrypted=true, Age=893, Accept-Ranges=bytes}}}, CONNECTION, 2, 15, NULL
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    9388600     │
│ (9.39 million) │
└────────────────┘
D SET httpfs_client_implementation='something_else';
Invalid Input Error:
Unsupported option for httpfs_client_implementation, only `curl`, `httplib` and `default` are currently supported
It can be checked from the headers that slightly different implementations are used, given for example different styling for Etag vs etag or similar implementation details.

hannes added a commit to duckdb/duckdb that referenced this pull request Aug 18, 2025
Yes this will fix my build errors at
duckdb/duckdb-httpfs#96. This CI link has a
passing build status
https://github.com/duckdb/duckdb-httpfs/actions/runs/16991311944/job/48171130825

~~[DO NOT MERGE]: 
I am testing to see if this is the correct fix with [this
PR](duckdb/duckdb-httpfs#96) first. I am just
updating the duckdb submodule pointer for the httpfs fork to the branch
here. If those tests pass then I know what the correct fix is. (don't
know how to trigger it otherwise yet)~~

This is prompted by this PR
duckdb/duckdb-httpfs#96. Related [CI
failure](https://github.com/duckdb/duckdb-httpfs/actions/runs/16932639981/job/47981684022?pr=96#step:26:597)

Seems like the httplib has conflicts with the max() function. I've
searched for other instances of `::max()` and `::min()` in httplib.hpp
and didn't find any.

It seems like the proper fix is to use
`(std::numeric_limits<size_t>::max)()` as seen on line [96 of
httplib.hpp](https://github.com/duckdb/duckdb/blob/1f0de28806a8915c8203dd060dad549f28f5539b/third_party/httplib/httplib.hpp#L96)
and that did not fail the windows build
@carlopi
Copy link
Collaborator

carlopi commented Aug 19, 2025

I remember there was a problem when running:

SET httpfs_client_implementation='curl';
INSTALL non_existent_extension;

throwing an uncaught error, that likely signal a problem elsewhere.

Could you check whether that's still the case?

Apart from that, also considering this is not the default AND it's on 1.4- branch, I would think this is good to be merged.

Copy link
Collaborator

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great to have this!

@carlopi
Copy link
Collaborator

carlopi commented Aug 19, 2025

Other things to consider, as follow ups:

  • running all duckdb test suite with a configuration to force curl as default (when I did that last time only error with INSTALL extensions)
  • testing on a physical Windows machine
  • considering whether the default should be curl (OSX + Linux?, likely 1.5)
  • documentation

@carlopi
Copy link
Collaborator

carlopi commented Aug 21, 2025

Should we merge or is there any blocker?

I think merge + bumping duckdb-httpfs hash in duckdb/duckdb would make this more properly available, other steps such as docs can be done then before 1.4.0

@samansmink samansmink merged commit c31f9e9 into duckdb:v1.4-andium Aug 21, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants