This project is being archived. There is a newer / successor library with similar goals, that has more features and capabilities. Going forward, that project should be utilized rather than this repo (dlio_s3_rust). That new project is s3dlio (https://github.com/russfellows/s3dlio)
This guide shows how to use the Rust library, which supports both a compiled command line interface executable called s3Rust-cli
and Python library, compiled into a wheel file. Using this wheel, the Python library may be installed, imported via import dlio_s3_rust as s3
and used by Python programs.
The purpose of this library is to enable testing of S3 storage via both a cli and a Python library. The intention is to create hooks to DLIO, so that it may access S3 storage during its testing. The five primary operations for S3 are included: Get, Put, List, Delete, and Create-Bucket. Note that bucket creation currently occurs only as part of a Put operation.
For ease of testing, both the cli and library may be built into a container as well. The Rust source code is removed from this container in order to reduce its size.
In the near future, the Python library dlio_s3_rust
will be published on PyPi for distribution. The compiled Rust executable and Python wheel files are already published on GitHub and are availble under the release section of this project.
In order to build this project, you can build the code and libraries and use them directly.
- Rust, see [https://www.rust-lang.org/tools/install] How to Install Rust
- The
maturin
command, which may be installed with pip, uv or some other tools - The
patchelf
executable, best installed with system packages, such assudo apt-get install patchelf
or similar
To build the code, there is a two step process:
- First build the Rust code and library. Use
cargo build --release
, which of course presumes you have Rust installed - Next, build the Python library, using maturin. Command is:
maturin build --release --features extension-module
To install the python library you would use a pip install --force-reinstall ./path/to/my/manylinux.whl
using the path of the .whl file just created
For testing purposes you can build and run the code on most Linux systems, including both the Rust cli and the python library. For ease of use, a container may be used as well.
A container image may be built from the source code in this repository. Also, examining the Dockerfile may help users better understand how to build the code outside of a container by examining the steps in the Dockerfile.
The container will contain the executable, along with a python library. This allows you to run either the Rust cli, or a python program that uses the Python library.
Either docker or podman may be used to create the image, typically with a podman build -t xxx .
where xxx is the name you want to use for your container image. For those that want to simply pull and run a pre-built container, a relatively recent image may be found on quay.io. However, the version on quay may be older than that available by building from the source code in this repository.
Note: A possibly OLD container is here: https://quay.io/repository/russfellows-sig65/dealio_s3_rust
In order to pull the container using docker, you may execute the following:
docker pull quay.io/russfellows-sig65/dealio_s3_rust
Here is an example of starting a container using podman, and listing the contents of the containers /app directory.
Note: It is VERY important to use the "--net=host" or similar command when starting, since using S3 necessitates network connectivity to S3 storage.
eval@loki-node3:~/real-dealio2$ podman run --net=host --rm -it dealio_s3-rust
root@loki-node3:/app# ls
README.md dlio_s3_test3.py target test_all_dlio_s3.py
root@loki-node3:/app# which python
/app/.venv/bin/python
root@loki-node3:/app# python --version
Python 3.13.2
This library and CLI currently support accessing S3 via http without TLS, and S3 with TLS via https for both official, and self signed certificates. There is NO option to connect via TLS and ignore certificate errors. There are several reasons for this, the two biggest ones being:
- It is VERY difficult to do this with the current Rust crates and AWS S3 SDK
- It is VERY dangerous, and strongly discouraged
Setting up the environment variables and information necessary to access S3 can be somewhat tricky. This code supports the use of variables read in from a .env
file, OR environment variables. Note that the examples here have sample .env files. The values included from a private test environment, thus the ACCESS_KEY and SECRET_KEY values are not a concern. Again, these may be set as environment variables if desired, but probably easier to set them in the .env file, which is read in by default.
Note: The following values are required:
AWS_ACCESS_KEY_ID=some-id
AWS_SECRET_ACCESS_KEY=my-secret
AWS_ENDPOINT_URL=https://dns-name-or-ipaddr:port-number
AWS_REGION=region-if-needed
Note: The following value is OPTIONAL, but REQUIRED if using a self signed certificate with TLS access to S3:
AWS_CA_BUNDLE_PATH=/a/path/to-myfile.pem
Here are the contents of the sample .env file:
eval@loki-node3:~/real-dealio2$ cat .env
AWS_ACCESS_KEY_ID=BG0XPVXISBP41DCXOQR8
AWS_SECRET_ACCESS_KEY=kGSmlMHBl0ohc/nYRtGbBx4KCfpdPN1/fLbtjUyX
AWS_ENDPOINT_URL=http://10.9.0.21
AWS_REGION=us-east-1
S3_BUCKET=my-bucket2
The intention of this entire library is to provide a Python library for interacting with S3. The Rust CLI provides a way to check the underlying functions without worrying about constructing a Python program, or syntax errors. Because it is pure Rust, it also more accurately displays the potential performance that may be attained. However, using the Python library is the intention. In the "docs" subdirectory, there is a quick overview of the Python API, written by a chat agent. For those who prefer examples, there is a sample Python script to test the APIs, shown below.
Note: The following example shows running the test_new_dlio_s3.py
Python script, after changing the number of objects to 500, by editing the file. The default number of objects to use for testing is only 5 objects. In order to display more accurate performance values, the following example shows running the test with NUM_OBJECTS = 500
setting.
root@loki-node3:/app# uv run test_new_dlio_s3.py
=== Sync LIST ===
Found 999 objects under s3://my-bucket2/my-data2/ in 0.02s
=== Sync PUT ===
Uploaded 500 objects (10000.00 MiB) in 3.34s -> 149.66 ops/s, 2993.14 MiB/s
=== Sync GET Many ===
Fetched 500 objects (10000.00 MiB) in 7.50s -> 66.67 ops/s, 1333.49 MiB/s
=== Sync GET Many Stats ===
Stats: 500 objects, 10485760000 bytes (10000.00 MiB) in 2.21s -> 226.25 ops/s, 4524.91 MiB/s
=== Sync DELETE ===
Deleted test objects in 0.37s
=== Async LIST ===
Found 999 objects under s3://my-bucket2/my-data2/ in 0.02s
=== Async PUT ===
Uploaded 500 objects (10000.00 MiB) in 3.10s -> 161.18 ops/s, 3223.68 MiB/s
=== Async GET Many ===
Fetched 500 objects (10000.00 MiB) in 7.64s -> 65.41 ops/s, 1308.29 MiB/s
=== Async GET Many Stats ===
Stats: 500 objects, 10485760000 bytes (10000.00 MiB) in 2.41s -> 207.79 ops/s, 4155.79 MiB/s
=== Async DELETE ===
Deleted test objects in 0.36s
root@loki-node3:/app#
If you built this project locally, outside a container, you will have to install the executable, or provide the path to its location, which by default is in the "./target/release" directory.
If running in the container, the Rust executable is in your path. A which s3Rust-cli
shows its location:
root@loki-node3:/app# which s3Rust-cli
/usr/local/bin/s3Rust-cli
Running the command without any arguments shows the usage:
root@loki-node3:/app# s3Rust-cli
Usage: s3Rust-cli <COMMAND>
Commands:
list List keys that start with the given prefix
get Download one or many objects
delete Delete one object or every object that matches the prefix
put Upload one or more objects concurrently, uses ObjectType format filled with random data
help Print this message or the help of the given subcommand(s)
Options:
root@loki-node3:/app# s3Rust-cli
Usage: s3Rust-cli <COMMAND>
Commands:
list List keys that start with the given prefix
get Download one or many objects
delete Delete one object or every object that matches the prefix
put Upload one or more objects concurrently, uses ObjectType format filled with random data
help Print this message or the help of the given subcommand(s)
Options:
-v, --verbose... Increase log verbosity: -v = Info, -vv = Debug
-h, --help Print help
-V, --version Print version
root@loki-node3:/app#
First is an example of getting help for the get
subcommand:
root@loki-node3:/app# s3Rust-cli get --help
Download one or many objects
Usage: s3Rust-cli get [OPTIONS] <URI>
Arguments:
<URI> S3 URI – can be a full key or a prefix ending with `/`
Options:
-j, --jobs <JOBS> Maximum concurrent GET requests [default: 64]
-h, --help Print help
Next is an example of help for the put
subcommand:
root@loki-node3:/app# s3Rust-cli put --help
Upload one or more objects concurrently, uses ObjectType format filled with random data
Usage: s3Rust-cli put [OPTIONS] <URI_PREFIX>
Arguments:
<URI_PREFIX> S3 URI prefix (e.g. s3://bucket/prefix)
Options:
-c, --create-bucket Optionally create the bucket if it does not exist
-j, --jobs <JOBS> Maximum concurrent uploads (jobs), but is modified to be min(jobs, num) [default: 32]
-n, --num <NUM> Number of objects to create and upload [default: 1]
-o, --object-type <OBJECT_TYPE> What kind of object to generate: [default: RAW] [possible values: NPZ, TFRECORD, HDF5, RAW]
-s, --size <SIZE> Object size in bytes (default 20 MB) [default: 20971520]
-t, --template <TEMPLATE> Template for object names. Use '{}' as a placeholder [default: object_{}_of_{}.dat]
-h, --help Print help
Here are several examples of running the s3Rust-cli
command:
root@loki-node3:/app# s3Rust-cli list s3://my-bucket4/
...
/russ-test3-989.obj
/russ-test3-99.obj
/russ-test3-990.obj
/russ-test3-991.obj
/russ-test3-992.obj
/russ-test3-993.obj
/russ-test3-994.obj
/russ-test3-995.obj
/russ-test3-996.obj
/russ-test3-997.obj
/russ-test3-998.obj
/russ-test3-999.obj
Total objects: 1200
root@loki-node3:/app# s3Rust-cli delete s3://my-bucket4/
Deleting 1200 objects…
Done.
root@loki-node3:/app# s3Rust-cli list s3://my-bucket4/
Total objects: 0
Note: This use of put only uses a single set of braces {}
, and will thus each object will have only the object number, and not the total number in the name:
root@loki-node3:/app# s3Rust-cli put -c -n 1200 -t russ-test3-{}.obj s3://my-bucket4/
Uploaded 1200 objects (total 25165824000 bytes) in 7.607254643s (157.74 objects/s, 3154.88 MB/s)
root@loki-node3:/app#
root@loki-node3:/app# s3Rust-cli get -j 32 s3://my-bucket4/
Fetching 1200 objects with 32 jobs…
downloaded 24000.00 MB in 4.345053775s (5523.52 MB/s)
root@loki-node3:/app# s3Rust-cli delete s3://my-bucket4/
Deleting 1200 objects…
Done.
root@loki-node3:/app#