Skip to content

[EPIC] Iceberg Cache #1226

Open
Open
@Xuanwo

Description

@Xuanwo

What's the feature are you trying to implement?

Cache is an essential component of an Iceberg table, and different types of cache are needed at various levels.

For example, for our table metadata, we will need a Manifest cache so that we don't have to read and deserialize the same manifest files repeatedly. For our Parquet files, we will need a FileMetadata cache to avoid parsing the metadata from the Parquet files each time. We could even implement a raw data cache to store portions of data files, eliminating the need to download them from S3 again.

As the foundation for various query engines, iceberg-rust should be designed to simplify integration while still allowing each engine to fully optimize performance. This applies whether they are using iceberg-rust on a single machine or within a distributed cluster.

I plan to add a set of cache APIs to meet all those needs. My current plan is:

  • ObjectCache: an object cache trait that can hold objects like Manifest or FileMetadata
  • BytesCache: a bytes cache that can hold row content of files, like table_metadata.json files.
  • In FileIO Cache like opendal's CacheLayer, but the API is not decided yet.

Tasks

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions