Skip to content

Block Database #4027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Block Database #4027

wants to merge 11 commits into from

Conversation

DracoLi
Copy link
Contributor

@DracoLi DracoLi commented Jun 23, 2025

Why this should be merged

This PR introduces BlockDB, a specialized database optimized for block storage.

Avalanche VMs currently store blocks in a key-value database (LevelDB or PebbleDB). This approach is no optimal for block storage because large blocks trigger frequent compactions causing write amplification that degrades performance as the database grows, and KV databases are designed for random key-value access rather than the sequential patterns typical of blockchain operations.

For how BlockDB works see README.md.

Changes

  • Added blockdb to x/.
  • Updated our lru cache to support onEvict. This is used by the blockdb for storing opened file descriptors for the data files.

How this was tested

Units tests for now

Todos

  • Split data across multiple files when MaxDataFileSize is reached
  • Compress data files to reduce storage size - Will be in a follow up PR
  • Add performance benchmarks - Will be in follow up

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces BlockDB, a specialized on-disk database optimized for blockchain block storage with improved write performance and automatic recovery.

  • Implements dedicated tests for writing, reading, concurrency, and error cases.
  • Introduces recovery logic and index management for efficient block lookups, along with detailed documentation in the README.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x/blockdb/writeblock_test.go Adds comprehensive tests verifying block writes, error conditions, and concurrency scenarios.
x/blockdb/recovery.go Introduces recovery logic to reconcile the data and index file contents after crashes.
x/blockdb/readblock_test.go Provides test coverage for reading full blocks, headers, and bodies in various conditions.
x/blockdb/index.go Implements fixed-size index entries and header serialization/deserialization.
x/blockdb/database.go Sets up file handling, header initialization, recovery trigger, and connection closure.
x/blockdb/block.go Implements block header serialization, writing/reading blocks, and ensuring data integrity.
x/blockdb/config.go Defines default and custom configuration options for the BlockDB.
x/blockdb/README.md Documents design, file formats, recovery, and usage of BlockDB.

Copy link
Contributor

@rkuris rkuris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still unreviewed: recovery code and some of the block allocation logic, but there is enough here to get started with some changes.

│ Min Block Height │ 8 bytes │
│ Max Contiguous Height │ 8 bytes │
│ Data File Size │ 8 bytes │
│ Reserved │ 24 bytes│
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a reserved area here?

Copy link
Contributor Author

@DracoLi DracoLi Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to account for we might add features that will require us to store more data in the header in future versions. If this happens, we can add it here without needing to reindex.

}
}

if s.nextDataWriteOffset.CompareAndSwap(currentOffset, newOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way of doing this! Presumably this is faster in the non-contention case than a mutex?

Copy link
Contributor Author

@DracoLi DracoLi Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this should be more lightweight

Comment on lines 333 to 339
fileIndex := int(currentOffset / maxDataFileSize)
localOffset := currentOffset % maxDataFileSize

if localOffset+totalSize > maxDataFileSize {
writeOffset = (uint64(fileIndex) + 1) * maxDataFileSize
newOffset = writeOffset + totalSize
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means that files other than the first one will not contain a header. Is this intentional? If so, it means the first file is always going to be opened and can never be deleted which should be mentioned in the README.

Copy link
Contributor Author

@DracoLi DracoLi Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not every block will contain the header (blockSize includes the metadata header + block). We are only splitting the data files here, not the index file. This is just calculating the global next write offset if the current data file cannot fit the block.

Copy link
Contributor Author

@DracoLi DracoLi Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable names could have been better. I updated this method to be a clearer in terms of what its doing.

}

func (s *Database) getOrOpenDataFile(fileIndex int) (*os.File, error) {
if handle, ok := s.fileCache.Load(fileIndex); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some limit on the fileCache size, otherwise we could run out of file descriptors if the maxFileSize is pretty small and/or blocks are really big.

Copy link
Contributor Author

@DracoLi DracoLi Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. i can set a 10k limit

@DracoLi DracoLi requested a review from rkuris July 2, 2025 18:42
@DracoLi DracoLi changed the title [Draft] BlockDB Block Database Jul 2, 2025
@DracoLi DracoLi marked this pull request as ready for review July 2, 2025 21:12
@DracoLi DracoLi requested a review from yacovm July 2, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants