-
Notifications
You must be signed in to change notification settings - Fork 764
Block Database #4027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Block Database #4027
Changes from all commits
260979a
64ca7f1
cf35473
15ae1d1
c6989b0
c1bcf97
4201549
9a90669
decbfe8
f08b7a7
cb900cf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
# BlockDB | ||
|
||
BlockDB is a specialized database optimized for blockchain blocks. | ||
|
||
## Key Functionalities | ||
|
||
- **O(1) Performance**: Both reads and writes complete in constant time | ||
- **Parallel Operations**: Multiple threads can read and write blocks concurrently without blocking | ||
- **Flexible Write Ordering**: Supports out-of-order block writes for bootstrapping | ||
- **Configurable Durability**: Optional `syncToDisk` mode guarantees immediate recoverability | ||
- **Automatic Recovery**: Detects and recovers unindexed blocks after unclean shutdowns | ||
|
||
## Design | ||
|
||
BlockDB uses a single index file and multiple data files. The index file maps block heights to locations in the data files, while data files store the actual block content. Data storage can be split across multiple data files based on the maximum data file size. | ||
|
||
``` | ||
┌─────────────────┐ ┌─────────────────┐ | ||
│ Index File │ │ Data File 1 │ | ||
│ (.idx) │ │ (.dat) │ | ||
├─────────────────┤ ├─────────────────┤ | ||
│ Header │ │ Block 0 │ | ||
│ - Version │ ┌─────>│ - Header │ | ||
│ - Min Height │ │ │ - Data │ | ||
│ - Max Height │ │ ├─────────────────┤ | ||
│ - Data Size │ │ │ Block 1 │ | ||
│ - ... │ │ ┌──>│ - Header │ | ||
├─────────────────┤ │ │ │ - Data │ | ||
│ Entry[0] │ │ │ ├─────────────────┤ | ||
│ - Offset ───────┼──┘ │ │ ... │ | ||
│ - Size │ │ └─────────────────┘ | ||
│ - Header Size │ │ | ||
├─────────────────┤ │ | ||
│ Entry[1] │ │ | ||
│ - Offset ───────┼─────┘ | ||
│ - Size │ | ||
│ - Header Size │ | ||
├─────────────────┤ | ||
│ ... │ | ||
└─────────────────┘ | ||
``` | ||
|
||
### File Formats | ||
|
||
#### Index File Structure | ||
|
||
The index file consists of a fixed-size header followed by fixed-size entries: | ||
|
||
``` | ||
Index File Header (80 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Version │ 8 bytes │ | ||
│ Max Data File Size │ 8 bytes │ | ||
│ Min Block Height │ 8 bytes │ | ||
│ Max Contiguous Height │ 8 bytes │ | ||
│ Max Block Height │ 8 bytes │ | ||
│ Next Write Offset │ 8 bytes │ | ||
│ Reserved │ 32 bytes│ | ||
└────────────────────────────────┴─────────┘ | ||
|
||
Index Entry (16 bytes): | ||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Data File Offset │ 8 bytes │ | ||
│ Block Data Size │ 4 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
#### Data File Structure | ||
|
||
Each block in the data file is stored with a header followed by the raw block data: | ||
|
||
``` | ||
Block Header (24 bytes): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Block entry header? |
||
┌────────────────────────────────┬─────────┐ | ||
│ Field │ Size │ | ||
├────────────────────────────────┼─────────┤ | ||
│ Height │ 8 bytes │ | ||
│ Checksum │ 8 bytes │ | ||
│ Size │ 4 bytes │ | ||
│ Header Size │ 4 bytes │ | ||
└────────────────────────────────┴─────────┘ | ||
``` | ||
|
||
### Block Overwrites | ||
|
||
BlockDB allows overwriting blocks at existing heights. When a block is overwritten, the new block is appended to the data file and the index entry is updated to point to the new location, leaving the old block data as unreferenced "dead" space. However, since blocks are immutable and rarely overwritten (e.g., during reorgs), this trade-off should have minimal impact in practice. | ||
|
||
### Fixed-Size Index Entries | ||
|
||
Each index entry is exactly 16 bytes on disk, containing the offset, size, and header size. This fixed size enables direct calculation of where each block's index entry is located, providing O(1) lookups. For blockchains with high block heights, the index remains efficient, even at height 1 billion, the index file would only be ~16GB. | ||
|
||
### Durability and Fsync Behavior | ||
|
||
BlockDB provides configurable durability through the `syncToDisk` parameter: | ||
|
||
**Data File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The data file is fsync'd after every block write, guaranteeing durability against both process failures and kernel/machine failures. | ||
- **When `syncToDisk=false`**: Data file writes are buffered, providing durability against process failures but not against kernel or machine failures. | ||
|
||
**Index File Behavior:** | ||
|
||
- **When `syncToDisk=true`**: The index file is fsync'd every `CheckpointInterval` blocks (when the header is written). | ||
- **When `syncToDisk=false`**: The index file relies on OS buffering and is not explicitly fsync'd. | ||
|
||
### Recovery Mechanism | ||
|
||
On startup, BlockDB checks for signs of an unclean shutdown by comparing the data file size on disk with the indexed data size stored in the index file header. If the data files are larger than what the index claims, it indicates that blocks were written but the index wasn't properly updated before shutdown. | ||
|
||
**Recovery Process:** | ||
|
||
1. Starts scanning from where the index left off (`NextWriteOffset`) | ||
2. For each unindexed block found: | ||
- Validates the block header and checksum | ||
- Writes the corresponding index entry | ||
3. Updates the max contiguous height and max block height | ||
4. Persists the updated index header | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This isn't clear. Do you mean you add an index entry in the index file in the former and in the latter you update the header of the index file? Can you clarify? |
||
|
||
## Usage | ||
|
||
### Creating a Database | ||
|
||
```go | ||
import "github.com/ava-labs/avalanchego/x/blockdb" | ||
|
||
config := blockdb.DefaultDatabaseConfig() | ||
db, err := blockdb.New( | ||
"/path/to/index", // Index directory | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that two parameters of the same type is usually confusing. Why not merge them into the config struct? Less parameters this way and less chance to conflate the order of the index and data. |
||
"/path/to/data", // Data directory | ||
config, | ||
logger, | ||
) | ||
if err != nil { | ||
fmt.Println("Error creating database:", err) | ||
return | ||
} | ||
defer db.Close() | ||
``` | ||
|
||
### Writing and Reading Blocks | ||
|
||
```go | ||
// Write a block with header size | ||
height := uint64(100) | ||
blockData := []byte("header:block data") | ||
headerSize := uint32(7) // First 7 bytes are the header | ||
err := db.WriteBlock(height, blockData, headerSize) | ||
if err != nil { | ||
fmt.Println("Error writing block:", err) | ||
return | ||
} | ||
|
||
// Read a block | ||
blockData, err := db.ReadBlock(height) | ||
if err != nil { | ||
fmt.Println("Error reading block:", err) | ||
return | ||
} | ||
if blockData == nil { | ||
// Block doesn't exist at this height | ||
DracoLi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return | ||
} | ||
|
||
// Read block components separately | ||
headerData, err := db.ReadHeader(height) | ||
if err != nil { | ||
fmt.Println("Error reading header:", err) | ||
return | ||
} | ||
bodyData, err := db.ReadBody(height) | ||
if err != nil { | ||
fmt.Println("Error reading body:", err) | ||
return | ||
} | ||
``` | ||
|
||
## TODO | ||
|
||
- [ ] Compress data files to reduce storage size | ||
- [ ] Split data across multiple files when `MaxDataFileSize` is reached | ||
- [ ] Implement a block cache for recently accessed blocks | ||
- [ ] Use a buffered pool to avoid allocations on reads and writes | ||
- [ ] Add tests for core functionality | ||
- [ ] Add performance benchmarks | ||
- [ ] Consider supporting missing data files (currently we error if any data files are missing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not make sense to add a version field? Imagine that in the future we'll need to change the structure of the data file entry.
If you have a node with the old version, they'll want to migrate their data, but it's not clear where the header of the block entry starts and where the data (block header and body) parts start. Maybe spare 2 bytes for a version?
Alternatively we can put 2 bytes to say where the header of the block entry ends.