NOTE: This file is generated with the use of AI.
A high-performance, hash-based sampling library for Go that provides deterministic, stateless sampling without external dependencies.
sampling-go
is a generic sampling library that uses consistent hashing to make deterministic decisions about whether to include or exclude items from a sample.
It's particularly useful for:
- Rate limiting: Reduce load on downstream services by sampling a percentage of requests
- Logging and metrics: Sample log entries or metrics to reduce storage costs
- A/B testing: Consistently assign users to test groups
- Data processing: Sample large datasets for analysis
- Distributed systems: Make consistent sampling decisions across multiple service instances
Note: This library provides deterministic sampling based on hash values. The quality of distribution depends on the hash function and input characteristics. For cryptographic use cases, consider using cryptographically secure hash functions.
- Deterministic: Same input always produces the same sampling decision
- Stateless: No external storage or coordination required
- High performance: All operations are in-memory with minimal overhead
- Generic: Works with any Go type through generics
- Configurable: Flexible hash functions and sampling rates
- Zero dependencies: No third-party dependencies for core functionality
go get github.com/block/sampling-go
package main
import (
"fmt"
"github.com/block/sampling-go"
"github.com/block/sampling-go/hashing"
)
func main() {
// Create a sampler for strings with 30% sampling rate
sampler := sampling.NewSampler[string](
sampling.WithHashFunc(hashing.DefaultStringHasher),
sampling.WithStaticRate[string](0.3), // 30% sampling rate
)
messages := []string{
"user-123-login",
"user-456-purchase",
"user-789-logout",
"user-123-login", // Same message - will have same result
}
for _, msg := range messages {
if sampler.Sample(msg) {
fmt.Printf("✓ Sampled: %s\n", msg)
} else {
fmt.Printf("✗ Skipped: %s\n", msg)
}
}
}
The library uses a hash-based approach to sampling:
- Hash Generation: Each input is converted to a
uint32
hash value using a configurable hash function - Threshold Calculation: The sampling rate is converted to a "max hash" threshold
- Comparison: If
hash ≤ maxHash
, the item is included in the sample - Consistency: Same input always produces the same hash, ensuring deterministic behavior
For a 5% sampling rate: Max Hash = max(uint32) * 0.05 = 4,294,967,295 * 0.05 = 214,748,364
0 214,748,364 4,294,967,295
|-----------------|------------- ... ---------------------------|
| 1 | 2 | 3 | 4 | 5 | ... | 99 | 100 | <-- bucket (metrics, logging)
^^^^^^^^^^^^^^^^^^
Items in this area ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
are INCLUDED (5%) Items in this area
are EXCLUDED (95%)
^
|
Max Hash
sampler := sampling.NewSampler[string](
sampling.WithHashFunc(hashing.DefaultStringHasher),
sampling.WithStaticRate[string](0.1), // 10% sampling
)
result := sampler.Sample("user-request-123")
// Returns true/false consistently for the same input
sampler := sampling.NewSampler[string](
sampling.WithHashFunc(hashing.DefaultStringHasher),
sampling.WithStaticRate[string](0.2),
)
shouldTake, meta := sampler.SampleWithMeta("user-action-456")
fmt.Printf("Take: %v\n", shouldTake)
fmt.Printf("Hash: %d\n", meta.Hash)
fmt.Printf("Bucket: %d\n", meta.Bucket)
fmt.Printf("Valid: %v\n", meta.IsHashValid)
type Event struct {
ID string
UserID string
Type string
Priority int
}
// Custom hash function
func eventHasher(event Event) (uint32, bool) {
if event.ID == "" {
return 0, false // Invalid event
}
composite := fmt.Sprintf("%s:%s:%s", event.ID, event.UserID, event.Type)
return crc32.ChecksumIEEE([]byte(composite)), true
}
// Priority-based sampling rates
func priorityMaxHash(event Event) uint32 {
switch {
case event.Priority >= 9:
return hashing.ToMaxHash(1.0, 1.0) // 100% for critical
case event.Priority >= 7:
return hashing.ToMaxHash(0.8, 1.0) // 80% for high
default:
return hashing.ToMaxHash(0.1, 1.0) // 10% for normal
}
}
sampler := sampling.NewSampler[Event](
sampling.WithHashFunc(eventHasher),
sampling.WithMaxHashFunc(priorityMaxHash),
)
type Request struct {
RequestID string
Method string
Path string
UserAgent string
}
func requestHasher(req Request) (uint32, bool) {
if req.RequestID == "" {
return 0, false
}
return crc32.ChecksumIEEE([]byte(req.RequestID)), true
}
// Sample 30% of requests to reduce downstream load
sampler := sampling.NewSampler[Request](
sampling.WithHashFunc(requestHasher),
sampling.WithStaticRate[Request](0.3),
)
func handleRequest(req Request) {
if sampler.Sample(req) {
// Forward to expensive downstream service
processRequest(req)
} else {
// Log and skip
log.Printf("Skipped request %s", req.RequestID)
}
}
hashing.DefaultStringHasher
: CRC32 hash for stringshashing.DefaultBytesHasher
: CRC32 hash for byte sliceshashing.ZeroHash[T]()
: Always returns 0, false (for testing)- Custom functions implementing
hashing.HashFunc[T]
WithStaticRate[T](rate)
: Fixed sampling rate (0.0 to 1.0)WithMaxHashFunc[T](func)
: Dynamic sampling based on input- Default: 100% sampling rate
WithSkipInvalid[T]()
: Skip items that can't be hashed (default: include them)
type Sampler[T any] interface {
Sample(T) bool
SampleWithMeta(T) (bool, Meta)
}
type Meta struct {
IsHashValid bool // Whether hash was successfully generated
Bucket int32 // Hash bucket (1-100) for distribution analysis
Hash uint32 // Generated hash value
MaxHash uint32 // Sampling threshold
}
func NewSampler[T any](opts ...SamplerOption[T]) *sampler[T]
func WithHashFunc[T any](hasher hashing.HashFunc[T]) SamplerOption[T]
func WithMaxHashFunc[T any](maxHash hashing.MaxHashFunc[T]) SamplerOption[T]
func WithStaticRate[T any](rate float64) SamplerOption[T]
func WithSkipInvalid[T any]() SamplerOption[T]
See the examples/
directory for comprehensive examples:
- Basic usage: Simple string sampling
- Configuration options: Various sampling configurations
- Custom structs: Advanced patterns with business logic
- Request limiting: Real-world rate limiting scenario
Run examples:
cd examples
go run .
-
Clone the repository:
git clone https://github.com/block/sampling-go.git cd sampling-go
-
Install dependencies:
go mod download
-
Run tests:
go test ./...
- Built with ❤️ by the Block team
- Inspired by consistent hashing algorithms used in distributed systems
- Uses CRC32 hashing for fast, deterministic hash generation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: pkg.go.dev