Skip to content

block/sampling-go

NOTE: This file is generated with the use of AI.

sampling-go

A high-performance, hash-based sampling library for Go that provides deterministic, stateless sampling without external dependencies.

Overview

sampling-go is a generic sampling library that uses consistent hashing to make deterministic decisions about whether to include or exclude items from a sample.

It's particularly useful for:

  • Rate limiting: Reduce load on downstream services by sampling a percentage of requests
  • Logging and metrics: Sample log entries or metrics to reduce storage costs
  • A/B testing: Consistently assign users to test groups
  • Data processing: Sample large datasets for analysis
  • Distributed systems: Make consistent sampling decisions across multiple service instances

Note: This library provides deterministic sampling based on hash values. The quality of distribution depends on the hash function and input characteristics. For cryptographic use cases, consider using cryptographically secure hash functions.

Key Features

  • Deterministic: Same input always produces the same sampling decision
  • Stateless: No external storage or coordination required
  • High performance: All operations are in-memory with minimal overhead
  • Generic: Works with any Go type through generics
  • Configurable: Flexible hash functions and sampling rates
  • Zero dependencies: No third-party dependencies for core functionality

Installation

go get github.com/block/sampling-go

Quick Start

package main

import (
    "fmt"
    "github.com/block/sampling-go"
    "github.com/block/sampling-go/hashing"
)

func main() {
    // Create a sampler for strings with 30% sampling rate
    sampler := sampling.NewSampler[string](
        sampling.WithHashFunc(hashing.DefaultStringHasher),
        sampling.WithStaticRate[string](0.3), // 30% sampling rate
    )

    messages := []string{
        "user-123-login",
        "user-456-purchase", 
        "user-789-logout",
        "user-123-login", // Same message - will have same result
    }

    for _, msg := range messages {
        if sampler.Sample(msg) {
            fmt.Printf("✓ Sampled: %s\n", msg)
        } else {
            fmt.Printf("✗ Skipped: %s\n", msg)
        }
    }
}

How It Works

The library uses a hash-based approach to sampling:

  1. Hash Generation: Each input is converted to a uint32 hash value using a configurable hash function
  2. Threshold Calculation: The sampling rate is converted to a "max hash" threshold
  3. Comparison: If hash ≤ maxHash, the item is included in the sample
  4. Consistency: Same input always produces the same hash, ensuring deterministic behavior

Visual Representation

For a 5% sampling rate: Max Hash = max(uint32) * 0.05 = 4,294,967,295 * 0.05 = 214,748,364

0             214,748,364                                 4,294,967,295
|-----------------|------------- ... ---------------------------|
| 1 | 2 | 3 | 4 | 5 |            ...                 | 99 | 100 |    <-- bucket (metrics, logging) 
^^^^^^^^^^^^^^^^^^                                               
Items in this area ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
are INCLUDED (5%)               Items in this area  
                                are EXCLUDED (95%)
                  ^
                  |
               Max Hash

Usage Examples

Basic String Sampling

sampler := sampling.NewSampler[string](
    sampling.WithHashFunc(hashing.DefaultStringHasher),
    sampling.WithStaticRate[string](0.1), // 10% sampling
)

result := sampler.Sample("user-request-123")
// Returns true/false consistently for the same input

Sampling with Metadata

sampler := sampling.NewSampler[string](
    sampling.WithHashFunc(hashing.DefaultStringHasher),
    sampling.WithStaticRate[string](0.2),
)

shouldTake, meta := sampler.SampleWithMeta("user-action-456")

fmt.Printf("Take: %v\n", shouldTake)
fmt.Printf("Hash: %d\n", meta.Hash)
fmt.Printf("Bucket: %d\n", meta.Bucket)
fmt.Printf("Valid: %v\n", meta.IsHashValid)

Custom Struct Sampling

type Event struct {
    ID       string
    UserID   string
    Type     string
    Priority int
}

// Custom hash function
func eventHasher(event Event) (uint32, bool) {
    if event.ID == "" {
        return 0, false // Invalid event
    }
    composite := fmt.Sprintf("%s:%s:%s", event.ID, event.UserID, event.Type)
    return crc32.ChecksumIEEE([]byte(composite)), true
}

// Priority-based sampling rates
func priorityMaxHash(event Event) uint32 {
    switch {
    case event.Priority >= 9:
        return hashing.ToMaxHash(1.0, 1.0) // 100% for critical
    case event.Priority >= 7:
        return hashing.ToMaxHash(0.8, 1.0) // 80% for high
    default:
        return hashing.ToMaxHash(0.1, 1.0) // 10% for normal
    }
}

sampler := sampling.NewSampler[Event](
    sampling.WithHashFunc(eventHasher),
    sampling.WithMaxHashFunc(priorityMaxHash),
)

Request Rate Limiting

type Request struct {
    RequestID string
    Method    string
    Path      string
    UserAgent string
}

func requestHasher(req Request) (uint32, bool) {
    if req.RequestID == "" {
        return 0, false
    }
    return crc32.ChecksumIEEE([]byte(req.RequestID)), true
}

// Sample 30% of requests to reduce downstream load
sampler := sampling.NewSampler[Request](
    sampling.WithHashFunc(requestHasher),
    sampling.WithStaticRate[Request](0.3),
)

func handleRequest(req Request) {
    if sampler.Sample(req) {
        // Forward to expensive downstream service
        processRequest(req)
    } else {
        // Log and skip
        log.Printf("Skipped request %s", req.RequestID)
    }
}

Configuration Options

Hash Functions

  • hashing.DefaultStringHasher: CRC32 hash for strings
  • hashing.DefaultBytesHasher: CRC32 hash for byte slices
  • hashing.ZeroHash[T](): Always returns 0, false (for testing)
  • Custom functions implementing hashing.HashFunc[T]

Sampling Rates

  • WithStaticRate[T](rate): Fixed sampling rate (0.0 to 1.0)
  • WithMaxHashFunc[T](func): Dynamic sampling based on input
  • Default: 100% sampling rate

Additional Options

  • WithSkipInvalid[T](): Skip items that can't be hashed (default: include them)

API Reference

Core Types

type Sampler[T any] interface {
    Sample(T) bool
    SampleWithMeta(T) (bool, Meta)
}

type Meta struct {
    IsHashValid bool   // Whether hash was successfully generated
    Bucket      int32  // Hash bucket (1-100) for distribution analysis
    Hash        uint32 // Generated hash value
    MaxHash     uint32 // Sampling threshold
}

Constructor

func NewSampler[T any](opts ...SamplerOption[T]) *sampler[T]

Configuration Options

func WithHashFunc[T any](hasher hashing.HashFunc[T]) SamplerOption[T]
func WithMaxHashFunc[T any](maxHash hashing.MaxHashFunc[T]) SamplerOption[T]
func WithStaticRate[T any](rate float64) SamplerOption[T]
func WithSkipInvalid[T any]() SamplerOption[T]

Examples

See the examples/ directory for comprehensive examples:

  • Basic usage: Simple string sampling
  • Configuration options: Various sampling configurations
  • Custom structs: Advanced patterns with business logic
  • Request limiting: Real-world rate limiting scenario

Run examples:

cd examples
go run .

Development Setup

  1. Clone the repository:

    git clone https://github.com/block/sampling-go.git
    cd sampling-go
  2. Install dependencies:

    go mod download
  3. Run tests:

    go test ./...

License

LICENSE

Acknowledgments

  • Built with ❤️ by the Block team
  • Inspired by consistent hashing algorithms used in distributed systems
  • Uses CRC32 hashing for fast, deterministic hash generation

Support

About

A high-performance, hash-based sampling library for Go.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages