Effect of quantization and alternate schemes

Embeddings are quantized into int8 when written to `clusters/`. What effect does this quantization have on our 
1. search metrics
2. throughput/communication
3. other? 

Should we be considering any other quantization schemes?

### Current approach:

##### Document: 
scaled with reference to a set of documents.
For us, this is done per cluster:
https://github.com/alan-turing-institute/arc-tiptoe/blob/5ce4188d154725973a283b68b89d4ee28325f4b1/src/arc_tiptoe/preprocessing/dim_reduce/dim_reduce.py#L144
and again for the centroids:
https://github.com/alan-turing-institute/arc-tiptoe/blob/5ce4188d154725973a283b68b89d4ee28325f4b1/src/arc_tiptoe/preprocessing/dim_reduce/dim_reduce.py#L166

method:
https://github.com/alan-turing-institute/arc-tiptoe/blob/5ce4188d154725973a283b68b89d4ee28325f4b1/src/arc_tiptoe/preprocessing/dim_reduce/dim_reduce_methods.py#L30-L37


nb: before quantization, pca is run with reference to *entire* set of documents. It is applied per cluster but with shared weights.


##### Query: 
per query scaling. No reference to document scaling.
https://github.com/alan-turing-institute/arc-tiptoe/blob/5ce4188d154725973a283b68b89d4ee28325f4b1/src/arc_tiptoe/search/query_processor.py#L216-L223


### Alternatives
 - per axis scaling?
 - use reference to documents when scaling queries (i.e. use same weights) - e.g. if keeping per cluster scaling, we could keep a reference set of weights per cluster and apply these before sending the query to each cluster.


	# Adaptive scaling to fit within int8 range
	data_min = np.min(transformed)
	data_max = np.max(transformed)
	data_range = max(abs(data_min), abs(data_max))

	# TODO: generalise quantisation
	scale_factor = 127.0 / data_range
	quantized = np.clip(np.round(transformed * scale_factor), -127, 127)

	# Quantize embedding
	data_min = np.min(embedding_reduced)
	data_max = np.max(embedding_reduced)
	data_range = max(abs(data_min), abs(data_max))
	scale = 127.0 / data_range
	embedding_quantized = np.clip(
	np.round(embedding_reduced * scale), -127, 127
	).astype(np.int8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Effect of quantization and alternate schemes #34

Current approach:

Document:

Query:

Alternatives

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Effect of quantization and alternate schemes #34

Description

Current approach:

Document:

Query:

Alternatives

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions