-
Notifications
You must be signed in to change notification settings - Fork 298
Description
System Info
I run Qwen/Qwen3-Embedding-0.6B using the following command:
docker run -d --name text-embeddings-inference_server -p 8089:80 -v "$VOLUME":/data -e HF_ENDPOINT=https://hf-mirror.com --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.4 --revision refs/pr/27 --pooling last-token --max-batch-tokens 512 --model-id Qwen/Qwen3-Embedding-0.6B
The following request returns different results when called multiple times, sometimes returning result 1 and sometimes returning result 2.
curl --location 'http://localhost:8089/embed' --header 'Content-Type: application/json' --data '{ "inputs": [ "{\"name\": \"风场\", \"type\": \"model\", \"id\": \"EnOS_Wind_Farm\", \"desc\": \"风场模型\"}", "{\"name\": \"发电量\", \"type\": \"metric\", \"id\": \"ActiveProduction\", \"model\": \"EnOS_Wind_Farm\", \"model_name\": \"风场\", \"data_type\": \"double\", \"unit\": \"s\"}", "{\"name\": \"发电量\", \"type\": \"metric\", \"id\": \"ActiveProduction\", \"model\": \"EnOS_Solar_Site\", \"model_name\": \"光伏场站\", \"data_type\": \"double\", \"unit\": \"s\"}" ] }'
result 1:
[[-0.03803645, 0.010309944, ...], [...], [...]]
result 2:
[[-0.02697843, -0.005487643, ...], [...], [...]]
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- docker run
- call embed api
Expected behavior
Multiple requests with the same input return the same result