Open
Description
Describe the bug
I have a TEI embedding model endpoint created like this:
from huggingface_hub import create_inference_endpoint
repository = "thenlper/gte-large" #"BAAI/bge-reranker-large-base"
endpoint_name = "gte-large-001"
namespace = "MoritzLaurer" # your user or organization name
# check if endpoint with this name already exists from previous tests
available_endpoints_names = [endpoint.name for endpoint in huggingface_hub.list_inference_endpoints()]
if endpoint_name in available_endpoints_names:
endpoint_exists = True
else:
endpoint_exists = False
print("Does the endpoint already exist?", endpoint_exists)
# create new endpoint
if not endpoint_exists:
endpoint = create_inference_endpoint(
endpoint_name,
repository=repository,
namespace=namespace,
framework="pytorch",
task="sentence-similarity",
# see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricing
accelerator="gpu",
vendor="aws",
region="us-east-1",
instance_size="x1",
instance_type="nvidia-a10g",
min_replica=2,
max_replica=4,
type="protected",
custom_image={
"health_route":"/health",
"env": {
"MAX_BATCH_TOKENS":"16384",
"MAX_CONCURRENT_REQUESTS":"512",
"MAX_BATCH_REQUESTS": "124",
"MODEL_ID": "/repository"},
"url":"ghcr.io/huggingface/text-embeddings-inference:latest"
}
)
print("Waiting for endpoint to be created")
endpoint.wait()
print("Endpoint ready")
# if endpoint with this name already exists, get existing endpoint
else:
endpoint = huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
if endpoint.status in ["paused", "scaledToZero"]:
print("Resuming endpoint")
endpoint.resume()
print("Waiting for endpoint to start")
endpoint.wait()
print("Endpoint ready")
Based on the docs here, I should be able to call it like this:
from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
"Machine learning is so easy.",
other_sentences=[
"Deep learning is so straightforward.",
"This is so difficult, like rocket science.",
"I can't believe how much I struggled with this.",
],
model=endpoint.url
)
This results in this (hard to interpret) error message: HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: nEd4Xz) Make sure 'sentence-similarity' task is supported by the model.
It does work when making the /similarity
route from TEI explicit:
from huggingface_hub import InferenceClient
client = InferenceClient()
client.sentence_similarity(
"Machine learning is so easy.",
other_sentences=[
"Deep learning is so straightforward.",
"This is so difficult, like rocket science.",
"I can't believe how much I struggled with this.",
],
model=endpoint.url + "/similarity"
)
# output: [0.9319057, 0.81048536, 0.75192505]
Seems like the route is not set correctly by the client.
Reproduction
No response
Logs
No response
System info
{'huggingface_hub version': '0.24.6',
'Platform': 'Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.31',
'Python version': '3.9.5',
'Running in iPython ?': 'Yes',
'iPython shell': 'ZMQInteractiveShell',
'Running in notebook ?': 'Yes',
'Running in Google Colab ?': 'No',
'Token path ?': '/home/user/.cache/huggingface/token',
'Has saved token ?': True,
'Who am I ?': 'MoritzLaurer',
'Configured git credential helpers': 'store',
'FastAI': 'N/A',
'Tensorflow': 'N/A',
'Torch': 'N/A',
'Jinja2': '3.1.4',
'Graphviz': 'N/A',
'keras': 'N/A',
'Pydot': 'N/A',
'Pillow': 'N/A',
'hf_transfer': 'N/A',
'gradio': 'N/A',
'tensorboard': 'N/A',
'numpy': 'N/A',
'pydantic': 'N/A',
'aiohttp': 'N/A',
'ENDPOINT': 'https://huggingface.co',
'HF_HUB_CACHE': '/home/user/.cache/huggingface/hub',
'HF_ASSETS_CACHE': '/home/user/.cache/huggingface/assets',
'HF_TOKEN_PATH': '/home/user/.cache/huggingface/token',
'HF_HUB_OFFLINE': False,
'HF_HUB_DISABLE_TELEMETRY': False,
'HF_HUB_DISABLE_PROGRESS_BARS': None,
'HF_HUB_DISABLE_SYMLINKS_WARNING': False,
'HF_HUB_DISABLE_EXPERIMENTAL_WARNING': False,
'HF_HUB_DISABLE_IMPLICIT_TOKEN': False,
'HF_HUB_ENABLE_HF_TRANSFER': False,
'HF_HUB_ETAG_TIMEOUT': 10,
'HF_HUB_DOWNLOAD_TIMEOUT': 10}