GitHub - ittia-research/inference: Embedding inferencing on TPU with PyTorch XLA

Embedding inferencing on TPU with PyTorch XLA.

§
VISITOR: Why another inferencing server?
PROJECT: There are a lot of good inferencing project out there. The project is mainly for edge cases in which we can not find a good existing project yet, mainly embedding inferencing on TPU device. We have noticed that vLLM supports TPU and embedding, but we can not make it working. If you know open-source project that works well, let us know.

Features

Supported devices: Google TPU
Supported models: Alibaba-NLP/gte-Qwen2-1.5B-instruct
API: REST compateble with OpenAI format

Quick Start

git pull https://github.com/ittia-research/inference
cd inference
docker compose up -d
docker compose logs -f

Other Docs

Roadmap

Acknowledgements

TPU Research Cloud team at Google

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Features

Quick Start

Other Docs

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ittia-research/inference

Folders and files

Latest commit

History

Repository files navigation

Features

Quick Start

Other Docs

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages