Latency Benchmarking tools for Amazon Bedrock

A collection of tools to measure inference latency for foundations models in Amazon Bedrock and OpenAI. Reports time to first token and total time.

bedrock-latency-benchmark.ipynb - Measure LLM single request latency across scenarios like:
- Different number of in/out tokens
- Compare latency across models from Amazon Bedrock and OpenAI.
- Latency for the same model across AWS Regions.
stress.test.bedrock.py - A utility to stress test Claude 3 models on Bedrock with high number of concurrent requests (e.g., launch 1000 requests/min).
clock-based-rolling-window-rpm-tpm-tester - Test whether Bedrock TPM limits are based on absolute minutes (clock-based) or relative minutes (rolling window). Determines if quota resets at xx:00 or 60 seconds after first usage.

Installing

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
clock-based-rolling-window-rpm-tpm-tester		clock-based-rolling-window-rpm-tpm-tester
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bedrock-latency-benchmark.ipynb		bedrock-latency-benchmark.ipynb
stress.test.bedrock.py		stress.test.bedrock.py