vllm
Efficient and high-throughput inference engine for large language models.
About vllm
A high-throughput and memory-efficient inference and serving engine for LLMs.
✅ Pros
- + High throughput for LLM inference
- + Memory-efficient serving engine
- + Open source and customizable
⚠️ Cons
- - Primarily supports Linux
- - Requires technical expertise to deploy
Reviews
Loading reviews...
Quick Info
- Pricing
- Free
- License
- Apache-2.0
- GitHub Stars
- 71,011
- Forks
- 13,647
- Language
- Python
- Platforms
- linux, self-hosted
Similar Tools
Price Alert
Get notified when vllm's pricing changes.