Notice something off about this package? Help us keep the marketplace safe and trustworthy by reporting inappropriate content or behavior.
Report this packageA Configuration package that defines an LLMInference type for deploying LLM models using vLLM Production Stack. Supports generative and embedding models with automatic GPU scaling, tensor parallelism, and Ingress routing.