Media Summary: Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ... The AI revolution demands a new kind of infrastructure — and the AI Lab video series is your technical deep dive, discussing key ...

Inference Office Hours With Sglang Performance Optimizations For Llm Serving - Detailed Analysis & Overview

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ... Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ... The AI revolution demands a new kind of infrastructure — and the AI Lab video series is your technical deep dive, discussing key ... Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Curious about designing fault-tolerance for large-scale systems for Zoom link: Talk : Introductions and Meetup Updates by Chris Fregly and Antje Barth ...

InferenceX is an open-source (Apache 2.0) automated benchmark designed to keep pace with the rapidly evolving

Photo Gallery

Inference Office Hours with SGLang: Performance Optimizations for LLM Serving
SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025
Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten
Deep Dive: Optimizing LLM inference
Optimizing LLM Inference Requests
Tutorial: A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference... Multiple Speakers
AI Lab: Open-source inference with vLLM + SGLang | Optimizing KV cache with Crusoe Managed Inference
Faster LLMs: Accelerate Inference with Speculative Decoding
Optimize LLM inference with vLLM
What is vLLM? Efficient AI Inference for Large Language Models
Inference Office Hours: Building Fault Tolerance in Systems of Scale for LLM inference
AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)
View Detailed Profile
Inference Office Hours with SGLang: Performance Optimizations for LLM Serving

Inference Office Hours with SGLang: Performance Optimizations for LLM Serving

Join us to find out the latest

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving | Ray Summit 2025

At Ray Summit 2025, Ying Sheng from

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Do you want to learn how to

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ...

Optimizing LLM Inference Requests

Optimizing LLM Inference Requests

Our new book club series is about

Tutorial: A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference... Multiple Speakers

Tutorial: A Cross-Industry Benchmarking Tutorial for Distributed LLM Inference... Multiple Speakers

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Amsterdam, The Netherlands ...

AI Lab: Open-source inference with vLLM + SGLang | Optimizing KV cache with Crusoe Managed Inference

AI Lab: Open-source inference with vLLM + SGLang | Optimizing KV cache with Crusoe Managed Inference

The AI revolution demands a new kind of infrastructure — and the AI Lab video series is your technical deep dive, discussing key ...

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Optimize LLM inference with vLLM

Optimize LLM inference with vLLM

Ready to

What is vLLM? Efficient AI Inference for Large Language Models

What is vLLM? Efficient AI Inference for Large Language Models

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Inference Office Hours: Building Fault Tolerance in Systems of Scale for LLM inference

Inference Office Hours: Building Fault Tolerance in Systems of Scale for LLM inference

Curious about designing fault-tolerance for large-scale systems for

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)

Zoom link: https://us02web.zoom.us/j/82308186562 Talk #0: Introductions and Meetup Updates by Chris Fregly and Antje Barth ...

Lecture 100: InferenceX Continuous OSS Inference Benchmarking

Lecture 100: InferenceX Continuous OSS Inference Benchmarking

InferenceX is an open-source (Apache 2.0) automated benchmark designed to keep pace with the rapidly evolving

SGLang Office Hour 04/22: Scaling LLM Serving with Ray and SGLang

SGLang Office Hour 04/22: Scaling LLM Serving with Ray and SGLang

Scaling

Boost LLM performance: New SGLang course is live 🚀

Boost LLM performance: New SGLang course is live 🚀

Learn more: https://bit.ly/4du2u69 Introducing Efficient

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

LLM inference