View Detailed Profile
Speculative Speculative Decoding (Mar 2026)

Speculative Speculative Decoding (Mar 2026)

Title:

Speculation is all you need: Intro to Speculative Decoding for High Performance Inference

Speculation is all you need: Intro to Speculative Decoding for High Performance Inference

Speculative decoding

Speculative Decoding in 2026: What Changed

Speculative Decoding in 2026: What Changed

Speculative Decoding

Ep 34: Qwen3.6-27B paired with llama.cpp speculative decoding delivers 10x token speedups in real...

Ep 34: Qwen3.6-27B paired with llama.cpp speculative decoding delivers 10x token speedups in real...

HOOK:** Qwen3.6-27B paired with llama.cpp

Accelerating LLM Inference on TPUs via Diffusion Speculative Decoding

Accelerating LLM Inference on TPUs via Diffusion Speculative Decoding

... today we'll hit the autoagressive bottleneck

Speculative Decoding Guide

Speculative Decoding Guide

This video overview explores the mechanics and production performance of

Don't use speculative decoding until you watch this

Don't use speculative decoding until you watch this

In this video, I benchmark

Speeding Up LLMs: Speculative Decoding for Multi-Sample Inference

Speeding Up LLMs: Speculative Decoding for Multi-Sample Inference

The paper introduces a novel twist on

Speculative Speculative Decoding

Speculative Speculative Decoding

Richard and Pierce break down

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Isaac Ke explains

ML Performance Reading Group 23: DFlash: Block Diffusion for Flash Speculative Decoding

ML Performance Reading Group 23: DFlash: Block Diffusion for Flash Speculative Decoding

Paper: https://arxiv.org/abs/2602.06036 Presenter: Shayan Shamsi.

Lossless LLM inference acceleration with Speculators

Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine

Speculative Speculative Decoding

Speculative Speculative Decoding

We unpack the SSD (

Speculative Decoding explained

Speculative Decoding explained

written version: https://www.adaptive-ml.com/post/

What is Speculative Decoding? making LLMs faster

What is Speculative Decoding? making LLMs faster

Speculative Decoding

EAGLE 3.1 Targets the Biggest Bug in Speculative Decoding

EAGLE 3.1 Targets the Biggest Bug in Speculative Decoding

This video walks through what

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative decoding

What is Speculative Decoding ?

What is Speculative Decoding ?

That's the mystery behind **

Speculative Decoding: The Easiest Way to Speed Up LLMs

Speculative Decoding: The Easiest Way to Speed Up LLMs

N-gram

How Medusa Works

How Medusa Works

Speculative