Media Summary: This video is part of an online course, Intro to Parallel Programming. Check out the course here: ... NHR PerfLab Seminar talk on April 28, 2026 Speaker: Chris Kitching, Founder and CTO, Spectral Compute Slides: ... Dive into the step-by-step optimizations of a

Lecture 45 Outperforming Cublas On H100 - Detailed Analysis & Overview

This video is part of an online course, Intro to Parallel Programming. Check out the course here: ... NHR PerfLab Seminar talk on April 28, 2026 Speaker: Chris Kitching, Founder and CTO, Spectral Compute Slides: ... Dive into the step-by-step optimizations of a Explanation of Speed up and Efficiency based on Amdahl's Law and Gustafson's Law. Speaker: Nouamane Tazi (00:00:00): High Level Overview ...

Photo Gallery

Lecture 45: Outperforming cuBLAS on H100
Livestream: Outperforming cuBLAS on H100
Nvidia CUDA in 100 Seconds
Coalesce Memory Access - Intro to Parallel Programming
SCALE—Ahead-of-time compilation of CUDA for HPC platforms
Intro to CUDA (part 3): Parallelizing a For-Loop
Only Guide You Need to Master CUDA MatMul Optimization
Computer Architecture Performance: Part 2: Amdahl's Law and Gustafson's Law
Lecture 48: The Ultra Scale Playbook
View Detailed Profile
Lecture 45: Outperforming cuBLAS on H100

Lecture 45: Outperforming cuBLAS on H100

Speaker: pranjalssh.

Livestream: Outperforming cuBLAS on H100

Livestream: Outperforming cuBLAS on H100

Speaker: pranjalssh.

Nvidia CUDA in 100 Seconds

Nvidia CUDA in 100 Seconds

What is

Coalesce Memory Access - Intro to Parallel Programming

Coalesce Memory Access - Intro to Parallel Programming

This video is part of an online course, Intro to Parallel Programming. Check out the course here: ...

SCALE—Ahead-of-time compilation of CUDA for HPC platforms

SCALE—Ahead-of-time compilation of CUDA for HPC platforms

NHR PerfLab Seminar talk on April 28, 2026 Speaker: Chris Kitching, Founder and CTO, Spectral Compute Slides: ...

Intro to CUDA (part 3): Parallelizing a For-Loop

Intro to CUDA (part 3): Parallelizing a For-Loop

CUDA

Only Guide You Need to Master CUDA MatMul Optimization

Only Guide You Need to Master CUDA MatMul Optimization

Dive into the step-by-step optimizations of a

Computer Architecture Performance: Part 2: Amdahl's Law and Gustafson's Law

Computer Architecture Performance: Part 2: Amdahl's Law and Gustafson's Law

Explanation of Speed up and Efficiency based on Amdahl's Law and Gustafson's Law.

Lecture 48: The Ultra Scale Playbook

Lecture 48: The Ultra Scale Playbook

Speaker: Nouamane Tazi https://huggingface.co/spaces/nanotron/ultrascale-playbook (00:00:00): High Level Overview ...