inference

Inference requires efficient loading and quantization of the model. This video covers the depth and breadth of various methods ...

15:14

Why Inference is hard..

661 views

1 hour ago

SambaNova

AI Agents Need Faster Inference — Why GPUs Fall Short (And What Replaces Them)

AI agents are changing everything. They don't just generate text — they plan, reason, call tools, and take action. And every one of ...

3:01

AI Agents Need Faster Inference — Why GPUs Fall Short (And What Replaces Them)

0 views

55 minutes ago

Baseten

In this conversation, we sit down with Philip Kiely and Charlie O'Neill to talk about Philip's book Inference Engineering and why ...

27:59

How to become an inference engineer

2,258 views

3 weeks ago

KodeKloud

vLLMs Labs for FREE — https://kode.wiki/4toLSl7 Most people can use an LLM. Very few know how to serve one at scale.

15:17

Understanding vLLM with a Hands On Demo

15,363 views

2 weeks ago

DevOps & AI Toolkit

Building Inference-as-a-Service on Kubernetes

This video walks you through building a fully self-hosted AI inference platform on Kubernetes, giving your organization the ability ...

21:40

Building Inference-as-a-Service on Kubernetes

5,215 views

4 weeks ago

Firebase

AI in action: Adding AI-powered reviews → https://goo.gle/4chWYS6 Android Hybrid on Device Inference ...

3:52

Implement hybrid inference in Android using Firebase AI Logic

755 views

4 days ago

Bloomberg Technology

Google to Release New Inference-Focused Chips

Google plans to announce its new generation of custom-designed chips, known as tensor processing units, or TPUs, this week.

4:27

Google to Release New Inference-Focused Chips

886 views

2 hours ago

ScyllaDB

P99 CONF 2025 | LLM Inference Optimization by Chip Huyen

Go to https://www.p99conf.io/ for P99 CONF talks on demand and to learn more. . . . . . This talk will discuss why LLM inference is ...

31:42

P99 CONF 2025 | LLM Inference Optimization by Chip Huyen

947 views

3 weeks ago

San Diego Machine Learning

We are kicking off a short book club series called An Introduction to LLM Inference. Ted has done a deep dive on how LLM ...

1:30:16

Introduction to LLM Inference

422 views

4 weeks ago

Johnathan Russell

Substack Deep Dive: How AI Inference Will Create the Next Millionaires 💰

Everyone is talking about AI… but almost no one understands inference—and that's where the real money is being made.

20:14

Substack Deep Dive: How AI Inference Will Create the Next Millionaires 💰

8 views

2 weeks ago

Microsoft Reactor

EP 7 | Build Enterprise Worthy LLM Inference with Open Source and Kubernetes

Scaling LLMs to production introduces critical challenges: How do you orchestrate multi-node execution? Optimize GPU ...

49:58

EP 7 | Build Enterprise Worthy LLM Inference with Open Source and Kubernetes

429 views

Streamed 6 days ago

Vizuara

Master LLM Inference Engineering by MIT, Purdue PhDs | Get the Early Access

Register here: https://inference.vizuara.ai/

5:56

Master LLM Inference Engineering by MIT, Purdue PhDs | Get the Early Access

762,040 views

2 weeks ago

ShowOffer - Tech Interview Coaching Platform

Design Batch Inference System - Anthropic & OpenAI System Design Question

Chapters 0:00 Introduction 4:46 Requirements 7:23 APIs and Entities 10:21 GPU Knowledge 18:34 High Level Design 29:42 ...

52:25

Design Batch Inference System - Anthropic & OpenAI System Design Question

74,439 views

4 weeks ago

Identity V

Dear Visitors, "Here I stand. I will not yield." Guided by a mysterious fragrance, the power of the unicorn awakens in her blood.

2:02

Identity V | Truth & Inference — The Herald Star of Fragrance

13,707 views

11 days ago

Augmented Mind Podcast

A User-Centric Perspective on LLM Inference | AM Podcast #3

Woosuk Kwon is CTO of Inferact and creator of the vLLM inference library. Woosuk shares what it takes to build the most popular ...

49:42

A User-Centric Perspective on LLM Inference | AM Podcast #3

254 views

2 weeks ago

wecite

Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)

How do we serve AI models in production without breaking the bank or keeping users waiting? In this lecture, based on Chapter 9 ...

6:29

Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)

56 views

4 weeks ago

Firebase

Hear the latest updates across Firebase, from Firebase App Hosting to Firestore Enterprise. Discover the newly available ...

6:14

March 2026: Firebase in AI Studio, Hybrid AI Inference for Android apps and more!

1,809 views

13 days ago

Michael Porinchak - AP Statistics & AP Precalculus

AP Statistics | How to Choose the Right Inference Procedure (Step-by-Step)

Are you struggling to figure out which inference procedure to use on the AP Statistics exam? This video is your complete guide to ...

31:27

AP Statistics | How to Choose the Right Inference Procedure (Step-by-Step)

1,234 views

3 weeks ago

Andrej Baranovskij

How to Cache vLLM Model in FastAPI for Faster Inference

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every ...

7:47

How to Cache vLLM Model in FastAPI for Faster Inference

278 views

3 weeks ago

Modern AI Course

Intro to Modern AI online course. For more information and to enroll, please visit https://modernaicourse.org.

53:05

Lecture 13: Efficient LLM Inference

599 views

4 weeks ago

ViewTube