inference

In this conversation, we sit down with Philip Kiely and Charlie O'Neill to talk about Philip's book Inference Engineering and why ...

27:59

How to become an inference engineer

2,249 views

3 weeks ago

KodeKloud

vLLMs Labs for FREE — https://kode.wiki/4toLSl7 Most people can use an LLM. Very few know how to serve one at scale.

15:17

Understanding vLLM with a Hands On Demo

15,188 views

2 weeks ago

DevOps & AI Toolkit

Building Inference-as-a-Service on Kubernetes

This video walks you through building a fully self-hosted AI inference platform on Kubernetes, giving your organization the ability ...

21:40

Building Inference-as-a-Service on Kubernetes

5,214 views

4 weeks ago

Johnathan Russell

Substack Deep Dive: How AI Inference Will Create the Next Millionaires 💰

Everyone is talking about AI… but almost no one understands inference—and that's where the real money is being made.

20:14

Substack Deep Dive: How AI Inference Will Create the Next Millionaires 💰

8 views

2 weeks ago

Martin Khristi

Training vs Inference — The Battle That Will Define AI's Future

Every time you open ChatGPT, Claude, or Gemini — something expensive is happening. But the hard part? It already happened ...

4:18

Training vs Inference — The Battle That Will Define AI's Future

129 views

3 weeks ago

ScyllaDB

P99 CONF 2025 | LLM Inference Optimization by Chip Huyen

Go to https://www.p99conf.io/ for P99 CONF talks on demand and to learn more. . . . . . This talk will discuss why LLM inference is ...

31:42

P99 CONF 2025 | LLM Inference Optimization by Chip Huyen

946 views

3 weeks ago

San Diego Machine Learning

We are kicking off a short book club series called An Introduction to LLM Inference. Ted has done a deep dive on how LLM ...

1:30:16

Introduction to LLM Inference

422 views

4 weeks ago

Vizuara

Master LLM Inference Engineering by MIT, Purdue PhDs | Get the Early Access

Register here: https://inference.vizuara.ai/

5:56

Master LLM Inference Engineering by MIT, Purdue PhDs | Get the Early Access

762,039 views

2 weeks ago

Microsoft Reactor

EP 7 | Build Enterprise Worthy LLM Inference with Open Source and Kubernetes

Scaling LLMs to production introduces critical challenges: How do you orchestrate multi-node execution? Optimize GPU ...

49:58

EP 7 | Build Enterprise Worthy LLM Inference with Open Source and Kubernetes

429 views

Streamed 5 days ago

Data Science Dojo

Tutorial: Powering Agentic Inference with @SambaNovaSystems | Agentic AI Conference

This hands-on lab by Kwasi Ankomeh, Director of AI Solutions at SambaNova, shows how the next frontier of AI isn't just about ...

49:33

Tutorial: Powering Agentic Inference with @SambaNovaSystems | Agentic AI Conference

206 views

8 days ago

Identity V

Dear Visitors, "Here I stand. I will not yield." Guided by a mysterious fragrance, the power of the unicorn awakens in her blood.

2:02

Identity V | Truth & Inference — The Herald Star of Fragrance

13,666 views

11 days ago

ShowOffer - Tech Interview Coaching Platform

Design Batch Inference System - Anthropic & OpenAI System Design Question

Chapters 0:00 Introduction 4:46 Requirements 7:23 APIs and Entities 10:21 GPU Knowledge 18:34 High Level Design 29:42 ...

52:25

Design Batch Inference System - Anthropic & OpenAI System Design Question

74,424 views

4 weeks ago

Augmented Mind Podcast

A User-Centric Perspective on LLM Inference | AM Podcast #3

Woosuk Kwon is CTO of Inferact and creator of the vLLM inference library. Woosuk shares what it takes to build the most popular ...

49:42

A User-Centric Perspective on LLM Inference | AM Podcast #3

253 views

2 weeks ago

Firebase

Hear the latest updates across Firebase, from Firebase App Hosting to Firestore Enterprise. Discover the newly available ...

6:14

March 2026: Firebase in AI Studio, Hybrid AI Inference for Android apps and more!

1,807 views

13 days ago

wecite

Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)

How do we serve AI models in production without breaking the bank or keeping users waiting? In this lecture, based on Chapter 9 ...

6:29

Inference Optimization: Making AI Faster & Cheaper (Latency, Throughput & GPUs)

56 views

4 weeks ago

Michael Porinchak - AP Statistics & AP Precalculus

AP Statistics | How to Choose the Right Inference Procedure (Step-by-Step)

Are you struggling to figure out which inference procedure to use on the AP Statistics exam? This video is your complete guide to ...

31:27

AP Statistics | How to Choose the Right Inference Procedure (Step-by-Step)

1,214 views

3 weeks ago

ProGuruGyan

Groq LLM with Python | Ultra-Fast AI Inference Using Jupyter Notebook

Groq is one of the fastest AI inference platforms available today. In this tutorial, we learn how to use the Groq API with Python ...

32:54

Groq LLM with Python | Ultra-Fast AI Inference Using Jupyter Notebook

314 views

4 weeks ago

My Weird Prompts

The world of local AI is powered by a confusing alphabet soup of tools. This episode demystifies the open-source inference ...

23:44

The AI Inference Engine Rebellion

17 views

2 weeks ago

Andrej Baranovskij

How to Cache vLLM Model in FastAPI for Faster Inference

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every ...

7:47

How to Cache vLLM Model in FastAPI for Faster Inference

277 views

3 weeks ago

Bert Chan (陳信宏)

6:41

The Architecture of Inference

13 views

2 weeks ago

ViewTube