Quantizing SDXL for Inference

This talk explains quantization principles and demonstrates SVD quantization on Stable Diffusion XL, showing how to reduce GPU VRAM usage effectively for inference.

Overview

This presentation offers a concise and accessible introduction to the principles of quantization, a technique used to optimize computational efficiency. It includes an overview of a basic, straightforward implementation of Singular Value Decomposition (SVD) quantization applied to the Stable Diffusion XL (SDXL) model. The approach demonstrates a practical method to significantly reduce GPU VRAM usage, dropping from 6.5 GB to 3.5 GB with minimal code. Designed for professionals and enthusiasts alike, this talk highlights the potential for resource optimization in machine learning workflows.

Links

https://github.com/rishabh063/intro-to-Quant/blob/main/into_to_quan...
Jupyter notebook demonstrates quantitative finance concepts using Python for analysis.

Tech stack

Related projects

AI in compliance

Pune

Learn how AI agents automatically analyze documents, identify compliance gaps, and provide real‑time monitoring for ISO‑27001/SOC, using LLMs,…

Building AI Workflows

Delhi

This talk covers building complex AI workflows using Julep, an open-source platform that simplifies creating agentic AI applications…

Quantization for Edge AI

Nairobi

A live walkthrough of building an Android AI app for community health workers, covering model quantization, edge deployment,…

AI Agents for Enterprises: Automating Workflows at Scale

Delhi

This talk demonstrates AI Agents automating complex enterprise workflows, showcasing real-world task automation and management within a dedicated…

Mobile Use Agent - Operator for apps

Delhi

This talk demonstrates an AI agent that uses computer vision and touch emulation to operate mobile apps, automating…

Big Models, Small Machines: Run Full-Precision LLMs on Low Memory

London

Learn how to run full‑precision LLMs on low‑memory devices using a custom inference strategy, demonstrated with a 1.7B…