Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Virtual Try-On Video Models
The talk demonstrates how video‑based models combine AI fitting algorithms with real‑time rendering to enable accurate, on‑screen clothing try‑ons for remote shoppers.
Explore the future of fashion with cutting-edge video models that power seamless virtual try-on experiences. This technology blends AI precision and real-time visuals, letting users try clothes virtually with stunning accuracy and ease.
- GPT-4GPT-4 is OpenAI’s large multimodal model: it processes both text and image inputs, delivering human-level performance on complex professional and academic benchmarks.This is OpenAI’s latest milestone in scaling deep learning: a large multimodal model accepting both text and image inputs. It demonstrates a significant capability leap over its predecessor, scoring in the top 10% on a simulated bar exam (GPT-3.5 scored in the bottom 10%). The model handles nuanced instructions and long-form content, supporting context windows up to 32,768 tokens (32K model). This capacity allows processing up to 25,000 words in a single, complex prompt. GPT-4 is engineered for enhanced reliability, steerability, and advanced reasoning across diverse tasks.
- Claude-3Claude-3 is Anthropic's state-of-the-art multimodal model family (Opus, Sonnet, Haiku), setting new industry benchmarks for intelligence, speed, and vision capabilities.Claude-3, developed by Anthropic, is a powerful family of three generative AI models: Opus, Sonnet, and Haiku. Opus, the flagship, excels in complex reasoning, outperforming peers on key benchmarks (MMLU, GPQA) and supporting a 200,000-token context window. Sonnet offers an optimal balance for enterprise workloads, delivering performance that is 2x faster than its predecessor, Claude 2.1. Haiku is the fastest and most cost-effective option, capable of processing a 10,000-token research paper (including charts) in under three seconds. All three models are multimodal, featuring strong vision capabilities for analyzing charts, diagrams, and PDFs alongside text, enabling advanced data extraction and analysis.
- Llama-2Llama 2 is Meta AI's powerful, openly accessible family of large language models (LLMs), featuring models from 7B to 70B parameters for research and commercial applications.Llama 2 is Meta AI's next-generation LLM family, released for free research and commercial use. The collection includes both pre-trained foundation models and instruction-tuned 'Chat' variants, scaling from 7 billion (7B) up to 70 billion (70B) parameters. Key technical upgrades over Llama 1 involve training on 2 trillion tokens (40% more data) and doubling the context length to 4096 tokens. The Llama-2-chat models were rigorously aligned using Reinforcement Learning from Human Feedback (RLHF), positioning them as a top-tier, openly available option for developers building advanced generative AI solutions.
- LangChainThe open-source framework for building and deploying reliable, data-aware Large Language Model (LLM) applications.LangChain is the essential framework for engineering LLM-powered applications: it simplifies connecting models (like GPT-4 or Claude) to external data, computation, and APIs. The platform provides a modular set of components—Chains, Agents, Tools, and Memory—allowing developers to quickly build complex workflows like Retrieval-Augmented Generation (RAG) pipelines and sophisticated conversational agents. Its Python and JavaScript libraries, combined with LangChain Expression Language (LCEL), offer a standardized interface for rapid prototyping and moving applications to production with confidence.
- ComfyUIComfyUI is the premier open-source, node-based interface for building and executing highly customized generative AI workflows (Stable Diffusion, video, 3D).ComfyUI is your foundational, open-source engine for visual generative AI, operating on a powerful node-based interface. This architecture gives users total control: you visually construct complex AI pipelines—connecting components like samplers, models, and custom nodes (e.g., Animatediff, IPAdapter)—to achieve precise, reproducible results. Unlike simpler UIs, ComfyUI excels in advanced Stable Diffusion workflows, video generation, and 3D asset creation, maximizing hardware efficiency and allowing for low-level parameter adjustments. It is the flexible operating system for the next generation of AI artists and developers.
Related projects
AI Architect
Hong Kong
This talk explores how AI transforms architectural design by streamlining processes, enhancing creativity, and delivering personalized, efficient solutions…
AI for Cultural Preservation: Bridging Generative AI and Classical Methods to Decode East Asian Archives
Hong Kong
Discover how we convert millions of East Asian archival texts into structured, searchable databases using layout extraction, domain‑specific…
Can you tell? Works by Human artist vs AI artist
Hong Kong
Exploring how traditional artists incorporate AI, comparing human‑created and AI‑enhanced works, and sharing hands‑on student projects that blend…
Tech Noir I-Ching
Hong Kong
Learn how to create short video loops using Sora and Midjourney, apply Jet Set Radio prompts, and avoid…
Building a vibe-coding community
Hong Kong
We’ll discuss building AI‑generated web apps, the technical hurdles we faced, and how a social platform enables creators…
Photoshop Anything on your Phone in 10 Seconds
Hong Kong
Use a Telegram wrapper for Gemini 2.0 Flash to edit images with text prompts, producing edited photos in about ten…