Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Better Prompts, Better Evals
The session explores practical strategies for designing effective prompts and evaluation methods, drawing examples from content moderation, code generation, and biochemical research.
Lessons from Content Moderation, Code Gen & Biochemical Research
Related projects
AI-Powered Prompt Engineering: Enhancing LLM Performance with PromptLab and IQ
Boston
Live demo shows PromptLab and IQ optimizing Apple Intelligence prompts, demonstrating real-time prompt refinement across multiple LLMs for…
Simon Says Prompts
San Diego
A live demo shows how modifying a single embedding can redirect an LLM’s output, illustrating the technique for…
YourBench - Benchmarking LLMs on your Data
Dublin
This talk demonstrates how to create a dataset and use YourBench to efficiently evaluate the performance of large…
A new CLI/Library for evals
Austin
Demonstrating a new CLI/library for configuration-based LLM evaluations. Run evaluations on prompts with datasets or existing model output.
Model and Prompt Evals the Hard Way
New York City
Learn to evaluate models and prompts using simple Jupyter notebooks, applying manual reviews and automated metrics to compare…
Demo: Dendron - AI-Powered Analysis for Technology/AI Adoption
Miami
Live demo shows multi‑agent LLMs mapping business processes and technology landscapes, maintaining context across analysis and quantifying confidence…