Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Noa Notes: Medical AI Evaluation
Learn how we built a two‑tier evaluation framework for a medical transcription AI, using LLM‑driven factual checks, style analysis, prompt engineering, and MLflow tracking.
How do you evaluate an AI system that assists doctors with medical documentation? In this talk, we’ll share practical insights from building an evaluation framework for Noa Notes @ Docplanner - a system that transcribes and summarizes doctor-patient conversations. We will discuss our two-tier evaluation approach combining detailed factual assessment with style analysis, see how we leverage LLMs in the evaluation pipeline, and share specific examples of how prompt engineering improved our metrics. We’ll also discuss challenges unique to the medical domain and how we addressed them.
Noa provides AI healthcare assistance, automating clinical note generation and 24/7 booking via AWS/Azure.
Related projects
Practical demo challenges in creating LLM-based consumer products
Poland
Explore real-world obstacles and solutions when integrating large language models into consumer products, covering design, deployment, testing, and…
Efficient data extraction from documents for data analytics and process automation
Poland
Explore Arctic‑TILT, a 0.8 B‑parameter model that outperforms GPT‑4 in document processing, enabling efficient data extraction for analytics and…
How Not to Kill Anyone: Safety Layers in Medical Reasoning
Poland
Methods for extracting body composition and lab data from unstructured sources, building real‑time digital health twins, and ensuring…
Genaicode - programming on steroids
Poland
Live demo of Genaicode, an AI code generator, modifying a personal game in real time and covering latency,…
Unleash Your voice Unleash Your Agents
Poland
The talk demonstrates a locally run AI system that provides real‑time speech transcription, lets you control applications, and…
LLM Evaluations in Practice
Amsterdam
Learn about a practical setup for LLM evaluation in production, sharing hard-earned lessons for guiding prompt and code…