Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Self-modifying code
The session demonstrates constructing an AI agent that can edit its own source, spawn duplicates, and communicate, covering sandboxing, safety, and practical implementation details.
What if AI could reproduce? What if it could evolve?
In this talk we’ll pursue a single goal: building an AI agent that can modify it’s own implementation and spawn copies of itself.
Will it develop independent thought? Blow up my AWS bill? Brick my machine? Ascend to Skynet?
Join us February 21st to find out.
- AWSAWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from 33 geographic Regions.AWS is the global leader in cloud infrastructure, delivering over 200 fully featured services. We operate across 105 Availability Zones within 33 geographic Regions, ensuring high availability and low latency for your applications. Core services like Amazon EC2 (virtual servers), Amazon S3 (scalable object storage), and AWS Lambda (serverless compute) provide the foundational building blocks for any workload. This platform allows customers (from startups to Fortune 500s) to innovate faster, reduce operational costs by moving from CapEx to OpEx, and scale instantly. Security remains paramount: we offer 300+ security, compliance, and governance services, meeting standards like ISO 27001 and SOC 1/2/3. Simply put, AWS provides the secure, flexible, and proven foundation you need to build anything.
- AnthropicAnthropic is a frontier AI safety and research company, developing the Claude family of large language models (LLMs) via its Constitutional AI framework.Anthropic is an AI safety and research company, founded in 2021 by former OpenAI executives Dario and Daniela Amodei, and structured as a Public Benefit Corporation (PBC) . The core mission is building reliable, steerable AI systems, with a focus on interpretability and long-term alignment . Its flagship product is the Claude family of LLMs, which are highly capable models designed for complex reasoning and coding tasks . A key technical innovation is Constitutional AI (CAI), a training method that aligns the models with a set of ethical principles to ensure helpful, harmless, and honest outputs . The company has secured significant backing, including up to $4 billion from Amazon and a $2 billion commitment from Google .
- SWE-agent benchmarkSWE-agent is the state-of-the-art autonomous system enabling LLMs (like GPT-4o) to fix real-world software bugs using developer tools, with performance measured on the SWE-bench dataset.SWE-agent is the leading autonomous software engineering system, developed by Princeton and Stanford, that empowers LLMs (like GPT-4o or Claude Sonnet) to resolve real-world GitHub issues. It provides the model with a full suite of developer tools (bash, file editing, search) to generate a correct code patch within a containerized environment. Performance is benchmarked on the industry-standard SWE-bench dataset, which contains 2,294 real GitHub issues. The system is evaluated on its 'resolved rate' (pass@1), which tests both the foundation model and the agentic harness, establishing a rigorous, reproducible measure of AI capability in complex software development.
- SWE-bench-sonnetClaude Sonnet: Anthropic's high-performance AI model, optimized for agentic software engineering tasks and coding workflows.SWE-bench-sonnet represents the Claude Sonnet model line's (e.g., Sonnet 4.5) state-of-the-art performance on the SWE-bench Verified benchmark. This evaluation measures an AI's ability to solve real-world GitHub issues from open-source projects. Sonnet 4.5 achieved a record 77.2% on SWE-bench Verified, demonstrating superior autonomous coding capability: it can sustain complex, multi-step reasoning and execute code for long-horizon tasks, making it a powerful foundation for developer-focused AI agents.
- SandboxingSandboxing is a security mechanism that executes untrusted code within a tightly controlled, isolated environment to prevent system compromise.Sandboxing is a critical security control: it executes untrusted programs (like suspicious email attachments or web browser processes) within a disposable, isolated environment. This containment, often achieved via hypervisor-based virtualization (e.g., Windows Sandbox) or kernel-level user separation (e.g., Android Application Sandbox), strictly limits resource access (file system, network, memory). The core benefit is risk mitigation: any malware 'detonated' inside the sandbox is contained, protecting the host operating system and network infrastructure, and is typically purged upon session closure, ensuring a pristine state for the next use.
Related projects
Coder
Seattle
This talk explains how GPT-4 and Roslyn are combined to generate reliable C# code with over 90% success…
EidOS, an Agent Operating System
Seattle
Learn how to simplify LLM data products using a multi‑agent approach, then watch a live coding session building…
TSK: Open Source Coding Agent Task Manager and Sandbox
Seattle
Learn how TSK automates coding tasks in isolated Docker containers, allowing AI agents to generate, test, and commit…
Building AI react apps
Seattle
This talk demonstrates building a simple AI app using the Tambo dev tool, enabling AI to control and…
Autonomous Web Agents with Planning and Self-correction
Palo Alto
The session explains how autonomous web agents combine advanced search, self‑critique, and reinforcement learning to plan actions, correct…
60min+ video -> automated mutli-channel content
Seattle
Learn how we transform 60‑90‑minute AI workshops into GitHub README, blog post, email, and social‑media posts automatically, saving…