Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
TensorFlow Agents: Maze Optimization
This talk demonstrates shortest path optimization in a maze using reinforcement learning with OpenAI Gym and TensorFlow Agents in a custom environment.
Demonstrating shortest path optimization in a maze using reinforcemnet learning. Using OpenAI Gym to create a tailored environment for this reinforcemnet learning problem. And applying TensorFlow agents (cutting-edge reinforcement learning algorithms) to this tailored environment.
Proximal Policy Optimization trains agents in custom Gym and MuJoCo environments.
- OpenAI GymThe open-source Python toolkit establishing the standard API for developing and comparing reinforcement learning (RL) algorithms.OpenAI Gym is the foundational, open-source Python library for RL research: it provides a standardized API for communication between your learning algorithm and a diverse suite of environments. This toolkit offers a wide range of challenges, from classic control problems like CartPole-v1 to complex Atari games and robotic simulations. The consistent interface allows developers to easily test, benchmark, and reproduce results for new RL methods against a common set of tasks. Note: Gym provides the environment; the developer must implement the learning agent, often using frameworks like TensorFlow or PyTorch.
- TF-AgentsThe reliable, modular TensorFlow library for building, training, and deploying Reinforcement Learning (RL) and Contextual Bandits agents.TF-Agents is your streamlined solution for RL development, built on TensorFlow 2.x and Keras. The library provides well-tested, modular components: Agents, Policies, and Environments. You can accelerate iteration by leveraging pre-implemented algorithms like DQN, PPO, and SAC, or quickly build custom solutions. It supports standard environment suites (e.g., OpenAI Gym) and integrates with DeepMind's Reverb for efficient replay buffers. This structure ensures fast code development, robust testing, and scalable deployment for your RL projects.
- Deep Q-NetworkDQN is the foundational deep reinforcement learning algorithm: it uses a deep convolutional network to approximate the optimal action-value function (Q-function), enabling agents to learn complex policies directly from raw pixel input.Deep Q-Network (DQN), pioneered by DeepMind in 2013, was the first successful fusion of deep learning and reinforcement learning. The core architecture employs a deep convolutional neural network to estimate the Q-value—the expected future reward for a state-action pair—stabilizing the classic Q-learning algorithm. It introduced two critical mechanisms: Experience Replay, which stores and samples past transitions to break data correlation, and a separate Target Network, which provides stable optimization targets. This innovation allowed a single agent to achieve human-level performance on 49 distinct Atari 2600 games, setting the benchmark for general-purpose AI agents.
- PPOProximal Policy Optimization (PPO) is the default, highly stable reinforcement learning (RL) algorithm: it uses a clipped objective function to ensure policy updates are efficient without becoming catastrophically large.PPO is a core policy gradient method in deep reinforcement learning, introduced by OpenAI in 2017 (John Schulman et al.). It addresses the instability of earlier methods like Trust Region Policy Optimization (TRPO) by approximating the trust region constraint with a simpler, clipped surrogate objective function (L_CLIP). This design allows for multiple epochs of minibatch updates on sampled data, significantly improving sample efficiency and implementation simplicity. PPO is the go-to algorithm for complex tasks: it has been successfully applied to simulated robotic locomotion, Atari game playing, and, critically, in Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) like ChatGPT with human preferences. Its balance of performance, stability, and ease of tuning makes it an industry standard.
- TensorFlowGoogle's open-source, end-to-end platform for building, training, and deploying machine learning models across all environments.TensorFlow is the open-source, end-to-end machine learning platform developed by the Google Brain team . It provides a comprehensive ecosystem of tools for model development: Keras simplifies high-level neural network construction, and TensorBoard offers visualization and debugging . The framework is engineered for scalability, supporting distributed training on powerful hardware like Google's custom Tensor Processing Units (TPUs) . Crucially, its ecosystem—including TensorFlow Lite for mobile/edge devices and TensorFlow.js for web browsers—ensures deployment flexibility, allowing models to run on servers, microcontrollers, or directly in a browser .
Related projects
Deep RL for User Experience
Chicago
Learn how to use Ray’s distributed tuning and parallel processing to scale reinforcement learning predictions and training, including…
Agents That Fix Their Own Mistakes: Multi-Agent Code Generation with Automated Iteration
Seattle
See a multi-agent system that automatically generates, tests, diagnoses, and fixes code using architectural patterns until it passes…
DeepSeek-ing a Needle in a Haystack
Toronto
Learn how to use DeepSeek R1 agentic workflows and temporal prompting to filter, rank, and retrieve the most relevant…
Agents on edge
Toronto
Examining deployment of TinyLlama on a 4 GB Jetson Nano, measuring memory, CPU, and GPU usage while assessing feasibility…
Teaching an AI Agent to Build a Data Warehouse
Denver
See an AI agent, Modlr, autonomously design and build a complex data warehouse model live, transforming weeks of…
AI agents for investment research
Los Angeles
Explore how distinct AI personas perform fundamental, technical, and growth analysis, using prompt engineering, function calling, and open‑source…