.

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

April 17, 2025 · Vancouver

TensorFlow Agents: Maze Optimization

This talk demonstrates shortest path optimization in a maze using reinforcement learning with OpenAI Gym and TensorFlow Agents in a custom environment.

Overview
Links
Tech stack
  • OpenAI Gym
    The open-source Python toolkit establishing the standard API for developing and comparing reinforcement learning (RL) algorithms.
    OpenAI Gym is the foundational, open-source Python library for RL research: it provides a standardized API for communication between your learning algorithm and a diverse suite of environments. This toolkit offers a wide range of challenges, from classic control problems like CartPole-v1 to complex Atari games and robotic simulations. The consistent interface allows developers to easily test, benchmark, and reproduce results for new RL methods against a common set of tasks. Note: Gym provides the environment; the developer must implement the learning agent, often using frameworks like TensorFlow or PyTorch.
  • TF-Agents
    The reliable, modular TensorFlow library for building, training, and deploying Reinforcement Learning (RL) and Contextual Bandits agents.
    TF-Agents is your streamlined solution for RL development, built on TensorFlow 2.x and Keras. The library provides well-tested, modular components: Agents, Policies, and Environments. You can accelerate iteration by leveraging pre-implemented algorithms like DQN, PPO, and SAC, or quickly build custom solutions. It supports standard environment suites (e.g., OpenAI Gym) and integrates with DeepMind's Reverb for efficient replay buffers. This structure ensures fast code development, robust testing, and scalable deployment for your RL projects.
  • Deep Q-Network
    DQN is the foundational deep reinforcement learning algorithm: it uses a deep convolutional network to approximate the optimal action-value function (Q-function), enabling agents to learn complex policies directly from raw pixel input.
    Deep Q-Network (DQN), pioneered by DeepMind in 2013, was the first successful fusion of deep learning and reinforcement learning. The core architecture employs a deep convolutional neural network to estimate the Q-value—the expected future reward for a state-action pair—stabilizing the classic Q-learning algorithm. It introduced two critical mechanisms: Experience Replay, which stores and samples past transitions to break data correlation, and a separate Target Network, which provides stable optimization targets. This innovation allowed a single agent to achieve human-level performance on 49 distinct Atari 2600 games, setting the benchmark for general-purpose AI agents.
  • PPO
    Proximal Policy Optimization (PPO) is the default, highly stable reinforcement learning (RL) algorithm: it uses a clipped objective function to ensure policy updates are efficient without becoming catastrophically large.
    PPO is a core policy gradient method in deep reinforcement learning, introduced by OpenAI in 2017 (John Schulman et al.). It addresses the instability of earlier methods like Trust Region Policy Optimization (TRPO) by approximating the trust region constraint with a simpler, clipped surrogate objective function (L_CLIP). This design allows for multiple epochs of minibatch updates on sampled data, significantly improving sample efficiency and implementation simplicity. PPO is the go-to algorithm for complex tasks: it has been successfully applied to simulated robotic locomotion, Atari game playing, and, critically, in Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) like ChatGPT with human preferences. Its balance of performance, stability, and ease of tuning makes it an industry standard.
  • TensorFlow
    Google's open-source, end-to-end platform for building, training, and deploying machine learning models across all environments.
    TensorFlow is the open-source, end-to-end machine learning platform developed by the Google Brain team . It provides a comprehensive ecosystem of tools for model development: Keras simplifies high-level neural network construction, and TensorBoard offers visualization and debugging . The framework is engineered for scalability, supporting distributed training on powerful hardware like Google's custom Tensor Processing Units (TPUs) . Crucially, its ecosystem—including TensorFlow Lite for mobile/edge devices and TensorFlow.js for web browsers—ensures deployment flexibility, allowing models to run on servers, microcontrollers, or directly in a browser .

Related projects