Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
TextArena
Explore TextArena, an OpenAI‑Gym style suite of 57 text games for evaluating and training language models with reinforcement learning, plus performance insights on popular LLMs.
TextArena is an OpenAI Gym style environment with 57 text based games (single-player, two-player and multi-player) that allows for evaluation and RL based training of the game playing capabilities of language models.
TextArena: Gym-style framework for LLM evaluation/RL in competitive text-based games.
- OpenAI GymThe open-source Python toolkit establishing the standard API for developing and comparing reinforcement learning (RL) algorithms.OpenAI Gym is the foundational, open-source Python library for RL research: it provides a standardized API for communication between your learning algorithm and a diverse suite of environments. This toolkit offers a wide range of challenges, from classic control problems like CartPole-v1 to complex Atari games and robotic simulations. The consistent interface allows developers to easily test, benchmark, and reproduce results for new RL methods against a common set of tasks. Note: Gym provides the environment; the developer must implement the learning agent, often using frameworks like TensorFlow or PyTorch.
- Language ModelsLanguage Models (LMs) are deep learning systems—like GPT-4 and Llama 2—trained on massive text datasets to predict and generate human-quality text, code, and conversation.Language Models are sophisticated deep learning systems, primarily utilizing the Transformer architecture, designed to process and generate natural language. They function as probabilistic prediction engines, estimating the likelihood of a token (word or subword) sequence based on billions or even trillions of learned parameters (e.g., Llama 2 offers models from 7B to 70B parameters). Training involves self-supervised learning on massive, diverse datasets (Common Crawl, digitized books), enabling them to master syntax, semantics, and context. Key applications include advanced text generation, summarization, machine translation, and code generation, effectively powering modern conversational AI and developer tools.
- TextArenaTextArena is the open-source evaluation framework: 57+ competitive text-based games for rigorously testing Large Language Model (LLM) agentic behavior and dynamic social skills.TextArena is your comprehensive, open-source framework for LLM evaluation, focusing on agentic behavior and complex social skills. We bypass saturated traditional benchmarks by utilizing 57+ unique text-based environments (single-player, two-player, multi-player) to test capabilities like negotiation, theory of mind, and deception. The platform features a unified, Gym-like API for streamlined reinforcement learning (RL) integration and a dynamic online evaluation system. Performance tracking is handled via real-time TrueSkill™ scores, offering a precise, relative measurement against other models and the 'Humanity' baseline. This design ensures extensibility and provides granular soft-skill profiling across ten dimensions (e.g., Strategic Planning, Bluffing).
- OpenAIOpenAI is an AI research and deployment company: We build safe artificial general intelligence (AGI) to benefit all of humanity.OpenAI is a premier AI research and deployment company, focused on developing safe Artificial General Intelligence (AGI) for global benefit. The organization operates under a unique structure: a non-profit Foundation governs a for-profit Group, which functions as a public benefit corporation. Its technology portfolio includes industry-defining models like the GPT series (e.g., GPT-4o, GPT-5.1), the conversational platform ChatGPT, and the text-to-video model Sora. These tools drive innovation across multiple sectors, providing powerful, accessible AI capabilities for developers, businesses, and consumers worldwide.
- Reinforcement LearningReinforcement Learning (RL) trains an autonomous agent to select optimal actions within an environment, maximizing its cumulative reward signal through continuous trial-and-error.RL is a decision-making framework: an agent learns an optimal policy by interacting with a dynamic environment (modeled as a Markov Decision Process or MDP). The agent executes an action, receives a new state, and gets a scalar reward (positive or negative). This trial-and-error loop drives the agent to maximize the total long-term reward. This core mechanism enabled DeepMind's AlphaGo to master the game of Go and is critical for autonomous vehicles navigating complex traffic scenarios. Key algorithms like Q-Learning, Policy Gradients, and Actor-Critic methods define the agent's strategy for balancing exploration (trying new actions) and exploitation (using known high-reward actions).
Related projects
Tackling ARC via LLM Program Search
Singapore
The talk presents current techniques using LLM‑driven program search to address the ARC challenge, detailing practical methods and…
Using LLMs for storytelling
Singapore
Explore using LLMs to create branching stories. Discover iterative generation techniques for complex narratives with multiple choices, a…
Onword AI
Singapore
Learn to coordinate applications using LLM agents, exploring LLMs as underlying infrastructure and the future of agent‑driven development.
alBERT
Singapore
Learn how alBERT monitors Chrome browsing, converts patterns into tool calls for AI agents, and enables interaction with…
NanoBrowser: Building Open-Source Alternative to OpenAI "Operator"
Singapore
This talk demonstrates NanoBrowser, an open-source AI-powered browser extension for web automation, highlighting community contributions, practical use cases,…
Run Local, open source AI
Singapore
Learn how to run open-source models like Llama3, Mistral, and Gemma locally using Jan.ai and Cortex.so, with practical…