Automated Terminal Session Dockerfiles

The demo shows how a pipeline of fine‑tuned LLMs converts a recorded terminal session into structured events and automatically produces a Dockerfile for reproducibility.

Python PyTorch Hugging Face Jupyter notebook Weights & Biases

Overview

A lack of documentation is a common pain point for any developer recently onboarded to a team. This absence frequently leads to difficulties in the comprehension of a piece of software, often leading to errors as a result. Under the guidance of Julia Longtin of the Human Feedback Foundation, the program that will be demoed will be our solution to fill this gap in knowledge: a program that will automate the generation of either documentation or a dockerfile, wherein the latter will hence can be run to replicate commands of the user. Our program achieves this automation through dividing tasks amongst three fine-tuned variants of DeepSeek R1 (and one model responsible for markdown outputs, which had been developed by a previous team utilizing a version of Gemma).

We will be presenting the user flow of an individual inputting their terminal recording session, and receiving a dockerfile that can be used to replicate the commands used during the recording session. To output a dockerfile, we will present a session that will first be inputted to a model that will be responsible for dividing the session into relevant “events”, and these events will be both annotated (i.e., summarized) and assigned a hierarchical structure (i.e., an event being a subevent of exiting a previous event’s hierarchy) by a subsequent model. Finally, the annotated session will be inputted to a model responsible for the generation of a dockerfile, which will be ran to replicate the user’s commands during the recorded session. Further, the motivation behind this approach over other solutions will also be discussed in detail.

Links

https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational...
LoRA fine-tuned Deepseek models automate Asciinema conversion into structured documentation and Dockerfiles.

Tech stack

Python

Python: The high-level, general-purpose language built for readability, powering everything from web backends to advanced machine learning models.

Python is the high-level, general-purpose language prioritizing clear, readable syntax (via significant indentation), ensuring rapid development for any team . Its ecosystem is massive: use it for robust web development with frameworks like Django and Flask, or leverage its power in data science with libraries such as Pandas and NumPy . The Python Package Index (PyPI) provides thousands of community-contributed modules, offering immediate solutions for tasks from network programming to GUI creation . The language is actively maintained by the Python Software Foundation (PSF), with the stable release currently at Python 3.14.0 (as of November 2025) .

https://python.org

View projects
PyTorch

PyTorch is the open-source machine learning framework: it provides a Python-first tensor library with strong GPU acceleration and a dynamic computation graph for building deep neural networks.

PyTorch, developed by Meta AI, is a premier open-source deep learning framework favored in both research and production environments. Its core is a powerful tensor library (like NumPy) optimized for GPU acceleration, delivering 50x or greater speedups for complex computations. The key differentiator is its 'Pythonic' design and dynamic computation graph (eager execution), which allows for rapid prototyping and simplified debugging compared to static-graph frameworks. Leveraging its Autograd system for automatic differentiation, practitioners build and train models for computer vision and NLP; major companies like Tesla (Autopilot) and Microsoft utilize PyTorch for critical AI applications.

https://pytorch.org

View projects
Hugging Face

Hugging Face is the central, open-source platform and community for building AI applications, hosting over 300,000 models and datasets via the popular Transformers library.

Hugging Face functions as the 'GitHub for machine learning,' providing a massive, collaborative Hub for AI assets (models, datasets, and demos). Its core technology is the open-source **Transformers** Python library, which simplifies the use of state-of-the-art models (e.g., BERT, GPT) for various tasks: natural language processing, computer vision, and audio. The platform hosts over 300,000 models and thousands of datasets, streamlining the entire ML workflow from research to deployment via **Spaces** (interactive demos). This ecosystem makes advanced AI accessible, efficient, and reproducible for developers and enterprises globally.

https://huggingface.co

View projects
Jupyter notebook

Jupyter Notebook is an open-source, web-based platform that merges live code, narrative text, equations, and rich media (like visualizations) into a single, shareable computational document.

This interactive computing environment is a core tool for data science and AI development (e.g., rapid experimentation, model prototyping). It supports over 40 programming languages (including Julia, Python, and R—from which the name 'Jupyter' is derived) via pluggable kernels. Notebooks, saved in the `.ipynb` format, organize work into executable code cells and Markdown text cells, making the workflow transparent, reproducible, and highly effective for collaborative analysis and sharing results with both technical and non-technical teams.

https://jupyter.org/

View projects
Weights & Biases

The AI developer platform for end-to-end MLOps: track experiments, optimize hyperparameters, and manage models from research to production.

Weights & Biases (W&B) is the leading AI developer platform, providing a centralized system of record for all MLOps workflows. It enables teams to track, visualize, and reproduce machine learning experiments efficiently. Key tools include W&B Experiments for logging metrics (like loss and accuracy) and W&B Sweeps for automated hyperparameter optimization. For modern GenAI development, W&B Weave offers a specialized toolkit for tracing, evaluating, and monitoring Large Language Model (LLM) applications, ensuring performance and reproducibility across the entire AI lifecycle.

https://wandb.ai

View projects

Related projects

DocStream: An Educational AI Agent

Toronto

Learn how DocStream parses live terminal commands, groups events, and generates clear summaries that detect errors and guide…

Working with AI: Code Conversion (Delivered by Dwayne Forde)

Toronto

Learn how an LLM‑driven workflow automates bulk code conversion, preserving context, and frees engineers to tackle critical project…

LLM Fingerprinting: Identifying AI Models by Their Responses

Toronto

This talk demonstrates a system to identify and classify large language models by analyzing their responses to benchmark…

60min+ video -> automated mutli-channel content

Seattle

Learn how we transform 60‑90‑minute AI workshops into GitHub README, blog post, email, and social‑media posts automatically, saving…

MeetingBot Demo

Toronto

Demo of Meeting Bot: one-click Terraform deployment and Next.js dashboard to provision headless bots that join and record…

What would a Personal AI Tutor look like?

Toronto

Explore how Advisory uses Neo4j roadmaps and chat history to create a transparent AI tutoring system that cites…