VLM and Claude Web Agents

Explore building reliable web‑scraping agents using a vision-language model, Claude reasoning, Selenium automation, and prompt engineering, demonstrated with flight price extraction.

Overview

Web scraping is broken. Companies spend millions maintaining brittle scrapers while developers waste countless hours rebuilding the same solutions. The emergence of powerful vision-language models (VLMs) and LLMs creates an opportunity to revolutionize this space.

I’ll demonstrate a novel architecture that combines:

Microsoft VLM for visual understanding and DOM parsing
Claude for reasoning and task planning
Selenium for browser automation
Custom prompt engineering for reliable structured output

We’ll explore:

Why traditional scrapers fail
How VLMs understand web interfaces
Prompt engineering for reliable agents
Live demo: Flight price comparison
Challenges in hallucination prevention
Open source architecture decisions

Key technical innovations:

Vision-guided DOM traversal
RAG memory during browsing
Structured data extraction

This project started from personal frustration with repetitive research tasks. The goal: make web automation accessible to everyone while being reliable enough for production use.

Live demo will showcase the agent finding flight prices and returning structured JSON - all without human intervention.

Links

https://onequery.app/
OneQuery.app: API for structured, asynchronous web data, no manual scraping.
https://github.com/addy999/onequery
OneQuery: AI web agent extracts structured data via Playwright, LLMs.

Tech stack

Related projects

LLM Fingerprinting: Identifying AI Models by Their Responses

Toronto

This talk demonstrates a system to identify and classify large language models by analyzing their responses to benchmark…

Ai Agents

Toronto

An overview of designing and deploying AI agents and MCP solutions at CIBC, covering architecture, integration, testing, and…

What would a Personal AI Tutor look like?

Toronto

Explore how Advisory uses Neo4j roadmaps and chat history to create a transparent AI tutoring system that cites…

Building Autonomous AI Agents

Toronto

Live demo of autonomous AI agents using the crewAI framework, showing real‑time learning and decision‑making, with code on…

AI powered reading buddy - what Kindle should be, but isn't

Toronto

A live demo of a web‑and‑Docker tool that reads academic PDFs line‑by‑line, adds context via graph search, and…

Living Museum

Toronto

The talk explores an AI-powered museum exhibit enabling natural language search and interactive conversations with artifacts, offering new…