Technology

Qwen VLMS

Qwen-VL is Alibaba Cloud's state-of-the-art Large Vision-Language Model (LVLM): it handles image, text, and bounding box inputs for superior multimodal reasoning and high-resolution visual processing.

This is Alibaba Cloud's Qwen-VL series, a robust Large Vision-Language Model (LVLM) built on the Qwen-LM foundation and open-sourced in September 2023. It is a powerful multimodal agent: it processes image, text, and bounding box inputs to deliver precise text and bounding box outputs. The model's key technical advantage is its support for ultra-high-resolution images (millions of pixels) and extreme aspect ratios, significantly enhancing detailed recognition and text extraction (OCR) in both English and Chinese. The latest Qwen-VL-Max version competes directly with proprietary models like OpenAI's GPT-4V and Google's Gemini, achieving world-class results and notably outperforming them in Chinese question-answering tasks.

https://github.com/QwenLM/Qwen-VL

1 project · 1 city

Related technologies

Groq 23 Kubernetes 30 Meta Llama 4 Scout 1 Remix 2

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Iris: Accurate Computer Agent

Singapore Jun 20

Qwen VLMS Meta Llama 4 Scout