Technology
Qwen VLMS
Qwen-VL is Alibaba Cloud's state-of-the-art Large Vision-Language Model (LVLM): it handles image, text, and bounding box inputs for superior multimodal reasoning and high-resolution visual processing.
This is Alibaba Cloud's Qwen-VL series, a robust Large Vision-Language Model (LVLM) built on the Qwen-LM foundation and open-sourced in September 2023. It is a powerful multimodal agent: it processes image, text, and bounding box inputs to deliver precise text and bounding box outputs. The model's key technical advantage is its support for ultra-high-resolution images (millions of pixels) and extreme aspect ratios, significantly enhancing detailed recognition and text extraction (OCR) in both English and Chinese. The latest Qwen-VL-Max version competes directly with proprietary models like OpenAI's GPT-4V and Google's Gemini, achieving world-class results and notably outperforming them in Chinese question-answering tasks.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1