Technology
Mask2Former
A unified Transformer-based framework that masters semantic, instance, and panoptic segmentation using localized masked attention.
Meta AI researchers (Bowen Cheng and colleagues) engineered Mask2Former as a universal architecture for all image segmentation tasks. It utilizes a Transformer-based decoder with masked attention: a mechanism that restricts attention to predicted mask regions for faster convergence and higher precision. The results are definitive: it achieves 57.8 PQ on COCO panoptic and 56.1 mIoU on ADE20K (setting new state-of-the-art benchmarks at release). This single framework replaces specialized models like Mask R-CNN or DeepLab, handling semantic, instance, and panoptic segmentation with one efficient pipeline.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1