Technology
STCFormer
STCFormer (Spatio-Temporal Criss-cross Transformer) is a high-efficiency model for 3D Human Pose Estimation (HPE), utilizing a decomposed attention mechanism to minimize quadratic computational cost.
This is the Spatio-Temporal Criss-cross Transformer: a robust architecture for 3D Human Pose Estimation. STCFormer addresses the quadratic computational cost of full spatio-temporal attention by introducing the STC block, which efficiently decomposes correlation learning into parallel spatial and temporal pathways. The system integrates a Structure-enhanced Positional Embedding (SPE) to factor in explicit human body structure, boosting accuracy. Validated on major benchmarks, the model delivered a state-of-the-art 40.5mm P1 error on the challenging Human3.6M dataset, confirming its superior performance and economic design: it achieves this with significantly fewer parameters than prior state-of-the-art techniques.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1