SUNNet: A Novel Framework for Simultaneous Human Parsing and Pose Estimation


This paper presents a novel Separation-and-UnioN Network (SUNNet) for simultaneous human parsing and pose estimation. Our SUNNet consists of two stages: feature separation and feature union. In feature separation stage, we leverage a common feature extractor to implicitly encode the correlation between human parsing and pose estimation, meanwhile task-specific feature extractors are designed to extract the features for each task. By combining the task-specific features with common features with a feature consolidation module in a coarse-to-fine manner, we can get the initial prediction for parsing and 2D pose estimation; In feature union stage, we refine the initial prediction by explicitly leveraging the features from parallel task to predict the kernels’ receptive fields in a convolutional neural network. We further propose a leverage a 3D human body reconstructed from the image to facilitate these tasks. Extensive experiments demonstrate the effectiveness of our SUNNet model for human body configuration analysis.

Submitted to BMVC19