发表于 ICCV 2015
1. Introduction
动机
characterise and quantify social relation traits from computer vision point of view
挑战
relations of face images are related to high-level facial factors
no single dataset is presently available, which encompasses all the required facial attribute annotations to learn such a rich representation
(some dataset only contain face expression labels, while other datasets may only contain the gender label)
论文提出模型的特性
- dealing with missing attribute labels from different datasets
- bridging the gap of heterogeneous datasets by weak constraints derived from the association of face part appearances
- capable of jointly considering pairwise faces for relation reasoning, where each face serves as the mutual context to the other
论文的贡献
- 第一个基于人脸进行的社交关系推断
- 构造了新的社交关系数据集,标记了社交关系
- 从多项任务中学习社交关系
2. Social Relation Prediction from Face Images
2.1 Definitions of Social Relation Traits
- 定义了16种关系,每一种关系均有其对立关系,比如“friendly and hostile”
- 这些关系是视觉上可感知的
2.2 Social Relation Dataset
- 介绍人工标注的一些细节
2.3 Baseline Method
Deep Convolution Network (DCN)
利用两个区别的 DCN 提取 subject 和 object 的特征,为2048维向量
两个向量进行 concatenate,经过一层全连接层,得到256维的向量
最后经过8个二元分类器
引入 position 作为输入
- two faces’ positions (8维)sss
- the relative faces’ position (2维)
- the ratio between the faces’ scales(1维)
三种 position vector 进行 concatenate 作为输入,与上述的 256 维向量 concatenate 到一起
$$
spatial_feature = {x^l, y^l, w^l, h^l, x^r, y^r, w^r, h^r, \frac{x^l - x^r}{w^l}, \frac{y^l - y^r}{h^l}, \frac{w^l}{w^r}}
$$
2.4 A Cross-Dataset Approachs
- 引入了3个数据集:AFLW,CeleFaces, Kaggle
- AFLW 只包括 gender + poses, Kaggle 只包括 expression