LXMERT 阅读笔记

2020-04-18

发表于 EMNLP 2019

1. Introduction

three encoders:
- an object relationship encoder
- a language encoder
- a cross-modality encoder
pre-train the model with large amounts of image-and-sentence pairs, via five pre-training tasks:
- masked language modeling
- masked object predicition via ROI-feature regression
- masked object predicition via detected-label classification
- cross-modality matching
- image question answering

2. Model Architecture

two inputs: an image and its related sentence (a caption or a question)

model

2.1 Input Embeddings

word-level sentence embeddings
$$
sentence = {w_1, …, w_n} \
\hat{w_i} = WordEmbed(w_i) \
\hat{\mu_i} = IdxEmbed(i) \
h_i = LayerNorm(\hat{w_i} + \hat{\mu_i})
$$
object-level image embeddings
- m objects: ${o_1, …, o_m}$
- each object $o_j$ is represented by its position feature $p_j$ and its 2048-dimensional RoI feature $f_j$
- learn a position-aware embedding $v_j$ by adding outputs of 2 fully-connected layers
$$
\hat{f_j} = LayerNorm(W_Ff_j + b_F) \
\hat{p_j} = LayerNorm(W_pp_j + b_p) \
v_j = (\hat{f_j} + \hat{p_j}) / 2
$$

2.2 Encoders

single-modality encoders

‘self’ means self-attention sublayers —— query 向量是其本身
feed-forward sub-layer is further composed of two fully-connected sub-layers
‘+’ means residual connection and layer normalization

Cross-Modality Encoder
- two unidirectional cross-attention sub-layers: one from language to vision and one from vision to language
- exchange the information and align the entities between the two modalities in order to learn joint cross-modality representations

2.3 Output Representations

the outputs:

the vision output
the language output
the cross-modality output : append a special token [CLS] before the sentence words, and the correspinding feature vector of this special token in language feature sequences is used as the cross-modality output

缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是yilia根目录）执行以下命令：
npm i hexo-generator-json-content --save

3、在根目录_config.yml里添加配置：

  jsonContent:
    meta: false
    pages: false
    posts:
      title: true
      date: true
      path: true
      text: false
      raw: false
      content: false
      slug: false
      updated: false
      comments: false
      link: false
      permalink: false
      excerpt: false
      categories: false
      tags: true

在大海的黑暗里，<br>粼粼的游鱼是闪电。<br>在人体的黑暗里，<br>粼粼的白骨是闪电。<br>生活，你一片黑暗，<br>而我们本身，就是闪电。<br>——帕斯