Parameter-Efficient Transfer Learning


Note

fine-tuning is parameter inefficient

Propose:transfer with adapter modules,a compact and extensible model;
only a few trainable parameters per task

Transfer learning in NLP:

  • feature-based transfer
    -pre-training real-valued embeddings vectors. The embeddings are then fed to custom downstream models
  • fine-tuning (better performance)
    -copying the weights from a pre-trained network and tuning them on the downstream task.

Adapters:

added between layers
injecting new layers into the original network. The weights of the original network are untouched,

Comparasion:

  • Fine-tuning, the new top-layer and the original weights are co-trained
  • Adapter-tuning, the parameters of the original network are frozen and therefore may be shared by many tasks.

Left: We add the adapter module twice to each Transformer layer
Right: The adapter consists of a bottleneck which contains few parameters relative to the attention and feedforward layers in the original model.


文章作者: lunan
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 lunan !
  目录