Low-Parameter Federated Learning with Large Language Models


  • Low-Parameter Federated Learning with Large Language Models

  • Metadata

      • Tags: #LLM #Federated-Learning
      • Authors: [[Jingang Jiang]], [[Xiangyang Liu]], [[Chenyou Fan]]
  • Abstract

We study few-shot Natural Language Understanding (NLU) tasks with Large Language Models (LLMs) in federated learning (FL) scenarios. It is a challenging task due to limited labeled data and communication capacities in FL, especially with mobile devices. Recent studies show LLMs can be prompted to perform few-shot NLU tasks like sentiment analysis and arithmetic reasoning. However, the huge sizes of LLMs result in high computation and communication costs, making classical FL schemes impractical. To address these challenges, we propose Low-Parameter Federated Learning (LP-FL). LP-FL combines few-shot prompt learning from LLMs with efficient communication and federating techniques. Our approach enables federated clients to assign soft labels to unlabeled data using gradually learned knowledge from the global model. Through iterative soft-label assigning, we continually expand the labeled set during the FL process. Additionally, to reduce computation and communication costs, LP-FL utilizes the Low-Rank Adaptation (LoRA) technique for compact learnable parameter construction, efficient local model fine-tuning, and affordable global model federation. LP-FL consistently outperforms Full-Parameter Federated Learning (FP-FL) in sentiment analysis tasks across various FL settings. Its resistance to overfitting allows LP-FL to equal or surpass centralized training in few-shot scenarios.

Note

Challenges of LLM in FL:

  • limited labeled data
  • communication capacities

Low-Parameter Federated Learning (LP-FL):

combines few-shot prompt learning from LLMs with efficient communication and federating techniques

  • clients 从 global model 的 knowledge 将 soft labels 应用到 unlabeled data,以此来扩展 labeled set
  • Low-Rank Adaptation (LoRA) technique, efficient local model fine-tuning
    追求一种有效的微调方法,以最小的参数微调达到理想的结果

Use cases: data is often unlabeled, distributed learning with few-shot labeled data, achieve effective fine-tuning of a global model.

Contributions:

  1. consider an under-studied task of fine-tuning LLMs with distributed devices with limited communications and local computational powers.
  2. fine-tune the LLMs by adding task descriptions to the input examples for text sentiment classification. semi-supervised method to augment the dataset.
  3. fine-tuning a small subset of the local model parameters then federate averaging over all clients.

Workflow

  1. 使用 small labeled data fine-tuning the global model. (Train LoRA in clients side)
  2. server performs FedAvg on the LoRA parameters and retransmits the updated LoRA to clients.
  3. client updates their local model using the received LoRA, and select a portion of the unlabeled data for annotation (逐渐使用 unlabeled data 扩展 labeled set)

M 是 LLM
V 是词汇量的规模
$T_k$ is the labeled set, $U_k$ is the unlabeled set
sequence $x=(s_1,\cdots,s_n)$ ,$s_j\in V$
使用 P (x) 将带有mask token 的任务描述短语添加到 input sequence x
利用 LLM $M$ 预测 P(x) 中 mask token 位置的每个 $v\in V$ 的概率
给定序列 x,计算带有 label $l\in L$ 的score

通过 softmax 函数获取标签上的概率分布,比较真实单词的预测概率,并使用标准交叉熵损失来衡量预测

To address the communication burden arising from the substantial parameter size of the LLM, we utilize the Parameter-Efficient Fine-Tuning technique known as Low-Rank Adaptation (LoRA)

对 clients 训练得到的 LoRA,执行 FedAvg 更新

Framework


文章作者: lunan
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 lunan !
  目录