代码：⾃扩展神经⽹络 Self-Expanding

https://github.com/ml-research/self-expanding-neural-networks

抽象 https://arxiv.org/pdf/2307.04526.pdf

训练神经⽹络的结果在很⼤程度上取决于所选的架构；即使仅修改⽹络的⼤⼩，⽆论多么⼩，通常都需要重新启动训练过程。与此相反，我们开始使⽤⼩型架构进⾏训练，仅根据问题需要增加其容量，并避免⼲扰之前的优化。因此，我们引⼊了⼀种基于⾃然梯度的⽅法，该⽅法直观地扩展了神经⽹络的宽度和深度，这可能会⼤⼤减少假设的收敛训练损失。我们证明了添加神经元的“速率”的上限，以及扩展分数的计算成本低的下限。我们说明了这种⾃扩展神经⽹络在分类和回归问题中的好处，包括那些适当的架构⼤⼩先验基本上不确定的问题。

Self-Expanding Neural Networks 目录：

How to add: expanding without changing the overall function

When to add: deciding whether more capacity is useful

What to add: determining the initial value of new neurons

Where to add: completing the algorithm

Bounds on convergence of expansion

Efficiently computing a lower bound on score increase

简介：

正确地调整模型的能⼒以适应任意任务是极具挑战性的，特别是当后者尚未得到充分研究时。这个挑战可以通过选择⼀个太⼤的架构来回避，这样⼀个糟糕的解决⽅案不太可能出现[19]，例如由于双下降现象。然⽽，由于很难预测什么尺⼨⾜够⼤，这在实践中通常需要使⽤⼤量过度参数化的⽹络[22] [12] [11]。当然可以检测到⽹络现有容量不⾜，并在需要的时间和地点添加更多神经元吗？事实上，⽣物神经⽹络是通过神经发⽣过程向现有⽹络添加新神经元⽽⽣⻓的。热⻔评论[9]讨论了这⼀相对较新的发现，即这⼀过程在成年哺乳动物⼤脑中仍然活跃[23]，并且[13] [5]将其确定为⽀撑终⾝学习的关键能⼒。受此启发，我们提出了⼀种类似的过程，在训练期间将神经元和层添加到⼈⼯神经⽹络中，该过程基于从与⾃然梯度密切相关的第⼀原理衍⽣的“⾜够容量”的局部概念[1] [17]。

任何⼈⼯神经发⽣⽅法都必须回答三个问题，以避免局部容量不⾜的问题[6]。它必须确定当前容量何时不⾜，因此必须添加神经元。它必须确定这些神经元应该被引⼊到哪⾥。最后，它必须选择适合这些神经元的初始化。这些问题，如果在⽂献中得到解决的话，通常是零碎的或以临时的⽅式解决的。例如，很少有⽅法解决“什么”的问题 [6] [26]。通过假设预定的时间表[26] [21]或等待训练损失收敛[27] [25]来回答何时，这两种⽅法都⽆法提供有关位置的信息。

From a mathematical perspective, these degrees of freedom available to the optimizer are given by the image of the parameter space under the Jacobian, and the derivative with respect to the loss in function space will not in general lie in this subspace. It is however possible to project this derivative onto that subspace, and the natural gradient, F −1g, is exactly the change in parameters which changes the function according to this projection. In order to measure the size of that projection for a given parameterization, we introduce the natural expansion score η = g TF −1g. Specifically, the capacity of a neural network is locally insufficient when this score is small for the current parameterization. We therefore add neurons when this substantially increases η, where they will maximally increase η, and choose what initialization to use for the new parameters according to how it increases η. To summarize,

our contributions are:

We introduce the natural expansion score which measures the increase in rate of loss reduction under natural gradient descent when width or depth is added to a neural network.
2. We show how such additions may be made during training without altering the function represented by the network. Our neurogenesis inspired Self-Expanding Neural Networks (SENN) thus avoid interfering with previous optimization or requiring restarts of training.
3. We prove that the number of neurons added simultaneously in SENN is bounded. We further introduce a computationally efficient approximation as a provable lower bound to increases in natural expansion score resulting from additions.
4. We demonstrate SENN’s effectiveness for regression and classification. In the remainder of this paper, we proceed as follows: In section 2 we summarize existing growth methods, in section 3 we then describe SENN, and in section 4 we illustrate its operation in practice.

Any method for artificial neurogenesis must answer three questions to avoid the problem of locally insufficient capacity [6]. It must determine when the current capacity is insufficient and that neuron(s) must therefore be added. It must identify where these neurons should be introduced. Finally, it must choose what initialization is appropriate for these neurons. These questions, if they are addressed at all in the literature, are normally addressed piecemeal or in ad-hoc ways. For example, very few methods address the question of what [6] [26]. When is answered either by assuming predetermined schedules [26] [21], or by waiting for the training loss to converge [27] [25], neither of which are informative about where。

We argue that by inspecting the degrees of freedom of the optimizer in function space, one may not only strike faster in answer to when, but answer where and what in the same stroke.

完整内容请参考原论文。

代码：⾃扩展神经⽹络 Self-Expanding

长文本杀不死RAG:SQL+向量驱动大模型和大数据新范式,MyScale开源

从零复现Llama3代码库爆火，大神Kapathy一键三连，GitHub狂揽2k+

比LoRA还快50%！一张3090超越全参调优，UIUC联合LMFlow提出LISA

大语言模型权重、激活的全方位低bit可微量化，已集成进商用APP

Mac专属大模型框架来了！两行代码部署，能聊本地数据，支持中文

ICLR 2024 | 连续学习不怕丢西瓜捡芝麻，神经形态方法保护旧知识

手机流畅运行470亿大模型：上交大发布LLM手机推理框架，提速29倍

这篇Cell论文，因第一作者伪造试验数据，通讯作者主动撤稿并致歉

从电疗到磁疗，重塑抑郁大脑的未来

立法机构“蓝白合”刺痛执政党神经，台媒炒作“绿白合”意欲何为

导师不教！仅需 4 步，15 分钟就能分离出高纯度 PBMCs

睡的少，脑子傻！清华大学研究发现，睡眠不足会引发炎症，增加认知障碍风险

朝鲜人民军意外越过三八线，神经紧绷的韩军，在半岛开了第一枪

加拿大国防部确认派出军舰和巡逻机追踪俄舰队

郑智更衣室捶桌喊话！不管对手是谁，脑袋里有个概念，拿着分回去

中国行！桑尼赛后首发声：下个月被邀请来中国，会尽快建国内社媒

伊朗军方透露：哈马斯要求中俄土作担保签署停火协议

中央召开重要会议鼓励有条件的民企建立现代企业制度

以色列士兵被曝使用投石机，向黎巴嫩边境丢火球

问界M9翻滚下山崖全景图曝光之后，网友们对山崖又有了新的认识

代码：⾃扩展神经⽹络 Self-Expanding

长文本杀不死RAG:SQL+向量驱动大模型和大数据新范式,MyScale开源

从零复现Llama3代码库爆火，大神Kapathy一键三连，GitHub狂揽2k+

比LoRA还快50%！一张3090超越全参调优，UIUC联合LMFlow提出LISA

大语言模型权重、激活的全方位低bit可微量化，已集成进商用APP

Mac专属大模型框架来了！两行代码部署，能聊本地数据，支持中文

ICLR 2024 | 连续学习不怕丢西瓜捡芝麻，神经形态方法保护旧知识

手机流畅运行470亿大模型：上交大发布LLM手机推理框架，提速29倍

这篇Cell论文，因第一作者伪造试验数据，通讯作者主动撤稿并致歉

从电疗到磁疗，重塑抑郁大脑的未来

立法机构“蓝白合”刺痛执政党神经，台媒炒作“绿白合”意欲何为

导师不教！仅需 4 步，15 分钟就能分离出高纯度 PBMCs

睡的少，脑子傻！清华大学研究发现，睡眠不足会引发炎症，增加认知障碍风险

朝鲜人民军意外越过三八线，神经紧绷的韩军，在半岛开了第一枪

加拿大国防部确认派出军舰和巡逻机追踪俄舰队

郑智更衣室捶桌喊话！不管对手是谁，脑袋里有个概念，拿着分回去

中国行！桑尼赛后首发声：下个月被邀请来中国，会尽快建国内社媒

伊朗军方透露：哈马斯要求中俄土作担保签署停火协议

中央召开重要会议 鼓励有条件的民企建立现代企业制度

以色列士兵被曝使用投石机，向黎巴嫩边境丢火球

问界M9翻滚下山崖全景图曝光之后，网友们对山崖又有了新的认识

中央召开重要会议鼓励有条件的民企建立现代企业制度