打开网易新闻 查看更多图片

https://github.com/ml-research/self-expanding-neural-networks

打开网易新闻 查看更多图片

抽象 https://arxiv.org/pdf/2307.04526.pdf

训练神经⽹络的结果在很⼤程度上取决于所选的架构;即使仅修改⽹络的⼤⼩,⽆论多么⼩, 通常都需要重新启动训练过程。与此相反,我们开始使⽤⼩型架构进⾏训练,仅根据问题需要 增加其容量,并避免⼲扰之前的优化。因此,我们引⼊了⼀种基于⾃然梯度的⽅法,该⽅法直 观地扩展了神经⽹络的宽度和深度,这可能会⼤⼤减少假设的收敛训练损失。我们证明了添 加神经元的“速率”的上限,以及扩展分数的计算成本低的下限。我们说明了这种⾃扩展神经 ⽹络在分类和回归问题中的好处,包括那些适当的架构⼤⼩先验基本上不确定的问题。

Self-Expanding Neural Networks 目录:

How to add: expanding without changing the overall function

When to add: deciding whether more capacity is useful

What to add: determining the initial value of new neurons

Where to add: completing the algorithm

Bounds on convergence of expansion

Efficiently computing a lower bound on score increase

简介:

正确地调整模型的能⼒以适应任意任务是极具挑战性的,特别是当后者尚未得到充分研究时。这个挑战可以通过 选择⼀个太⼤的架构来回避,这样⼀个糟糕的解决⽅案不太可能出现[19],例如由于双下降现象。然⽽,由于很难 预测什么尺⼨⾜够⼤,这在实践中通常需要使⽤⼤量过度参数化的⽹络[22] [12] [11]。当然可以检测到⽹络现有 容量不⾜,并在需要的时间和地点添加更多神经元吗?事实上,⽣物神经⽹络是通过神经发⽣过程向现有⽹络添加新神经元⽽⽣⻓的。热⻔评论[9]讨论了这⼀相对较新的发现,即这⼀过程在成年哺乳动物⼤脑中仍然活跃[23],并且[13] [5]将其确定为⽀撑终⾝学习的关键能⼒。受此启发,我们提出了⼀种类似的过程,在训练期间将神经元 和层添加到⼈⼯神经⽹络中,该过程基于从与⾃然梯度密切相关的第⼀原理衍⽣的“⾜够容量”的局部概念[1] [17]。

任何⼈⼯神经发⽣⽅法都必须回答三个问题,以避免局部容量不⾜的问题[6]。它必须确定当前容量何时不⾜,因 此必须添加神经元。它必须确定这些神经元应该被引⼊到哪⾥。最后,它必须选择适合这些神经元的初始化。这些 问题,如果在⽂献中得到解决的话,通常是零碎的或以临时的⽅式解决的。例如,很少有⽅法解决“什么”的问题 [6] [26]。通过假设预定的时间表[26] [21]或等待训练损失收敛[27] [25]来回答何时,这两种⽅法都⽆法提供有关位置的信息。

打开网易新闻 查看更多图片

From a mathematical perspective, these degrees of freedom available to the optimizer are given by the image of the parameter space under the Jacobian, and the derivative with respect to the loss in function space will not in general lie in this subspace. It is however possible to project this derivative onto that subspace, and the natural gradient, F −1g, is exactly the change in parameters which changes the function according to this projection. In order to measure the size of that projection for a given parameterization, we introduce the natural expansion score η = g TF −1g. Specifically, the capacity of a neural network is locally insufficient when this score is small for the current parameterization. We therefore add neurons when this substantially increases η, where they will maximally increase η, and choose what initialization to use for the new parameters according to how it increases η. To summarize,

our contributions are:

  1. We introduce the natural expansion score which measures the increase in rate of loss reduction under natural gradient descent when width or depth is added to a neural network.

  2. 2. We show how such additions may be made during training without altering the function represented by the network. Our neurogenesis inspired Self-Expanding Neural Networks (SENN) thus avoid interfering with previous optimization or requiring restarts of training.

  3. 3. We prove that the number of neurons added simultaneously in SENN is bounded. We further introduce a computationally efficient approximation as a provable lower bound to increases in natural expansion score resulting from additions.

  4. 4. We demonstrate SENN’s effectiveness for regression and classification. In the remainder of this paper, we proceed as follows: In section 2 we summarize existing growth methods, in section 3 we then describe SENN, and in section 4 we illustrate its operation in practice.

打开网易新闻 查看更多图片
打开网易新闻 查看更多图片
打开网易新闻 查看更多图片
打开网易新闻 查看更多图片
打开网易新闻 查看更多图片
打开网易新闻 查看更多图片

Any method for artificial neurogenesis must answer three questions to avoid the problem of locally insufficient capacity [6]. It must determine when the current capacity is insufficient and that neuron(s) must therefore be added. It must identify where these neurons should be introduced. Finally, it must choose what initialization is appropriate for these neurons. These questions, if they are addressed at all in the literature, are normally addressed piecemeal or in ad-hoc ways. For example, very few methods address the question of what [6] [26]. When is answered either by assuming predetermined schedules [26] [21], or by waiting for the training loss to converge [27] [25], neither of which are informative about where。

We argue that by inspecting the degrees of freedom of the optimizer in function space, one may not only strike faster in answer to when, but answer where and what in the same stroke.

完整内容请参考原论文。