Plasticity Neural Networks

Plastic NN

Abstract

How can we build agents that keep learning from experience , after their initial training ?

Take inspiration from the main mechanism of learning in biological brains : synaptic plasticity

Synaptic plasticity , carefully tuned by evolution to produce efficient lifelong learning .

we show that plasticity , just like connection weights , can be optimized by gradient descent in large recurrent networks with hebbian plastic connections .

First , recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel , high-dimensional natural images not seen during training .

  • 超过两百万参数的递归塑性网络可以被训练来记忆和重建,在训练中没有出现过的创新高维的自然图像数据集。

Crucially , traditional non-plastic recurrent networks fail to solve this task .

  • 至关重要的是,传统非塑性循环神经网络无法解决这个问题

Furthermore , trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task,with competitive results, and little parameter overhead.

  • 更好的是, 训练塑性神经网络也可以解决像“Omniglot”任务, 有着复杂结果和无限参数的通用元学习任务

Finally , in reinforcement learning settings , plastic networks outperform a non-plastic equivalent in a maze exploration task.

  • 最后, 在强化学习迷宫问题中,塑性网络比非塑性网络的表现效果更好

We conclude that differentiable plasticity may provide a powerful novel approach to the learning-to-learn problem .

  • 总结,可微塑性网络也许为“learn to learn ”问题提供了更有力创新的方法。

Introduction : the problem of “learn to learn “

Many of the recent spectacular successes in machine learning involve learning one complex task very well,Through extensive training over thousands or millions of training examples .

After learning is complete, the agent’s knowledge is fixed and unchanging ;

  • 现在的学习方法中, 智能体的知识在学习完后是固定不变的

If the agent is to be applie to a different task , it must be re-trained , again requiring a very large number of new training examples.

  • 如果智能体被要求执行一个不同的任务,它必须被重新训练, 在此需要大量的训练数据。

By constrast, biological agents exhibit a remark-able ability to learn quickly and efficiently from ongoing experience :

  • animals can learn to navigate and remember the location of food sources, discover and remember rewarding or aversive properties of novel objects and situations,etc.- often from a single exposere .

An additional beenefit of autonomous learning abilities

  • In many tasks (e.g. object recognition , maze navigation ,etc. )
  • The bulk of fixed , unchanging structure in the task can be stored in the fixed knowledge of the agent,leaving only the changing , contingent parameters of the specific situation to be learned from experience.

    • 任务中大部分不改变的结构可以被存储智能体固定的知识中,只留下改变的,特殊场景的持续变化的参数,从经验中学习。

As a result, learning the actual specific instance of the task at hand(that is , the actual latent parameters that do vary across multiple instances of the general task ) can be extremely fast requiring few or even a single experience with the environment.

  • 因此, 学习当前任务的特定实例,可能非常快, 甚至只需要当前环境中单独的经验

Several meta-learning methods have been proposed to train agents to learn autonomously .

  • 几种元学习方法以及被提出来训练智能体自动学习。

However , unlike in current approaches , in biological brains long-term learning is thought to occur primarily through synaptic plasticity .

  • The strengthening and weakening of connections between neurons as a result of neural activity .
  • as carefully tuned by evolution over millions of years to enable efficient learning during the lifetime of each individual.
  • While multiple forms of synaptic plasticity exist, many of them build upon the general principle known as Hebb’srule

    • 多数突触可塑性都是建立在Hebb规则上的

Hebb’s rule

  • if a neuron repeatedly takes part in making another neuron fire, the connection between them is strengthened

    • (often roughly summarized as “neurons that fire together , wire together”)

Designing neural networks with plastic connections has long been explored with evolutionary algorithms , but has been so far relatively less studied in deep learning .

However, given the spectacular results of gradient descent in designing traditional -plastic neural networks for complex tasks, it would be great interest to expand backpropagation training to networks with plastic connections - optimizing through gradient descent not only the base weights , but also the amount of plasticity in each connection .

  • 然而,为一些复杂任务设计传统塑性神经网络时,

We previously demonstrate the theoretical feasibility and analytical derivability of this approach .

Here we show that this approach can train large networks for non-trivial tasks .

To demonstrate our approach , we apply it to three different types of tasks : complex pattern memorization (including natural images ).

one-shot classification (on the Omniglot dataset), and reinforcement learning (in a maze exploration problem ) .

We show that plastic networks provide competitive results on Omniglot, improve performance in maze exploration and outperform advanced non-plastic recurrent networks (LSTMs) by orders of magnitude in complex pattern memorization .

This result is interesting not only for opening up a new avenue of investigation in gradient-based neural network training , but also for showing that meta -properties of neural structures normally atrributed to evolution or a priori design are in fact amenable to gradient descent , hinting at a whole class of heretofore unimagine meta-learning algorithms

Differentiable plasticity

To train plastic networks with backpropagation , a plasticity rule must be specified.

  • 为了使用反向传播训练塑性网络,必须指定一个塑性规则

Here we choose a flexible formulation that keeps separate plastic and non-plastic components for each connection

  • 在这里, 我们选择一个合适的公式,为每个连接保留塑性和非塑性组件。

while allowing multiple Hebbian rules to be easily implemented within the framework.

  • 同时允许框架内能够轻松实现多个Hebbian rules

A connection between any two neurons $i$ and $j$ has both a fixed component and plastic component.

  • 任何两个神经元i和j之间,都有一个固定不变的组件 和一个 塑性组件。

The fixed part is just a traditional connection weight $w_{i,j}$,

  • 固定组件就是一个传统的连接权重 $w_{i,j}$,

The plastic part is stored in a Hebbian trace $Hebb_{i,j}$, Which varies during a lifetime according to ongoing inputs and outputs (note that we use “lifetime” and “episode” interchangeably.)

  • 塑性组件被存储在一个 Hebbian 跟踪 $Hebb_{i,j}$里面, 在一个生命周期里面随着输入和输出变化。

In the simplest case studied here, the Hebbian trace is simply a running average of the product of pre- and post-synaptic activity .

The relative importance of plastic and fixed components in the connection is structurally determined by the plasticity coefficient $\aerfa_{i,j}$, Which multiplies the Hebbian trace to form the full plastic component of the connection.

  • 在连接中,塑性组件和固定组件的相对重要性是通过塑性系数结构化决定的, 塑性系数 是 Hebbian trace 相乘来形成连接的全局塑性组件。

Thus, at any time , the total , effective weight of the connection between neurons i and j is the sum of the baseline(fixed) weight $w_{i,j}$,plus the hebbian trace

  • 因此, 在任何时候, 神经元i和j之间的所有有效权重, 是固定的权重$w_{i,j}$, 加上Hebbian trace$Hebb_{i,j}$ 乘以塑性系数。

The

  • 神经元j的输出 $x_j(t)$ 如下
  • $\sigmma$ 是一个非线性函数
  • input 代表所有给神经元j提供输入的神经元。
  • 通过这种方式, 取决于权值$w_{i,j}$ 和$\alpha_{i,j}$,

    • 如果$\alpha=0$ ,一个连接可以是完全fixed的
    • 如果$w=0$ , 一个连接可以是没有fixed组件的完全plastic的。
    • 或者一个连接有固定组件和塑性组件。

The Hebbian trace $Hebb_{i,j}$

  • 在每个生命周期的开始初始化为0
  • 参数$w_{i,j}$和$\alpha_{i,j}$ ,在整个生命周期被保存,这个神经网络的结构化参数,通过梯度下降法优化, 在一个生命周期最大化期待的性能
  • $\elta$ , 学习率, 也是神经网络的一个优化参数, 在本文中,所有的连接共享相同的$\elta$值。 它是整个网络的学习标量参数。

    • elta,作为权重衰减项出现, 保证了Hebbian迹的失控正反馈。
  • 然而因为,这个权重衰减, hebb迹在输入缺少的情况下衰减到0

  • 不幸的是, 其他更复杂的Hebb规则可以在刺激缺少的情况下,无限保持权重值的稳定性。
  • 一个更有名的例子就是Oja’s 规则 , 因此可以用Oja规则替换上面公式中的第二个公式
  • 这种方法可以训练神经网络形成持续任意时间的记忆。
  • 为了描述我们研究的灵活性, 我们证明了两个规则, 在下面报告的实验中

Experiments and Results

这个实验被设计用来证明可微塑性实际上在一个元学习框架内起作用, 并且展示它提供了决定性的优势。

模式记忆 : 二元模式

  • 为了描述这些不同的塑性方法,我们首先把它应用到任意高维模式的快速记忆集合中去, 并且当这些集合暴露出了局部和降维的版本,就重建这些模式。
  • 可以执行这个任务的神经网络被称为 内容可寻址记忆, 或者 自动联想神经网络
  • 这个任务是一个有用的测试, 因为使用Hebbian plastic 连接手动设计的循环神经网络可以成功解决这个二元模式
  • 因此,如果可微的塑性网络有任何帮助, 它也应该嫩自动解决这个问题,这个问题就是, 自动设计可以执行已经存在的手动设计神经网络可以完成的任务 的神经网络
  • 图1描绘了这个任务的一个生命周期, 该网络连续显示一组5个二维模式。
  • 每个二元模式由1000个元素组成, 每一个不是1就是-1, 分别用红色和蓝色表示。
  • 每个模式展示了10个周期步骤

    • 在演示文稿之间有输入为0的三个周期步骤
    • 所有的模式序列通过3次随机顺序展示。
    • 然后,通过将所述模式的一半比特设置为零,随机选择其中的一个模式并对其进行降级。
    • 然后将这种退化的模式作为网络的输入
  • 网络的任务是输出重建的正确完整模式,在它的记忆中绘制完成损失的降维模式(底部的浅蓝色和红色)

  • 图1下面的架构是完全循环身价还哦,每一个模式元素都有一个神经元, 加上一个固定的输出神经元(偏差),对于所有的1001个神经元, 输入模式是通过固定每个神经元的只,到模式中对应的元素的值来提供的, 如果这个值不是0 的话;
  • 对于降维模式中的0值输入,对应的神经元没收到模式输入, 并且从侧面连接中唯一地得到他们的输入。 他们必须重建正确的期待的输出结果。
  • 输出直接从活跃神经元中读取。
  • 神经网络额性能仅仅在最后一步中被评价
  • 通过计算累计squared误差, 最终的神经网络输出和正确的模式
  • 然后根据反向传播计算了 $w$和$\alpha$ 的梯度误差
  • 然后这些系数通过一个Adam solver 使用学习率 0.001解决了。
  • 在这个实验中, 我们使用了简单衰减的Hebbian 公式来更新hebbian迹
  • 注意,这个神经网络有两个训练参数 ($w$、$\alpha$),加起来是 100110012个训练参数。
  • 在大概200个周期的时候,误差率缩减到不再变化

XMind: ZEN - Trial Version