Kyle Daruwalla
NeuroAI Scholar at CSHL

I presented my work on a biologically plausible learning rule as a poster at the Spiking Neural networks as Universal Function Approximators (SNUFA '21). There were many great talks and posters presented, but I was most excited by the presentations on spiking neural networks (SNNs) being applied to solve real problems:

Our work

I presented our work on learning in biological networks by optimizing the information bottleneck. The success of deep learning has emphasized the importance of depth when training networks to solve complex problems. Depth isn't an issue for artificial neural networks (ANNs), because back-propagation provides a systematic way to assigning credit regardless of depth. Currently, there is not an accepted, biologically plausible back-propagation equivalent for SNNs, though we can use surrogate gradients when the plausibility constraint is removed.

Our work took inspiration from HSIC training for ANNs. Instead of computing a loss at the very end of the network, then propagating the loss backwards, HSIC training optimizes each layer independently. Every layer is updated to minimize

LHSIC=HSIC(z,x)γHSIC(z,y) \mathcal{L}_{\text{HSIC}} = \mathrm{HSIC}(\mathbf{ z}^\ell, \mathbf{ x}) - \gamma \mathrm{HSIC}(\mathbf{ z}^\ell, \mathbf{ y})

where z\mathbf{ z}^\ell is the output of the layer \ell, and x\mathbf{ x}/y\mathbf{ y} are the input/output of the entire network, respectively. We show the gradient descent update for this objective can be decomposed into two components –- a local Hebbian component and a layer-wise global modulatory signal.

One challenge to applying this rule for SNNs directly is that the HSIC is computed over a batch of samples, but biological networks see samples sequentially, one-at-a-time. To overcome this, we encode a batch as a window of samples over time (i.e. a batch size of NN corresponds to the last NN samples presented to the network). We show that the local component depends only on the current sample, and the global component depends on the prior samples. Then, we propose using an auxiliary reservoir network to compute the global component as shown below.

Our learning rule is a [three-factor Hebbian rule](http://journal.frontiersin.org/Article/10.3389/fncir.2015.00085/abstract). It contains a local component that depends on the current sample, and a global component that depends on past samples. A reservoir is used to compute the global component.

Note that the reservoir can be trained a priori using random data, and it does not need to be trained during the main learning task (though it can be). Check out our poster or preprint for more details!