Biophysics of nerve tissue

Jan-Hendrik Schleimer (AG Theoretische Neurophysiologie, Prof. Schreiber)

1 Preface

Location: House 6, Lecture Hall - Campus Nord
Time: Tue., 16:00-18:00
Contact: https://itb.biologie.hu-berlin.de/~schleimer
Practical: 1 week end of semester

1.1 Content Summary

Nerve tissue is comprised of excitable cells, supporting cells and a surrounding bath. This seminar lecture reviews the biophysical basics of excitability, including the stochastic processes that govern voltage-gated ion-channels in nerve membranes. The composition of ion channels, among other factors, determines the functional properties of action potential pulses, which are analysed using dynamical systems theory. The statistics of pulse trains is derived from the microscopic level of ion-channel fluctuations. This will also serve the understanding of a self-consistent description of macroscopic neural-network dynamics.

1.2 (Dis)claimer

This scriptum shall evolve. It will have flaws (didactic ones and outright errors) but you, the reader, student, search bot, provide the selective pressure to improve it. Please write to the lecturer if you find fault of any kind with it.

Although written in a somewhat formal fashion the book is by no means a text in applies mathematics. It completely lacks the mathematical rigor to meet such standards. The choice of wrapping statements in definitions, observations and propositions arose from the need to be able to easily refer to them and to render the logical buildings blocks of some arguments easier to identify and follow through the text. Let’s see how that worked out …

2 Overview

There are many different lectures on “the biophysics of computation”, or “computational neuroscience”. The topic by now is so vast, that only a small non-exhaustive subset of the subject can be discussed within the temporal confinement of a lecture. This one is no different.

The idea here is to approach questions like “What is a nerve impulse?” or “Are there different types of excitability?” from a mathematical and a biophysical angle.

From the biophysical point of view, one has to addresse the many buidling blocks of nervous systems: There is needs to understand the biophysics of lipid sheets and the chemical gradiente across them. Voltage-gated ion channels swimming in the membrane should be understood, including their stochastic dynamics. What is going on in the extracellular medium might be a topic. And of course what are the consequences of wiring neurons in networks via synapses or gap-junctions. Also the enourmous energetic demands of nerve tissue are important factors in understanding its structure and function.

Many of these components of nervous systems derive from preadaptations, meaning precurses have been arround long before the first nervous system emerged some 600 Myr ago. This is certainly true for excitability, which is found in singe celled Paramecium and even bacteria. Synaptic components like the SNAR complex is present in fungi and some of the transmitters are used in bacterial communication.

Naturally, there is a risc of getting overwelmed by these beautiful details. To explain the complexity of nervous systems ab initio seems a mindbogglingly difficult task. Yet, there are already all-atom simulation of ion channels in the membrane. Still, before and during discussing the structural aspects it is insightful to ask what are the exigences of a nervous system. Not in any teleological sense, theoretical biology is over that, but from an evolutionary standpoint. One may even formulate certain desiderata, without claiming that brains were design by anyone.

2.1 Levels of explanation

If by attending the lecture or goint through its notes the amount of confusion becomes unbearable Tinbergen’s levels of analysis might help

Diachronic view of historical sequence Static view of current form
Proximate cause Ontogeny Mechanism
Ultimate cause Phylogeny Function

While, categories and schemes like the above are of utility in structuring ones thoughts about nature, they should not be taken as dogmatic fences that do not allow you to think, speculate outside of them. Nature is to beautiful to get involved into dogmatic fights over reified wordings, leave them to the humanities.

Roughly one might associated certain disciplins of applied math that can be utilised in the four categories from above.

Diachronic view of historical sequence Static view of current form
Proximate cause Control theory Dynamical systems, Stochastic processes
Ultimate cause Information theory, Optimisation Filter theory, Triggered ensembles
Nerve cell in a bath and an action potential in the inset.

3 The Floquet theory of the action potential

3.1 Periodic event

Of many natural systems we observe only some drastically obvious events, while the underlying complex evolution of all state variables is hidden. The malatonin falling level below threshold (in the suprachiasmatic nucleus) causes a “waking-up event”. Results the complex regulatory network causing gap phase 1 \to synthsis \to gap phase 2 \to mitosis transition result in clearly visible cell division events. Sadly, extinction events can be the results of changes in complex foodwebs, the dynamics of which is not completely determined. Action potentials are no different. The dynamics involves charging the cell membrane and the nonlinear kinetics of several ion channels, but the electric impulse is an clearly detectable event (with a glass electrode at least).

3.2 Pulse-based communication

Why do nervous system use pulses, called action potentials or spikes to communicate? In fact, not all do. The information picked up by your photo receptors is processed by cells in the retina (amacrine cells, horizontal cells) still as graded potentials, which are then recoded into a impulse based representation. Conversely, the visusal system of flys starts out with pulses representation and converts it to graded potentials upstream. Last but not least, the small nervous system of nematode C. elegans (302 neurons) seemed to operate entirely on graded potentials. No spikes had been reportet for a long time. But in 2018 someone found Ca2+^{2+} based spikes in one of their neurons.

One of the differences between passive graded potential and an active action potential is that the latter does not suffer from the same kind of siganl degradation over long distances. There could be active more or less graded potentials, but these are less temporally precise as a short pulse and it turns out that many computations rely on on exactly this temporal precession.

A neurobiological “dogma” states that action potentials follow the all-or-nothing principle (E.D. Adrian and Zotterman 1926). Fortunately, the jelly fish Aglanta digitalae does not care much for dogmas and encodes its swimming patterns in different action potential shapes. So its not really a dogma or even a principle. But most action potentials really are very stereotypical events, hence its a common property. The all-or-nothing property can be construed to mean that the exact shape of the action potential does not matter1 and that information is instead stored in the exact spike times or the frequency of their occurence.

This can be formalised to mean that the voltage evolution ought to be represented by a shifted pulse wave form

  1. v(t)=kA(ttk)=A(t)*kδ(ttk)v(t) = \sum_k A(t-t_k) = A(t) * \sum_k\delta(t-t_k),

where tkt_k are the spike times, which need to be defined later.

It also means that the action potential A(t)A(t) should be stable under perturbations, but that inputs ought to be able to shift the occurrence of spikes in time. In mathematical terms, these desiderata read

  1. The action potential must be stable in amplitude,
  2. yet neutrally stable in phase.

To wit, (ii). just means that stimulus induced phase shifts should neither decay nor blow up.

3.3 Tonic spikes are limit cycles

The mathematical object that suffices the desiderata (i) and (ii) are limit cycles. They are closed orbits in phase space. Naturally, the state space need to be at least two dimensional for it to contain a limit cycle. What are the state variables? The membrane voltage, v(t)v(t), typically in milivolts sounds like a reasonable one, at least that is what electrophysiologist measure. The conformational state of the ion channels also changes with time, ck(t)c_k(t), and so do ion concentration, say [K+]o[\mathrm{K}^+]_o. The exact governing equations are going to be derived in the following chapters. For now, denote a vector of state variables as

x=(vc1c2[K+]o).\vec x=\begin{pmatrix}v\\c_1\\c_2\\ [\mathrm{K}^+]_o\\ \vdots\end{pmatrix}.

A general differential equation could have the following structure

  1. ẋ=F(x,α)+η(x,t)\dot{\vec x} = \vec F(\vec x,\vec\alpha) + \vec\eta(\vec x,t)

F\vec F is the time independent (homogeneous) flow field. η\vec\eta is a (often assumed small), possibly time dependent (inhomogeneous) perturbation to the system. α\vec\alpha are system’s parameter. Parameters could be cell size and other fixed properties, but also a constant applied bias current. The change of solutions in response to variations in system’s parameter will be studied later.

In the absence of perturbations, η=0\vec\eta=0, Eq. (2) becomes a homogeneous ordinary differential equation

  1. ẋ=F(x,α).\dot{\vec x} = \vec F(\vec x,\vec\alpha).

We are interested in the cases in which, at least for cerctain parameter values α\vec\alpha, \exists a P{P}-periodic limit cycle solution, with xlc(t)=xlc(t+P)\vec x_{\mathrm{lc}}(t)=\vec x_{\mathrm{lc}}(t+{P}), also known as periodic orbit. This solution is identified as the action potential if it sufices the requirements of amplitude-stability and time-shiftability (i) and (ii).

3.4 Limit cycle stability

Stability of is probed by studying small perturbation to an invariant set solution. Our invariant set is the limit cycle (periodic orbit) xlc\vec x_{\mathrm{lc}}. Assuming there was a small perturbation to the system the solution can be decomposed as

  1. x(t)=xlc(t)+y(t)\vec x(t)=\vec x_{\mathrm{lc}}(t)+\vec y(t)

with t:y(t)<ϵ\forall t:\|y(t)\|<\epsilon some “small” perturbation to the orbit. What small, i.e., ϵ\epsilon is we do not want to say now, maybe later, lets see …

Assuming the perturbation was transient (only in initial conditions) and the system is homogeneous again we plug the Ansatz of Eq. (4) into Eq. (3) and get

ddtxlc(t)+ẏ(t)=F(xlc)+F(xlc)𝐉(xlc(t))y(t)\frac{\mathrm{d}}{{\mathrm{d}}t}\vec x_{\mathrm{lc}}(t)+\dot{\vec y}(t)=\vec F(\vec x_{\mathrm{lc}})+\underbrace{\nabla\vec F(\vec x_{\mathrm{lc}})}_{{\boldsymbol{J}}(\vec x_{\mathrm{lc}}(t))} \cdot \vec y(t)

The Jacobi matrix evaluated on the limit cycle can be written as a function of time 𝐉(t)=F(xlc(t)){\boldsymbol{J}}(t)=\nabla\vec F(\vec x_{\mathrm{lc}}(t)). Note that since the limit cycle solution is P{P}-periodic, so is 𝐉(t)=𝐉(t+P)\boldsymbol{J}(t)=\boldsymbol{J}(t+{P}).

Identifying the limit cycle solution above we are left with the first variational equation of Eq. (3)

  1. ẏ(t)=𝐉(t)y(t).\dot{\vec y}(t)={\boldsymbol{J}}(t)\vec y(t).

Hence, one needs to study of linear system with periodic coefficients. One solution of Eq. (5) can be guessed, let us try the time derivative of the orbit, dxlcdt\frac{{\mathrm{d}}\vec x_{\mathrm{lc}}}{{\mathrm{d}}t}{}

  1. ddt(dxlcdt)=ddtF(xlc)=F(xlc)ddtxlc=𝐉(t)dxlcdt.\frac{{\mathrm{d}}}{{\mathrm{d}}t}(\frac{{\mathrm{d}}\vec x_{\mathrm{lc}}}{{\mathrm{d}}t})=\frac{{\mathrm{d}}}{{\mathrm{d}}t}\vec F(\vec x_{\mathrm{lc}})=\nabla\vec F(\vec x_{\mathrm{lc}})\frac{{\mathrm{d}}}{{\mathrm{d}}t}{\vec x_{\mathrm{lc}}}={\boldsymbol{J}}(t)\frac{{\mathrm{d}}\vec x_{\mathrm{lc}}}{{\mathrm{d}}t}.

So it is a solution alright, and it happens to be a P{P}-periodic solution. This solution is called the Goldstone mode. But for arbitray intitial conditions not all solutions should be periodic.

Def: Floquet Ansatz
According to Floquet theory the solution to Eq. (5) can be written in the form of a P{P}-periodic similarity matrix2 and an matrix exponential
  1. y(t)=𝐑(t)et𝚲.y(t)=\boldsymbol{R}(t)e^{t\boldsymbol{\Lambda}}.

For the proof please consult (Chicone 2006), here we are just goint to work with this as an Ansatz. The constant matrix 𝚲\boldsymbol{\Lambda} is called the Floquet matrix.

Recall the matrix exponential

Def: Matrix Exp
Let 𝐀ICn×n\boldsymbol{A}\in{I\!\!\!\!C}^{n\times n} then exp𝐀=k=01k!𝐀k\exp\boldsymbol{A}=\sum_{k=0}^\infty\frac1{k!}\boldsymbol{A}^k A useful corollary of this definition is that the eigen vectors of an exponentiated matrx are the same as those of the original and eigenvalues become exponentiated. If λi\lambda_i, wi\vec w_i are the eigenvalue, eigenvector pairs of the matrix 𝐀\boldsymbol{A}, i.e., 𝐀wi=λiwi\boldsymbol{A}\vec w_i=\lambda_i\vec w_i then by using this identity kk-times
  1. e𝐀wi=(k=01k!𝐀k)wi=k=01k!𝛌ikwi=eλiwie^{\boldsymbol{A}}\vec w_i=\left(\sum_{k=0}^\infty\frac1{k!}\boldsymbol{A}^k\right)\vec w_i=\sum_{k=0}^\infty\frac1{k!}\boldsymbol{\lambda}_i^k\vec w_i=e^{\lambda_i}\vec w_i

Inserting the Floquet Ansatz into Eq. (5) yields

𝐑̇et𝚲+𝐑(t)𝚲et𝚲=𝐉(t)𝐑(t)et𝚲.\dot{\boldsymbol{R}}e^{t\boldsymbol{\Lambda}}+\boldsymbol{R}(t)\boldsymbol{\Lambda }e^{t\boldsymbol{\Lambda}}=\boldsymbol{J}(t)\boldsymbol{R}(t)e^{t\boldsymbol{\Lambda}}.

Which by multiplying with et𝚲e^{-t\boldsymbol{\Lambda}} results in a dynamics equation for similarity matrix 𝐑\boldsymbol{R}.

  1. 𝐑̇=𝐉(t)𝐑(t)𝐑(t)𝚲.\dot{\boldsymbol{R}}=\boldsymbol{J}(t)\boldsymbol{R}(t)-\boldsymbol{R}(t)\boldsymbol{\Lambda}.

Remember that 𝐑\boldsymbol{R} was invertable so one can also derive an equation for the inverse

  1. ddt𝐑1=𝐑1𝐑̇𝐑1=𝚲𝐑1(t)𝐑1(t)𝐉(t).\frac{{\mathrm{d}}}{{\mathrm{d}}t}\boldsymbol{R}^{-1}=-\boldsymbol{R}^{-1}\dot{\boldsymbol{R}}\,\boldsymbol{R}^{-1}=\boldsymbol{\Lambda}\,\boldsymbol{R}^{-1}(t)-\boldsymbol{R}^{-1}(t)\boldsymbol{J}(t).

3.4.1 Eigensystem of the Floquet matrix

The Floquet matrix, 𝚲\boldsymbol{\Lambda}, is constant, though not necesarily symmetric. Hence it has an orthonormal left and right eigensystem

  1. Λwi=μiwi\Lambda\vec w_i=\mu_i\vec w_i and ziΛ=μizi\vec z_i\Lambda=\mu_i\vec z_i with ziwj=δij\vec z_i\cdot\vec w_j=\delta_{ij}.

The μk\mu_k are called Floquet exponents, for reasons becomming obvious below. One can now define a “rotated” eigensystem

Def. (“Rotated” eigensystem and Floquet modes) #def:floqmode

The rotated eigensystem of a spike is Wi(t)=𝐑(t)wi\vec W_i(t)=\boldsymbol{R}(t)\vec w_i and Zi(t)=zi𝐑1(t)\vec Z_i(t)=\vec z_i\boldsymbol{R}^{-1}(t). For which ipso facto obeys

  1. t:Zi(t)Wj(t)=δij\forall t:\vec Z_i(t)\cdot\vec W_j(t)=\delta_{ij}.

Wi(t)\vec W_i(t) and Zi(t)\vec Z_i(t) are also called the Floquet modes.

If we project the eigenvectors on the Eqs. (9) and (10) and use this definitions we get

  1. ddtWk=(𝐉(t)μk𝐈)Wk(t)\frac{{\mathrm{d}}}{{\mathrm{d}}t}{\vec W_k}=(\boldsymbol{J}(t)-\mu_k\boldsymbol{I})\vec W_k(t)

and

  1. ddtZk=(μk𝐈𝐉(t))Zk(t).\frac{\mathrm{d}}{{\mathrm{d}}t}{\vec Z_k}=(\mu_k\boldsymbol{I}-\boldsymbol{J}^\dagger(t))\vec Z_k(t).

As a final test of what has been chiefed by introducing the new time-dependent coordinate system of the right and left Floquet modes, project the general solution y(t)y(t) from Eq. (7) on the the adjoint Floquet modes, Zk(t)Z_k(t):

Zk(t)y(t)=zk𝐑1(t)𝐑(t)et𝚲=zket𝚲=zketμk\vec Z_k(t)\cdot\vec y(t)=\vec z_k\boldsymbol{R}^{-1}(t)\boldsymbol{R}(t)e^{t\boldsymbol{\Lambda}}=\vec z_ke^{t\boldsymbol{\Lambda}}=\vec z_ke^{t\mu_k}.

Along these rotating basis one observes a simple exponential contraction or expansion depending on the sign of the Floquet exponents. The maginitude of the Floquet exponents is the rate of convergence in units of 1/Time. What if a Floquet exponent is zero? In fact, by comparing Eq. (6) and (13) one may observe that the Goldtone mode is alwasys a solution to the

Note that if μ0=0\mu_0=0,

  1. W0(t)=ddtxlc(t)\vec W_0(t)=\frac{\mathrm{d}}{{\mathrm{d}}t}\vec x_{\mathrm{lc}}(t)

is the Goldstone mode, and from Eq. (12)

  1. Z0(ϕ)ddtxlc(ϕ)=Z0(ϕ)F(xlc(ϕ))=1\vec Z_0(\phi)\cdot\frac{\mathrm{d}}{{\mathrm{d}}t}\vec x_{\mathrm{lc}}(\phi)= \vec Z_0(\phi)\cdot\vec F(\vec x_{\mathrm{lc}}(\phi))=1

W.l.o.g, let the Floquet exponents of a stable limit cycle be ordert such that

0=μ0μ1...μn10=\mu_0\geqslant\mu_1\geqslant...\geqslant\mu_{n-1}

What are left Floquet modes useful for? Here is an example. First, rewrite Eq. (14) for k=0k=0 in the form

(ddt+J(t))Z0=0(\frac{{\mathrm{d}}}{{\mathrm{d}}t}+J^\dagger(t))\vec Z_0=0

Say one is interested in the change of the period of a limit cycle oscillator w.r.t. a parameter change. By rescaling time

dxlcdtP(α)F(xlc,α)=0\frac{d\vec x_{\mathrm{lc}}}{dt}-{P}(\alpha)\vec F(\vec x_{\mathrm{lc}},\alpha)=0

Taking the total derivative (and chain rule) one finds

αdxlcdtPαFPFαPFxlcα=0\frac{\partial}{\partial\alpha}\frac{d\vec x_{\mathrm{lc}}}{dt}-\frac{\partial {P}}{\partial\alpha}\vec F-{P}\frac{\partial \vec F}{\partial\alpha}-{P}\nabla\vec F\frac{\partial\vec x_{\mathrm{lc}}}{\partial\alpha}=0

The derivatives of the first term can be excanged. Now project the equation from the left onto the zeroth Floquet mode

Pα(Z0F)P(Z0Fα)+Z0(ddtPJ(t))xlcα=0-\frac{\partial {P}}{\partial\alpha}(\vec Z_0\cdot\vec F)-{P}(\vec Z_0\cdot\frac{\partial \vec F}{\partial\alpha}) +\vec Z_0\cdot(\frac{d}{dt}-{P}J(t))\frac{\partial\vec x_{\mathrm{lc}}}{\partial\alpha}=0

The last term can be rewritten by transposing the operator

Z0(ddtPJ(t))xlcα=((ddt+PJ(t))Z0)=0xlcα\vec Z_0\cdot(\frac{d}{dt}-{P}J(t))\frac{\partial\vec x_{\mathrm{lc}}}{\partial\alpha}= -\underbrace{((\frac{d}{dt}+{P}J^\dagger(t))\vec Z_0)}_{=0}\cdot\frac{\partial\vec x_{\mathrm{lc}}}{\partial\alpha}

Hence

  1. Pα=P(Z0Fα)\frac{\partial {P}}{\partial\alpha}=-{P}(\vec Z_0\cdot\frac{\partial \vec F}{\partial\alpha})

4 Spikes event time series and isochrons

4.1 Spike times and Isochrons

There is still a lack of a formal definnition of the spike time tspt^{\text{sp}} in Eq. (1). Electrophysiologists typically envision a voltage threshold, but these would only exist if the voltage dynamics was a one dimensional equation, which it is not. In fact, it can not be if true limit cycles are to exist. In the general state-space of action potential dynamics there may exist a generalised threshold manifold. A reasonable alternative definition would take the (local) peak/maximum of the voltage

v(tsp)=vmaxv(t^{\text{sp}})=v_\mathrm{max}.

Again, a definition based on properties of the state variables and it also leaves all other dimensions undefined. The general problem seems to be to uniquely associate time with states. If the dynamics was restricted to only on the limit cycle this is easy. As the maximum voltage value is only reached once per spike it also implies unique values for the other state variables, wich can be called the spike boundary condition: xsp\vec x_\mathrm{sp}. Most importantly it also implies a unique time indices at which such events occur, xlc(tsp)=xsp\vec x_{\mathrm{lc}}(t^{\text{sp}})=\vec x_\mathrm{sp}. The normalised time index of points on the limit cycle is also called the phase, ϕ[0,1)\phi\in[0,1) s.t. xlc(Pϕ)=xlc(P(ϕ+1))\vec x_{\mathrm{lc}}({P}\phi)=\vec x_{\mathrm{lc}}({P}(\phi+1)). Given synaptic bombardment, intrinsic noise and other perturbations the dynamics, however, may be temporarily not exactly on the limit cycle. The simple idea is that the time phase of any point in the basin of attraction of a limit cycle can be defined as the phase it converges back to when asymptotically approaching the limit cycle. To make this precise, first define the flow:

Def. (Flow of an ODE) #def:flow

For a given initial condition x0BIRn\vec x_0\in B\subset{I\!\!R}^n, where BB is the basin of attraction of the limit cycle, the flow or trajectory induced by the ODE in Eq. (3) is denoted as x(t;x0)\vec x(t;x_0).

Numerically, the flow can be evaluated by forward-integration of the ODE starting at a given initial value.

Def. (Assymptotic phase) #def:assphase

For point yB\vec y\in{B}, in the basin of attraction of the limit cycle solution of Eq. (3), the asymptotic phase, ψ\psi, is defined as

  1. $\psi(\vec y)=\argmin_\psi\lim_{t\to\infty}\|\vec x(t-\psi {P};\vec y)-\vec x_{\mathrm{lc}}(t;\vec x_\mathrm{sp})\|$.

Note that the limit cycle solution was initiated at time t=0t=0 at the spike boundary condition. Based on the asymptotic phase, the concept of isochrons can be introduced (Winfree 1974; Osinga and Moehlis 2010) as the level sets of constant phase:

Def. (Isochrons) #def:isochron

The isochron manifolds, (ϕ)\mathcal I(\phi), are the level sets of the asymptotic phase variable

(ϕ)={yB:ψ(y)=ϕ}\mathcal I(\phi)=\{\vec y\in{B}:\psi(\vec y)=\phi\}.

An isochron defines a n1n-1 dimensional manifold of points that converge to the same asymptotic phase on the limit cycle.

Based on the isochron one can now define a spike as a crossing of the flow/trajectory with (0)\mathcal I(0).

Note: There are several other ways to define a phase (Hilbert transform, linear interpolation between spikes, …). In principle, there should be monotonous mappings between each other.

4.2 Neutral dimension and phase shifts

An alternative to studying the high dimensional set of coupled nonlinear biophysical equations is to find an (approximately) input/output(I/O)-equivalent equation for the phase evolution like in Fig. #. In the figure the phase is ploted as modulo 1. However, it is also convenient to consider the freely evolving phase since this is related to the spike count process,

N(t)=ϕ(t)N(t)=\lfloor\phi(t)\rfloor.

The kthk^\mathrm{th} spike time is then defined as

ϕ(tksp)=k\phi(t^{\text{sp}}_k)=k.

A spike-train can be written as

  1. r(t)=kδ(ttksp)=kδ(ϕ(t)k)r(t)=\sum_k\delta(t-t^{\text{sp}}_k)=\sum_k\delta(\phi(t)-k).

Def. (I/O-equivalent phase oscillator) #def:io_phase

An I/O-equivalent phase oscillator produces with its threshold crossing process the same spike times {tksp}k\{t^{\text{sp}}_k\}_k as a particular conductance-based model (Lazar 2010).


 

Fig. #fig:phasered

The phase equation turns the spiking neuron into a one-dimensional level-crossing process with equivalent spike times. The phase is plotted modulo 1.


The evolution of the phase of Eq. (2) is given by the chain rule as

dϕdt=ϕ(x)dxdt\frac{{\mathrm{d}}\phi}{{\mathrm{d}}t}=\nabla\phi(\vec x)\cdot\frac{{\mathrm{d}}\vec x}{{\mathrm{d}}t}.

To first order this is

  1. dϕdt=ϕ(xlc)dxlcdt=ϕF(xlc)+ϕη(xlc,t)\frac{{\mathrm{d}}\phi}{{\mathrm{d}}t}=\nabla\phi(\vec x_{\mathrm{lc}})\cdot\frac{{\mathrm{d}}\vec x_{\mathrm{lc}}}{{\mathrm{d}}t}=\nabla\phi\cdot\vec F(\vec x_{\mathrm{lc}})+\nabla\phi\cdot\vec\eta(\vec x_{\mathrm{lc}},t).

The geometric interpretation of the phase gradient, ϕ\nabla\phi, is that it points perpendincular to the isochrons towards the steapest change of the asymptotic phase variable ϕ\phi.

4.3 The empirical infinitesimal phase response curve

Def. (empirical PRC) #def:empiricalPRC

Given a short δ\delta-pulse of amplitude ϵ\epsilon, delivered at a certain phase, ϕ\phi, (time point) in the inter-spike interval of a tonically firing neuron the linear phase response is defined as

$\prc(\psi)=\lim\limits_{\epsilon\to0}\frac1\epsilon\frac{{P}_0-{P}_\epsilon(\psi)}{{P}_0}$.

The underlying assumption of interpreting PRC is that it is a function, $\prc(\psi):[0,1)\mapsto{I\!\!R}$, that mapps phase or time of occurence of a perturbation to a persistent measurable phase shift. (Please appreciate how idealised and “noise free” these definitions are.) The convention is that a prolongation of intervall corresponds to a phase delay, while shortinging means phase advance.

The perturbed period can be written as the original period minus the phase shift converted back to time (by multiplying with the unperturbed period)

Pϵ=P0ΔϕP0{P}_\epsilon={P}_0-\Delta\phi{P}_0.

Given that the time evolution of ϕ(t)\phi(t) has been perturbed at some time point ψP0<P0\psi {P}_0<{P}_0, The phase shift can be measured at the spike time of the unperturbed system as

Δϕ=ϕ(P0)1\Delta\phi = \phi({P}_0)-1.

The empirical PRC in Def. # can then be expressed as

$\prc(\psi)=\frac1\epsilon\frac{{P}_0}{{P}_0}(1-(\phi({P}_0)-1)-1)=\frac1\epsilon(\phi({P}_0)-1)$.

Apply the identify operator of functions (convolution with a δ\delta-function) on the left and the fundamental theorem of calculus on the right hand side then to first order

$\epsilon{P}_0\int_0^{{P}_0}\prc(\phi(t))\delta(\psi-\phi(t)){\mathrm{d}}t=\int_0^{{P}_0}(\dot\phi-\frac1{{P}_0}){\mathrm{d}}t$

This step is not really allowed, but for well behaved integrals we can identify the integrands to see

$\dot\phi=\frac1{{P}_0}+\epsilon\prc(\phi)\delta(\psi-\phi)$.

One can bring this equation in agreement with the phase evolution in Eq. (20), if one identifies

$\nabla\phi(\psi)=\prc(\psi)$ and ϕ(ψ)F(ψ)=1P0=f0\nabla\phi(\psi)\cdot\vec F(\psi)=\frac1{{P}_0}={f}_0.

4.4 Constant phase shift and adjoint equation

The assumption underlying the experimental proceduce described in Def. # is that the phase shift stays constant in time, say Δϕ\Delta\phi. Without perturbation there is not long-term shift of the phase. Hence it must be the perturbation off the limit cycle which causes the phase shift. The phase change must be due to the Δx(t)\Delta\vec x(t) and is given by

Δϕ=ϕΔx(t)=\Delta\phi=\nabla\phi\cdot\Delta\vec x(t)= const. From dΔϕdt=0\frac{{\mathrm{d}}\Delta\phi}{{\mathrm{d}}t}=0 one gets

0=ddtϕΔx+ϕddtΔx0=\frac{\mathrm{d}}{{\mathrm{d}}t}\nabla\phi\cdot\Delta\vec x+ \nabla\phi\cdot\frac{\mathrm{d}}{{\mathrm{d}}t}\Delta\vec x

Again the evolution of Δx(t)\Delta x(t) can be linearised if it is small

0=ddtϕΔx+ϕ𝐉(t)Δx=(ddtϕ+𝐉(t)ϕ)Δx0=\frac{\mathrm{d}}{{\mathrm{d}}t}\nabla\phi\cdot\Delta\vec x+ \nabla\phi\cdot {\boldsymbol{J}}(t)\Delta\vec x= (\frac{\mathrm{d}}{{\mathrm{d}}t}\nabla\phi+{\boldsymbol{J}}^\dagger(t)\nabla\phi)\cdot\Delta\vec x

Such that by Fredholm’s alternative

ddtϕ=𝐉(t)ϕ\frac{\mathrm{d}}{{\mathrm{d}}t}\nabla\phi=-{\boldsymbol{J}}^\dagger(t)\nabla\phi

This is the same ODE as for the zeroth left Floquet mode, rendering them identical up to scaling.

5 The continuum limit of a membrane patch

Motivation and aim

In this lecture ion channels are introduced as stochastic devices floating in the membrane of a nerve cell. It should motivate why the analysis techniques and models introduced in this lecture need to deal with fluctuations and noise. (Nerve) cells produce stochastic processes on several levels:

  1. The ion channels in their membrane stochastically jump beteen conformations, governed by Master equations.

  2. On a more macroscipic level the their membrane voltage fluctuations show properties of coloured noise, well described by diffusion processes and stochastic differential equations.

  3. The trains of action potentials they emmit from point processes.

In the stationary state these processes can be subjected to spectral analysis.

Note that in 1952, the first equations describing membrane voltage dynamics where the deterministic rate equations by Hodgin & Huxley (Hodgkin and Huxley 1952). Only in 1994 Fox and Lu derived these equations from the continuum limit of an ensemble of stochastic ion channels (Fox and Lu 1994). Essentially, this was a chived by doing a diffusion approximation in the thermodynamic limit of many ion channels.

Finally, in 1998 the structure of a K+ ion channel from Streptomyces lividans, which is nearly identical to Drosophila’s shaker channel was discovered using X-ray analysis (Doyle 1998).

Since then the nuences of appllying diffusion approximations to neurons have been investigated (Linaro, Storace, and Giugliano 2011,Orio and Soudry (2012)) and reviewed (Goldwyn et al. 2011,Goldwyn and Shea-Brown (2011),Pezo, Soudry, and Orio (2014)).

5.1 Empirical assertions about ion channels

Observation (Conserning the ion channels of excitable membranes) #obs:chan

  1. Ion pores show different degrees of selectivity to ionic species. Understanding of the mechanisms of the ionic filter was achieved by structural analaysis (Doyle 1998). Potassiume channels conduct K+ 1000 fold more than Na+ and allow for conduction rate of 108 ions per second.

  2. In single channel patch experiments ion channels how discrete levels of conductivity, which look like a Telegraph process.

  3. Independent ion channels (no cooperativity or spatial heterogeneity)

  4. Membrane bound proteins like many others change conformation on various triggering signals:

To relate measured voltage changes to ion flux accorss the membrane note that according to Kirchofs law and electro neutrality

Cdvdt=FΩA(d[Na+]idt+d[K+]idt+d[Cl]idt)C\frac{{\mathrm{d}}v}{{\mathrm{d}}t}=\frac{F\Omega}{A} \Big(\frac{{\mathrm{d}}[{\text{Na}^+}]_i}{{\mathrm{d}}t}+\frac{{\mathrm{d}}[{\text{K}^+}]_i}{{\mathrm{d}}t}+\frac{{\mathrm{d}}[{\text{Cl}^-}]_i}{{\mathrm{d}}t}\Big)

The currents are assumed to follow Ohms law:

FΩAd[K+]idt=γNK(EKv)\frac{F\Omega}{A}\frac{{\mathrm{d}}[{\text{K}^+}]_i}{{\mathrm{d}}t}=\gamma N_K(E_K-v)

where γ\gamma is the unitary conductance and NN is the total number of open pores.

5.2 First-order rate kinetics

Starting with a hypothetical ion pore of which there is a fixed total concentration, [Pore]=const. The pore an open conformation, OO, in which, say only K+, ions can pass and a cloased states, CC, which blocks ion flow completely:

Def. (Two-state ion-channel) #def:twochan

CαβO\boxed{C}\underset{\beta}{\overset{\alpha}{\rightleftharpoons}}\boxed{O}

Based on the law of mass-action, the corresponding first-order rate equation is

d[O]dt=α[C]β[O]=α([Pore][O])β[O]\frac{{\mathrm{d}}[O]}{{\mathrm{d}}t}=\alpha[C]-\beta[O]=\alpha([\text{Pore}]-[O])-\beta[O]

The second equality is by conservation of mass [O]+[C]=[Pore]=[O]+[C]=[\text{Pore}]= const.

has the typical strucutre of its stochastic counter parts the Master equation of a birth-death process.

Hodgkin and Huxley normalised the equation by defining n(t)=[O][Pore]n(t) = \frac{[O]}{[\text{Pore}]}.

dndt=α(1n)βn=(αα+βn)(α+β)=(nn)/τ\frac{{\mathrm{d}}n}{{\mathrm{d}}t}=\alpha(1-n) -\beta n=\left(\frac\alpha{\alpha+\beta}-n\right)(\alpha+\beta) =(n_\infty-n)/\tau

Under a voltage-clamp experiment, for each voltage level this equation predicts an exponential charge up to a level nn_\infty with the time constant τ\tau that could be fitted.

Hodgkin and Huxley discovered that their measured traces could not be fitted well by the above equation.

Def. (A tetrameric K+ pore) #def:fourchan

04αβ13α2β22α3β3α4β4\boxed{0} \underset{\beta}{\overset{4\alpha}{\rightleftharpoons}} \boxed{1} \underset{2\beta}{\overset{3\alpha}{\rightleftharpoons}} \boxed{2} \underset{3\beta}{\overset{2\alpha}{\rightleftharpoons}} \boxed{3} \underset{4\beta}{\overset{\alpha}{\rightleftharpoons}} \boxed{4}

5.3 The ion channel as a Markov model

Instead of writing down heuristic macroscopic laws for channel concentrations, one can attempt to derive them from microscopic dynamics of single pores. For this the state diagram in Def. # is reinterpreted as the state of a single molecule.

Based on Obs. # such single channel dynamics can be described by a finite state Markov model which is mathematically described as a Master equation.

A Markov model represented by a graph like Def. # is a special kind of chemical reaction network studied in systems biology. The connectivity/topology can be stored in a

Def. (Stoichiometry matrix) #def:stoichmat

The stoichiometry matrix, 𝐍INm×n\boldsymbol{N}\in{I\!\!N}^{m\times n} describes nn reactions (transitions) involving mm chemical species (states). It can be decomposed into the educt matrix 𝐄\boldsymbol{E} and the product matrix 𝐏\boldsymbol{P} by 𝐍=𝐏𝐄\boldsymbol{N} = \boldsymbol{P} - \boldsymbol{E}

The stoichiometry matrix of Def. # would look like

𝐍=(1000100011001100011001100011001100010001)\boldsymbol{N}=\begin{pmatrix} -1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & -1 & 0 & 0 & -1 & 1 & 0 & 0 \\ 0 & 1 & -1 & 0 & 0 & -1 & 1 & 0 \\ 0 & 0 & 1 & -1 & 0 & 0 & -1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 \end{pmatrix}

In addition one needs the rate vector

v=[4α,3α,2α,α,β,2β,3β,4β]𝖳\vec v = [4\alpha,3\alpha,2\alpha,\alpha,\beta,2\beta,3\beta,4\beta]^{\mathsf{T}}

to define the complete reaction network for the channel.

To understand the dynamical evolution of occupation propability of different conformational states, take the two-state channel as an example. Define a vector of the probability of being closed and open, p=[pC,pO]\vec p=[p_C,p_O]^\dagger, respectively.

In actuality the channels are part of the membrane dynamical system, where α\alpha and β\beta depend at least on vv and hence are not constant during a spike. One needs an update rule of how to get from the probability of being open at time tt to the probability of begin open at time t+dtt+dt. Given are the rates, so the propabilities of change in an interval of time dtdt are the rates times the interval length, e.g. dtβdt\,\beta. For the two state channel this cann be summarised in matrix notation

(pC(t+dt)pO(t+dt))=(1αdtβdtαdt1βdt)(pO(t)pC(t))\begin{pmatrix}p_C(t+dt)\\p_O(t+dt)\end{pmatrix}=\begin{pmatrix}1-\alpha\,dt&\beta\,dt\\\alpha\,dt&1-\beta\,dt\end{pmatrix}\begin{pmatrix}p_O(t)\\p_C(t)\end{pmatrix}

or in vector form

p(t+dt)=(𝐈+𝐐dt)p(t)\vec p(t+dt)=\left(\boldsymbol{I}+\boldsymbol{Q} dt\right)\vec p(t)

The infinitesimal limit yields a kind of Kolmogorov forward-equation

  1. ddtp=𝐐p\frac{\mathrm{d}}{{\mathrm{d}}t}\vec p=\boldsymbol{Q}\vec p.

Def. (Transition rate matrix) #def:transmat

The transition rate matrix (or infinitesimal generator) is 𝐐IRm×m\boldsymbol{Q}\in{I\!\!R}^{m\times m}.

In general the transition rate matrix can be obtained from the graph

$\boldsymbol{Q} = \boldsymbol{N}\diag(\vec v)\boldsymbol{E}^{\mathsf{T}}$.

Exercise (Transition rate matrix) #ex:mc1

Derive 𝐐\boldsymbol{Q} of a three state channel

03αβ12α2β2α3β3\boxed{0} \underset{\beta}{\overset{3\alpha}{\rightleftharpoons}} \boxed{1} \underset{2\beta}{\overset{2\alpha}{\rightleftharpoons}} \boxed{2} \underset{3\beta}{\overset{\alpha}{\rightleftharpoons}} \boxed{3}

Note that Eq. (21), though a consistent description of the occupation probabilities, does not give us realisations or sample paths comparable to the traces measured in single-channel patch clamp experiments.

5.4 Sample path of a telegraph process (simulation a jump process)

Given a total number of ion channels NN, the goal is to track the actual number of channels in a particular state. Based on reaction rate theory and given a certain reaction rate aa, the number kk of reaction events in a time window is

kPoisson(a)=akeak!.k\sim\text{Poisson}(a)=\frac{a^ke^{-a}}{k!}.

If several reactions can occur and several molecules are in volved then (because Poisson distribution is closed under convolution) one has

kPoisson(jNjajλ)k\sim\text{Poisson}(\underbrace{\sum_jN_ja_j}_\lambda).

So the rate of the occurence of any reaction is given by this linear combination of the rates of individual reactions. The waiting time distribution between reactions is exponential with the same rate:

p(t)=λeλtp(t)=\lambda e^{-\lambda t}

Starting in a particular state N=[N1,...,Nn]\vec N=[N_1,...,N_n] at time t0t_0 the life time of staying in that state until t0+tt_0+t is

The escape rate of the state is (any reaction occuring)

λ=k=1nNkak\lambda=\sum_{k=1}^nN_ka_k

where aka_k are the rate of leaving state kk. For example a3a_3 in the K+-channel is

a3=2β+2αa_3=2\beta+2\alpha

But which reaction did occur? Let jj be the reaction (not the state!). For example, there are 8 reactions in the K+ channel. The probabilities of occurance associated with any one of them is

p(j)=Njajk=1NreacNkak=Njaj/λp(j)=\frac{N_ja_j}{\sum_{k=1}^{N_{\mathrm{reac}}}N_ka_k}=N_ja_j/\lambda

In a computer algorithm one can draw the waiting time for the next reaction simply by generating a uniform random number r1U(0,1)r_1\sim U(0,1) and then

τln(r11)/λ\tau\leftarrow\ln(r_1^{-1})/\lambda

The reaction is determined with a second random number r2U(0,1)r_2\sim U(0,1)

P(j)=k=1jp(k)P(j)=\sum_{k=1}^jp(k)

jargmaxjP(j)<r2j\leftarrow\text{argmax}_{j}P(j)<r_2

while
  r1 = rand
  r2 = rand
  tau = ln(1/r1) / a

5.5 Statistical properties

With pC=1pOp_C=1-p_O we can express one row of this equation as

ṗO=αpOβ(1pO)\dot p_O=\alpha p_O - \beta(1-p_O)

or

τṗO=pO()pO\tau\dot p_O=p_O^{(\infty)}-p_O with τ=1α+β\tau=\frac1{\alpha+\beta} and pO()=αα+βp_O^{(\infty)}=\frac\alpha{\alpha+\beta}

which has solution

pO(t)=pO()(1et/τ)p_O(t)=p^{(\infty)}_O(1-e^{-t/\tau})

Fourier transformation leads to

τiωp̃(ω)=pO()p̃(ω)\tau\mathrm{i}\omega\tilde p(\omega)=p^{(\infty)}_O-\tilde p(\omega)

or

p̃(ω)=p()1+iτω\tilde p(\omega)=\frac{p^{(\infty)}}{1+\mathrm{i}\tau\omega}

|p̃(ω)|2=p̃(ω)p̃*(ω)=p()1+(τω)2|\tilde p(\omega)|^2=\tilde p(\omega)\tilde p^*(\omega)=\frac{p^{(\infty)}}{1+(\tau\omega)^2}

a Lorenzian spectrum. Inverse Fouerier transform yields the covariance function

c(τ)=pO(t)pO(t+τ)c(\tau)=\langle p_O(t)p_O(t+\tau)\rangle

An ionic current produced in the open state would be

Iion=γK+NO(EK+v)I^{\mathrm{ion}}=\gamma_{\mathrm{K}^+}N_O(E_{\mathrm{K}^+}-v)

γK+\gamma_{\mathrm{K}^+} and EK+E_{\mathrm{K}^+} are the unitary conductance and the Nernst potential respectively. The average such current would be

Iion=γK+NpO(EK+v)\langle I^\mathrm{ion}\rangle=\gamma_{\mathrm K^+}Np_O(E_{\mathrm K^+}-v)

where NN is the total number of channels in the membrane patch under consideration. But what about a particular stochastic realisation of the current, what about the fluctuations around the mean?

If we have NN channels than the number of kk of them being open is binomially distributed

PO(k)=(Nk)pOkpCNkP_O(k)={N\choose k}p_O^kp_C^{N-k}

5.5.1 The nn-state channel

Let us try to calculate the statistics of the current originating from an nn-state channel (just like in the two state case). Why would one do this? The idea, later on is to be able to find a continuous stochastic process that we can simulate and analyise easily.

Let K(t)[1,...,n]K(t)\in[1,...,n] be the realisation of the nn-state Markov channel. For example a K+-channel with four subunits

04αβ13α2β22α3β3α4β4\boxed{0} \underset{\beta}{\overset{4\alpha}{\rightleftharpoons}} \boxed{1} \underset{2\beta}{\overset{3\alpha}{\rightleftharpoons}} \boxed{2} \underset{3\beta}{\overset{2\alpha}{\rightleftharpoons}} \boxed{3} \underset{4\beta}{\overset{\alpha}{\rightleftharpoons}} \boxed{4}

For the K+ channel with n=4n=4 conformations states, of which one is conducting, let us further define

G(t)=δ4K(t)={1:K(t)=40:K(t)>1.G(t)=\delta_{4K(t)}=\begin{cases}1&:K(t)=4\\0&:K(t)>1.\end{cases}

If there are several conducting states (with possibly different conductance level, one could have e.g., G=δ2K(t)+2δ4K(t)G=\delta_{2K(t)}+2\delta_{4K(t)}.

The single channel current at a voltage clamp v(t)=vv(t)=v is then

  1. I(t)=γG(t)(Ev).I(t)=\gamma G(t)(E-v).

How does it evolve? Define pi(t)=P(K(t)=i)p_{i}(t)=P(K(t)=i) and p(t)=[p1,...,pn]\vec p(t)=[p_1,...,p_n]^\dagger, then

  1. ddtp=𝐐p\frac{\mathrm{d}}{{\mathrm{d}}t}\vec p=\boldsymbol{Q}\vec p

With formal solution

p(t)=e𝐐tp(0)=(e𝐐)𝐌tp(0)=𝐌tp(0)\vec p(t)=e^{\boldsymbol{Q}t}\vec p(0)={\underbrace{(e^{\boldsymbol{Q}})}_{\boldsymbol{M}}}^t\vec p(0)=\boldsymbol{M}^t\vec p(0)

Use the singular value decompostion, 𝐌=𝐔𝚺𝐕\boldsymbol{M}=\boldsymbol{U}\,\boldsymbol{\Sigma}\,\boldsymbol{V}^\dagger, the matrix power can be written as 𝐌t=𝐔𝚺t𝐕\boldsymbol{M}^t=\boldsymbol{U}\,\boldsymbol{\Sigma}^t\boldsymbol{V}^\dagger. Or (recall Eq. (8))

  1. p(t)=k=1nukvkeνktp(0)\vec p(t)=\sum_{k=1}^n\vec u_k\vec v_k^\dagger e^{\nu_kt}\vec p(0).

If Eq. (23) has a stationary distribution in the tt\to\infty limit, then this must correspond to the eigenvalue ν1=0\nu_1=0 (let us assume they are ordered). So ddtp()=0\frac{\mathrm{d}}{{\mathrm{d}}t}\vec p(\infty)=0\Longrightarrow for p()=v1,ν1=0:𝐐v1=ν0v1=0\vec p(\infty)=\vec v_1,\exists\nu_1=0:\boldsymbol{Q}\vec v_1=\nu_0\vec v_1=0. Therefore, the solution can be written as

  1. p(t)=p()+k=2nukvkeνktp(0)\vec p(t)=\vec p(\infty)+\sum_{k=2}^n\vec u_k\vec v_k^\dagger e^{\nu_kt}\vec p(0).

The average channel current of Eq. (22) is

I(t)=γ(Ev)k=1npk(t)δ1k=γ(Ev)(p1()+k=2nu1kv1kp1(0)eνkt)\langle I(t)\rangle=\gamma(E-v)\sum_{k=1}^n p_k(t)\delta_{1k}=\gamma(E-v)(p_1(\infty)+\sum_{k=2}^nu_{1k}v_{1k}p_1(0)e^{\nu_kt}),

which if the chain is stable (νk<0:k>1\nu_k<0:\forall k>1) has steady state

I=limtI(t)=p1()\langle I\rangle=\lim_{t\to\infty}\langle I(t)\rangle=p_1(\infty).

The steady state covariance C(Δ)=limtC(\Delta)=\lim_{t\to\infty} in this case is

Ct(Δ)=I(t)I(t+Δ)I2=γ2(Ev)2j,k=1nδ1jδ1kpj(t)pk(t+Δ)I2C_t(\Delta)=\langle I(t)I(t+\Delta)\rangle-\langle I\rangle^2=\gamma^2(E-v)^2\sum_{j,k=1}^n\delta_{1j}\delta_{1k}\,p_j(t)p_k(t+\Delta)-\langle I\rangle^2

Hence

Ct(Δ)=p1(t)p1(t+Δ)I2=i,j=2n[uivivj=δijui]11eνit+νj(t+Δ)C_t(\Delta)=p_1(t)p_1(t+\Delta)-\langle I\rangle^2=\sum_{i,j=2}^n[\vec u_i \underbrace{\vec v_i^\dagger\vec v_j}_{=\delta_{ij}}\vec u_i^\dagger]_{11}e^{\nu_it+\nu_j(t+\Delta)}

If νk<0:k>1\nu_k<0:\forall k>1 then

  1. C(Δ)=i=2nu1iu1ieνiΔC(\Delta)= \sum_{i=2}^nu_{1i} u_{1i}e^{\nu_i\Delta}

is a sum of exponentials. The spectral density is a superposition of Lorenzians.

5.6 Statistically equivalent diffusion process (Orenstein-Uhlenbeck Process)

The jump process discussed in the previous sections is a continuous time Markov-process on a discrete domain, K(t)INK(t)\in{I\!\!N} with tIRt\in{I\!\!R}. A diffusion process is a continuous time Markov-process on a continuous domain, η(t)IR\eta(t)\in{I\!\!R} with tIRt\in{I\!\!R}.

Can are in search for a a diffusion process such that

I=γ(p1()+k=2nηk(t))(Ev)I=\gamma\left(p_1(\infty) + \sum_{k=2}^n\eta_k(t)\right)(E-v)

has the same first and second order statistics (in voltage clamp) as Eq. (22)? Let us try

  1. τk(v)η̇k=ηk+σ(v)ξ(t)\tau_k(v)\dot\eta_k=-\eta_k+\sigma(v)\,\xi(t) where ξ(0)ξ(Δ)=δ(Δ)\langle\xi(0)\xi(\Delta)\rangle=\delta(\Delta)

To solve it, use the Fourier Transform

iωτη̃(ω)=η̃(ω)+σχ(ω)\mathrm{i}\omega\tau\tilde\eta(\omega)=-\tilde\eta(\omega) + \sigma\,\chi(\omega)

The place holder symbold χ\chi was introduced for the Fourier transform of the stochastic process ξ(t)\xi(t). Rearanging yields

η̃=σχ(ω)1+iωτ\tilde\eta=\frac{\sigma\chi(\omega)}{1+\mathrm{i}\omega\tau}

The spectrum is

η̃(ω)η̃*(ω)=σ2χ(ω)χ*(ω)1+(τω)2\tilde \eta(\omega)\tilde \eta^*(\omega)=\frac{\sigma^2\chi(\omega)\chi^*(\omega)}{1+(\tau\omega)^2}

By definition χ(ω)χ*(ω)\chi(\omega)\chi^*(\omega) is the Fourier Transform of the covariance function δ(Δ)\delta(\Delta) and from Eq. ((???)) this is one. Hence,

η̃(ω)η̃*(ω)=σ21+(τω)2\tilde \eta(\omega)\tilde \eta^*(\omega)=\frac{\sigma^2}{1+(\tau\omega)^2}

Applying the inverse Fourier transform results in the correlation function

C(t)=σ2τe|t|/τC(t)=\frac{\sigma^2}\tau e^{-|t|/\tau}.

A super position of independent such OU process

k=inηi(t)\sum_{k=i}^n\eta_i(t)

leads to a correlation function with the same structure as in Eq. (26). We identify τi=1/νi\tau_i=1/\nu_i and σi=u1i\sigma_i=u_{1i}.

The idea of matching the second order statistics can be formulated in a far more abstract way in terms of the Kramers-Moyal-van-Kampen expansion of the Master requation

ṗ=w(xx)p(x,t)w(xx)p(x,t)dx\dot p=\int w(x'\to x)p(x',t)-w(x\to x')p(x,t) dx'

p(x,t)=n=1nxnKn(x,t)p(x,t)\partial p(x,t)=\sum_{n=1}^\infty\frac{\partial^n}{\partial x^n} K_n(x,t)p(x,t)

from

6 Continuation of fixpoints and orbits

Often small changes, say in an ion channel density, has only small quantitative effect on a neurons behaviour.

Often numerical solutions to nonlinear ordinary differential equations are found by (forward) time-integration. An interesting alternative is to track a found solution through parameter space, for which the solution must be persistent in a certain parameter. If it is not, then a bifurcation occurs and one observes a qualitative change of the solution. The study of system solutions under parameter perturbations is also the subject of sensitivity analysis.

For book on bifurcation theory consult (Izhikevich 2007) and numerical bifurcation analysis (Kielhöfer 2011).

6.1 Continuation of fixed points

Asume for

ẋ=F(x,α)\dot{\vec x}=\vec F(\vec x,\alpha)

there is a steady state solution

  1. F(x,α)=0\vec F(\vec x, \alpha)=0

In biological systems like cells, parameters are rarely stable over longer periods, because there is a continuous turnover (synthesis and degradation) of all molecules. Hence, so one should be interested in families of solutions for varying prarameters, x(α)\vec x(\alpha).

Def. (solution banch) #def:bifbranch

A branch, familiy or curve of solutions with regard to an index parameter α\alpha is denoted as x(α)\vec x(\alpha).

The existence of the solution as a function of the parameter is governed by the implicite function theorem:

Theorem (Implicite function theorem) #thm:implicitefoo

Consider a system of equations

F(x,α)=0\vec F(\vec x,\alpha)=0, with FIRn\vec F\in I\!\!R^n, xIRn\vec x\in I\!\!R^n, αIR\alpha\in I\!\!R and x,αFIRn×n+1\nabla_{x,\alpha}\vec F\in I\!\!R^{n\times n+1}.

Let ff and x,αF\nabla_{x,\alpha}\vec F be smooth near xx. Then if the Jacobian x,αF\nabla_{x,\alpha}\vec F is nonsingular, \exists a unique, smooth solution family x(α)\vec x(\alpha) such that

F(x(α),α)=0\vec F(\vec x(\alpha),\alpha)=0.

This establishes the existence of lines in a bifurcation diagram.
The method of continuation is a predictor-corrector method. In practice, assume the fixpoint is known for one particular parameter value α0\alpha_0, then for a small change in parameter the solution is is predicted

Predictor step:
Taylor’s linearisation
  1. x(p+δp)x(p)+δpxp\vec x(p+\delta p)\approx\vec x(p)+\delta p\frac{\partial\vec x}{\partial p}.

To predict the new solution, the old solution is required and the derivative of the solution w.r.t. the parameter that is changed. How to compute the latter? Take the total derivative of Eq. (28) w.r.t. the parameter

(xf)xp+fp=0(\nabla_xf)\frac{\partial\vec x}{\partial p}+\frac{\partial\vec f}{\partial p}=0.

with formal solution

xp=(xf)1fp\frac{\partial\vec x}{\partial p}=-(\nabla_x\vec f)^{-1} \frac{\partial\vec f}{\partial p}.

If xf\nabla_x\vec f is full rank one can use some efficient linear algebra library to find the vector xp\frac{\partial\vec x}{\partial p} and back insert it into Eq. (29). For too large δp\delta p the predicted solution will be wrong. Yet, it is a good initial guess form which to find the correct version.

Corrector step:
Newton iterations to find the root of f(x,p)\vec f(\vec x,p)
  1. xn+1=xn(xf)1f(xn,p)\vec x_{n+1}=\vec x_n-(\nabla_x\vec f)^{-1}\vec f(\vec x_n,p)

Actually that Newton’s iterations are also obtained by linearisation

0=f(xn+1,p)=f(xn)+(xf)(xn+1xn)0=\vec f(\vec x_{n+1},p)=\vec f(x_n)+(\nabla_x\vec f)(\vec x_{n+1}-x_{n}),

which if solved for xn+1\vec x_{n+1} yields Eq. (30).

Convergence analysis of Newton iterations yields that with each newton iterations the number of correct decimal places doubles. Hence often a low number of iterations (3-7) suffice to achieve sufficient numerical accuracy.

Example (Continuation of the QIF model) #ex:contQIF

v̇=I+v2=F(v,I)\dot v = I + v^2 = F(v,I) with the solution branches v(I)=±Iv(I)=\pm\sqrt{-I}.

From vF=2v\nabla_vF=2v and FI=1\frac{\partial F}{\partial I}=1, one gets vI=12v=12I\frac{\partial v}{\partial I}=-\frac{1}{2v}=-\frac{1}{2\sqrt{-I}}. What happens at I=v=0I=v=0?

Example:
Say f(x,p)=x2+p21f(x,p)=\sqrt{x^2+p^2}-1, then the solution branches are x(p)=±1p2x(p)=\pm\sqrt{1-p^2}. The linear stability analysis xf=xx2+p2\partial_xf=\frac x{\sqrt{x^2+p^2}} that
x>0xf>0x>0\to\partial_xf>0\to unstable
x<0xf<0x<0\to\partial_xf<0\to stable

Also pf=px2+p2\partial_pf=\frac p{\sqrt{x^2+p^2}} and thus px=px=p±1p2\partial_px=\frac px=\frac p{\pm\sqrt{1-p^2}}. What happens at p=±1p=\pm1 or x=0x=0?

Numerical continuation

Or more general: What happens if we find that the condition number of f\nabla\vec f explodes?

In the example above the branch switches its stability and it bends back in a fold or turning point. In general folds can occure with and without stability change.

6.2 Local bifurcations: What happens if f\nabla\vec f is singular?

Bifurcatin analysis is one big “case discrimination” task. Luckily, and due to topology the cases are, given certain constraints, finite.

6.2.1 Folds and one-dimensional nullspaces

Recall some difinitions

Def. (nullspace, kernel):
The kernel or nullspace of a matrix 𝐉\boldsymbol{J} is

N(𝐉)={xIRn|𝐉x=0}N(\boldsymbol{J})=\{\vec x\in I\!\!R^n|\boldsymbol{J}\vec x=0\}

Def. (range, image):
The range of a matrix 𝐉\boldsymbol{J} is

R(𝐉)={y|xIRn:𝐉x=y}R(\boldsymbol{J})=\{\vec y|\exists x\in I\!\!R^n:\boldsymbol{J}\vec x=\vec y\}

Def:
Let QQ be the projection onto the range
Def:
Eigensystem at the fold. Let Jacobian matrix 𝐉=f(x(0),p(0))\boldsymbol{J}=\nabla\vec f(\vec x(0),p(0))

𝐉rk=λkrk\boldsymbol{J}\vec r_k=\lambda_k\vec r_k and lk𝐉=λklk\vec l_k\boldsymbol{J}=\lambda_k\vec l_k.

So that the nullspace is spanned by l0\vec l_0.

This section considers dimN(𝐉)=1\dim N(\boldsymbol{J})=1. Hence, the implicite function theorem is not applicable. From the example above it is apparent that x(p)=\vec x'(p)=\infty at the bifurcation. The problem can be circumvented by defining a new “arclength parameter”, p(s)p(s). The bifurcation branch is then a parametric curve, (x(s),p(s))(\vec x(s),p(s)). Without loss of generality the bifurcation is to be at s=0s=0.

If the Jacobian matrix 𝐉=f\boldsymbol{J}=\nabla\vec f is rank-deficient the Lyapunov-Schmidt reduction can be applied. Intutiefely the problem is reduced from a high-dimensional, possibly infinite-dimensional, one to one that has as many dimension as the deffect of f\nabla\vec f.

The nullspace is spanned by the eigenvector, r0r_0 of 𝐉\boldsymbol{J}, corresponding to the eigenvalue 0.

Assume that ff is twice differentiable w.r.t. x\vec x, then differentiate f(x(s),p(s))=0\vec f(\vec x(s),p(s))=0 w.r.t. ss and evaluate at s=0s=0

  1. ddsf(x(s),p(s))=xfx(s)+pfp(s)=𝐉x(s)+pfp(s)\frac{\mathrm{d}}{{\mathrm{d}}s}\vec f(\vec x(s),p(s))=\nabla_x\vec fx'(s)+\partial_p\vec fp'(s)=\boldsymbol{J}\vec x'(s) + \partial_p\vec fp'(s)

Let 𝐇=xf\boldsymbol{H}=\nabla\nabla_x\vec f be the Hessian tensor.

d2dsf=𝐇x(s)x(s)+𝐉x(s)+p2fp(s)+pfp(s)=0\frac{{\mathrm{d}}^2}{{\mathrm{d}}s}\vec f=\boldsymbol{H}\vec x'(s)\vec x'(s)+\boldsymbol{J}\vec x''(s)+\partial^2_p\vec fp'(s)+\partial_p\vec fp''(s)=0

At s=0s=0 one has p(0)=0p'(0)=0 and hence from Eq. (31) x(0)=r0\vec x'(0)=\vec r_0. Projecting onto the left-eigenvector l0\vec l_0 to the eigenvalue 0. at s=0s=0 one finds

l0𝐇r0r0+l0pfp(0)=0\vec l_0\boldsymbol{H}\vec r_0\vec r_0 + \vec l_0\partial_p\vec fp''(0)=0

or with pfR(𝐉)\partial_p\vec f\not\in R(\boldsymbol{J})

p(0)=l0𝐇r0r0l0pfp''(0)=-\frac{\vec l_0\boldsymbol{H}\vec r_0\vec r_0}{\vec l_0\partial_p\vec f}.

This is a test function of wether the bifurcation is

  1. subcritical (p(0)<0p''(0)<0)
  2. superctirical (p(0)>0p''(0)>0)
  3. transcritical
Folds
Def:
Let PP be the projection onto the null space

There are several cases to be distinguised.

The fold:
If the rank deficiency is one-dimensional, dimN(𝐉)=1\dim N(\boldsymbol{J})=1.
The Andronov-Hopf:
If the rank deficiency is two-dimensional.

6.3 Stability exchange

At a simple fold one eigenvalue is zero. Study the eigen system of 𝐉=f(x(s),p(s))\boldsymbol{J}=\nabla\vec f(\vec x(s),p(s)) near s=0s=0

𝐉(r0+w(s))=λ(s)(r0+w(s))\boldsymbol{J}(\vec r_0+\vec w(s)) = \lambda(s)(\vec r_0+\vec w(s)).

With a bit of analysis

λ(0)=l0pf(x(0),p(0))p(0)\lambda'(0)=-\vec l_0\cdot\partial_p\vec f(\vec x(0),p(0))p''(0)

6.3.1 Extended system

Continue the extened system (linear and nonlinear)

f(x(p),p)=0\vec f(\vec x(p),p)=0

f(x,p)w=0\nabla\vec f(\vec x,p)\vec w=0

(f(x,p))𝖳z=0(\nabla\vec f(\vec x,p))^{\mathsf{T}}\vec z=0

6.4 Continuation of boundary value problems and periodic orbits

The same procedure than above can be applied to the continuation of periodic orbits and boundary value problems. Define the implicit function to encompass the time derivative

g(x,p)=ddtxTf(x,p)=0\vec g(\vec x,p)=\frac{\mathrm{d}}{{\mathrm{d}}t}\vec x-T\vec f(\vec x,p)=0, with t[0,1]t\in[0,1].

Then proceed as above. Note that the time derivative d/dt{\mathrm{d}}/{\mathrm{d}}t is a linear operator which has a matrix representation just like the Jacobian. in that sense

xddtx=ddt𝐈\nabla_x\frac{\mathrm{d}}{{\mathrm{d}}t}\vec x=\frac{\mathrm{d}}{{\mathrm{d}}t}\boldsymbol{I}

Def. (saddle-node/fold bifurcation) #def:bif-sn

The fixpoint defined by F(𝐱sn,αsn)=0\vec F({\boldsymbol{x}^\mathrm{sn}},\alpha_\mathrm{sn}) = 0 is a fold of a saddle and a node if

  1. the Jacobian J(𝐱sn,αsn)J({\boldsymbol{x}^\mathrm{sn}},\alpha_\mathrm{sn}) has one eigenvalue λ0(𝐱sn,αsn)=0\lambda_0({\boldsymbol{x}^\mathrm{sn}},\alpha_\mathrm{sn})=0 with algebraic multiplicity one,
  2. 2λ0α2|𝐱sn,αsn0\frac{\partial^2\lambda_0}{\partial\alpha^2}|_{{\boldsymbol{x}^\mathrm{sn}},\alpha_\mathrm{sn}}\neq0

Def. (Hopf bifurcation) #def:bif-hopf

Assuming all but two eigenvalues have negative real parts, except for a conjugate pair of purely imaginary eigenvalues.

7 Normal forms, centre manifold and simple spiking models

7.1 Example: Dynamics in the centre manifold of a saddle-node

At arbitrary parameters the periodic soluiton adjoint to the first variational equation on the limit cycle yields the PRC. I can be calculated numerically with the continuation method. Near a bifurcation, however, if major parts of the dynamics happen in the centre manifold the PRC can be calculated analytically. As an example take the saddle-node on limit cycle bifurcation (SNLC). The spike in this case is a homoclinic orbit to a saddle-node, that enters and leaves via the semi-stable (centre) manifold that is associated with the eigenvalue λ0=0\lambda_0=0 of the Jacobian at the saddle.

Saddle-node on a limit cycle (SNLC). The dynamics on the stable manifold is fast ẋ=λ1x\dot x=\lambda_1x, while the dynamics along the centre subspace is slow ẋ=bx2\dot x=bx^2.
  1. ẋ=F(x)+pG(x)\dot{\vec x}=\vec F(\vec x)+p\,\vec G(\vec x)

(cv̇τṅ)=(II(n,v)n(v)n){c\dot v\choose\tau\dot n}={I-I(n,v)\choose n_\infty(v)-n}

Let there be a saddle-node (fold) bifurcation at some value p0p_0 and the saddle-node position is x0\vec x_0. In the previous lecture we noted that if the bifurcation is not transcritical (degenerated)

d2p(s)ds2=l0𝐇r0r0l0pF0\frac{{\mathrm{d}}^2 p(s)}{{\mathrm{d}}s^2}=-\frac{\vec l_0\boldsymbol{H}\vec r_0\vec r_0}{\vec l_0\partial_p\vec F}\neq0

The change of the solution w.r.t. the pseudo-arclength parameter was

dxds=r0\frac{{\mathrm{d}}\vec x}{{\mathrm{d}}s}=\vec r_0.

Given that the linear part of the dynamics is degenerated at the bifurcation, one might look at the leading nonlinear term.

Expanding the right-hand side around saddle-node fixpoint, x0\vec x_0, yields (B. Ermentrout 1996)

F(x0+δx)=F(x0)+𝐉(δx)+𝐇(δx)(δx)+...\vec F(\vec x_0+\delta\vec x)=\vec F(\vec x_0) + \boldsymbol{J}(\delta\vec x)+\boldsymbol{H}(\delta\vec x)(\delta\vec x)+...

Establish the eigen system at the saddle-node 𝐉=f(x0)\boldsymbol{J}=\nabla\vec f(\vec x_0)

lk𝐉=λklk\vec l_k\boldsymbol{J}=\lambda_k\vec l_k and 𝐉rk=λkrk\boldsymbol{J}\vec r_k=\lambda_k\vec r_k with ljrk=δjkl_j\cdot r_k=\delta_{jk}.

By assumption of a saddle-node the Jacobian has a simple zero with an associated eigenvector. All other eigenvalues are exponentially contracting λk<0,k>0\lambda_k<0,\forall k>0, while r0\vec r_0 is slow.

Def (centre subspace):
The subspace spanned by r0\vec r_0 is called the centre subspace or slow subspace.

Write the dynamics arround the saddle-node as x(t)=x0+yr0\vec x(t) = \vec x_0 + y\vec r_0, and also use small changes in the bifurcation parameter p0+pp_0+p. Then Eq. (32) is

dx0dt+ẏr0=F(x0,p0)+y𝐉r0+y2𝐇r0r0+pFp\frac{{\mathrm{d}}\vec x _0}{{\mathrm{d}}t}+\dot y\vec r_0=\vec F(\vec x_0,p_0)+y\boldsymbol{J}\vec r_0+y^2\boldsymbol{H}\vec r_0\vec r_0 +p\frac{\partial\vec F}{\partial p}.

Projecting this equation onto the left eigenvector l0\vec l_0 yields the isolated dynamics along the centre manifold:

  1. ẏ=ay2+b\dot y = ay^2+b with a=l0𝐇r0r0a=\vec l_0\boldsymbol{H}\vec r_0\vec r_0 and b=pl0Fpb=p\,\vec l_0\cdot\frac{\partial\vec F}{\partial p}.
Note:
In the literature often yy is suggestively written as vv assuming the quadratic dynamics is a long the voltage dimension. However, it can be shown that the centre manifold of a conductance-based neuron is never parallel to the voltage dimension.
Solution of Eq (33)

The centre manifold is only tangential to the spiking limit cycle dynamics near the saddle. Although the proportion of time spend near the saddle is large at some point the trajectories depart so it is only locally valid.

The formal solution of Eq. (33) with initial condition y(0)=y(0)=-\infty is

  1. y(t)=batan(abtπ/2)y(t)=\sqrt{\frac ba}\tan(\sqrt{ab}\,t-\pi/2).

Away from the saddle-node the slow manifold accelerates again and the variable yy explodes at π/2\pi/2. This blowing up is like a spike. So for the SNLC centre manifold one can think of the the spike at y=y=\infty and reset to y=y=-\infty. The time it takes from y=y=-\infty to y=y=\infty is finite and given by

Tp=πab{T_\text{p}}=\frac\pi{\sqrt{ab}}.

Note that for y(0)=0y(0)=0 it is

  1. y(t)=abtan(abt)y(t)=\sqrt{\frac ab}\tan(\sqrt{ab}\,t)

and

Tp=π2ab{T_\text{p}}=\frac\pi{2\sqrt{ab}}.

The bifurcation parameter enter in aa. If qg(x)=[c1I,0,0,...]𝖳q\vec g(\vec x)=[c^{-1}I,0,0,...]^{\mathsf{T}}, then the firing rate has the typical square-root scaling

f0=1Tp=π1bl00c1(II0)f_0=\frac1{T_\text{p}}=\pi^{-1}\sqrt{b\vec l_{00}c^{-1}(I-I_0)}.

Let us check this numerically in the pratical.

8 Fundamental organisation of neuronal excitability

The last chapter argued that conductance-based neuron models, are a consistent continuum limit of Markov description of finite ion channels. Hallmarks of the resuliting ordinary differential equations include first-order kinetic equations for ion channel components, i.e. gates, and certain polynomial nonlinearities. There is a rich literature on the dynamical systems analysis of single neurons, such that it is difficult to structure it for a lecture. Typically, the starting point is the different “simple” bifurcations that can be found in neuron models. One can also start with the “simplest” of the complicated bifurcations that summarises all possible simple ones. This is asking for the simplest chimera neuron with as many of the known features as possible (Kirst 2012; Kirst et al. 2015).

Start by restating some empirical observations on the phenomenology of excitable nerve cells. For pulsed-based communication these equations need a stable resting potential and a limit cycle-like transient pulsatile solution for certain input perturbations (Pereira, Coullet, and Tirapegui 2015).

Phenomenon Dynamics
resting potential stable node
subtreshold oscillations stable spiral
action potential stable limit cycle
thresholds saddle, unstable limit cycle

Note that the focus on the generation of single or tonic spikes, not bursts. Bursts require at least one additional time scale to the pulse generating ones and are not minimal in that sense.

Observation (nerve cells) #obs:neuron

  1. stable resting potential, for Iin=0I_{\mathrm{in}}=0 in the convex hull of reversal potentials,
  2. stable pulses

Remark (dimensionality of state space) #rem:dim

The minimal dimensionality of the state space to get spikes (but not bursts) is n=2n=2.

Def. (Class-1 excitability) #def:excite1

Class one neurons have arbirary low frequencies (long periods). This also implies they can show very long (and in the presence of noise varied) latencies to the first spike.

Lets write the conductance-based model as follows

Def. (Conductance-based neuron model) #def:cbm

Cmv̇=IdcgLvIion(v,𝐚)C_{\mathrm{m}}\dot v=I_{\mathrm{dc}}- \bar g_\mathrm{L}v - I_\mathrm{ion}(v,\boldsymbol{a})

where

Iion(v,𝐚)=kGk(v,𝐚)(vEk)I_\mathrm{ion}(v,\boldsymbol{a}) = \sum_kG_k(v,\boldsymbol{a})(v-E_k)

and the components of the vector of gates 𝐚\boldsymbol{a} evolve according to

τk(v)ȧk=ak()(v)ak\tau_k(v)\dot a_k = a^{(\infty)}_k(v)-a_k.

For brevity, introduce 𝐱=[a1,...,an1,v]\boldsymbol{x}=[a_1,...,a_{n-1},v] and denote 1k<n:Fk=(ak()(v)ak)/τk(v)\forall1\leqslant k<n: F_k=(a^{(\infty)}_k(v)-a_k)/\tau_k(v) and Fn=(IdcgLvIion(v,𝐚))/CmF_n=(I_{\mathrm{dc}}-\bar g_\mathrm{L}v-I_\mathrm{ion}(v,\boldsymbol{a}))/C_{\mathrm{m}}, such that d𝐱/dt=𝐅(𝐱){\mathrm{d}}\boldsymbol{x}/{\mathrm{d}}t=\boldsymbol{F}(\boldsymbol{x}).

The goal is to find “the simplest” of the complicated bifurcations that allows for all known phenomena. What is meant by the simplest? Bifurcations can be ordered according to the codimension, i.e. how many parameters are required to tune into them. Codimension one bifurcations a points in one-dimensional parameter spaces, hey are lines in two-dimensional spaces and sheets in three-dimensional spaces, and so on. Which codimension should one allow for? It does not make sense to use more parameters than are biophysically present in all the neuron models from the model class of interest. These are three for conductance-based neuron models:

Def. (fundamental parameter) #def:params

Conductance-based neuron models may have a very diverse potpourri of ion channel with different parameters, but all share the following fundamental parameters: (Cm,Idc,gL)(C_{\mathrm{m}},I_{\mathrm{dc}},\bar g_\mathrm{L}). CmC_{\mathrm{m}} it the time-scale separation between kinetics and current balance equation. IdcI_{\mathrm{dc}} and gL\bar g_\mathrm{L} are the constant and linear term of the total membrane current. All other “kinetic parameters” depend on which exact channels are present in the nerve cell model and may vary substantially.

The simplest known bifurcation the can be broad into agreement with the phonomenology above is

Def. (Bogdanov-Takens cusp bifurcation) #def:bif-bt

A zero eigenvalue of algebraic multiplicity two. And a degenerecy assumption to be able to get rid of the fold via a cusp, which reads that the quadratic normal form coeficient must be zero.

Assertion (BT point) #assert:btc

A BT point implies the emergence of a homoclinic orbit to a saddle. Importantly, due to the coincidence with the planar limit cycle of the Hopf bifurcation, the homoclinic orbit is planar and not as three dimensional as a homoclinic orbit to a saddle-focus (Shilnikov). It may turn into that further away from the unfolding.

Cusp bifurcation
Cusp bifurcation

Assumptions (Electrophysiology) #ass:cbm

The following assumptions can be motivated from biophysics of excitable membranes or electrophysiological measurements:

  1. All conductances are positive, k:Gk>0\forall k:G_k>0.
  2. The kinetics of the ion channel gates is of first order, with all activation curves are bounded, monotonic, twice-differentiable functions, that become sufficiently flat at v=±v=\pm\infty. More specifically the derivative of the activation curve has to vanish at ±\pm\infty and do so faster than linearly, i.e. limv±va(v)v=0\lim_{v\to\pm\infty}v\frac{\partial a(v)}{\partial v}=0.
  3. Channels are voltage-gated only.
  4. Class 1 \to some slow positive feedback exists, i.e., ṽ:dI()dv|v=ṽ>0\exists \tilde v: \frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}\Big|_{v=\tilde v}>0

Remark (excitable system vs limit cycle) #rem:excitesys

Before proceeding to the actual argument, what of “excitable systems”. Many neurons do not spike tonically but only if excited by a synaptic perturbation or noise. Or in more physiological terms this is tonic spiking vs Poisson like spiking (low vs high CV). In such cases they cross a “potential barrier” or some threshold manifold and return to rest after a long excursion through state space. One way of prduceing a system with such a long excursion is to have it have a stable limit-cycle for “other parameters”. It turns out, that the Bogdavnov-Takens point plays a crucial role here two because it garantees there are homoclinic orbits attached to a saddle, whcih is one way to get such excrusions.

8.1 Stable resting potential and how many fixpoint are expected

Proposition (fixpoints of CBM) #prop:fp

From v̇=0\dot v=0 and ȧk=0\dot a_k=0 one gets:

v*=Idc+kGk(v*,𝐚*)EkkGk(v*,𝐚*)v^*=\frac{I_{\mathrm{dc}}+ \sum_kG_k(v^*,\boldsymbol{a}^*)E_k}{\sum_kG_k(v^*,\boldsymbol{a}^*)}

ak*=ak()(v*)a_k^*=a_k^{(\infty)}(v^*).

Provided solutions to the implicite definition of the voltage fixpoint exists, then for Idc=0I_{\mathrm{dc}}=0 they lie within the convex hull of the reversal potentials EkE_k.

Given that v:\forall v: the fixpoints of the gates can readily be found, the steady states of the whole system are the zeros of the steady-state steady-state II-vv curve:

Def. (steady-state II-vv curve) #def:ivcurve

Given that all gates have time to equilibrate to their steady state values (are in kinetic steady state), the steady-state II-vv curve is defined as

I()(v)=IdcgLvIion(v,𝐚*(v))I^{(\infty)}(v) = I_{\mathrm{dc}}-g_Lv-I_\mathrm{ion}(v,\boldsymbol{a}^*(v)).

It is a continuous and differentiable function just like 𝐅\boldsymbol{F}. This allows some weal statements bout the number of steady states expected in a conductance-based model.

Lemma (fixponts of class 1 neurons) #lem:fp-class1

Depending on their IdcI_{\mathrm{dc}} class 1 neurons can have at least one or three steady states.

Proof #

The partial derivative of the gating equations w.r.t vv at the kinetic steady state is

Fkv|ak=ak()=a()v1τk(v)\frac{\partial F_k}{\partial v}|_{a_k=a^{(\infty)}_k}=\frac{\partial a^{(\infty)}}{\partial v}\frac1{\tau_k(v)}.

Hence, the total derivative of a kinetic equation w.r.t vv is in that case

dFkdv=Fkv+Fkak=a()v1τk(v)1τk(v)av=0\frac{{\mathrm{d}}F_k}{{\mathrm{d}}v}=\frac{\partial F_k}{\partial v}+\frac{\partial F_k}{\partial a_k}=\frac{\partial a^{(\infty)}}{\partial v}\frac1{\tau_k(v)}-\frac{1}{\tau_k(v)}\frac{\partial a}{\partial v}=0,

which shows that in kinetic steady state a/v=a()/v\partial a/\partial v=\partial a^{(\infty)}/\partial v. Therefore, the total derivative of the steady-state II-vv relation is

1CmdI()dv=F0v+F0akak()v\frac{1}{C_{\mathrm{m}}}\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}=\frac{\partial F_0}{\partial v}+\frac{\partial F_0}{\partial a_k}\frac{\partial a^{(\infty)}_k}{\partial v}.

Inserting the definitions yields

dI()dv=gLkGk(v,𝐚)+kGk(v,𝐚)v(Ekv)+kG(v,𝐚)ak(Ekv)ak()v\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}=-g_L-\sum_kG_k(v,\boldsymbol{a})+\sum_k\frac{G_k(v,\boldsymbol{a})}{\partial v}(E_k-v)+\sum_k\frac{\partial G(v,\boldsymbol{a})}{\partial a_k}(E_k-v)\frac{\partial{a^{(\infty)}_k}}{\partial v}.

This can be used to inpsect the asymptotic behaviour of I()I^{(\infty)} at large positive and negative voltages:

limv±dI()dv=gLkGk(v,𝐚*)gL\lim\limits_{v\to\pm\infty}\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}=-g_L-\sum_kG_k(v,\boldsymbol{a}^*)\leqslant-g_L,

where the last two terms have vanished due to Assumptions #a) and b). This means the slope of the II-vv curve is negative at v=±v=\pm\infty, and thus I()=0I^{(\infty)}=0 should cross zero at least once. For the class 1 excitable system also Assumption #d) stands and consequently dI()dv\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v} must have at least one maximum. Consequently IdcI_{\mathrm{dc}} can be tuned such that there is at there are at least three fixpoints. \Box

Let us call the maximums v̂:d2I()dv2|v=v̂=0\exists\hat v:\frac{{\mathrm{d}}^2 I^{(\infty)}}{{\mathrm{d}}v^2}|_{v=\hat v}=0

8.2 Rank-one updated diagonal Jacobian

Next we are making use of Axiom #c) to calculate the Jacobian of a conductance-based neuron model

Observation (rank-1 updated diagonal Jacobian) #obs:rank1diag

The special structure of the Jacobian in a conductance-based neuron model, a rank-1 updated diagonal matrix,

𝐉=(F1a10F1v0Fn1an1Fn1vFna1Fnan1Fnv)\boldsymbol{J}= \begin{pmatrix} \frac{\partial F_1}{\partial a_1} & & 0 &\frac{\partial F_1}{\partial v} \\ & \ddots & & \vdots\\ 0 & & \frac{\partial F_{n-1}}{\partial a_{n-1}} & \frac{\partial F_{n-1}}{\partial v}\\ \frac{\partial F_n}{\partial a_1} & \cdots & \frac{\partial F_n}{\partial a_{n-1}} & \frac{\partial F_n}{\partial v} & \\ \end{pmatrix},

Proposition (Zero-eigenvectors of a CBM) #thm:cbnJacobian

Consider a conductance-based neuron model (Def. #) at a fold bifurcation, i.e., with a simple zero eigenvalue, λ1=0\lambda_1=0, at the saddle-node. Then the semi-stable (centre-)manifold is tangential to the right-eigenvector corresponding to λ1\lambda_1, which is given by

  1. 𝐫1=r1n(τkF1vτn1Fn1v1)=1κ(ddvak()(v)|v=𝐯sn1)\boldsymbol{r}_1=r_{1n}\begin{pmatrix}\tau_k\frac{\partial F_1}{\partial v}\\\vdots\\\tau_{n-1}\frac{\partial F_{n-1}}{\partial v}\\1\end{pmatrix}=\frac1\kappa\begin{pmatrix}\vdots\\\frac{\mathrm{d}}{{\mathrm{d}}v}a^{(\infty)}_{k}(v)|_{v={\boldsymbol{v}^\mathrm{sn}}}\\\vdots\\1\end{pmatrix},

where v=𝐯snv={\boldsymbol{v}^\mathrm{sn}} and ak=ak()(𝐯sn)a_k=a^{(\infty)}_k({\boldsymbol{v}^\mathrm{sn}}) was used. The normalisation constant κ\kappa is given at the end of Proof #.

The associated left-eigenvector is

  1. 𝐥1=(τkFnak1)\boldsymbol{l}_1=\begin{pmatrix}\tau_k\frac{\partial F_n}{\partial a_k}\\\vdots\\1\end{pmatrix},

where FnF_n denotes the r.h.s. of the current-balance equation and FkF_k for k=1,...,n1k=1,...,n-1 the r.h.s. of the kinetic equations. Note that the right eigenvector 𝐫1\boldsymbol{r}_1 gives the direction of the semi-stable manifold.

The determinant can be rewritten as

  1. |𝐉|=(1)n1kτkdI()dv|\boldsymbol{J}|= \frac{(-1)^{n-1}}{\prod_k\tau_k}\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}

Proof #

allows for an explicit calculation of the left and right eigenvectors corresponding to the eigenvalue λ1=0\lambda_1=0.

Choosing the zero eigenvector as 𝐫1=(𝛍1)\boldsymbol{r}_1={\boldsymbol{\mu}\choose1}, one obtains from 𝐉𝐫1=λ1𝐫1=0\boldsymbol{J} \boldsymbol{r}_1 = \lambda_1 \boldsymbol{r}_1 = 0 a set of nn equations:

  1. Fnv+j=1n1μjFnaj=0\frac{\partial F_n}{\partial v} + \sum_{j=1}^{n-1}\mu_j\frac{\partial F_n}{\partial a_j}=0, and

  2. Fkv+μkFkak=0\frac{\partial F_k}{\partial v}+\mu_k\frac{\partial F_k}{\partial a_k}=0 for k=1,...,n1k=1,...,n-1.

From Eqs. (40), it follows that μj=(Fjaj)1Fjv\mu_j=-\left(\frac{\partial F_j}{\partial a_j}\right)^{-1}\frac{\partial F_j}{\partial v}. This solution also solves Eq. (39), because the required zero eigenvalue implies that det𝐉=0\det\boldsymbol{J}=0. Note that according to Leibniz’s formula the determinant of a matrix

𝐉=(𝐀𝐁𝐂𝐃)\boldsymbol{J}=\begin{pmatrix}\boldsymbol{A} & \boldsymbol{B}\\\boldsymbol{C}&\boldsymbol{D}\end{pmatrix}

is |𝐀||𝐃𝐂𝐀1𝐁||\boldsymbol{A}|\,|\boldsymbol{D}-\boldsymbol{C}\boldsymbol{A}^{-1}\boldsymbol{B}|

  1. det𝐉=|F0vj=1n1(Fjaj)1FjvF0aj|k=1n1Fkak=0\det\boldsymbol{J}=\left|\frac{\partial F_0}{\partial v} - \sum_{j=1}^{n-1}\left(\frac{\partial F_j}{\partial a_j}\right)^{-1}\frac{\partial F_j}{\partial v}\frac{\partial F_0}{\partial a_j}\right|\prod_{k=1}^{n-1}\frac{\partial F_k}{\partial a_k}=0.

Since Fkak=1τk(vsn)<0\frac{\partial F_k}{\partial a_k}=-\frac{1}{\tau_k(v_\mathrm{sn})}<0, the first factor of the determinant must be zero. Hence, μj\mu_j also solves Eq. (39).

An analogue derivation for the left eigenvector, 𝐥0=(1𝛏)\boldsymbol{l}_0={1\choose\boldsymbol{\xi}}, results in the following set of equations,

  1. F0v+j=1n1ξjFjv=0\frac{\partial F_0}{\partial v} + \sum_{j=1}^{n-1}\xi_j\frac{\partial F_j}{\partial v}=0, and

  2. F0ak+ξkFkak=0\frac{\partial F_0}{\partial a_k}+\xi_k\frac{\partial F_k}{\partial a_k}=0 for k=1,...,n1k=1,...,n-1,

which yields ξj=(Fjaj)1F0aj\xi_j=-\left(\frac{\partial F_j}{\partial a_j}\right)^{-1}\frac{\partial F_0}{\partial a_j}. The normalisation constant to ensure 𝐥0𝐫0=1\boldsymbol{l}_0\cdot\boldsymbol{r}_0=1 is

κ=1+j=1n1τj2F0ajFjv|v=𝐯sn,ak=ak()(𝐯sn)\kappa=1+\sum_{j=1}^{n-1}\tau_j^2\frac{\partial F_0}{\partial a_j}\frac{\partial F_j}{\partial v}\big|_{v={\boldsymbol{v}^\mathrm{sn}},a_k=a^{(\infty)}_k({\boldsymbol{v}^\mathrm{sn}})}. Note that yy is in units of volt here. Other conventions are possible.

A long calculation shows that

α2=𝐥1𝐇𝐫1𝐫1=d2I()dv2\alpha_2=\boldsymbol{l}_1\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_1=\frac{{\mathrm{d}}^2I^{(\infty)}}{{\mathrm{d}}v^2}.

To the condition for the BTC bifurcation require the eigenvalue to be not simple, but of algebraic multiplicity two. This means there is a Jordan chain of rank two, such that for the left and right eigenvectors, 𝐥1\boldsymbol{l}_1 and 𝐫1\boldsymbol{r}_1, corresponding to λ1=0\lambda_1=0 there are also a pair of generalised eigenvectors (KUZNETSOV 2011), which are not denoted as 𝐥2\boldsymbol{l}_2 and 𝐫2\boldsymbol{r}_2.

(𝐉λ1𝐈)𝐫2=𝐫1(\boldsymbol{J}-\lambda_1\boldsymbol{I})\boldsymbol{r}_2=\boldsymbol{r}_1 and 𝐥2(𝐉λ1𝐈)=𝐥1\boldsymbol{l}_2(\boldsymbol{J}-\lambda_1\boldsymbol{I})=\boldsymbol{l}_1 with 𝐥1𝐫1=𝐥2𝐫2=0\boldsymbol{l}_1\cdot\boldsymbol{r}_1=\boldsymbol{l}_2\cdot\boldsymbol{r}_2=0 and 𝐥1𝐫2=𝐥2𝐫1=1\boldsymbol{l}_1\cdot\boldsymbol{r}_2=\boldsymbol{l}_2\cdot\boldsymbol{r}_1=1.

Note that the orthonormalisation is somewhat unintuitive, but useful. The BT case is λ1=0\lambda_1=0, such that 𝐉𝐫2=𝐫1\boldsymbol{J}\boldsymbol{r}_2=\boldsymbol{r}_1 and 𝐥2𝐉=𝐥1\boldsymbol{l}_2\boldsymbol{J}=\boldsymbol{l}_1. Numerically, this equivalent to solving an extended eigenvalue problem

(𝐉0𝐈𝐉)(𝐫1𝐫2)=0\begin{pmatrix}\boldsymbol{J} & \boldsymbol{0}\\-\boldsymbol{I} & \boldsymbol{J}\end{pmatrix} \begin{pmatrix}\boldsymbol{r}_1\\\boldsymbol{r}_2\end{pmatrix}=\boldsymbol{0}

Proposition (Generalised zero-eigenvectors of a CBM) #prop:gen-eig

Given the eigenvectors 𝐫1\boldsymbol{r}_1 and 𝐥1\boldsymbol{l}_1 and the difinition 𝐉𝐫2=𝐫1\boldsymbol{J}\boldsymbol{r}_2=\boldsymbol{r}_1 one finds

𝐫2=(τ1(r2nr1nτ1)F1vτn1(r2nr1nτn1)Fn1vr2n)\boldsymbol{r}_2=\begin{pmatrix}\tau_1(r_{2n}-r_{1n}\tau_1)\frac{\partial F_1}{\partial v}\\\vdots\\\tau_{n-1}(r_{2n}-r_{1n}\tau_{n-1})\frac{\partial F_{n-1}}{\partial v}\\ r_{2n}\end{pmatrix}

Proof #

As the normal eigenvectors have already been calculated above up to normalisation, the equations for the coefficients of the generalised eigenvectors are

k=1n1r2kFnak+r2nFnv=r1n\sum\limits_{k=1}^{n-1} r_{2k} \frac{\partial F_n}{\partial a_k} + r_{2n}\frac{\partial F_n}{\partial v} = r_{1n}

r2jτj+r2nFjv=r1nτjFjv,j=1,..,n1-\frac{r_{2j}}{\tau_j} + r_{2n}\frac{\partial F_j}{\partial v} = r_{1n}\tau_j\frac{\partial F_j}{\partial v},\qquad\forall j=1,..,n-1

r2j=τj(r2nr1nτj)Fjv\Longrightarrow r_{2j}=\tau_j(r_{2n}-r_{1n}\tau_j)\frac{\partial F_j}{\partial v}

k=1n1l2kFkv+l2nFnv=l1n\sum\limits_{k=1}^{n-1} l_{2k} \frac{\partial F_k}{\partial v} + l_{2n}\frac{\partial F_n}{\partial v} = l_{1n}

l2jτj+l2nFnaj=l1nτjFnaj,j=1,..,n1-\frac{l_{2j}}{\tau_j} + l_{2n}\frac{\partial F_n}{\partial a_j} = l_{1n}\tau_j\frac{\partial F_n}{\partial a_j},\qquad\forall j=1,..,n-1

l2j=τj(l2nl1nτj)Fnaj\Longrightarrow l_{2j}=\tau_j(l_{2n}-l_{1n}\tau_j)\frac{\partial F_n}{\partial a_j}

From 𝐥1𝐫1=0\boldsymbol{l}_1\cdot\boldsymbol{r}_1=0 one obtains

  1. 0=1+j=1n1τj2FnajFjv|v=𝐯sn,ak=ak()(𝐯sn)0=1+\sum_{j=1}^{n-1}\tau_j^2\frac{\partial F_n}{\partial a_j}\frac{\partial F_j}{\partial v}\big|_{v={\boldsymbol{v}^\mathrm{sn}},a_k=a^{(\infty)}_k({\boldsymbol{v}^\mathrm{sn}})}

From 𝐥1𝐫2=1\boldsymbol{l}_1\cdot\boldsymbol{r}_2=1 we obtain l11l_{11}

Once the generalised eigensystem spanning the nullspace are determined the normal form of BT bifurcation can be written as

After some longer calculation one obtains a condition for obtaining the generalised eigenvectors:

Conditions (BTC) #cond:btc

and the previous conditions

  1. dI()dv=0\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}=0

  2. dI()dv2=0\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v^2}=0

Proof #

The requirement of a zero determinant as defined in Eq. (38) leads to Eq. (45). Eq. (46) follows from the degenerecy requirement of the CUSP. It was shown as a consequence of Lemma # that such a point actually exists if one starts with a class 1 neuron.

From these conditions the following statement can be derived.

Lemma (unique param) #lem:uniquepar

From the above one can conclude that from the conditions of the BTC point a unique set of parameter (Idc,gl,Cm)(I_{\mathrm{dc}},g_l,C_{\mathrm{m}}) can be determined.

Proof #

Eq. (44) can be solved for

Cm=kIionakτkak()vC_{\mathrm{m}}=-\sum_k\frac{\partial I_\mathrm{ion}}{\partial a_k}\tau_k\frac{\partial a^{(\infty)}_k}{\partial v}

Eq. (45) depends only on glg_l and CmC_{\mathrm{m}}. It can be solved for

gl=1CmdIiondvg_l=-\frac{1}{C_{\mathrm{m}}}\frac{{\mathrm{d}}I_\mathrm{ion}}{{\mathrm{d}}v}

The input current can be obtained from the fixpoint condition of the current-balance equation

Idc=glv*+Iion(v*,𝐚(v*))I_{\mathrm{dc}}= g_lv^*+I_\mathrm{ion}(v^*,\boldsymbol{a}(v^*)),

only if the steady state is available. Luckily condition Eq. (46) is independent of the fundamental parameters and can be solved to find v*v^*.

Given the biophysical nature of the parameter both the conductance and the capacitance require positive values to be realistic.

Lemma (Permissible) #lem:permissible

For a CBM as defined above the BTC point is biophysically permissable if kIionaτkak()v<0\sum_k\frac{\partial I_\mathrm{ion}}{\partial a}\tau_k\frac{\partial{a^{(\infty)}_k}}{\partial v}<0

Proof #

From dI()dv=0\frac{{\mathrm{d}}I^{(\infty)}}{{\mathrm{d}}v}=0 \Longrightarrow dIiondv=gl\frac{{\mathrm{d}}I_\mathrm{ion}}{{\mathrm{d}}v}=-g_l. Hence if gl>0g_l>0 to begin with then

$\sign(g_l)=\sign(C_{\mathrm{m}})$

All SNIC neurons can be tuned into Hopf neurons (not necessariliy visa versa)

Proposition (Normal form of the BTC bifurcation) #prop:normal-BTC

The normal form of the degenerated BTC bifucation is

ẏ=w\dot y = w

ẇ=α2y2+α1wy+β2y+β1\dot w = \alpha_2y^2 + \alpha_1wy + \beta_2y + \beta_1

with α2=𝐥1𝐇𝐫1𝐫1=d2I()dv2=0\alpha_2=\boldsymbol{l}_1\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_1=\frac{{\mathrm{d}}^2I^{(\infty)}}{{\mathrm{d}}v^2}=0.

Proof #

Expand the solution in small variations along the directions spanning the centre manifold: 𝐱(t)=𝐱0+y(t)𝐫1+w(t)𝐫2\boldsymbol{x}(t)=\boldsymbol{x}_0 + y(t)\boldsymbol{r}_1 + w(t)\boldsymbol{r}_2. Then, project the flowfield onto the correpsonding left eignvectors and keep only leading order terms of the nonlinearity. The perturbation evolves according to

ẏ𝐫1+ẇ𝐫2=y𝐉𝐫1+w𝐉𝐫2+yw𝐇𝐫1𝐫2+12(y2𝐇𝐫1𝐫1+w2𝐇𝐫2𝐫2)\dot y\boldsymbol{r}_1+\dot w\boldsymbol{r}_2 = y\boldsymbol{J}\boldsymbol{r}_1 + w\boldsymbol{J}\boldsymbol{r}_2 + yw\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_2 + \frac12(y^2\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_1 + w^2\boldsymbol{H}\boldsymbol{r}_2\boldsymbol{r}_2).

Remember 𝐥1𝐫1=𝐥2𝐫2=0\boldsymbol{l}_1\cdot\boldsymbol{r}_1=\boldsymbol{l}_2\cdot\boldsymbol{r}_2=0 and 𝐥1𝐫2=𝐥2𝐫1=1\boldsymbol{l}_1\cdot\boldsymbol{r}_2=\boldsymbol{l}_2\cdot\boldsymbol{r}_1=1 and 𝐉𝐫2=𝐫1\boldsymbol{J}\boldsymbol{r}_2=\boldsymbol{r}_1, 𝐥2𝐉=𝐥1\boldsymbol{l}_2\boldsymbol{J}=\boldsymbol{l}_1.

Projecting onto 𝐥2\boldsymbol{l}_2

ẏ=y𝐥2𝐉𝐫1=0+w𝐥2𝐉𝐫2=1\dot y = y\,\underbrace{\boldsymbol{l}_2\boldsymbol{J}\boldsymbol{r}_1}_{=0} + w\,\underbrace{\boldsymbol{l}_2\boldsymbol{J}\boldsymbol{r}_2}_{=1}

Projecting onto 𝐥1\boldsymbol{l}_1

ẇ=12y2𝐥1𝐇𝐫1𝐫1+yw𝐥1𝐇𝐫1𝐫2+12w2𝐇𝐫2𝐫2\dot w = \frac12y^2\,\boldsymbol{l}_1\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_1+yw\,\boldsymbol{l}_1\boldsymbol{H}\boldsymbol{r}_1\boldsymbol{r}_2 + \frac12w^2\boldsymbol{H}\boldsymbol{r}_2\boldsymbol{r}_2

9 Ultimate causation: nervous system function

Comming back to the type of questions biological theory attempts to answer, see §(???), capters like §(???) describe the mechanistic builing block present in nerve tissue, theire function and where they came from is bracketed off. This is the discussion of proximate causes and the static form. What a bout function?

Note that function is not to be understood in a teleological sense, that it has a purpose given to it by something. All attempts by early German “theoretical biologist” (KE von Baer, JJ Uexkülls, J Reinke or J Schaxel) to use the nonscientific (non empirical) ideas of Hegel, Kant or Goethe as a theoretical bases for biology have proven useless today. Their theories (vitalism, semiotic, teleology) were either incorrect or had no utility, predictive power, empirical basis or falsifiability. They were romantic fancies of an introspective and ultimately authoritarian, self-serving view of the living.

But “why is it like this?”-questions are nagging us. Ultimately, evolutionary considerations, e.g. the contribution of nervous systems to fitness, may come into play. Fitness, however, is an abstract concept difficult to relate to some specific mechanism of a complex nervous system. In part this due to the general problem of fitness being a defined as an expectation value of future offspring.

Surely, one first needs to understand more about the function of nerve tissues, what it does. But where to start?

Nerve tissues can roughly be categorised into

  1. affarent pathways doing sensory processing,
  2. central decission making computations, and
  3. efferent motor pattern generators

There is no rule that nervous systems have to obay this modular design. These are to some extent arbitrary labels given by scientists to find their way arround.

Let’s briefly go through the three clases and discuss which kind of signals are expected (not an exclusive list).

9.1 Signals and senses

A classical, ecological theory of sensory organs aims to predict the structural organisation of the peripheral sensory architecture (???; ???), and the coding scheme used to represent information early in the pathway (???), based on the properties of the adequate stimuli occurring in the natural habitat. An alternative approach is inspired by the engineering task of system identification, in which blackbox (technical) systems are probed to obtain a response which helps uncovering the exact transformation executed by them. In contrast to behavioural experiments that utilise abstractions of the actual stimuli perceived by the animal (???) (Cpt. 3), systems identification often relies on unnatural stimuli that, however, possess beneficial statistical properties to facilitate the reverse-engineering scheme. Information theory can be applied within both paradigms as long as there is sufficient data. Examples of such artificial stimuli are a short transient pulse, mathematically idealised by Dirac’s δ\delta-function (Green’s function method) or ,,white’’ unstructured noise (Wiener’s kernel method). Both signals have a flat, uniform spectrum that contains all frequencies in an unbiased manner and thus the system’s response will not lack structures simply because of missing input complexity. In reality, a uniform spectrum of a physical stimulus can not extend its frequency range to infinity as this would require infinite power. Instead, there will be a reasonable upper bound to the frequency content, fc=sup{f:P(f)>0}f_c=\sup\{f:P(f)>0\}. In addition, artificial stimuli are at times chosen from a Gaussian process, for reasons including the invocation of a central limit theorem, arguments based on maximum entropy, or easement of analytical treatments.

9.1.1 Natural signals and their neuronal representation

One of the first observations that environmental conditions are indeed mapped into nerve pulse activity was an experiment on the toad’s optic nerve (Edgar Adrian 1928).

The external signals from the natural world relevant to animals can often be categorised into the three broad classes: (i)(i) signals relevant to basic body kinematics and movement; (ii)(ii) signals for predator or prey detection; and (iii)(iii) communication signals form other (often conspecific) animals. Here are some examples from each category.

Example (Botoxic blowfies and careful crickets) #eg:blowfly-circket

Large monopolar cells, a first order interneuron of the insect compound eye, ecnodes light intensities differences (contrast) I[0,Imax]I\in[0,I_\mathrm{max}] into the neurons raded potential response r[0,rmax)r\in[0,r_\mathrm{max}). This was investigated in Calliphora stygia (Laughlin 1981). The cricket Acheta domesticus represents the wind direction in 4 interneurons of the cercal system, each encoding a different subinterval of the angular variable θ[0,2π)\theta\in [0,2\pi) into its firing rate (Theunissen and Miller 1991).

In the above examples the input is a real valued interval IR\subset{I\!\!R}, while the encoding neuronal activity is real or a discrete number, IR\subset{I\!\!R} or IN{I\!\!N}. For example, the cercal system maps [0,2π)[0,fmax)4[0,2\pi)\mapsto [0,f_\mathrm{max})^4. These are seemingly simple mappings, what could possibly go wrong?

A couple of questions arise:

  1. How exactly is this read-out by upstream parts of the nervous system? Will there be errors?

  2. A related question is: How acute is the sense? What difference in wind direction can still be resolved?

  3. How long does the animal have to observe a constant wind direction to be able to encode it? How constant or stationary is the signal in the environment?

  4. What is the relation between the function and the dynamics discussed in previous chapters?

If there is no noise and the coding variable is a real number infinite precission could be possible. If the output is a firing rate, then upstream neurons have to estimate this based on finite amount of observed spikes3. At the same time it needs to be understood how these spike numbers are mechanistically generated from the neuronal dynamics.

Def (spike count) #def:spikecount

N(t)=ϕ(t)N(t)=\lfloor\phi(t)\rfloor.

Note that here the statistical quantity was already connected to a dynamic phase variable ϕ(t)\phi(t) for which the governing evolution equations are known in some regimes. Hence, one has related neuronal dynamics and coding (???).

Some signals may not be constant in time, but at least wide-sense stationary, meaning they have temporal correlations, but these are shift invariant.

Praying paddlefish
The primary afferents in the american paddlefish ’s electrosensory organ show a tonic response to fluctuating stimuli. In their muddy habitat the electrosensory system is used to detect their planctonic prey, like members of the crustacean genus . The frequency spectrum of the bioelectric profile emitted by single moving Daphnia shows that most power resides in a baseband below 8~Hz . For whole swarms the power is distributed over a larger interval, but still drops at high frequencies. In addition to naturalistic stimuli, the electrosensory system has been investigated with white noise analysis in order to determine the transfer function of the sensory affarents . This has confirmed that information transmission is focused on a passband between 0.5–20~Hz with a steep roll-off. The stimulus response coherence of affarents shows a bassband between 0.5–20~Hz with a steeper roll-off than expected from a simple RC-circuit. In part the steep roll-off is due to noise masking from epithelial oscillators at around 23~Hz. Yet, the notch at 23~Hz is, though less strong, also apparent in the filter gain, cfFig.3(b) in Ref.~.
Hiving hoverflies
The Dipteran haltere organ is a biological gyroscope that enables hoverflies to excel in aeronautics. The organ evolved from the second pair of wings. According to behavioural studies halteres measure Coriolis forces during flight. The white noise analysis of crane fly halteres under direct mechanical stimulation in Ref.~ shows that the system is sensitive to stimuli movements on time scales around 10~ms, while very fast stimulus changes are ignored.
Greeting grasshoppers
Many gomphocerin grasshoppers like use acoustic signalling to arrange mating behaviour. The detected signals are amplitude modulations of broad band carrier waves. The important information about the song structure resides in frequency band below 100~Hz, see Refs.~ and .

If the signals in the environment are time dependent functions x(t)x(t) from some function space, then the neuronal signals representing them could also be functions of time, y(t)y(t).

Again some questions:

  1. But how many degrees of freedom should the function have to represent the signal?

At the dispose are the pulse trains emmited by a neuron. An event time series

Def. (Spike train and empirical measure) #def:spiketrain

The event time series of action potential {tksp}k=1\{t^{\text{sp}}_k\}_{k=1}^\infty can be defined in terms of the dynamics, as the kthk^\mathrm{th} spike obaying

ϕ(tksp)=k\phi(t^{\text{sp}}_k)=k.

Further, a spike-train can be written as the empirical measure

  1. y(t)=k=1δ(ttksp)=k=1δ(ϕ(t)k)y(t)=\sum_{k=1}^\infty\delta(t-t^{\text{sp}}_k)=\sum_{k=1}^\infty\delta(\phi(t)-k).

Def. (I/O-equivalent phase oscillator) #def:io_phase

An I/O-equivalent phase oscillator produces with its threshold crossing process the same spike times {tksp}k\{t^{\text{sp}}_k\}_k as a particular conductance-based model (Lazar 2010).

Def. (Instantaneous firing rate) #def:instfire

This idealised quantity does not exist in finite neuronal systems. It is the ensemble average

r(t)=y(t)y|x=J(ϕ,t)|ϕ|1=0r(t)=\langle y(t)\rangle_{y|x}=J(\phi,t)|_{\phi|1=0},

and can be interpreted as a infinitely large population of the statistically i.i.d. neurons experiencing the same stimulus or one neuron being stimulated time and again. The probability flux is defined as

J(ϕ,t)=a(ϕ,t)p(ϕ,t)12σ(ϕ)ϕσ(ϕ)p(ϕ,t)J(\phi,t)=a(\phi,t) p(\phi,t) - \frac12 \sigma(\phi)\frac{\partial}{\partial\phi}\sigma(\phi)p(\phi,t). The “advection”, aa, is due to active transport.

A simplifying observations is that there often (but not always) is a separation of time scales between stimulus induced perturbations and those fluctuations that originate from internal biophysical noise in receptor neurons, see §(???). The three examples clearly fall into this category where the stimulation cutoff renders the stimuli orders of magnitude slower than the intrinsic noise. Therefore, the fast (white) noise, ξ(t)\xi(t), in the phase equation can be averaged

ϕ̇=f+Z(ϕ)x(t)+σξ(t)\dot\phi = f + Z(\phi)x(t) + \bar\sigma\,\xi(t) with σ2=01σ2(ϕ)dϕ\bar\sigma^2=\int_0^1\sigma^2(\phi)\,{\mathrm{d}}\phi.

Continuity equation

pt=Jϕ\frac{\partial p}{\partial t}=-\frac{\partial J}{\partial\phi}

9.1.2 Bandlimited signal

A time-continuous signal does not necessary imply an infinite degree of freedom. Deterministic signals like sinusoidal is specified by only two degrees of freedom. In contrast to white noise, a band limited signal, with no power above fcf_c, has only a limited number of 2fc2f_c degrees of freedom per second (???). In accordance with Whittacker-Shannon sampling it can be represented as a countable set of random numbers (finite degrees of freedom per unit time). Mutual information estimations can then be based on these instead of the continuous signal without loosing information. The physical signals that announce themselves on sensory organs can have their power distributed over many octaves. However, signals emitted by other animals, either voluntarily like the communication songs of grasshoppers or involuntary as moth falling prey to a bat, are often constrained to a frequency range that is related to the body size of the sound producing animal (see the next paragraph for three examples of relevant natural stimuli). In such occasions one can identify an upper frequency, fcf_c, above which no relevant information is expected. Additionally, all primary transduction processes have a time constant restricting the signal frequencies they can follow. This is sufficient reason to consider the stimulus x(t)x(t) to fall into the class of bandlimited signals (also called baseband signals).

More precisely according to the sampling theorem a band limit signal with cutoff frequency fcf_c requires at least a sampling rate of 2fc2f_c. With the help of the Whittaker-Shannon interpolation formula the signal can be written as (???)

  1. $x(t)= \sum_{n=-\infty}^\infty x_n\;\sinc\big(2tf_c-n\big)$

For an observation interval [0,T)[0,T) one needs a minimum of NT=2TfcN_T=\lceil2Tf_c\rceil samples to specify the continuous bandlimited process completely.

The coefficients xnx_n can be continous Gaussian random variables such that one has a bandlimited Gaussian random processs. In order to compare information theoretic bounds on the information transmitted through continuous channels (physical systems) to histogram based methods for the estimation of mutual information the Whittaker-Shannon sampling can be augmented by discretisation of the (Gaussian) random variables xnx_n. The simplest strategy is to start with a discrete set of binomially distributed random variables, $x_n\sim\binomialdist(k,p)$ with $p={\textstyle\frac 12}$, and use the fact that for large kk the statistics will be approximately Gaussian. With this there is a correspondence between a time-continuous process on an interval [0,T)[0,T) and a discrete binomial vector, so that the histogram frequency p([x11,,x1k,,xn1,,xnk])p([x_1^1,\ldots,x_1^k,\ldots,x_n^1,\ldots,x_n^k]) can be estimated. For example the blue time continuous process in {+(???)} between 50 and 100ms with fc=100f_c=100Hz is described by the 11 samples marked as green circles. The continuous process is statistically similar to a Gaussian process.

A bandlimited and binomially distributed (k=14, p_{b}={\textstyle\frac 12}) signal as an approximation to a Gaussian process (\mu=kp_{bn}, \sigma^2=kp_{bn}(1-p_{bn})). The cutoff frequency was 100~Hz, implying that the entire stimulus over 250~ms is exactly defined by 50 regularly spaced samples, as according to the Nyquist theorem one needs at least two samples per period to define an oscillation. To check if the binomial amplitude distribution is an adequate approximation to a Gaussian, the Shapiro-Wilk test for normality is used. The Samples are accepted as Gaussian with an \alpha-level of 0.4 (very small \alpha-leves would mean rejection of normality).
A bandlimited and binomially distributed (k=14k=14, $p_{b}={\textstyle\frac 12}$) signal as an approximation to a Gaussian process (μ=kpbn\mu=kp_{bn}, σ2=kpbn(1pbn)\sigma^2=kp_{bn}(1-p_{bn})). The cutoff frequency was 100~Hz, implying that the entire stimulus over 250~ms is exactly defined by 50 regularly spaced samples, as according to the Nyquist theorem one needs at least two samples per period to define an oscillation. To check if the binomial amplitude distribution is an adequate approximation to a Gaussian, the Shapiro-Wilk test for normality is used. The Samples are accepted as Gaussian with an α\alpha-level of 0.4 (very small α\alpha-leves would mean rejection of normality).

A bandlimited and binomially distributed (k=14k=14, $p_{b}={\textstyle\frac 12}$) signal as an approximation to a Gaussian process (μ=kpbn\mu=kp_{bn}, σ2=kpbn(1pbn)\sigma^2=kp_{bn}(1-p_{bn})). The cutoff frequency was 100~Hz, implying that the entire stimulus over 250~ms is exactly defined by 50 regularly spaced samples, as according to the Nyquist theorem one needs at least two samples per period to define an oscillation. To check if the binomial amplitude distribution is an adequate approximation to a Gaussian, the Shapiro-Wilk test for normality is used. The Samples are accepted as Gaussian with an α\alpha-level of 0.4 (very small α\alpha-leves would mean rejection of normality).

9.2 Decission making, multimodality, and universal codes

9.3 Motor system

9.4 Stimulus representation in single neurons

In previous lectures the biophysical details of the action potential voltage dynamics in membrane patches were discussed, highlighting its key features: (i)(i) the nonlinear feedback, provided by voltage gated channels; and (ii)(ii) the inherent stochasticity. These two combined make for a complex system that shirks being amenable to a complete analytical treatment. How can one answer the question ,,Which frequencies in a stimulus is it sensitive to?’’.

A particular useful strategy has been to derive approximate filters4 from model equations that are easier to analyse (???). These models must omit many biophysical details to remain analytically tractable.

The calculation of response filters for neuronal models is a long standing exercise in theoretical biology (???).

9.4.1 Spike-triggered ensemble

A classic approach to investigate which stimulus characteristics are represented in the activity of sensory neurons is to estimate their receptive field or spike-triggered average . The analysis characterises the components of a complex stimulus that are relevant to elicit a spike response or increased firing rate. Hence, the concept is related to that of a visual receptive field and could ultimately cause behavioural reactions. It is, therefore, related to the antedating concept of sign stimuli in ethology (???; ???). The method has been applied to recordings from the visual (???) and auditory (???) system.

One can also turn the paradigm around by applying a simple stimulus such as a delta pulse and characterising the nature of the response, which is then called transfer functions or point spread function in the visual system. The term transfer function stems form signal processing and stresses the fact that the neuron is viewed as a linear filter. It is surprising how complete a description of an inharently nonlinear neuron, a linear response filter can give, despite the fact that neurons contain so many nonlinear elements. In a sense it parallels the success of linear response theory in complex physical problems (???). Of course, there is no one linear filter that describes the whole range of dynamical features of the neuron. But different aspects may be characterised by different filter. In particular, we will linearise around the suprathreshold periodic solution to derive a firing rate filter in . Linearising the voltage response around the resting potential yields quite a different filter that describes the subthreshold response. So essentially it depends on which output variable one analyses.

There are two different filter

  1. the encoding filter mapps input to output;
  2. the decoding filter reconstructs the input from the output.

This is related to the question “what causes a neuron to spike” or how does observing the neuronal spike response affect the conditional statistics of the stimulus p(x|y)p(x|y)? After observing the spike train some external stimuli should have been more likely their cause than others. An immediate idea is to look at the first moments of the conditional ensemble. For example, the mean stimulus before a spike in one trial is

$\sta(\tau)=\frac{1}{{N_\text{sp}}}\sum_{k=1}^{N_\text{sp}}x(t_k-\tau)=\frac{1}{{N_\text{sp}}}\int_{-\infty}^\infty{\mathrm{d}}t\;y(t)x(t-\tau)$,

where we used the platonic spike train from Def. #.

To improve the signal to noise ratio, an experimenter might repeat the same stimulus, so that a trial average might smooth out the effect of intrinsic noise. Then, if the number of spikes, Nsp{N_\text{sp}}, per trial is large and almost constant (modest Fano factor) it may be approximated by

1Nspdty(t)x(tτ)y|x1Nspdtr(t)x(tτ)=1r0Rrx(τ),\left\langle\frac{1}{{N_\text{sp}}}\int{\mathrm{d}}t\;y(t)x(t-\tau)\right\rangle_{y|x} \approx\frac1{\langle{N_\text{sp}}\rangle} \int{\mathrm{d}}t\;r(t) x(t-\tau) = \frac1{r_0}R_{rx}(-\tau),

where r0r_0 is the mean firing rate. This gives rise to the term method, see Ref.~ and Ref.~.

Given that the rate causally depends on previous stimulus it can be expanded as a Volterra series in orders of the stimulus magnitude

r(t)=r0+0h(ts)x(s)ds+...r(t) = r_0 + \int_0^\infty h(t-s)x(s){\mathrm{d}}s + ...

This implies that also p(ϕ,t)=p0(ϕ)+p1(ϕ,t)+...p(\phi,t) = p_0(\phi) + p_1(\phi,t) + ... and J(ϕ,t)=J0+J1(ϕ,t)+...J(\phi,t) = J_0 + J_1(\phi,t) + ..., as well as the Fokker-Planck operator

ṗ=(L0+L1)p\dot p = (L_0 + L_1)p, where L0(ϕ)=(σ222ϕ2fϕ)L_0(\phi) = (\frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}-f\frac\partial{\partial\phi}), and L1(ϕ,t)=x(t)ϕL_1(\phi,t)=-x(t)\frac\partial{\partial\phi}.

The right eigen system of the Fokker-Planck operator, L0qk(ϕ)=μkqk(ϕ)L_0q_k(\phi)=\mu_kq_k(\phi) has the fourier basis as solution

qk(ϕ)=ei2πkϕq_k(\phi)=e^{\mathrm{i}2\pi k\phi} with eigenvalues μk=12(2πkσ)2ik2πf\mu_k=-\frac12(2\pi k\sigma)^2-\mathrm{i}k2\pi f.

k0\forall k\neq0 the eigenvalues have negative real part and the contribution of the corresponding eigenfunction vanishes for tt\to\infty, only the eigen mode k=0k=0, i.e., q0=1q_0=1 survives.

Hence, the zeroth order equation (x=0x=0), ṗ0=L0p0\dot p_0=L_0p_0 has a stationary solution p0=q0=1p_0=q_0=1. Consequently the zeroth order flux is constant, too. Hence, J(1,t)=fJ(1,t)=f.

The first order equation is

ṗ1=L0p1+L1p0=σ222ϕ2p1fϕp1Z(ϕ)x(t)\dot p_1=L_0p_1+L_1p_0=\frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}p_1-f\frac\partial{\partial\phi}p_1-Z'(\phi)x(t).

The simplest time dependent soution can be obtained by Laplace transforming and Fourier series expanding this equation, p(ϕ,t)P(ϕ,s)p(\phi,t)\to P(\phi,s).

σ222ϕ2PfϕPsP=Z(ϕ)X(s)\frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}P-f\frac\partial{\partial\phi}P -sP=Z'(\phi)X(s)

With Z(ϕ)=k=ckei2πkϕZ(\phi) = \sum_{k=-\infty}^\infty c_ke^{\mathrm{i}2\pi k\phi} and in the orthogonal Fourier basis

P1(k,t)=i2πkckX(s)s+i2πkf+12(kσ)2P_1(k,t)=-\frac{\mathrm{i}2\pi kc_kX(s)}{s+\mathrm{i}2\pi kf+\frac12(k\sigma)^2}

or

P(ϕ,s)=k=i2πkckX(s)ei2πkϕsμkP(\phi,s)=-\sum_{k=-\infty}^\infty\frac{\mathrm{i}2\pi kc_kX(s)e^{\mathrm{i}2\pi k\phi}}{s-\mu_k}

From the definition of the flux J(ϕ,t)=(f+Z(ϕ)x(t))pσ22ϕpJ(\phi,t)=(f+Z(\phi)x(t))p-\frac{\sigma^2}2\frac\partial{\partial\phi}p, the first order equation for the flux is

J1(k,s)=(fσ22ϕ)P1(k,s)+ckX(s)J_1(k,s)=\Big(f-\frac{\sigma^2}2\frac\partial{\partial\phi}\Big)P_1(k,s)+c_kX(s)

J1(k,s)=(μksμk+1)ckX(s)=(ssμk)ckX(s)J_1(k,s)=(\frac{\mu_k}{s-\mu_k}+1)c_kX(s)=(\frac{s}{s-\mu_k})c_kX(s)

R1=J(ϕ=0,s)=H(s)X(s)R_1=J(\phi=0,s)=H(s)X(s)

with H(s)=k=ckssμkH(s)=\sum_{k=-\infty}^\infty\frac{c_ks}{s-\mu_k}

for t>0t>0 h(t)=kck(μkeμkt+δ(t))h(t)=\sum_kc_k(\mu_k e^{\mu_kt}+\delta(t))

9.5 Properties of the firing rate’s linear response filter

The low frequency limit of the transfer function HH is

limf0H(i2πf)=c0=01Z(ϕ)dϕ=Z(ϕ)\lim_{f\to0}H(\mathrm{i}2\pi f)=c_0=\int_0^1Z(\phi){\mathrm{d}}\phi=\langle Z(\phi)\rangle.

Consequently, PRCs with a mean can transfer arbitrariliy low frequencies.

f0I=1P02P0I\frac{\partial f_0}{\partial I}=-\frac1{P_0^2}\frac{\partial P_0}{\partial I} =1P0Z(ϕ)=\frac{1}{P_0}\langle Z(\phi)\rangle.

The latter identity was already established in Eq. (17).

Proposition (DC limit of transfer function) #propo:dc-transfer

In the mean driven regime, the susceptibility of the instantaneous firing rate to very low frequencies is related to the mean component of the PRC and the derivative of the f0f_0-II curve

limf0H(i2πf)=Z(ϕ)=1f0f0I=lnf0I\lim_{f\to0}H(\mathrm{i}2\pi f)=\langle Z(\phi)\rangle=\frac1{f_0}\frac{\partial f_0}{\partial I}=\frac{\partial\ln f_0}{\partial I}.

Remark #rem:dc-transfer

As all known neuron models have f0/I>0\partial f_0/\partial I>0 above their rheobase, PRC are garantued to have a – at least small – mean component and the transfer function is susceptible to low frequencies. There are however quantitative differences between the different onset bifurcations.

Using ck=ak+ibkc_k=a_k+\mathrm{i}b_k one obtains Z(ϕ)=a0+kakcos(2πkϕ)+bksin(2πkϕ)Z(\phi)=a_0 + \sum_k^\infty a_k\cos(2\pi k\phi) + b_k\sin(2\pi k\phi). With this

limfH(i2πf)=kak=Z(0)\lim_{f\to\infty}H(\mathrm{i}2\pi f)=\sum_k a_k = Z(0)

Left: Poles-zeros plot of the transfer function. Right: transfer spectrum.
Left: Poles-zeros plot of the transfer function. Right: transfer spectrum.

10 Information theory for the living

Back to function. But where to start?

During the disucussion of action potential stability “pulse-based communication” using spike events was mentioned as a motivation in §(???). In very general terms, nervous system communication information for the senses to some place where decissions are made and further to places where motor commands are produced and executed, a reflex would be the simplest form (Braitenbergian view [@]). In the literature one finds a theory of communication called information theory (Shannon 1948).

10.1 Disclaimer

The author of this script is uncertain about the utility of information theory in biology at large (this is always a good reason to give a lecture on it), lit looks exciting but has some short commings: This theory was originally never intended to be used to describe living systems in which information bares meaning and consequences, i.e., the content of a message actually matters for your survival. Information theory deals with optimal compression, lossless transmission of signals irrespective of weather it is relevant or a youtube video. Nontheless a look at a quantitative science of communication may be insightfull.

Information theory has long been in the discussion as an ecological theory upon which to judge the performance of sensory processing (Atick 1992). This led Joseph Atick, Horace Barlow and others to postulate to use this theory to study how nervous systems adapt to the environment. The goal is to make quantitative predictions about what the connectivity in the nervous system and the structure of receptive fields should look like, for instance. In this sense, information theory was hoped to become the basis for an ecological theory of adaptation to environmental statistics (Atick 1992,Barlow (1961)).

It may not seems obvious how the mechanistic theories of dynamical system and the functional theory of inromation should be brought to gehterh, but that is of course the gould. One would like to decide which dynamical features have what kind of function, this will be addressed after the intruduction to information theory (G. B. Ermentrout, Galán, and Urban 2007; Schleimer and Stemmler 2009).

10.2 The communication process

The science of communication is concerned with at least two subtopics: (i)(i) the efficient representation of data (compression); and (ii)(ii) the safe transmission of data through unreliable channels (coding).

A source (the message) typically runns through the following processing sequence:

\to Compress \to Encode
\quad\downarrow
Channel
\quad\downarrow
\leftarrow Decompress \leftarrow Decode

One of the formal assertions of information theory is that these two problems can be addressed separately (without loss of generality or efficiency): Meaning first one compresses by removing redundancies. Then one again adds failsafe redundancy to combat the unreliability.

However convenient for engineering, this does not mean that biological systems have to make use of this fact (source coding and channel coding could be very intertwined).

Also note that many of the mathematical results in information theory are bounds, inequalities not achievable in real systems.

(An ergodic system has an invariant phase volume, which is a necessary condition for an organism to exist, in the sense that it would otherwise transgress phase boundaries and cese to exist)

10.3 Channel coding

One may start by wrongly assuming that the function of nerves is to communication information, irrespective of its semanting meaning. In reality, it is more likely that the function is to filter information, i.e., reliably communication the import parts and filter out the irrelevant information (or noise). But to introduce the classical quanities, dispense with the semantics.

If the function of nervous system (in part) is to communicate information, then the repertoire and speed of biological computations are limited by thermodynamic or metabolic constraints (???-). The second will deferred to later chapters, but as an outlook. There is no free lunch. It follows from basic thermodynamics that to obtain information work needs to be performed.

Here we consider a ,,memoryless’’ channel:

WmessageencodeXxp(y|x)noisy channelyYdecodeŴest. Message\overset{\text{message}}{W}\overset{\text{encode}}{\to}X\ni x\to\overset{\text{noisy channel}}{p(y|x)}\to y\in Y\overset{\text{decode}}{\to}\overset{\text{est. Message}}{\hat W}

The rate RR with which information can be transmitted over a channel without loss is measured Bitstransmission\frac{\text{Bits}}{\text{transmission}} for a discrete time channel or Bitssecond\frac{\text{Bits}}{\text{second}} for a continuous time channel. Operationally, we wish all bits that are transmitted to be recovered with negligible probability of error.

A measure of information could be:

The average reduction in the number of binary-questions needed to identify xXx\in X before and after observing yYy\in Y.

This would just be the difference in entropies:

${\mathcal I}(X;Y)=H(X)-H(X|Y)= -\sum\limits_{x\in X} p(x)\log_2 p(x) + \sum\limits_{x\in X\atop y\in Y}p(x,y)\log_2p(x|y)$.

This goes by the name of mutual information or transinformation. Remember marginalisation

p(x)=yYp(x,y)p(x)=\sum_{y\in Y}p(x,y).

So the mutual information is

${\mathcal I}=-\sum_{x\in X\atop y\in Y} p(x,y)\log_2 p(x) + \sum_{x\in X\atop y\in Y}p(x,y)\log_2\left(\frac{p(x|y)p(y)}{p(y)}\right)$

or

${\mathcal I}(X;Y)=\sum_{x\in X\atop y\in Y} p(x,y)\log\frac{p(x,y)}{p(x)p(y)}$

From the complete symmetry of this quantity we can also write it as

(X;Y)=H(Y)H(Y|X){\mathcal I}(X;Y)=H(Y)-H(Y|X).

The following Figure illustrates how the mutual information is related to respective (conditional) entropies of the input and output ensemble.


We have heard that (X;Y){\mathcal I}(X;Y) quantifies the statistical dependence of XX and YY, but how is that related to error-free communication? What makes the error is the channel. (X;Y){\mathcal I}(X;Y), however, also depends on the input ensemble p(x)p(x). To focus on the properties of the channel one can simply take an ,,optimal’’ input ensemble and define the channel capacity

C=maxp(x)(X;Y).C=\max_{p(x)}{\mathcal I}(X;Y).

It will be left to the sender to actually find the optimal input statistics. Note that (X;Y){\mathcal I}(X;Y) is a concave function (\cap) of p(x)p(x) over a convex set of probabilities {p(xi)}\{p(x_i)\} (this is relevant for procedures like the Arimoto-Blahut algorithm for estimating CC) and hence a local maximum is a global maximum.

p = []
while
  p(

Shannon’s channel coding theorem established that this capacity indeed measures the maximum amount of error-free information that can be transmitted.

10.4 Maximum entropy, discrete finite alphabet

A trivial upper bound on the channel capacity is

Cmin{log|X|,log|Y|}C\leqslant\min\{\log|X|,\log|Y|\}.

This is due to the maximum entropy property of the uniform distribution in the discrete case:

Example (Maximum entropy, discrete case):
For the derivate of the entropy from Eq. ((???)) one gets: piH=log2pi1\frac\partial{\partial p_i}H=-\log_2 p_i-1, which leads to pie1ip_i\propto e^{-1}\quad\forall i. After normalisation one has pi=1/Np_i=1/N, so the uniform distribution maximises the entropy in the discrete case.

10.5 Information transmission (continuous case)

Some of the signals in nature (and those applied in the laboratory) have continuous alphabets, e.g., light intensity values. Does it make sense to extend the definition of the entropy as

  1. H(x)=dxp(x)lnp(x)H(x) =-\int {\mathrm{d}}x\,p(x)\ln p(x)?

Maybe. Let us see how far one gets with this definition. It is called differential entropy by the way. Through quantisation this can be related back to the entropy of discrete alphabets.

If the p(x)p(x) is smooth then one associates the probability of being in iΔx(i+1)Δi\Delta\leqslant x\leqslant(i+1)\Delta with

pi=p(xi)Δ=iΔ(i+1)Δdxp(x)p_i=p(x_i)\Delta = \int_{i\Delta}^{(i+1)\Delta}{\mathrm{d}}x\,p(x)

The entropy of the quantised version is

HΔ=i=pilnpi=i=Δp(xi)ln(p(xi)Δ)H_\Delta=-\sum\limits_{i=-\infty}^\infty p_i\ln p_i=-\sum\limits_{i=-\infty}^\infty \Delta p(x_i)\ln(p(x_i)\Delta)

=i=Δp(xi)lnp(xi)lnΔ=-\sum\limits_{i=-\infty}^\infty \Delta p(x_i)\ln p(x_i)-\ln\Delta

This is problematic as the second term goes to infinity for small quantisations. Formally, if p(x)p(x) is Rieman integrable, then

limΔ0=HΔ+lnΔ=dxp(x)lnp(x)\lim_{\Delta\to0}=H_\Delta+\ln\Delta=-\int {\mathrm{d}}x\,p(x)\ln p(x)

Since the infinitesimal limit is taken we can also take nn to be the number of quantal intervals so that in the limit

$\lim_{\Delta\to0\atop n\to\infty}\ln\Delta\approx n$.

So that an nn-bit quantisation of a continous random variable xx has entropy

H(x)+nH(x)+n.

With the mutual information being the difference of entropies the quantisation term vanishes.

10.6 Optimal coding

A first example can be taken from rate coding5. In terms of our spike trains from Eq. (52), the instantaneous firing rate can be defined as

Example (Optimal coding in flies (Laughlin 1981)):
In short, Simon measured the distribution of light intensities that would impinge on the fly’s compund eye in a natural environment: It was more or less normally distributed. He asked What is the optimal contrast-response function? He postulated that an efficient allocation of resources (of the possible responses) would be, to spread more likely inputs over a broader range of firing rates. Why? It makes them easier to differentiate by upstream neurons. Via experiments he found that the fly’s compound eye approximates the cumulative probability distribution of ctrast lelvels in natural scenes.


Def. (Gaussian nonlinear channel) #def:infomax

The channel is defined by the following mapping:

y=f(x)+zy=f(x) + z

where z𝒩(0,σ2)z\sim \mathcal N(0,\sigma^2) is zero mean normally distributed .

Which encoding function (e.g., the constrast-response function from above) maximises the mutual information across the channel given that we know the input statistics?

Assumptions:

  1. ff is a continuous monotonous function with a somewhat restricted band width f(xmax)f(xmin)=af(x_{\max})-f(x_{\min})=a.
  2. For convenience the noise is really small, i.e., σ0\sigma\to0 pz(z)=δ(z)p_z(z)=\delta(z).

The mutual information is

I=p(y|x)p(x)lnp(y|x)p(y)dxdyI=\int p(y|x)p(x)\ln\frac{p(y|x)}{p(y)}dxdy

=H(y)H(y|x)=H(y)-H(y|x)

H[y|x]=H(z)=12ln(2πeσ2)=12(1+ln(2πσ2))H[y|x]=H(z)=\frac12\ln(2\pi e\sigma^2)=\frac12(1+\ln(2\pi\sigma^2))

The noise entropy does not depend on xx nor on f(x)f(x) in this particular channel.

To calculate the entropy of the output random variable yy not that it is added from two numbers, y=f+zy=f+z. This means

py(y)=pz(yw)pf(w)dw=pf(y)p_y(y) = \int p_z(y-w)p_f(w)dw = p_f(y).

The last identity used Assumption 1). From the transformation rules of random variables one has p(f)dfdx=p(x)p(f)\frac{df}{dx}=p(x) or p(f)=p(x)f(x)p(f)=\frac{p(x)}{f'(x)}.

Hence,

H(y)=dyp(y)lnp(y)=dyp(x)f(x)lnp(x)f(x)H(y)=-\int dy\,p(y)\ln p(y)=-\int dy \frac{p(x)}{f'(x)}\ln\frac{p(x)}{f'(x)} =dxp(x)lnp(x)+dxp(x)lnf(x)=-\int dx\,p(x)\ln p(x) + \int dx\,p(x)\ln f'(x)

From assumption 2) we have

J[f(x)]=xminxmaxf(x)dx=f(xmax)f(xmin)J[f'(x)]=\int_{x_{\min}}^{x_{\max}} f'(x)dx = f(x_{\max})-f(x_{\min})

C[f(x)]=H[f(x)]λ(J[f(x)]a)C[f'(x)]=H[f'(x)] - \lambda (J[f'(x)]-a)

δC[f(x)]δf(x)=p(x)f(x)λ=0\frac{\delta C[f'(x)]}{\delta f'(x)}=\frac{p(x)}{f'(x)}-\lambda = 0

Maximum Mutual information given p(x)p(x) is

f(x)=λp(x)f'(x)=\lambda p(x) or f(x)=λP(x)=λΦ(x)f(x)=\lambda P(x)=\lambda\Phi(x).

This means that under the assumptions above, the contrast-response function should be proportional to the cummulative density function of the input.

10.7 Maximum entropy, continuous alphabet

Let us pause and ask: What is the maximum entropy distribution for a continuous alphabet?

Example (Maxent, continuous with variance constraint):
Take a random variable xIRx\in{I\!\!R}. It can not be the uniform distribution as in the discrete case. In fact we need additional constraints. For example one may ask for the maximum entropy distribution, p(x)p(x), given a fixed mean, μ\mu, and variance σ\sigma. Using Lagrange’s multipliers to formulate the constraint optimisation problem

C=H+λ0dxp(x)+λ1(dxp(x)xμ)+λ2(dxp(x)(xμ)2σ2)C=H+\lambda_0\int{\mathrm{d}}x\,p(x)+\lambda_1\left(\int{\mathrm{d}}x\,p(x)x-\mu\right)+\lambda_2\left(\int{\mathrm{d}}x\,p(x)(x-\mu)^2-\sigma^2\right)

One searches for p(x)p(x) that fulfils δC/δp(x)=0\delta C/\delta p(x)=0 and C/λi=0\partial C/\partial\lambda_i=0, where δ/δp(x)\delta/\delta p(x) is the variantional derivative. One finds

p(x)=exp(λ01+λ1x+λ2(xμ)2)p(x)=\exp(\lambda_0-1+\lambda_1x+\lambda_2(x-\mu)^2).

With the normalisation from C/λi=0\partial C/\partial\lambda_i=0 we get the normal distribution (eλ01=1/2πσ2e^{\lambda_0-1}=1/\sqrt{2\pi\sigma^2}, λ1=0\lambda_1=0 and λ2=1/σ2\lambda_2=1/\sigma^2).

10.8 Bounds on the estimation error

What did we learn? Well

  1. H(x)HGauss(x)H(x)\leqslant H_\text{Gauss}(x).

If we consider the whole spike train y(t)=kδ(ttksp)y(t)=\sum_k\delta(t-t^{\mathrm{sp}}_k) (see Eq. (52)) as the ouput, not just its mean input intensity and mean firing rate, we have a continuous time dependent stochastic processes to deal with. Note that the definition of the spike train, if integrated, is related to the empirical distribution function.

If we average infinitely many trials we get the instantaneous firing rate r(t)=y(t)y|xr(t)=\langle y(t)\rangle_{y|x}. We will give a mathematical definition of r(t)r(t) later. Our communication channel looks like that

input signal neural response
x(t)IRx(t)\in{I\!\!R} \to pathway \to y(t)IRy(t)\in{I\!\!R}

The entropy rate of an ergodic process can be defined as the the entropy of the process at a given time conditioned on its past realisations in the limit of large time

  1. H[x]=limtH[xt|xτ:τ<t].H[x]=\lim_{t\to\infty}H[x_t|x_\tau:\tau<t].

The mutual information rate measures the amount of information a neural pathway transmits about an input signal x(t)x(t) is the mutual information rate,

  1. I[x,y]=H[y]H[y|x]encoding=H[x]H[x|y]decodingI[x,y]=H[y]-\underbrace{H[y|x]}_\text{encoding}= H[x]-\underbrace{H[x|y]}_\text{decoding},

between the stochastic process, x(t)x(t), and the stochastic response process, y(t)y(t). The entropy rate HH measures the number of discriminable input or output states, either by themselves, or conditioned on other variables.

The mutual information rates, which is the difference between unconditional and conditional entropy rates, characterises the number of input states that can be distinguished upon observing the output. The response entropy rates H[y]H[y], for instance, quantifies the number of typical responses per unit time, while H[x|y]H[x|y] is a measure of the decoding noise in the model. If this noise is zero, then the mutual information rate is simply H[x]H[x], provided that this is finite.

The conditional entropy rates H[y|x]H[y|x] and H[x|y]H[x|y], characterising the noise in the encoding and decoding model respectively, are each greater than zero. In information theory, these quantities are also called equivocation. Hence, both the stimulus and response entropy rates, H[x]H[x] and H[y]H[y], are upper bounds for the transmitted information.

Example (Info rate continuous discrete-time Gaussian process):
The OU-process from Eq. (27) is an example of a Gaussian process. Take a discretised version, x=[x(t1),x(t2),...]\vec x=[x(t_1),x(t_2),...], of it such that

p(x)=|2π𝐊|1/2exp(12(xμ)𝐊1(xμ))p(\vec x)=|2\pi\boldsymbol{K}|^{-1/2}\exp\left(-\frac12(\vec x-\vec\mu)^\dagger\boldsymbol{K}^{-1}(\vec x-\vec\mu)\right)

Hn=12ln2π|𝐊|+n2H_n=\frac12\ln2\pi|\boldsymbol{K}| + \frac n2

This is similar to the quantisation problem. It might be reasonable to drop the nn term (sometimes this is done, sometimes not). For the one dimensional case we have

H=12(1+ln2πσ2)H=\frac12(1+\ln2\pi\sigma^2)

or, if we drop the n=1n=1

  1. σ2=12πe2HGauss(x)\sigma^2=\frac1{2\pi}e^{2H_\text{Gauss}(x)}

Any practical consequences?

Def. (Estimation error):
For a random variable xx the estimation error of an estmator x̂\hat x is

(xx̂)2\langle(x-\hat x)^2\rangle

The best estimator is the mean, so the statisticians say. Therefore a lower bound to the estimation error is given by

  1. (xx̂)2(xx)2=σ2=12πe2HGauss(x)12πe2H(x)\langle(x-\hat x)^2\rangle\geqslant\langle(x-\langle x\rangle)^2\rangle=\sigma^2 =\frac1{2\pi}e^{2H_\text{Gauss}(x)}\geqslant\frac1{2\pi}e^{2H(x)}.

The lasst inequality followed from Eq. (55).

Example (Info rate continuous continuous-time Gaussian process):

Up to an additive constant the entropy of a multivariate Gaussian was67

$H=\frac12\ln|\boldsymbol{K}|=\frac12\tr\ln\boldsymbol{K}=\frac12\tr\ln\boldsymbol{\Lambda}=\frac12\sum_k\ln\lambda_k$.

First let us observe the process for ever x=[x(),...,x()]\vec x=[x(-\infty),...,x(\infty)], a bi-infinte series with countable elements. The elements of the covariance matrix 𝐊ij=c(i*dtj*dt)\boldsymbol{K}_{ij}=c(i*dt-j*dt). The orthogonal eigen-function for the continous covariance operator on tIRt\in{I\!\!R} are the Fourier bases. It can be shown that in the continuum limit

H=dflnλ(f)H=\int df\,\ln\lambda(f)

The result is due to Kolmogorov see also (???,Golshani and Pasha (2010)).

10.9 Linear stimulus reconstruction and a lower bound on the information rate (decoding view)

Without a complete probablilistic description of the model the mutual information can not be calculated. And even with a model the involved integrals may not be tracktable. At least two strategies to estimate it exist, though: Either, create a statistical ensemble of inputs and outputs by stimulation, followed by (histogram based) estimation techniques for the mutual information; or, find bounds on the information that can be evaluated more easily. In general, the estimation of mutual information from empirical data is difficult, as the sample size should be much larger than the size of the alphabet. Indeed, each element of the alphabet should be sampled multiple times so that the underlying statistical distribution can, in principle, be accurately estimated. But this prerequisite is often violated, so some techniques of estimating the information from data directly rely on extrapolation (???). The problem becomes particularly hairy when the alphabet is continuous or a temporal processes had to be discretised, resulting in large alphabets.

Another approach, which will allow us to perform a theoretical analysis of phase dynamics, relies on a comparison of the neuronal ``channel’’ to the continuous Gaussian channel (???,Cpt.~13) is analytically solvable (Cover and Thomas 2012). The approach can be used to estimate the information transmission of neuronal models (???). Also experimental system have ben analysed in this way, e.g.:

  1. the spider’s stretch receptor (???);
  2. the electric sense of weakly electric fish (???) and paddle fish (???);
  3. the posterior canal afferents in the turtle (???).

It was prooven that in that this method leads to a guaranteed lower bound of the actual information transmitted (???).

If one has experimental control of the stimulus ensemble it can choosen to be a Gaussian process with a flat spectrum up to a cutoff as to not introduce biases for certain frequency bands. The mutual information between stimulus x(t)x(t) and response y(t)y(t) can be bound from below as

  1. I[x,y]=H[x]H[x|y]H[x]Hgauss[x|y],I[x,y] = H[x] - H[x|y] \geqslant H[x] - H_\text{gauss}[x|y],

Here, Hgauss[x|y]H_\text{gauss}[x|y] is the equivocation of a process with the same mean and covariance structure as the original decoding noise, but with Gaussian statistics. The conditional entropy of the stimulus given the response is also called reconstruction noise entropy. It reflects the uncertainty remaining about the stimulus when particular responses have been observed.

It turns out that the inequality in Eq. (59) also holds if the estimator is conditioned. Say from the output of the neuron we estimate its input

x̂(t)=x̂t[y].\hat x(t) = \hat x_t[y].

So if the process has a stationary variance

(x(t)x̂(t))2x|yinfx̂(x(t)x̂(t))2x|y=(x(t)x(t)x|y)2x|y=e2Hgauss[x|y].\langle(x(t)-\hat x(t))^2\rangle_{x|y}\geqslant\inf\limits_{\hat x}\langle(x(t)-\hat x(t))^2\rangle_{x|y}=\langle(x(t)-\langle x(t)\rangle_{x|y})^2\rangle_{x|y}=e^{2H_\text{gauss}[x|y]}.

The second line uses the fact that in this case the optimal estimator is given by the conditional mean. We have the following bound on the equivocation

  1. $H[x|y] \leqslant H_\text{gauss}[x|y] \leqslant {\textstyle\frac 12}\ln\langle(x(t)-\hat x(t))^2\rangle\leqslant\ln\langle n^2(t)\rangle,$

The deviation between stimulus and its estimate, n(t)=x(t)x̂(t)n(t)=x(t)-\hat x(t), is treated as the noise process.

In order to obtain a tight bound the estimator x̂(t)\hat x(t) should be as close to optimal as possible. For the case of additional information given by the response of the neural system y(t)y(t) to the process x(t)x(t), the estimator should make use of it, x̂t[y]\hat x_t[y]. For simplicity one can assume it is carried out by a filtering operation, x̂(t)=(f*y)(t)\hat x(t)= (f*y)(t) specified later (Gabbiani and Koch 1998). Like the whole system the noise process is stationary, and its power spectral density, Pnn(ω)P_{nn}(\omega), is

$H_\text{gauss}[x|y]\leqslant {\textstyle\frac 12}\ln\langle n^2(t)\rangle={\textstyle\frac 12}\int_{-\infty}^\infty\frac{{\mathrm{d}}\omega}{2\pi}\ln P_{nn}(\omega).$

Together

  1. I[x,y]12dω2πln(Pxx(ω)Pnn(ω))I[x,y]\geqslant \frac12\int_{-\infty}^\infty\frac{{\mathrm{d}}\omega}{2\pi}\;\ln\left(\frac{P_{xx}(\omega)}{P_{nn}(\omega)}\right)

So as to render the inequality in Eq. (61) as tight a bound as possible one should use the optimal reconstruction filter from yy to x̂\hat x. In other words, it is necessary to extract as much information about xx from the spike train yy as possible.

The next step should be to find an expression for the noise spectrum, Pnn(ω)P_{nn}(\omega), based on the idea of ideal reconstruction of the stimulus. As opposed to the forward filter, the reconstruction filter depends on the stimulus statistics (even without effects such as adaptation). We seek the filter hh that minimises the variance of the mean square error

  1. n2(t)=(x(t)x̂(t))2\langle n^2(t) \rangle= \langle(x(t) - \hat x(t))^2\rangle, with x̂(t)=dτh(τ)y(tτ)\hat x(t)=\int{\mathrm{d}}\tau\,h(\tau) y(t-\tau).

Taking the variational derivative of the error w.r.t.
the filter (coefficients) h(τ)h(\tau) and equating this to zero one obtains the orthogonality condition for the optimal Wiener filter (???)

  1. n(t)y(tτ)=0\langle n(t) y(t-\tau)\rangle= 0, τ\forall\tau.

Inserting the definition of the error, n(t)=x(t)x̂(t)n(t)=x(t)-\hat x(t), into Eq. (64) yields

x(t)y(tτ)dτ1h(l)r(tτ1)r(tτ)=Rxy(τ)(h*Ryy)(τ)=0\langle x(t)y(t-\tau)\rangle-\int{\mathrm{d}}\tau_1\,h(l) \langle r(t-\tau_1)r(t-\tau)\rangle= R_{xy}(\tau) - (h\ast R_{yy})(\tau) = 0

In order to obtain hh we need to deconvolve the equation, which amounts to a division in the Fourier domain

  1. Pxy(ω)=H(ω)Pyy(ω)Hopt(ω)=Pxy(ω)Pyy(ω)P_{xy}(\omega)= H(\omega) P_{yy}(\omega)\Longrightarrow H^\text{opt}(\omega) = \frac{P_{xy}(\omega)}{P_{yy}(\omega)}.

To compute the mutual information rate, we now calculate the full auto-correlation of the noise when the filter is given by Eq. (65). For an arbitrary filter h(t)h(t), we have

Rnn(τ)=n(t)n(t+τ)=n(t)x(t+τ)dτ1h(τ1)n(t)y(t+ττ1)R_{nn}(\tau)=\langle n(t)n(t+\tau)\rangle=\langle n(t)x(t+\tau)\rangle-\int{\mathrm{d}}\tau_1\,h(\tau_1) \langle n(t)y(t+\tau-\tau_1)\rangle.

When the orthogonality condition of Eq. (64) holds, the right-most term vanishes. Proceeding by expanding the first term algebraically leads to an expression for the noise correlations

Rnn(τ)=n(t)x(t+τ)=Rxx(τ)dτ1h(τ1)Rxy(ττ1)R_{nn}(\tau)=\langle n(t)x(t+\tau)\rangle=R_{xx}(\tau)-\int{\mathrm{d}}\tau_1\,h(\tau_1)R_{xy}(\tau-\tau_1).

This expression can be Fourier transformed in order to obtain the required noise spectrum

Pnn(ω)=Pxx(ω)H(ω)Pxy(ω)=Pxx(ω)|Pxy(ω)|2Pyy(ω)P_{nn}(\omega)=P_{xx}(\omega)-H(\omega)P_{xy}(\omega)=P_{xx}(\omega)-\frac{|P_{xy}(\omega)|^2}{P_{yy}(\omega)},

where the definition of the optimal filter, Eq. (65), was utilised. This result can then be inserted into Eq. (62) to obtain the following well known bound on the information rate (???,lindner2005j:mi,holden1976b,stein1972j:coherenceInfo)

  1. [x,y]12ωcωcdω2πln(1|Pxy(ω)|2Pxx(ω)Pyy(ω))\mathcal M[x,y]\geqslant-\frac12\int_{-\omega_c}^{\omega_c}\frac{{\mathrm{d}}\omega}{2\pi}\ln\left(1-\frac{|P_{xy}(\omega)|^2}{P_{xx}(\omega)P_{yy}(\omega)}\right).

This information bound involves only spectra and cross-spectra of the communication channel’s input and output processes which are experimentally measurable in macroscopic recordings . The channel, in this case the neuron, can remain a black box. But since we can bridge the divide between microscopic, biophysical models and their filtering properties, we will, in the following section, derive the mutual information rates.

Def. (spectral coherence):
The expression in Eq. (66) is termed the squared signal response coherence
  1. c2(ω)=|Pxy(ω)|2Pxx(ω)Pyy(ω)c^2(\omega)=\frac{|P_{xy}(\omega)|^2}{P_{xx}(\omega)P_{yy}(\omega)}.

It quantifies the linearity of the relation between xx and yy in a way that it equals 1 if there is no noise and a linear filter transforms input to output. Both nonlinearities and noise reduce the coherence. The coherence can be estimated from data using the FFT algorithm and spectrum estimation. It is implemented in the free software packages scipy and matplotlib.

What renders the coherence a useful quanity? While the cross-spectrum informs us when stimulus and output have correlated power in a spectral band, the normlisation with the output auto-spectrum can be crucial. Say we find a particular high power in Pxy(ω)P_{xy}(\omega), this may not be ralted to the stimulus but could just be the intrinsic frequency of the neuron itself.

The coherence is a quantity without mention of the explicite decoding filter, in fact it is symmetric in input and output just as the mutual information. This is beneficial because one can now take the encoding view in the next chapter.

11 Linear response filter

The stimulus spectral density is given by the environment or controlled by the experimental setup, while cross- and output spectra need to be measured or calculated from the model in question. In this lecture cross-spectral and spike train spectral densities are derived from phase oscillator, see Eq. ((???)), that are in turn derived from biophysical models. This means we do not treat the channel as a blackbox but assume a particular model.

The first quantity we need to calculate Eq. (67) is the cross-spactrum. On the one hand it is the Fourier of the cross-corrlation, on the other it can be written as averages of the Fourier transforms (FT and average are linear operation).

  1. Pyx(ω)=ỹ(ω)x̃*(ω)y|xx=ỹ(ω)y|xx̃*(ω)xP_{yx}(\omega)=\langle\langle{\tilde y}(\omega){\tilde x}^*(\omega)\rangle_{y|x}\rangle_x= \langle\langle{\tilde y}(\omega)\rangle_{y|x}{\tilde x}^*(\omega)\rangle_x.

What has happened here? The cross-spectrum can be obtained by averaging the Fourier transformed quantities over trials and the stimulus ensemble. The average can be split into the conditinal average over trials y|x\langle\cdot\rangle_{y|x}, given a fixed, frozen stimulus and the average over the stimulus ensemble, (x\langle\cdot\rangle_x). The former is essential an average over the encoding noise (Chacron, Lindner, and Longtin 2004,Lindner, Chacron, and Longtin (2005)).
Observe that ỹ(ω)y|x\langle\tilde y(\omega)\rangle_{y|x} is Fourier transform of the trial averaged firing rate conditional on a frozen stimulus

r(t)=y(t)y|xr(t) = \langle y(t) \rangle_{y|x}.

Thus, it is sufficient to derive a filter that maps input x(t)x(t) to a firing rate, not an individual spike train.

Def. (forward, encoding filter):
Let g(t)g(t) be the filter kernel that maps stimulus into instantaneous firing rate
  1. r(t)=(g*x)(t)=tdrg(r)x(tr)r(t) = (g*x)(t) = \int_{-\infty}^t{\mathrm{d}}r g(r)x(t-r)

The filter is causal, since it is implemented by a differential equation and the Laplace transform yields

  1. R(s)=G(s)X(s)R(s) = G(s)X(s),

where G(s)G(s) denotes the transfer function of the encoding filter.

With this definition the cross-spectrum is written as

  1. Pyx(ω)=ỹ(ω)y|xx̃*(ω)x=G(iω)Pxx(ω)P_{yx}(\omega)=\langle\langle{\tilde y}(\omega)\rangle_{y|x}{\tilde x}^*(\omega)\rangle_x = G(\mathrm{i}\omega)P_{xx}(\omega).

This shows us that although we are computing the cross-spectrum of stimulus and spike train the response filter G(iω)G(\mathrm{i}\omega) for the instantaneous firing rate suffices. This simple relation reminds us of the fact that the cross-spectrum is not really a second order quantity, but can be exactly determined by linear response theory. The spike train spectrum Pyy(ω)P_{yy}(\omega), on the other hand, is truly a second order quantity, viz, the Fourier transform of the auto covariance, and can not be related to the linear response filter without further approximations.

11.1 Instantaneous firing rate in the continuum limit

The instantaneous firing rate can be estimated via binning and trial averaging

4Δr(kΔ)4\Delta r(k\Delta): 0 3 1 0 0 0 0 3 1 0 0 0 1 2 1 0
Trials: 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0
0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0

Two equivalent descriptions8 of Markov dynamics exist

  1. the path view,
  1. ẋ=u(x,t)=f(x)+g(x)ξ(t)\dot{x}= u(x,t) = f(x) + g(x)\xi(t);
  1. the ensemble view

ṗ(x,t)=\dot{p} (x,t)=?.

For (i) one can simulate many paths of a stochastic differential equation, with different intitial conditions and noise realisations. Histograms can provide the ensemble statistics. But it is also possible to find an evolution equation for the whole ensemble.

The relation between the two can be formally established by the

Def. (empirical measure):
Given a stochastic path realisation, x(t)x(t), from Eq. (72) the empirical measures is
  1. 𝜚(y,t)=δ(x(t)y)\varrho(y,t) = \delta(x(t)-y).

With all the lovely properties of a δ\delta-function.

The probability density

  1. p(y,t)=𝜚(y,t)p(y,t) = \langle\varrho(y,t)\rangle,

where the average is over all paths, x(t)x(t), and therefore over all realisations of the noise process ξ(t)\xi(t).

The chain rule yields

t𝜚(y,t)=ẋ(t)x𝜚(y,t)=y𝜚(y,t)u(x(t),t)\frac\partial{\partial t}\varrho(y,t)=\dot x(t)\frac\partial{\partial x} \varrho(y,t) =-\frac\partial{\partial y}\varrho(y,t)u(x(t),t)

Solving such a PDE involves time integration or other integral transformations (Fourier and Laplace’s). Since

dyδ(x(t)y)f(y)=f(x(t))=dyδ(x(t)y)f(x(t))\int{\mathrm{d}}y\,\delta(x(t)-y)f(y)=f(x(t))=\int{\mathrm{d}}y\,\delta(x(t)-y)f(x(t))

Therefore

  1. t𝜚(y,t)=y𝜚(y,t)u(y,t)=yf(y)𝜚(y,t)yg(y)ξ(t)𝜚(y,t)\frac\partial{\partial t}\varrho(y,t)=-\frac\partial{\partial y}\varrho(y,t)u(y,t) =-\frac\partial{\partial y}f(y)\varrho(y,t)-\frac\partial{\partial y}g(y)\xi(t)\varrho(y,t)

Averaging on both sides results in

tp(y,t)=yf(y)p(y,t)yg(y)ξ(t)𝜚(y,t)\frac\partial{\partial t}p(y,t)=-\frac\partial{\partial y}f(y)p(y,t)-\frac\partial{\partial y}g(y)\langle\xi(t)\varrho(y,t)\rangle.

The correlation between a stochastic process ξ(t)\xi(t) and a nonlinear functional of it is given by the Novikov-Furutsu-Donsker formula

  1. ξ(t)𝜚(y,t)=12δ𝜚δξ(t)=12yg(y)p(y,t)\langle\xi(t)\varrho(y,t)\rangle=-\frac12\langle\frac{\delta\varrho}{\delta\xi(t)}\rangle=-\frac12\frac\partial{\partial y}g(y)p(y,t)

All to gether we have the

Def. (Fokker-Planck equation):
The FPE correpsonding to Eq. (72) is
  1. tp(y,t)=12yg(y)yg(y)p(y,t)yf(y)p(y,t)\frac\partial{\partial t}p(y,t)=\frac12\frac\partial{\partial y}g(y)\frac\partial{\partial y}g(y)p(y,t)-\frac\partial{\partial y}f(y)p(y,t).

This is a diffusion equation. It can be written in the form

  1. tp(y,t)=yJ(y,t)\frac\partial{\partial t}p(y,t)=-\frac\partial{\partial y}J(y,t).

J(y,t)=f(y)p(y,t)12g(y)yg(y)p(y,t)J(y,t)=f(y)p(y,t)-\frac12g(y)\frac\partial{\partial y}g(y)p(y,t)

One needs boundary conditions and initial conditions to solve this PDE.

11.2 Phase flux = firing rate

For convenience rewrite the I/O-equivalent phase oscillator from Eq. ((???)) as

  1. ϕ̇=f0+z(ϕ)x(t)+σξ(t)\dot\phi=f_0+z(\phi)x(t) + \sigma\,\xi(t).

Here, as opposed to Eq. ((???)) Z(ϕ)η(ϕ,t)\vec Z(\phi)\cdot\vec\eta(\phi,t) was split into the part that results from the presented stimulus, now denoted x(t)x(t), and the part that originated from, e.g. intrinsic noise. From Eq. (27) the perturbation vector has

η=(x(t)σ1(v(ϕ))ξ1(t))\vec\eta = \begin{pmatrix} x(t)\\\sigma_1(v(\phi))\xi_1(t)\\\vdots\end{pmatrix}.

As long as the intrinic noise is fast enough compared to the stimulus an averaging can be applied9 to obtain an effective diffusion

σ2=01dϕiσi2(vLC(ϕ))\sigma^2 = \int_0^1{\mathrm{d}}\phi\sum_i\sigma_i^2(v_{\mathrm{LC}}(\phi)),

which enters Eq. (79). The benefit is that the corresponding The Fokker-Planck equation

  1. tp(ϕ,t)=σ222ϕ2p(ϕ,t)ϕ(f0+Z(ϕ)x(t))p(ϕ,t)=ϕJ(ϕ,t)\frac\partial{\partial t}p(\phi,t)=\frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}p(\phi,t)-\frac{\partial}{\partial\phi}(f_0+Z(\phi)x(t))p(\phi,t)=-\frac\partial{\partial\phi}J(\phi,t)

is tractable in a perturbation expansion. But first, remember what is the goal: Identification of the forward filter g(t)g(t) in r(t)=tdrg(r)x(tr)r(t)=\int_{-\infty}^t{\mathrm{d}}r\,g(r)x(t-r).

Phase density

r(t)=J(1,t)=(f0+Z(ϕ)x(t))p(ϕ,t)σ22ϕp(ϕ,t)|ϕ=1r(t)=J(1,t)=(f_0+Z(\phi)x(t))p(\phi,t)-\frac{\sigma^2}2\frac\partial{\partial\phi}p(\phi,t)\Big|_{\phi=1}

The equation is solved with the following perturabtion ansatz

  1. p(ϕ,t)=p0(ϕ)+ipi(ϕ,t)p(\phi,t)=p_0(\phi) + \sum_i p_i(\phi,t),

with the normalisation requirement

  1. dϕp0(ϕ)=1\int{\mathrm{d}}\phi\,p_0(\phi)=1 and iIN,tIR:dϕpi(ϕ,t)=0\forall i\in{I\!\!N},t\in{I\!\!R}:\int{\mathrm{d}}\phi\,p_i(\phi,t)=0.

The pi(ϕ,t)p_i(\phi,t) are assumed small correctin terms, given that the stimulus x(t)x(t) is also small.

at the same time one gets

J(ϕ,t)=J0(ϕ)+iJi(ϕ,t)J(\phi,t) = J_0(\phi) + \sum_iJ_i(\phi,t)

In perturbation theory one solves iteratively the equations of the same order

O(0)O(0):
For the lowest order the stimulus x(t)=0x(t)=0 the FPE is

ṗ0=σ222ϕ2p0f0ϕp0\dot p_0 = \frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}p_0-f_0\frac\partial{\partial\phi}p_0.

This equation is homogeneous so we find a time independent steady state solution

p0(ϕ)=1p_0(\phi)=1.

One may test this by back insertion. Here both the boundary conditions and Eq. (82) are inforced. The solution can be inserted into the definition of the flux to obtain the zeroth order flux

J0=f0p0=f0J_0 = f_0p_0 = f_0

The next order involves the time-dependent stimulus

O(x)O(x):
Note that mulitplying two terms of order O(x)O(x) yields a term of order O(x2)O(x^2) and is discarded. One is left with

ṗ1=σ222ϕ2p1f0ϕp1p0x(t)ϕZ(ϕ)\dot p_1= \frac{\sigma^2}2\frac{\partial^2}{\partial\phi^2}p_1 -f_0\frac{\partial}{\partial\phi}p_1 -p_0x(t)\frac{\partial}{\partial\phi}Z(\phi)

To turn the PDE into an algebraic equation one can apply both the Fourier series expansion and the Laplace transform. For this the Laplace transform of the stimulus is denoted as X(s)X(s) and the periodic function Z(ϕ)=k=ckei2πkϕZ(\phi)=\sum_{k=-\infty}^\infty c_ke^{\mathrm{i}2\pi k\phi}

sP1(k,s)=(2πkσ)22P1(k,s)f0i2πkP1(k,s)X(s)i2πkcksP_1(k,s)=-\frac{(2\pi k\sigma)^2}{2}P_1(k,s)-f_0\mathrm{i}2\pi kP_1(k,s)-X(s)\mathrm{i}2\pi kc_k

Solving for P1(ϕ,s)P_1(\phi,s)

P1(ϕ,s)=k=P1(k,s)ei2πkϕ=k=i2πkckX(s)ei2πkϕs+(2πkσ)2/2+i2πkf0P_1(\phi,s)=\sum_{k=-\infty}^\infty P_1(k,s)e^{\mathrm{i}2\pi k\phi}=\sum_{k=-\infty}^\infty\frac{\mathrm{i}2\pi kc_kX(s)e^{\mathrm{i}2\pi k\phi}}{s+(2\pi k\sigma)^2/2 + \mathrm{i}2\pi kf_0}

For brevity define the pole νk=(2πkσ)2/2i2πkf0\nu_k=-(2\pi k\sigma)^2/2-\mathrm{i}2\pi kf_0

The first order flux is

J1(k,s)=f0P1(k,s)+ckX(s)i2πkσ22P1(k,s)J_1(k,s)=f_0P_1(k,s)+c_kX(s)-\frac{\mathrm{i}2\pi k\sigma^2}2P_1(k,s)

=f0P1(k,s)+sνki2πkP1(k,s)i2πkσ22P1(k,s)=f_0P_1(k,s)+\frac{s-\nu_k}{\mathrm{i}2\pi k}P_1(k,s)-\frac{\mathrm{i}2\pi k\sigma^2}2P_1(k,s)

and

i2πkJ1(k,s)=i2πkf0P1(k,s)+(2πkσ)22P1(k,s)+(sνk)P1(k,s)=sP1(k,s)\mathrm{i}2\pi kJ_1(k,s)=\mathrm{i}2\pi kf_0P_1(k,s)+\frac{(2\pi k\sigma)^2}2P_1(k,s)+(s-\nu_k)P_1(k,s)=sP_1(k,s)

J1(1,s)=k=scksνkX(s)J_1(1,s)=\sum_{k=-\infty}^\infty\frac{sc_k}{s-\nu_k}X(s)

Happily and consistently one finds

G(s)=k=scksνkG(s)=\sum_{k=-\infty}^\infty\frac{sc_k}{s-\nu_k}

The power spectrum correponds to the imaginary axis, G(iω)G(\mathrm{i}\omega). The low frequency limit is

limω0G(iω)=c0=Z(ϕ)\lim_{\omega\to0}G(\mathrm{i}\omega)=c_0=\langle Z(\phi)\rangle.

With ck=ak+ibkc_k=a_k+\mathrm{i}b_k, the high frequency limit is

limωG(iω)=k=ak=Z(0)\lim_{\omega\to\infty}G(\mathrm{i}\omega)=\sum_{k=-\infty}^\infty a_k=Z(0).

12 No free lunch: energy and information processing

13 Networks

Bibliography

Adrian, E.D., and Y. Zotterman. 1926. “The Impulses Produced by Sensory Nerve-Endings. Part 2. the Reponse of a Single End-Organ.” J Physiol 61.

Adrian, Edgar. 1928. The Basis of Sensation: The Action of the Sense Organs. Christophers.

Atick, Joseph J. 1992. “Could Information Theory Provide an Ecological Theory of Sensory Processing?” Network: Computation in Neural Systems 3 (2): 213–51. http://www.tandfonline.com/doi/abs/10.1088/0954-898X_3_2_009.

Barlow, HB. 1961. “Possible Principles Underlying the Transformations of Sensory Messages.” In Sensory Communication, edited by WA Rosenblith, 217–34. MIT Press.

Chacron, Maurice J., Benjamin Lindner, and André Longtin. 2004. “Noise Shaping by Interval Correlations Increases Information Transfer.” Physical Review Letters 92 (8). doi:10.1103/PhysRevLett.92.080601.

Chicone, Carmen. 2006. Ordinary Differential Equations with Applications. Springer Science & Business Media.

Cover, Thomas M., and Joy A. Thomas. 2012. Elements of Information Theory. John Wiley & Sons.

Doyle, D. A. 1998. “The Structure of the Potassium Channel: Molecular Basis of K+ Conduction and Selectivity.” Article. Science 280 (5360): 69–77. doi:10.1126/science.280.5360.69.

Ermentrout, Bard. 1996. “Type I Membranes, Phase Resetting Curves, and Synchrony.” Neural Computation 8 (5): 979–1001. http://www.mitpressjournals.org/doi/abs/10.1162/neco.1996.8.5.979.

Ermentrout, G. Bard, Roberto F. Galán, and Nathaniel N. Urban. 2007. “Relating Neural Dynamics to Neural Coding.” Article. Physical Review Letters 99 (24). American Physical Society (APS). doi:10.1103/physrevlett.99.248103.

Fox, Ronald F., and Yan-nan Lu. 1994. “Emergent Collective Behavior in Large Numbers of Globally Coupled Independently Stochastic Ion Channels.” Physical Review E 49 (4): 3421–31. doi:10.1103/PhysRevE.49.3421.

Gabbiani, Fabrizio, and Christof Koch. 1998. “Principles of Spike Train Analysis.” Methods in Neuronal Modeling 12 (4): 313–60. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.383.7571&rep=rep1&type=pdf.

Goldwyn, Joshua H., and Eric Shea-Brown. 2011. “The What and Where of Adding Channel Noise to the Hodgkin-Huxley Equations.” PLoS Comput Biol 7 (11): e1002247. doi:10.1371/journal.pcbi.1002247.

Goldwyn, Joshua H., Nikita S. Imennov, Michael Famulare, and Eric Shea-Brown. 2011. “Stochastic Differential Equation Models for Ion Channel Noise in Hodgkin-Huxley Neurons.” Physical Review E 83 (4): 041908. doi:10.1103/PhysRevE.83.041908.

Golshani, Leila, and Einollah Pasha. 2010. “Rényi Entropy Rate for Gaussian Processes.” Information Sciences 180 (8): 1486–91. doi:10.1016/j.ins.2009.12.012.

Hodgkin, Alan L., and Andrew F. Huxley. 1952. “A Quantitative Description of Membrane Current and Its Application to Conduction and Excitation in Nerve.” The Journal of Physiology 117 (4): 500. http://www.ncbi.nlm.nih.gov/pmc/articles/pmc1392413/.

Izhikevich, Eugene M. 2007. Dynamical Systems in Neuroscience. MIT Press.

Kielhöfer, Hansjörg. 2011. Bifurcation Theory: An Introduction with Applications to Partial Differential Equations. Springer Science & Business Media.

Kirst, Christoph. 2012. “Synchronization, Neuronal Excitability, and Information Flow in Networks of Neuronal Oscillators.” PhD thesis, Niedersächsische Staats-und Universitätsbibliothek Göttingen. http://ediss.uni-goettingen.de/bitstream/handle/11858/00-1735-0000-000D-F08D-2/kirst.pdf?sequence=1.

Kirst, Christoph, Julian Ammer, Felix Felmy, Andreas Herz, and Martin Stemmler. 2015. “Fundamental Structure and Modulation of Neuronal Excitability: Synaptic Control of Coding, Resonance, and Network Synchronization.” biorxiv;022475v1. http://biorxiv.org/lookup/doi/10.1101/022475.

KUZNETSOV, YU. A. 2011. “PRACTICAL Computation of Normal Forms on Center Manifolds at Degenerate BogdanovTAKENS Bifurcations.” Article. International Journal of Bifurcation and Chaos 15 (11): 3535–46. doi:10.1142/S0218127405014209.

Laughlin, S. 1981. “A Simple Coding Procedure Enhances a Neuron’s Information Capacity.” Zeitschrift Fur Naturforschung. Section C, Biosciences 36 (9-10): 910–12.

Lazar, Aurel A. 2010. “Population Encoding with Hodgkin–Huxley Neurons.” IEEE Transactions on Information Theory 56 (2): 821–37. doi:10.1109/TIT.2009.2037040.

Linaro, Daniele, Marco Storace, and Michele Giugliano. 2011. “Accurate and Fast Simulation of Channel Noise in Conductance-Based Model Neurons by Diffusion Approximation.” PLoS Comput Biol 7 (3): e1001102. doi:10.1371/journal.pcbi.1001102.

Lindner, Benjamin, Maurice J. Chacron, and André Longtin. 2005. “Integrate-and-Fire Neurons with Threshold Noise: A Tractable Model of How Interspike Interval Correlations Affect Neuronal Signal Transmission.” Physical Review E 72 (2). doi:10.1103/PhysRevE.72.021911.

Orio, Patricio, and Daniel Soudry. 2012. “Simple, Fast and Accurate Implementation of the Diffusion Approximation Algorithm for Stochastic Ion Channels with Multiple States.” PLoS ONE 7 (5): e36670. doi:10.1371/journal.pone.0036670.

Osinga, Hinke M., and Jeff Moehlis. 2010. “Continuation-Based Computation of Global Isochrons.” SIAM Journal on Applied Dynamical Systems 9 (4): 1201–28. doi:10.1137/090777244.

Pereira, Ulises, Pierre Coullet, and Enrique Tirapegui. 2015. “The Bogdanov–Takens Normal Form: A Minimal Model for Single Neuron Dynamics.” Entropy 17 (12): 7859–74. doi:10.3390/e17127850.

Pezo, Danilo, Daniel Soudry, and Patricio Orio. 2014. “Diffusion Approximation-Based Simulation of Stochastic Ion Channels: Which Method to Use?” Frontiers in Computational Neuroscience 8: 139. doi:10.3389/fncom.2014.00139.

Schleimer, Jan-Hendrik, and Martin Stemmler. 2009. “Coding of Information in Limit Cycle Oscillators.” Article. Physical Review Letters 103 (24). American Physical Society (APS). doi:10.1103/physrevlett.103.248105.

Shannon, C. E. 1948. “A Mathematical Theory of Communication.” Article. Bell System Technical Journal 27 (3). Institute of Electrical; Electronics Engineers (IEEE): 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.

Theunissen, F. E., and J. P. Miller. 1991. “Representation of Sensory Information in the Cricket Cercal Sensory System. Ii. Information Theoretic Calculation of System Accuracy and Optimal Tuning-Curve Widths of Four Primary Interneurons.” Article. Journal of Neurophysiology 66 (5). American Physiological Society: 1690–1703. doi:10.1152/jn.1991.66.5.1690.

Winfree, A. T. 1974. “Patterns of Phase Compromise in Biological Cycles.” Journal of Mathematical Biology 1 (1): 73–93. http://www.springerlink.com/index/0668278661N54486.pdf.


  1. Actually it does, if you bring energetic considerations into baring.

  2. For now this means that its an invertable matrix; t:𝐑1(t)\forall t:\exists\boldsymbol{R}^{-1}(t)

  3. Estimators should be at least consistent, hopefuly unbiased, maybe efficient.

  4. The biological literature often uses dynamical systems jargon to describe mechanistic explanation and resorts to engineering vocabulary in order to describe the function of nerve cells.

  5. Here, the alphabet is a firing rate f0IRf_0\in{I\!\!R}. It might be more reasonable to think about spike counts in a given time window, which is still a countable set.

  6. Remember the determinant is |𝐊|=i=1nλi|\boldsymbol{K}|=\prod_{i=1}^n\lambda_i. So ln|𝐊|=i=1nlnλi\ln|\boldsymbol{K}|=\sum_{i=1}^n\ln\lambda_i. In terms of the matrix-logarithm and the trace the determinant can be expressed as $|\boldsymbol{K}|=\exp\tr\ln\boldsymbol{K}$.

  7. Because the trace is invariant under similarity transforms $\tr\boldsymbol{K}=\sum_{i=1}^n\lambda_i$.

  8. Similar to the particle and wave duality in quantum mechanics.

  9. This can be done formally using Chapman-Enskog or adiabatic elimination procedures. The derivation may be included in to future.