vaccog6

IMMUNE COGNITION AND
PATHOGENIC CHALLENGE:
SUDDEN AND CHRONIC
INFECTION
Rodrick Wallace, PhD
The New York State Psychiatric Institute∗
January 10, 2002
Abstract
We continue to study the implications of IR Cohen’s theory of immune cognition, in the presence of both sudden and
chronic pathogenic challenge, through a mathematical model
derived from the Large Deviations Program of applied probability. The analysis makes explicit the linkage between an
individual’s ‘immunocultural condensation’ and embedding
social or historical structures and processes, in particular
power relations between groups. We use methods adapted
from the theory of ecosystem resilience to explore the consequences of the sudden ‘perturbation’ caused by infection
in the context of such embedding, and examine a ‘stage’
model for chronic infection involving multiple phase transitions analogous to ‘learning plateaus’ in neural networks or
‘punctuated equilibria’ in adaptive systems.
Introduction
∗
Address correspondence to: R Wallace, PISCS Inc., 549 W. 123 St., Suite 16F, New
York, NY, 10027. Telephone (212) 865-4766, email [email protected]. Affiliation is
for identification only.
1
A recent paper by R Wallace and RG Wallace (2001) broadly explored
the use of Atlan and Cohen’s (1998) information theory treatment of immune
cognition for reinterpreting two important population case histories of chronic
infection in the context of ‘cultural variation,’ namely malaria in Burkina
Faso and ‘heterosexual AIDS’ in the US. Reductionist biomedical paradigm
had assigned differences of immune response and disease expression between
populations in a manner so as to reify ‘race’. That is, different responses to
pathogenic challenge were attributed entirely to genetic differences between
ethnic groups, ignoring the considerable impact of ‘path dependent’ patterns
of past and current power relations between coresident communities.
Reductionist perspective, at least on this matter, thus appears to incorporate a systematic logical fallacy which could significantly, and possibly
profoundly, distort attempts to develop effective vaccine strategies against a
number of disorders.
Here we present details of the mathematical model described in R Wallace
and RG Wallace (2001) which allows the interactive condensation of cognitive systems with the ‘selection pressures’ of embedding social constraints.
The work relies on foundations supplied by the Large Deviations Program of
applied probability, which provides a unified vocabulary for addressing language structures, in a large sense, using mathematical techniques adapted
from statistical mechanics and the theory of fluctuations.
The essence of the approach is to express the cognitive pattern recognitionand-response described as characterizing immune cognition by Atlan and Cohen (1998) in terms of a ‘language,’ in a broad sense, and then to show how
that language can interact and coalesce with similar cognitive languages at
lager scales – central nervous system (CNS) and the embedding local sociocultural network. The next step is to demonstrate that these may, in turn,
interact – and indeed coalesce – with a non-cognitive structured language
of externally imposed constraint. The process of developing the formalism
will make clear that change in such languages takes place ‘on the surface,’ in
a sense, of what has gone before, so that path dependence – the persisting
burden of history – becomes a natural outcome.
Next we describe the behavior of condensed cognitive systems under the
perturbation of a sudden pathogenic challenge, using a generalization of Ives’
(1995) and Holling’s (1973, 1992) ecosystem resilience analysis. Individuals
within more resilient social structures – typically those of the powerful rather
than of the subordinate – will usually suffer less ‘amplification’ of symptom
patterns caused by the infection.
2
Finally we examine chronic pathogenic challenge from this perspective,
and propose a ‘stage’ model formally analogous to learning plateaus in cognitive, and ‘punctuated equilibrium’ in adaptive, systems.
We begin with a summary of relevant theory.
Information theory preliminaries
Suppose we have an ordered set of random variables, Xk , at ‘times’ k =
1, 2, ... – which we call X – that emits sequences taken from some fixed
alphabet of possible outcomes. Thus an output sequence of length n, xn ,
termed a path, will have the form
xn = (α0 , α1 , ..., αn−1 )
where αk is the value at step k of the stochastic variate Xk ,
Xk = αk .
A particular sequence xn will have the probability
P (X0 = α0 , X1 = α1 , ..., Xn−1 = αn−1 ),
(1)
with associated conditional probabilities
P (Xn = αn |Xn−1 = αn−1 , ..., X0 = α0 ).
(2)
Thus substrings of xn are not, in general, stochastically independent.
That is, there may be powerful serial correlations along the xn . We call X
3
an information source, and are particularly interested in sources for which
the long run frequencies of strings converge stochastically to their timeindependent probabilities, generalizing the law of large numbers. These we
call ergodic (Ash, 1990; Cover and Thomas, 1991; Khinchine, 1957, hereafter
known as ACTK). If the probabilities of strings do not change in time, the
source is called memoryless.
We shall be interested in sources which can be parametized and that
are, with respect to that parameter, piecewise adiabatically memoryless, i.e.
probabilities closely track parameter changes within a ‘piece,’ but may change
suddenly between pieces. This allows us to apply the simplest results from
information theory, and to use renormalization methods to examine transitions between ‘pieces.’ Learning plateaus represent regions where, with
respect to the parameter, the system is, to first approximation, adiabatically
memoryless in this sense, analogous to adiabatic physical systems in which
rapidly changing phenomena closely track slowly varying driving parameters.
In what follows we use the term ‘ergodic,’ to mean ‘piecewise adiabatically
memoryless ergodic.’
For any ergodic information source it is possible to divide all possible
sequences of output, in the limit of large n, into two sets, S1 and S2 , having,
respectively, very high and very low probabilities of occurrence. Sequences
in S1 we call meaningful.
The content of information theory’s Shannon-McMillan Theorem is twofold:
First, if there are N (n) meaningful sequences of length n, where N (n) than the number of all possible sequences of length n, then, for each ergodic
information source X, there is a unique, path-independent number H[X]
such that
log[N (n)]
= H[X].
n→∞
n
lim
(3)
See ACTK for details.
Thus, for large n, the probability of any meaningful path of length n 1
– independent of path – is approximately
4
P (xn ∈ S1 ) ∝ exp(−nH[X]) ∝ 1/N (n).
(3)
This is the asymptotic equipartition property and the Shannon-McMillan
Theorem is often called the Asymptotic Equipartition Theorem (AEPT).
H[X] is the splitting criterion between the two sets S1 and S2 , and the
second part of the Shannon-McMillan Theorem involves its calculation. This
requires introduction of some nomenclature.
Suppose we have stochastic variables X and Y which take the values xj
and yk with probability distributions
P (X = xj ) = Pj
P (Y = yk ) = Pk
Let the joint and conditional probability distributions of X and Y be
given, respectively, as
P (X = xj , Y = yk ) = Pj,k
P (Y = yk |X = xj ) = P (yk |xj )
The Shannon uncertainties of X and of Y are, respectively
H(X) = −
X
Pj log(Pj )
j
H(Y ) = −
X
k
(4)
5
Pk log(Pj )
The joint uncertainty of X and Y is defined as
H(X, Y ) = −
X
Pj,k log(Pj,k ).
j,k
(5)
The conditional uncertainty of Y given X is defined as
H(Y |X) = −
X
Pj,k log[P (yk |xj )].
j,k
(6)
Note that by expanding P (yk |xj ) we obtain
H(X|Y ) = H(X, Y ) − H(Y ).
The second part of the Shannon-McMillan Theorem states that the – path
independent – splitting criterion, H[X], of the ergodic information source
X, which divides high from low probability paths, is given in terms of the
sequence probabilities of equations (1) and (2) as
H[X] = n→∞
lim H(Xn |X0 , X1 , ..., Xn−1 ) =
lim
n→∞
H(X0 , ..., Xn )
.
n+1
6
(7)
The AEPT is one of the most unexpected and profound results of 20th
Century applied mathematics.
Ash (1990) describes the uncertainty of an ergodic information source as
follows;
“...[W]e may regard a portion of text in a particular language
as being produced by an information source. the probabilities
P [Xn = αn |X0 = α0 , ..., Xn−1 = αn−1 ) may be estimated from the
available data about the language. A large uncertainty means,
by the AEPT, a large number of ‘meaningful’ sequences. Thus
given two languages with uncertainties H1 and H2 respectively, if
H1 > H2 , then in the absence of noise it is easier to communicate
in the first language; more can be said in the same amount of time.
On the other hand, it will be easier to reconstruct a scrambled
portion of text in the second language, since fewer of the possible
sequences of length n are meaningful.”
Languages can affect each other, or, equivalently, systems can translate
from one language to another, usually with error. The Rate Distortion Theorem, which is one generalization of the SMT, describes how this can take
place. As IR Cohen (2001) has put it, in the context of the cognitive immune
system (IR Cohen, 1992, 2000),
“An immune response is like a key to a particular lock; each
immune response amounts to a functional image of the stimulus
that elicited the response. Just as a key encodes a functional
image of its lock, an effective [immune] response encodes a functional image of its stimulus; the stimulus and the response fit each
other. The immune system, for example, has to deploy different
types of inflammation to heal a broken bone, repair an infarction, effect neuroprotection, cure hepatitis, or contain tuberculosis. Each aspect of the response is a functional representation of
the challenge.
Self-organization allows a system to adapt, to update itself in
the image of the world it must respond to... The immune system,
like the brain... aim[s] at representing a part of the world.”
7
These considerations suggest that the degree of possible back-translation
between the world and its image within a cognitive system represents the
profound and systematic coupling between a biological system and its environment, a coupling which may particularly express the way in which the
system has ‘learned’ the environment. We attempt a formal treatment, from
which it will appear that both cognition and response to systematic patterns
of selection pressure are – almost inevitably – highly punctuated by ‘learning
plateaus’ in which the two processes can become inextricably intertwined.
Suppose we have a ergodic information source Y, a generalized language
having grammar and syntax, with a source uncertainty H[Y] that ‘perturbs’
a system of interest. A chain of length n, a path of perturbations, has the
form
y n = y1 , ..., yn .
Suppose that chain elicits a corresponding chain of responses from the
system of interest, producing another path bn = (b1 , ..., bn ), which has some
‘natural’ translation into the language of the perturbations, although not,
generally, in a one-to-one manner. The image is of a continuous analog audio
signal which has been ‘digitized’ into a discrete set of voltage values. Thus,
there may well be several different y n corresponding to a given ‘digitized’ bn .
Consequently, in translating back from the b-language into the y-language,
there will generally be information loss.
Suppose, however, that with each path bn we specify an inverse code
which identifies exactly one path ŷ n . We assume further there is a measure
of distortion which compares the real path y n with the inferred inverse ŷ n .
Below we follow the nomenclature of Cover and Thomas (1991).
The Hamming distortion is defined as
d(y, ŷ) = 1, y 6= ŷ
d(y, ŷ) = 0, y = ŷ.
(8)
8
For continuous variates the Squared error distortion is defined as
d(y, ŷ) = (y − ŷ)2 .
(9)
Possibilities abound.
The distortion between paths y n and ŷ n is defined as
d(y n , ŷ n ) = (1/n)
n
X
d(yj , ŷj )
j=1
(10)
We suppose that with each path y n and bn -path translation into the ylanguage, denoted ŷ n , there are associated individual, joint and conditional
probability distributions p(y n ), p(ŷ n ), p(y n , ŷ n ) and p(y n |ŷ n ).
The average distortion is defined as
D=
X
p(y n )d(y n , ŷ n )
yn
(11)
It is possible, using the distributions given above, to define the information transmitted from the incoming Y to the outgoing Ŷ process in the usual
manner, using the appropriate Shannon uncertainties:
9
I(Y, Ŷ ) ≡ H(Y ) − H(Y |Ŷ ) = H(Y ) + H(Ŷ ) − H(Y, Ŷ )
(12)
If there is no uncertainty in Y given Ŷ , then no information is lost. In
general, this will not be true.
The information rate distortion function R(D) for a source Y with a
distortion measure d(y, ŷ) is defined as
R(D) =
p(y|ŷ);
P
(y,ŷ)
min
I(Y, Ŷ )
p(y)p(y|ŷ)d(y,ŷ)≤D
(13)
where the minimization is over all conditional distributions p(y|ŷ) for
which the joint distribution p(y, ŷ) = p(y)p(y|ŷ) satisfies the average distortion constraint.
The Rate Distortion Theorem states that R(D), as we have defined it,
is the maximum achievable rate of information transmission which does not
exceed distortion D. Note that the result is independent of the exact form of
the distortion measure d(y, ŷ).
More to the point, however, is the following: Pairs of sequences (y n , ŷ n )
can be defined as distortion typical, that is, for a given average distortion
D, pairs of sequences can be divided into two sets, a high probability one
containing a relatively small number of (matched) pairs with d(y n , ŷ n ) ≤ D,
and a low probability one containing most pairs. As n → ∞ the smaller set
approaches unit probability, and we have for those pairs the condition
p(ŷ n ) ≥ p(ŷ n |y n ) exp[−nI(Y, Ŷ )].
10
(14)
Thus, roughly speaking, I(Y, Ŷ ) embodies the splitting criterion between
high and low probability pairs of paths. These pairs are, again, the input
‘training’ paths and corresponding output path.
Note that, in the absence of a distortion measure, this result remains
true for two interacting information sources, the principal content of the
joint asymptotic equipartition theorem, (Cover and Thomas, 1991, Theorem
8.6.1).
Thus the imposition of a distortion measure results in a limitation in the
number of possible jointly typical sequences to those satisfying the distortion
criterion.
For the theory we will explore later – of pairwise interacting information
sources – I(Y, Ŷ ) (or I(Y1 , Y2 ) without the distortion restriction), can play
the role of H in the critical development of the next section.
The RDT is a generalization of the Shannon-McMillan Theorem which
examines the interaction of two information sources under the constraint of
a fixed average distortion. For our development we will require one more
iteration, studying the interaction of three ‘languages’ under particular conditions, and require a similar generalization of the SMT in terms of the
splitting criterion for triplets as opposed to single or double stranded patterns. The tool for this is at the core of what is termed network information
theory (Cover and Thomas, 1991, Theorem 14.2.3). Suppose we have (piecewise memoryless) ergodic information sources Y1 , Y2 and Y3 . We assume Y3
constitutes a critical embedding context for Y1 and Y2 so that, given three
sequences of length n, the probability of a particular triplet of sequences is
determined by conditional probabilities with respect to Y3 :
P (Y1 = y1 , Y2 = y2 , Y3 = y3 ) =
Πni=1 p(y1i |y3i )p(y2i |y3i )p(y3i ).
(15)
11
That is, Y1 and Y2 are, in some measure, driven by their interaction with
Y3
Then, in analogy with the previous two cases, triplets of sequences can
be divided by a splitting criterion into two sets, having high and low probabilities respectively. For large n the number of triplet sequences in the high
probability set will be determined by the relation (Cover and Thomas, 1991,
p. 387)
N (n) ∝ exp[nI(Y1 ; Y2 |Y3 )],
(16)
where splitting criterion is given by
I(Y1 ; Y2 |Y3 ) ≡
H(Y3 ) + H(Y1 |Y3 ) + H(Y2 |Y3 ) − H(Y1 , Y2 , Y3 )
Below we examine phase transitions in the splitting criteria H, which
we will generalize to both I(Y1 , Y2 ) and I(Y1 , Y2 |Y3 ). The former will produce punctuated cognitive and non-cognitive learning plateaus, while the
latter characterizes the interaction between selection pressure and sociocultural cognition.
Phase transition and coevolutionary condensation
The essential homology relating information theory to statistical mechanics and nonlinear dynamics is twofold (R Wallace and RG Wallace, 1998,
1999; Rojdestvenski and Cottam, 2000):
(1) A ‘linguistic’ equipartition of probable paths consistent with the
Shannon-McMillan and Rate Distortion Theorems serves as the formal connection with nonlinear mechanics and fluctuation theory – a matter we will
not fully explore here, and
12
(2) A correspondence between information source uncertainty and statistical mechanical free energy density, rather than entropy. See R Wallace and
RG Wallace (1998, 1999) for a fuller discussion of the formal justification for
this assumption, described by Bennett (1988) as follows:
“...[T]he value of a message is the amount of mathematical or
other work plausibly done by the originator, which the receiver
is saved from having to repeat.”
This is a central insight: In sum, we will generally impose invariance
under renormalization symmetry on the ‘splitting criterion’ between high
and low probability states from the Large Deviations Program of applied
probability (e.g. Dembo and Zeitouni, 1998). Free energy density (which
can be reexpressed as an ‘entropy’ in microscopic systems) is the splitting
criterion for statistical mechanics, and information source uncertainty is the
criterion for ‘language’ systems. Imposition of renormalization on free energy
density gives phase transition in a physical system. For information systems
it gives interactive condensation.
This analogy is indeed a mathematical homology:
The definition of the free energy density for a parametized physical system
is
F (K1 , ...Km ) = lim
V →∞
log[Z(K1 , .., Km )]
V
(17)
where the Kj are parameters, V is the system volume and Z is the ‘partition function’ defined from the energy function, the Hamiltonian, of the
system.
For an ergodic information source the equivalent relation associates source
uncertainty with the number of ‘meaningful’ sequences N (n) of length n, in
the limit
log[N (n)]
.
n→∞
n
H[X] = lim
13
We will parametize the information source to obtain the crucial expression
on which our version of information dynamics will be constructed;
H[K1 , ..., Km , X] = lim
n→∞
log[N (K1 , ..., Km )]
.
n
(18)
The essential point is that while information systems do not have ‘Hamiltonians’ allowing definition of a ‘partition function’ and a free energy density,
they may have a source uncertainty obeying a limiting relation like that of
free energy density. Importing ‘renormalization’ symmetry gives phase transitions at critical points (or surfaces), and importing a Legendre transform
in a ‘natural’ manner gives dynamic behavior far from criticality. The first
of these will be addressed here. The second is the subject of the ecological
resilience analysis of the latter sections.
As neural networks demonstrate so well, it is possible to build larger
pattern recognition systems from assemblages of smaller ones. We abstract
this process in terms of a generalized linked array of subcomponents which
‘talk’ to each other in two different ways. These we take to be ‘strong’ and
‘weak’ ties between subassemblies. ‘Strong’ ties are, following arguments
from sociology (Granovetter, 1973), those which permit disjoint partition
of the system into equivalence classes. Thus the strong ties are associated
with some reflexive, symmetric, and transitive relation between components.
‘Weak’ ties do not permit such disjoint partition. In a physical system these
might be viewed, respectively, as ‘local’ and ‘mean field’ coupling.
We fix the magnitude of strong ties, but vary the index of weak ties
between components, which we call P , taking K = 1/P .
We assume the information source of interest depends on three parameters, two explicit and one implicit. The explicit are K as above and an
‘external field strength’ analog J, which gives a ‘direction’ to the system.
We may, in the limit, set J = 0.
The implicit parameter, which we call r, is an inherent generalized ‘length’
on which the phenomenon, including J and K, are defined. That is, we can
write J and K as functions of averages of the parameter r, which may be
14
quite complex, having nothing at all to do with conventional ideas of space,
for example degree of niche partitioning in ecosystems.
Rather than specify complicated patterns of individual dependence or interaction between system subcomponents, we instead work entirely within
the domain of the uncertainty of the parametized information source associated with the system as a whole, which we write as
H[K, J, X]
Imposition of invariance of H under a renormalization transform in the
implicit parameter r leads to expectation of both a critical point in K, which
we call KC , reflecting a phase transition to or from collective behavior across
the entire structure, and of power laws for behavior near KC . Addition of
other parameters to the system, e.g. some parameter V , results in a ‘critical
line’ or surface KC (V ).
Let κ = (KC − K)/KC and take χ as the ‘correlation length’ defining the
average domain in r-space for which the dual information source is primarily
dominated by ‘strong’ ties. We begin by averaging across r-space in terms
of ‘clumps’ of length R, defining JR , KR as J, K for R = 1. Then, following
Wilson’s (1971) physical analog, we choose the renormalization relations as
H[KR , JR , X] = RD H[K, J, X]
χ(KR , JR ) =
χ(K, J)
R
(19)
where D is a non-negative real constant, possibly reflecting fractal network structure. The first of these equations states that ‘processing capacity,’
as indexed by the source uncertainty of the system which represents the ‘richness’ of the inherent language, grows as RD , while the second just states that
the correlation length simply scales as R.
Other, very subtle, symmetry relations – not necessarily based on elementary physical analogs – may well be possible. For example McCauley,
15
(1993, p.168) describes the counterintuitive renormalization relations needed
to understand phase transition in simple ‘chaotic’ systems.
For K near KC , if J → 0, a simple series expansion and some clever
algebra (e.g. Wilson, 1971; R Wallace and RG Wallace, 1998) gives
H = H0 κsD
χ = χ0 κ−s
(20)
where s is a positive constant. The series expansion argument which
generates our equations (20) from equations (19) is precisely the same as that
in K Wilson’s famous 1971 paper which produces his equations (22) and (23)
from his equations (4) and (5), setting ‘external field strength’ h → 0.
To reiterate, imposing renormalization symmetries different from equations (19) would give significantly different results.
Some rearrangement of equations (20) produces, near KC ,
H∝
1
χD
(21)
This suggests that the ‘richness’ of the language is inversely related to the
domain dominated by disjointly partitioning strong ties near criticality. As
the nondisjunctive weak ties coupling declines, the efficiency of the coupled
system as an information channel declines precipitously near the transition
point: see ACTK for discussion of the relation between channel capacity and
information source uncertainty.
16
Far from the critical point matters are considerably more complicated,
apparently driven by appropriate generalizations of a physical system’s ‘Onsager relations’, a matter treated in some detail below.
The essential insight is that regardless of the particular renormalization
symmetries involved, sudden critical point transition is possible in the opposite direction for this model, that is, from a number of independent, isolated
and fragmented systems operating individually and more or less at random,
into a single large, interlocked, coherent system, once the parameter K, the
inverse strength of weak ties, falls below threshold, or, conversely, once the
strength of weak ties parameter P = 1/K becomes large enough.
Thus, increasing weak ties between them can bind several different ‘language’ processes into a single, embedding hierarchical metalanguage which
contains those languages as linked subdialects.
This heuristic insight can be made exact using a rate distortion argument:
Suppose that two ergodic information sources Y and B begin to interact,
to ‘talk’ to each other, i.e. to influence each other in some way so that it
is possible, for example, to look at the output of B – strings b – and infer
something about the behavior of Y from it – strings y. We suppose it possible
to define a retranslation from the B-language into the Y-language through
a deterministic code book, and call Ŷ the translated information source, as
mirrored by B.
Take some distortion measure d comparing paths y to paths ŷ, defining d(y, ŷ). We invoke the Rate Distortion Theorem’s mutual information
I(Y, Ŷ ), which is a splitting criterion between high and low probability pairs
of paths. Impose, now, a parametization by an inverse coupling strength K,
and a renormalization symmetry representing the global structure of the system coupling. This may be much different from the renormalization behavior
of the individual components. If K < KC , where KC is a critical point (or
surface), the two information sources will be closely coupled enough to be
characterized as condensed.
We will make much of this below; cultural and genetic heritages are generalized languages, as are neural, immune, and sociocultural pattern recognition.
Pattern recognition as language
The task of this section is to express Atlan and Cohen’s (1998) cognitive
pattern recognition-and-response in terms of a ergodic information source
17
constrained by the AEPT. This general approach would then apply to the
immune system, the CNS and sociocultural networks. Pattern recognition,
as we will characterize it here, proceeds by convoluting an incoming ‘sensory’
signal with an internal ‘ongoing activity’ and, at some point, triggering an
appropriate action based on a decision that the pattern of the sensory input
requires a response. For the purposes of this work we do not need to model
in any particular detail the manner in which the pattern recognition system
is ‘trained,’ and thus adopt a ‘weak’ model which may have considerable
generality, regardless of the system’s particular learning paradigm, which
can be more formally described using the RDT.
We will, fulfilling Atlan and Cohen’s (1998) criterion of meaning-fromresponse, define a language’s contextual meaning entirely in terms of system
output.
The model is as follows: A pattern of sensory input is convoluted with a
pattern of internal ‘ongoing activity’ to create a path
x = (a0 , a1 , ..., an , ...).
This is fed into a (highly nonlinear) ‘decision oscillator’ which generates
an output h(x) that is an element of one of two (presumably) disjoint sets
B0 and B1 .
We take
B0 = b0 , ..., bk
B1 = bk+1 , ..., bm .
Thus we permit a graded response, supposing that if
h(x) ∈ B0
the pattern is not recognized, and
h(x) ∈ B1
that the pattern is recognized and some action bj , k + 1 ≤ j ≤ m takes
place.
We are interested in paths which trigger pattern recognition exactly once.
That is, given a fixed initial state a0 such that h(a0 ) ∈ B0 , we examine all
18
possible subsequent paths x beginning with a0 and leading exactly once to the
event h(x) ∈ B1 . Thus h(a0 , a1 , ...aj ) ∈ B0 for all j < m but h(a0 , ..., am ) ∈
B1 .
For each positive integer n, let N (n) be the number of paths of length
n which begin with some particular a0 having h(a0 ) ∈ B0 , and lead to the
condition h(x) ∈ B1 . We shall call such paths ‘meaningful’ and assume N (n)
to be considerably less than the number of all possible paths of length n –
pattern recognition is comparatively rare – and in particular assume that the
finite limit
H = n→∞
lim
log[N (n)]
n
exists and is independent of the path x. We will – not surprisingly – call
such a pattern recognition process ergodic.
We may thus define a ergodic information source X associated with
stochastic variates Xj having joint and conditional probabilities P (a0 , ...an )
and P (an |a0 , ..., an−1 ) such that appropriate joint and conditional Shannon
uncertainties satisfy the relations
H[X] = n→∞
lim
log[N (n)]
n
= lim H(Xn |X0 , ..., Xn−1 )
n→∞
H(X0 , ...Xn )
n→∞
n+1
= lim
We say this ergodic information source is dual to the pattern recognition
process.
Different ‘languages’ will, of course, be defined by different divisions of
the total universe of possible responses into different pairs of sets B0 and B1 ,
or perhaps even by requiring more than one response in B1 along a path.
Like the use of different distortion measures in the RDT, however, it seems
obvious that the underlying dynamics will all be qualitatively similar.
Meaningful paths – creating an inherent grammar and syntax – are defined entirely in terms of system response, as Atlan (1983, 1987, 1998) and
Atlan and Cohen (1998) propose, quoting Atlan (1987)
19
“...[T]he perception of a pattern does not result from a twostep process with first perception of a pattern of signals and then
processing by application of a rule of representation. Rather, a
given pattern in the environment is perceived at the time when
signals are received by a kind of resonance between a given structure of the environment – not necessarily obvious to the eyes of
an observer – and an internal structure of the cognitive system.
It is the latter which defines a possible functional meaning – for
the system itself – of the environmental structure.”
Elsewhere (R Wallace, 2000a, b) we have termed this process an ‘information resonance.’
Although we do not pursue the matter here, the ‘space’ of the aj can be
partitioned into disjoint equivalence classes according to whether states can
be connected by meaningful paths. This is analogous to a partition into domains of attraction for a nonlinear or chaotic system, and imposes a ‘natural’
algebraic structure which can, among other things, enable multitasking (R
Wallace, 2000b).
We can apply this formalism to the stochastic neuron: A series of inputs
yij , i = 1...m from m nearby neurons at time j is convoluted with ‘weights’
wij , i = 1...m, using an inner product
aj = y j · w j =
m
X
yij wij
i=1
(22)
in the context of a ‘transfer function’ f (yj · wj ) such that the probability
of the neuron firing and having a discrete output z j = 1 is P (z j = 1) =
f (yj · wj ). Thus the probability that the neuron does not fire at time j is
1 − f (yj · wj ).
In the terminology of this section the m values yij constitute ‘sensory
activity’ and the m weights wij the ‘ongoing activity’ at time j, with aj =
yj · wj and x = a0 , a1 , ...an , ...
20
A little more work, described below, leads to a fairly standard neural
network model in which the network is trained by appropriately varying the
w through least squares or other error minimization feedback. This can be
shown to, essentially, replicate rate distortion arguments, as we can use the
error definition to define a distortion function d(y, ŷ) which measures the
difference between the training pattern y and the network output ŷ as a
function of, for example, the inverse number of training cycles, K. As we
will discuss in some detail, ‘learning plateau’ behavior follows as a phase
transition on the parameter K in the mutual information I(Y, Ŷ ).
Park et al. (2000) treat the stochastic neural network in terms of a space
of related probability density functions [p(x, y; w)|w ∈ Rm ], where x is the
input, y the output and w the parameter vector. The goal of learning is
to find an optimum w∗ which maximizes the log likelihood function. They
define a loss function of learning as
L(x, y; w) ≡ − log p(x, y; w),
and one can take as a learning paradigm the gradient relation
wt+1 = wt − ηt ∂L(x, y; w)/∂w,
where ηt is a learning rate.
Park et al. (2000) attack this optimization problem by recognizing that
the space of p(x, y; w) is Riemannian with a metric given by the Fisher
information matrix
G(w) =
Z Z
∂ log p/∂w[∂ log p/∂w]T p(x, y; w)dydx
where T is the transpose operation. A Fisher-efficient on-line estimator
is then obtained by using the ‘natural’ gradient algorithm
wt+1 = wt − ηt G−1 ∂L(x, y; w)/∂w.
Again, through the synergistic family of probability distributions p(x, y; w),
this can be viewed as a special case – a ‘representation’, to use physics jargon
– of the general ‘convolution argument’ given above.
Again, it seems that a rate distortion argument between training language
and network response language will nonetheless produce learning plateaus,
even in this rather elegant special case.
21
The foundation of our mathematical modeling exercise is to assume that
both the immune system and a sociocultural network’s pattern recognition
behavior, like that of other pattern recognition systems, can also be represented by the language arguments given above, and is thus dual to a ergodic
information source, a context-defining language in Atlan and Cohen’s (1998)
sense, having a grammar and syntax such that meaning is explicitly defined
in terms of system response.
Sociogeographic or sociocultural networks – social networks embedded
place and embodying culture – serve a number of functions, including acting
as the local tools for teaching cultural norms and processes to individuals.
Thus, for the purposes of this work, a person’s social network – family and
friends, workgroup, church, etc. – becomes the immediate agency of cultural
dynamics, and provides the foundation for the brain/culture ‘condensation’
of Nisbett et al. (2001).
Sociocultural networks serve also, however, as instruments for collective
decision-making, a cognitive phenomenon. Such networks serve as hosts to
a political, in the large sense, process by which a community recognizes and
responds to patterns of threat and opportunity. To treat pattern recognition
on sociocultural networks we impose a version of the structure and general
formalism relating pattern recognition to a dual information source:
We envision problem recognition by a local sociocultural network as follows: A ‘real problem,’ in some sense, becomes convoluted with a community’s internal sociocultural ‘ongoing activity’ to create the path of a
‘perceived problem’ at times 0, 1, ..., producing a path of the usual form
x = a0 , a1 , ..., an , .... That serially correlated path is then subject to a decision process across the sociocultural network, designated h(x) which produces
output in two sets B0 and B1 , as before. The problem is officially recognized
and resources committed to if and only if h(x) ∈ B1 , a rare event made
even more rare if resources must then be diverted from previously recognized
problems.
For the purposes of this work, then, we will view ‘culture’ as, in fact,
a sociocultural cognitive process which can entrain individual cognition, a
matter on which there is considerable research (e.g. Richerson and Boyd,
1995, 1998)
Our next task is to apply phase transition dynamics to ergodic information sources dual to a pattern recognition language, using techniques of the
sections above. Similar considerations will apply to ‘non-cognitive’ interaction between structured selection pressures and the affected system.
22
To reiterate, we have, following the discussion in part 1 of Atlan and Cohen’s (1998) work, implicitly assumed that the immune cognition can likewise
be expressed as a pattern recognition-and-response language characterized
by an information source uncertainty. The essential point – see figure 1 of
Atlan and Cohen (1998) – is that antigens do not themselves directly trigger inflammatory response, but rather through the convolutional medium of
antigen-presenting cells which communicate an immune sequence to a T-cell.
The sequence/sentence describes the nature of the antigen. The T-Cell does
not recognize antigens in their native form directly, rather, a processed peptide in the MHC cleft is the subject of the immune sequence recognized by the
somatically generated TCR. How the T-cell responds to the peptide-MHC
subject defines the immune ‘meaning’ of the antigen.
This is quite precisely, then, an example of the pattern-recognition-aslanguage mechanism we have just analyzed.
Generalized cognitive condensations
We suppose a cognitive system – more generally a linked, hierarchically
structured, and broadly coevolutionary condensation of several such systems
– is exposed to a structured pattern of sensory activity – the training pattern – to which it must learn an appropriate matching response. From that
response we can infer, in a direct manner, something of the form of the excitory sensory activity. We suppose the training pattern to have sufficient
grammar and syntax so as to itself constitute a ergodic information source Y .
The output of the cognitive system, B, is deterministically backtranslated
into the ‘language’ of Y , and we call that translation Ŷ . The rate distortion behavior relating Y and Ŷ , is, according to the RDT, determined by
the mutual information I(Y, Ŷ ). We take the index of coupling between the
sensory input and the cognitive system to be the number of training cycles
– an exposure measure – having an inverse K, and write
I(Y, Ŷ ) = I[K]
(23)
23
I[K] defines the splitting criterion between high and low probability pairs
of training and response paths for a specified average distortion D, and is
analogous to the parametized information source uncertainty upon which we
imposed renormalization symmetry to obtain phase transition.
We thus interpret the sudden changes in the measured average distortion
P
D ≡ p(y)d(y, ŷ) which determines ‘mean square error’ between training
pattern and output pattern, e.g. the ending of a learning plateau, as representing onset of a phase transition in I[K] at some critical KC , consonant
with our earlier developments.
Note that I[K] constitutes an interaction between the cognitive system
and the impinging sensory activity, so that its properties may be quite different from those of the cognitive condensation itself.
From this viewpoint learning plateaus are an inherently ‘natural’ phase
transition behavior of pattern recognition systems. While one may perhaps,
in the sense of Park et al, (2000), find more efficient gradient learning algorithms, our development suggests learning plateaus will be both ubiquitous
and highly characteristic of a cognitive system. Indeed, it seems likely that
proper analysis of learning plateaus will give deep insight into the structures
underlying that system.
This is not a new thought: Mathematical learning models of varying
complexity have been under constant development since the late 1940’s (e.g.
Luce, 1997), and learning plateau behavior has always been a focus of such
studies.
The particular contribution of our perspective to this debate is that
the distinct coevolutionary condensation of immune, CNS, and local sociocultural network cognition which distinguishes human biology must respond
as a composite in a coherent, unitary and coupled manner to sensory input.
Thus the ‘learning curves’ of the immune system, the CNS and the embedding sociocultural network are inevitably coupled and must reflect each other.
Such reflection or interaction will, of necessity, be complicated.
Our analysis, however, has a particular implication. Learned cultural
behavior – sociocultural cognition – is, from our viewpoint, a nested hierarchy
of phase transition learning plateaus which carries within it the history of
an individual’s embedding socioculture. Through the cognitive condensation
which distinguishes human biology, that punctuated history becomes part
of individual cognitive and immune function. Simply removing ‘constraints’
which have deformed individual and collective past is unlikely to have the
desired impact: one never, really, forgets how to ride a bicycle, and a social
24
group, in the absence of affirmative redress, will not ‘forget’ the punctuated
adaptations ‘learned’ from experiences of slavery or holocaust. Indeed, at the
individual level, sufficiently traumatic events may become encoded within the
CNS and immune systems to express themselves as Post Traumatic Stress
Disorder.
Noncognitive condensation in response to selection pressure
As discussed above, sociocultural networks serve multiple functions and
are not only decision making cognitive structures, but are cultural repositories which embody the history of a community. Sociocultural networks,
like human biology in the large, and the immune system in the small, have a
duality in that they make decisions based on recognizing patterns of opportunity and threat by comparison with an internalized picture of the world, and
they respond to selection pressure in the sense that cultural patterns which
cannot adapt to external selection pressures simply do not survive. This is
not learning in the traditional sense of neural networks. Thus the immune
system has both ‘innate’ genetically programmed and ‘learned’ components,
and human biology in the large is a convolution of genetic and cultural systems of information transmission.
We suggest that sociocultural networks – the instrumentalities of culture – likewise contain both cognitive and selective systems of information
transmission which are closely intertwined to create a composite whole.
We now examine processes of ‘punctuated evolution’ inherent to evolutionary systems of information transmission.
We suppose a self-reproducing cultural system – more specifically a linked,
and in the large sense coevolutionary, condensation of several such systems
– is exposed to a structured pattern of selective environmental pressures
to which it must adapt if it is to survive. From that adaptive selection
– changes in genotype and phenotype analogs – we can infer, in a direct
manner, something, but not everything, of the form of the structured system
of selection pressures. That is, the culture contains markers of past ‘selection
events’.
We suppose the system of selection pressures to have sufficient internal
structure – grammar and syntax – so as to itself constitute an ergodic information source Y whose probabilities are fixed on the timescale of analysis.
The output of that system, B, is backtranslated into the ‘language’ of Y ,
and we call that translation Ŷ . The rate distortion behavior relating Y and
Ŷ , is, according to the RDT, determined by the mutual information I(Y, Ŷ ).
25
We take there to be a measure of the ‘strength’ of the selection pressure,
P , which we use as an index of coupling with the culture of interest, having
an inverse K = 1/P , and write
I(Y, Ŷ ) = I[K].
(24)
P might be measured by the rate of attack by predatory colonizers, or
the response to extreme environmental perturbation, and so on.
I[K] thus defines the splitting criterion between high and low probability
pairs of input and output paths for a specified average distortion D, and is
analogous to the parametized information source uncertainty upon which we
imposed renormalization symmetry to obtain phase transition. The result is
robust in the absence of a distortion measure through the joint asymptotic
equipartition theorem, as discussed above.
We thus interpret the sudden changes in the measured average distorP
tion D ≡ p(y)d(y, ŷ) which determines ‘mean error’ between pressure and
response, i.e. the ending of a ‘learning plateau’, as representing onset of
a phase transition in I[K] at some critical KC , consonant with our earlier
developments. In the absence of a distortion measure, we may still expect
phase transition in I[K], according to the joint AEPT.
Note that I[K] constitutes an interaction between the self-reproducing
system of interest and the impinging ecosystem’s selection pressure, so that
its properties may be quite different from those of the individual or conjoined
subcomponents (R Wallace and RG Wallace, 1998, 1999).
From this viewpoint highly punctuated ‘non-cognitive condensations’ are
an inherently ‘natural’ phase transition behavior of evolutionary systems,
even in the absence of a distortion measure. Again, while there may exist,
in the sense of Park et al. (2000), more efficient convergence algorithms, our
development suggests plateaus will be both ubiquitous and highly characteristic of evolutionary process and path. Indeed, it seems likely that proper
analysis of non-cognitive evolutionary ‘learning’ plateaus – to the extent they
can be observed or reconstructed – will give deep insight into the mechanisms
underlying that system.
26
Convolution between selection pressure and sociocultural
cognition
Selection pressure acting on sociocultural networks can be expected to
affect their cognitive function, their ability to recognize and respond to relatively immediate patterns of threat and opportunity. In fact, those patterns
themselves may in no small part represent factors of that selection pressure,
conditionally dependent on it. We assume, then, the linkage of three information sources, two of which are conditionally dependent on and may indeed be
dominated by, a highly structured embedding system of externally imposed
selection pressure which we call Y3 . Y2 we will characterize as the pattern
recognition-and-response language of the sociocultural network itself.
In IR Cohen’s (2000) sense, this involves comparison of sensory information with an internalized picture of the world, and choice of a response from
a repertory of possibilities. Y1 we take to be a more rapidly changing, but
nonetheless structured, pattern of immediate threat-and-opportunity which
demands appropriate response and resource allocation – the ‘training pattern’. We reiterate that Y1 is likely to be conditionally dependent on the
embedding selection pressure, Y3 , as is the hierarchically layered history expressed by Y2 .
According to the triplet version of the SMT which we discussed at the
end of the theoretical section above, then, for large n, triplets of paths in
Y1 , Y2 and Y3 may be divided into two sets, a smaller ‘meaningful’ one of
high probability – representing those paths consistent with the ‘grammar’
and ‘syntax’ of the interaction between the selection pressure, the cognitive
sociocultural process, and the pattern of immediate ‘sensory challenge’ it
faces – and a very large set of vanishingly small probability. The splitting
criterion is the conditional mutual information:
I(Y1 ; Y2 |Y3 ).
We parametize this splitting criterion by a variate K representing the
inverse of the strength of the coupling between the system of selection pressure and the linked complex of the sociocultural cognitive process and the
structured system of day-to-day problems it must address. I[K] will, according to the ‘phase transition’ developments above, be highly punctuated
by ‘mixed’ plateau behavior representing the synergistic and inextricably intertwined action of both externally imposed selection pressure and internal
sociocultural cognition.
27
Modeling a condensed cognitive system far from phase transition
We suppose a linked and broadly coevolutionary condensation of several cognitive systems is exposed to a sudden perturbation, specifically, an
episode of pathogenic challenge, and wish to estimate the response of that
system, particularly within the context of an embedding ‘selection pressure’.
Exploring this question requires some development. We will be interested,
somewhat loosely, in the ‘thermodynamics’ and ‘generalized Onsager relations’ of the splitting criteria associated with the AEPT, JAEPT and RDT,
and the network information theory variant. These are, respectively, the
source uncertainty, mutual information, and conditional mutual information.
Although we will couch the development in terms of the source uncertainty,
the other splitting criteria are presumed to be treated similarly.
Suppose the source uncertainty of a coevolutionarily condensed behavioral
language, H[X] is a function of system-wide average variables, K, J, M which
represent the ensemble indices – associated with the entire individual-andgroup.
Thus we may write
H = H[K, J, M, X].
(25)
We assume that as K, J, M increase, that H decreases monotonically.
We reiterate the essential trick that, in its definition,
H[K, J, M, X] = n→∞
lim
(26)
28
log[N (K, J, M, n)]
n
is the analog of the free energy density of statistical mechanics: Take
a physical system of volume V which can be characterized by an inverse
temperature parameter K = 1/T . The ‘partition function’ for the system is
(K. Wilson, 1971)
Z(K) =
X
exp[−KEj ]
j
(27)
where Ej represents the energy of the individual state j. The probability
of the state j is then
Pj =
exp[−KEj ]
.
Z(K)
Then the ‘free energy density’ of the entire system is defined as
log[Z(K)]
.
V →∞
V
F (K) = lim
(28)
As described, the relation between information and thermodynamic free
energy is long studied.
Equation (26) expresses one of the central theorems of information theory
in a form similar to equation (28). It is this similarity which suggests that,
for some systems under proper circumstances, there may be a ‘duality’ which
maps Shannon’s ergodic source uncertainty onto thermodynamic free energy
density, and perhaps vice versa.
The method we propose here, based entirely on equation (26), the ShannonMcMillan Theorem for ergodic information sources, and similar extensions
29
such as the Joint Asymptotic Equipartition, Rate Distortion Theorem, and
network information theory, may prove to be more generally applicable to
information systems, not requiring the explicit identification a ‘duality’ in
each and every case.
The formal analogy – the homological duality – between free energy density for a physical system and ergodic source uncertainty, (or the mutual
information splitting criteria associated with the JAEPT, RDT, and network information theory), is based on equations (26) and (28). This suggests
we impose a thermodynamics on source uncertainty or the appropriate mutual information measures which provide splitting criteria between low and
high probability sets of paths.
By a thermodynamics we mean, in the sense of Feigenbaum (1988, p.
530), the deduction of variables and their relations to one another that in
some well-defined sense are averages of exponential quantities defined microscopically on a set.
In our context the relation between the number of meaningful sequences
on length n, N (n), for (fixed) large n and the source uncertainty, i.e.
N (n) ≈ exp(nH[X])
for large n provides the exponential dependence exactly analogous to performing statistical mechanics. We are, to reiterate, going to express H[X] in
terms of a number of parameters which characterize the underlying community which carries the behavioral language.
We suppose that H, or the various I, representing the splitting criteria
defined by our coevolutionary condensation of CNS, immune and sociocultural cognition, is allowed to depends on a number of observable parameters,
which we will not fully specify here.
If source uncertainty H is the analog to free energy density in a physical
system, K is the analog to inverse temperature, the next ‘natural’ step is
to apply a Legendre transformation to H so as to define a generalized ‘entropy,’ S, and other (very) rough analogs to classical thermodynamic entities,
depending on the parameters.
Courant and Hilbert (1989, p.32) characterize the Legendre transformation as defining a surface as the envelope of its tangent planes, rather than
as the set of points satisfying a particular equation.
Their development shows the Legendre transformation of a well-behaved
function f (Z1 , Z2 , ...Zw ), denoted g, is
30
g=f−
w
X
Zi ∂f /∂Zi ≡ f −
i=1
w
X
Zi Vi .
i=1
(29)
with, clearly, Vi ≡ ∂f /∂Zi .
This expression is assumed to be invertible, hence the ‘duality:’
f =g−
w
X
Vi ∂g/∂Vi .
i=1
Transformation from the ‘Lagrangian’ to the ‘Hamiltonian’ formulation
of classical mechanics (Landau and Lifshitz, 1959) is via a Legendre transformation.
The generalization when f is not well-behaved is via a variational principle (Beck and Schlogl, 1993) rather than a tangent plane argument. Then
g(V ) = min[f (Z) − V Z]
Z
f (Z) = min[g(V ) − V Z].
V
(30)
In the first expression the variation is taken with respect to Z, in the
second with respect to V .
For a badly behaved function it is usually possible to fix up a reasonable
invertible structure since the singularities of f or g will usually belong to ‘a
set of measure zero,’ for example a finite number of points on a line or lines
on a surface where we may designate inverse values by fiat.
31
We first consider a very simple system in which the ergodic source uncertainty H depends only on the inverse strength of weak ties K, giving an
analog to the ‘canonical ensemble’ of statistical mechanics which depends
only on temperature. We define S, an entropy-analog which we term the
‘disorder,’ as the Legendre transform of the Shannon uncertainty H[K, X]:
S = H − KdH/dK ≡ H − KU
(31)
where we take U = dH/dK as an analog to the ‘internal energy’ of a
system. Note that S and H have the same physical dimensionality.
Since
dS/dK = dH/dK − U − KdU/dK = −KdU/dK
we have
dS/dU = −K
and
dU ∝ P dS.
This is the analog to the classic thermodynamic relation dQ = T dS for
physical systems. Thus what we have defined here as the disorder S is indeed
a generalized entropy.
Note that since dS/dU = −K we have
S = H − KU = H + U dS/dU
or
H = S − U dS/dU
which explicitly shows the dual relation between H and S.
32
Again let N (n) represent the number of meaningful sequences of length
n emitted by the source X. Since
log[N (n)]
n→∞
n
H[X] = lim
for large n, we have
1
dN/dK.
n→∞ nN
U = dH/dK = lim
(32)
For fixed (large) n, U is thus the proportionate rate of change in number
of meaningful sequences of length n with change in K. This is something like
the rate of change of mass per unit mass for a person losing weight: A small
value will not be much noticed, while a large one may represent a rigorous
starvation causing considerable distress.
Some rearrangement gives
Q ≡ (S − H) = −KdH/dK = U dS/dU
(33)
We define Q = S − H as the instability of the system.
If −dH/dK is approximately constant – something like a heat capacity
in a physical system – then we have the approximate linear relation
Q ≈ bK K
with
bK ≡ −∂H/∂K.
33
We generalize this as follows:
Allow H to depend on a number of parameters, for example average
probability of weak ties, the inverse level of community resources, and or
other factors which we call Zi , i = 1.... Then, taking H = H[Zi , X], we
obtain the equations of state
S=H−
s
X
Zj ∂H/∂Zj = H −
j=1
s
X
Zj Vj .
j=1
Vi ≡ ∂H/∂Zi
(34)
and the instability relation
Q=S−H =−
s
X
Zj ∂H/∂Zj =
s
X
Vj ∂S/∂Vj ,
j=1
j=1
=−
s
X
Vj Zj = −Z · ∇|Z H,
j=1
(35)
taking Z = (Z1 , Z2 , ..., Zs ).
Q represents the degree of disorder above and beyond that which in inherent to the ergodic information source itself.
Instability, as we have defined it, is driven by the declining capacity to
convey messages across the system, since information theory tells us that
information source uncertainty must be less than or equal to the capacity of
the transmitting channel, C.
34
We suppose that the capacity, C, of the underlying communication channel declines with increasing K, so that C = C(K) is monotonically decreasing
in K. An ergodic information source can be transmitted without error by a
channel only if H[K, X] < C(K) – again see ACTK – so that declining C
will inevitably result in rising Q.
Q is, according to this development, driven by underlying parameters
characterizing physiological and social systems – the Zj and Vj .
For a social system, equation (35) is interpreted as stating that the rate
of indices of distress is proportional a systemic experience of instability.
It may be possible to generalize the development to include temporal
effects if we suppose that H depends on dZ/dt ≡ Ż as well as on Z. Note
that terms of the form ∂H/∂t would violate ergodicity. Then we would take
Q = −(Z, Ż) · ∇|(Z,Ż) H ≡
X
−Zj ∂H/∂Zj − Żj ∂H/∂ Żj .
j
(36)
This suggests that both parameter gradients and their rates of change
can be globally destabilizing.
In linear approximation, assuming −∂H/∂Zi = −Vi = bZi ≈ constant,
equation (36) can be rewritten as
Q ≈ bK K + bJ J + bM M.
The use of environmental index variates for critical system parameters
will generally result in a nonzero intercept, giving the final equation
Q ≈ bK K + bJ J + bM M + b0 .
35
(37)
Note that the intercept b0 may, in fact, be quite complex, perhaps incorporating other parameters not explicitly included in the model. But it may
include as well an ‘error term’ representing stochastic fluctuations not entirely damped by large population effects, or even some ‘nonlinear’ structure
when the bZi are not quite constant.
Most importantly for our analysis here, if the ‘potentials’ Vi = ∂H/∂Zi
cannot be approximated as constants, then simple linear regression will fail
entirely, and equations (34) and (37) will represent an appropriate generic
model – possibly with ‘error terms’ – however the system will be both nonlinear and nonmonotonic, hence representing signal transduction in physiological systems.
In sum, we claim the instability relation derived from a fairly simple
quasi-thermodynamic argument applied to an ergodic information source
parametized by various significant indices, (as well as, perhaps, their rates of
change), explains the high degree to which simple regression models based
on those parameters account for observed patterns of physiological, psychological, psychosocial, or immune response to the perturbation of an infection.
Sudden pathogenic challenge: ecological resilience of cognitive
condensations
In reality, matters are significantly more complex than we have described
so far: physiological, psychological and (locally) sociocultural responses to
infection may, in turn, affect the infection itself through feedback loops.
Thus the inherently nonlinear model for response as produced by inP
creasing stimulation, Q = − j Vj Zj is replaced by an even more nonlinear
structure:
Q(t) = −
X
Vj (Q(t))Zj (Q(t)).
j
(38)
36
In a first iteration using linear approximation, we can replace this equation with a series for which each of s variates – ‘independent’ as well as
‘dependent’ – is expressed in terms of s linear regressions on all the others:
xi (t) =
s
X
bi,j xj (t) + bi,0 + (t, x1 (t)...xs (t)).
j6=i
(39)
Here the xj , j = 1...s are both ‘independent’ and ‘dependent’ variates
involved in the feedback, bi,0 is the intercept constant, and the terms are
‘error’ terms which may not be small, in this approximation.
In matrix notation this set of equations becomes
X(t) = BX(t) + U (t)
(40)
where X(t) is a s-dimensional vector, B is an s×s matrix of regression coefficients having a zero diagonal and U is an s-dimensional vector containing
the constant and ‘error’ terms which are not necessarily small.
We suggest that, on the timescale of the applied perturbation of infection,
and of initial responses, the B-matrix remains relatively constant.
Following the analysis of Ives (1995) this structure has a number of interesting properties which permit estimation of the effects of an infective
perturbation on an individual embedded in larger social structures, particularly those defined by unequal power relationships.
We begin by rewriting the matrix equation as
37
[I − B]X(t) = U (t)
(41)
where I is the s×s identity matrix and, to reiterate, B has a zero diagonal.
We reexpress matter in terms of the eigenstructure of B.
Let Q be the matrix of eigenvectors which diagonalizes B. Take QY (t) =
X(t) and QW (t) = U (t). Let J be the diagonal matrix of eigenvalues of B so
that B = QJQ−1 . In R Wallace, Y Huang, P Gould and D Wallace (1997)
we show the eigenvalues of B are all real. Then, for the eigenvectors Yk of
B, corresponding to the eigenvalues λk ,
Yk (t) = JYk (t) + Wk (t).
(42)
Using a term-by-term shorthand for the components of Yk , this becomes
yk (t) = λk yk (t) + wk (t).
(43)
Define the mean E[f ] of a time-dependent function f (t) over the time
interval 0 → ∆T as
38
1 Z ∆T
E[f ] =
f (t)dt.
∆T 0
(44)
We assume an appropriately ‘rational’ structure as ∆T → ∞, probably
some kind of ‘ergodic’ hypothesis.
Note that this form of expectation does not include the effects of differing
timescales or lag times. Under such circumstances, increasing ∆T will begin
to ‘pick up’ new effects, in a path-dependent manner: The mathematics of
equation (44) suddenly becomes extremely complicated.
The variance V [f ] over the same interval is defined as E[f − E[f ]2 ].
Again taking matters term-by-term, we take the variance of the yk (t)
from the development above to obtain
V [(1 − λk )yk (t)] = V [wk (t)]
so that
V [yk ] =
V [wk ]
(1 − λk )2
σ(yk ) =
σ(wk )
.
|1 − λk |
or
(45)
The yk are the components of the eigentransformed pathology, income,
crowding, community size etc. variates xi and the wk are the similarly transformed variates of the driving externalities ui .
The eigenvector components yi are characteristic but non-orthogonal combinations of the original variates xi whose standard deviation is that of
39
the (transformed) externalities wi , but synergistically amplified by the term
1/|1−λi |, a function of the eigenvalues of the matrix of regression coefficients
B. If λi → 1 then any change in external driving factors – here an infection
– will cause great instability within the affected individual.
A simple example suffices. Suppose we have the two empirical regression
equations
x1 = b1 x2 + b01
and
x2 = b2 x1 + b02
where x1 is, for example, an index of violent crime and x2 is an index
of the ‘strength of weak ties.’ These equations say that weak ties affect
violence and violence affects weak ties. Then, after normalizing x1 and x2 to
zero mean and unit variance, the B-matrix becomes
B=
0 R
R 0
!
where R = b1 = b2 is simply the correlation between x1 and
√ x2 . √
This matrix has eigenvalues ±|R| and eigenvectors [±1/ 2, 1/ 2]. As
the variates become more closely correlated, R → 1 and the ratio of the
standard deviation of the eigenvector with positive components and that of
the external perturbations, 1/[1 − R], diverges.
There is a kind of physical picture for this model. Imagine a violin strung
with limp, wet cotton twine. Then R ≈ 0 and no amount of bowing – an
external perturbation – will excite any sound from the instrument. Now
restring that violin with finely tuned catgut (to be somewhat old fashioned).
Then R ≈ 1 and external perturbation – bowing – will now excite loud and
brisant eigenresonances.
Ives (1995) defines an ecosystem for which λ ≈ 0, so that 1/|1 − λ| ≈ 1,
as resilient in the sense that applied perturbations will have no more effect
than their own standard deviation.
Such treatments are now routine in population and community ecology,
but are still rare in epidemiological or physiological studies. The essential
point is that a nonlinear deterministic ‘backbone’ serves to amplify external
perturbations – in our case an infection. As Higgins et al. (1997) put it,
40
“...[R]elatively small environmental perturbations can markedly
alter the dynamics of deterministic biological mechanisms, producing very large fluctuations...”
This is a version of Holling’s (1973, 1992) mechanism for the loss of ecological resilience by which the small can influence the large. In our case the
‘fluctuation’ is the sudden sickening of an individual by a pathogen, within
the context of social structure and process.
To be more explicit, we propose that pathogenic challenge or infection is a
perturbative ‘event’ affecting an individual embedded in a cognitive socioculture which may itself be subject to highly structured external ‘selection pressures’. The resilience of the individual-in-context will determine the response
to that perturbation – the eigenpattern of infection process and symptom.
The response of the (former) masters will generally be significantly different from that of the (former) slaves. Characteristic forms of physiological,
psychological, and psychosocial burden – structured ‘selection pressures’ –
experienced by the powerful will generally be far less intrusive than those
experienced by the powerless. We propose, then, that socially dominant individuals tend to be ecologically resilient to infection in Ives’ sense, so that
eigenpatterns of symptoms excited by infection, and related processes, will
be relatively muted.
The powerless will generally find both infection process – virulence – and
symptom response to be greatly amplified.
Chronic pathogenic challenge: stages of disease as ‘punctuated
equilibria’
Chronic infectious disease presents a conundrum for immunology. To
paraphrase remarks by Grossman (2002), the pathogenesis of HIV disease is
not well understood, partly because of both the limited accessibility of the
sites in the patients body where essential processes take place, and partly
due to limitations of the non-human primate model of the disease. More
fundamentally, however, according to Grossman, the root problem is an incomplete understanding of both the rules of immunity and of the reaction of
the immune system to chronic infection. Here we attempt to broadly address
these matters.
The first paper in this series (R Wallace and RG Wallace, 2002) examined cultural variation in malaria pathology and in rates of heterosexual
41
HIV transmission. HIV is too simple to be cognitive, and responds to immunogenic challenge as an evolution machine which hides deep in tissues. P
falciparum engages in analogous rapid clonal antigenic variation, and cytoadherence and sequestration in the deep vasculature, primary mechanisms
to escape from antibody-mediated mechanisms of the host’s immune system
(e.g. Allred, 1998).
Recently Adami et al. (2000) applied an information theory approach
to conclude that genomic complexity resulting from evolutionary adaptation
can be identified with the amount of information a gene sequence stores about
its environment. Elsewhere (R Wallace, 2002) we have used a Rate Distortion argument in the context of imposed renormalization symmetry to obtain
‘punctuated equilibrium’ in evolution as a consequence of their mechanism.
Here we apply the more general Joint Asymptotic Equipartition Theorem
to conclude that analogous pathogenic adaptive response to immune challenge will generally be characterized by relatively rapid changes which can
be interpreted as phase transitions, in a large sense, suggesting a ‘punctuated
stage’ model for many chronic infections. The result is formally analogous
to learning plateau behavior in neural networks (R Wallace, 2002).
The last section examined host response to sudden pathogenic challenge.
Suppose the pathogen is not extirpated by that response, but, changing its
coat or hiding within spatial refugia, becomes an established invading population, having a particular genotype and phenotype which may, in fact,
change in time. The immune system is cognitive, the pathogen is not, for
this analysis.
We suppose that the sequence of ‘states’ of the host, a path x ≡ x0 , x1 , ...
and the sequence of ‘states’ of the pathogen population, a path y ≡ y0 , y1 , ...,
are very highly structured and serially correlated and can, in fact, be represented by ‘information sources’ X and Y. Since the host and pathogen
population interact, these sequences of states are not independent, but are
jointly serially correlated. We define a path of sequential pairs as z ≡
(x0 , y0 ), (x1 , y1 ), .... The essential content of the Joint Asymptotic Equipartition Theorem is that the set of joint paths z can be partitioned into a
relatively small set of relatively high probability which is termed jointly typical, and a much larger set of vanishingly small probability, in much the
same manner as equation (3). Further, according to the JAEPT, the splitting criterion between high and low probability sets of pairs is the mutual
information
42
I(X, Y ) = H(X) − H(X|Y ) = H(X) + H(Y ) − H(X, Y ),
which represents the degree of interaction between X and Y.
What we propose is that I can be further parametized by some external
measurable index, P , of coupling between the host and the pathogen population, or perhaps by some set of them. P can, for example, represent the
selection pressure exerted by the host immune system on the pathogen population, or, vice versa, of the physiological impact of the pathogen. We let
K = 1/P . The basic argument is that, following the results of the previous
sections, the effects of changing K on the mutual information I can be very
highly punctuated, most likely multiply so. Such ‘learning plateaus’, in this
model, represent different stages of disease.
In essence, then, the pathogen population cognitively writes itself onto
the host’s immune system, which in turn, adaptively writes itself onto the
pathogen population’s genotype and phenotype, in an endless but highly
punctuated cycle, an example of what Levins and Lewontin (Lewontin, 2000)
have called ‘interpenetration’. The ‘punctuated equilibria’ of such interpenetration constitute, in this model, the different stages of the chronic infection.
This section, and the one above, represent two limiting cases of interaction
between host and pathogen population, i.e. ‘instant’ and ‘adiabatic’. Clearly
more work is needed on intermediate time scales, which will be typically
complicated.
A criticism of mathematical models
Mathematical models of complex biological phenomena, like the one we
have presented, do not have a good history. From ‘mathematical ecology’ to
‘mathematical epidemiology’, and more recently ‘theoretical immunology’,
Erlenmeyer flask models based on systems of 19th Century differential or
difference equations have been touted as ‘the Next Big Thing’ by many engaged in a kind of slash-and-burn academic plantation farming. While our
adaptation of the Large Deviations Program is perhaps of more compelling
mathematical interest than this work, the whole enterprise is fraught with a
certain intellectual peril. Pielou (1977) has attempted to correct some of the
excesses of ‘mathematical ecology’, and her comments are significant for all
such studies:
43
“...[Mathematical] models are easy to devise; even though the
assumptions of which they are constructed may be hard to justify, the magic phrase ‘let us assume that...’ overrides objections
temporarily. One is than confronted with a much harder task:
How is such a model to be tested? The correspondence between
a model’s predictions and observed events is sometimes gratifyingly close but this cannot be taken to imply the model’s simplifying assumptions are reasonable in the sense that neglected
complications are indeed negligible in their effects...
In my opinion the usefulness of models is great... [however] it
consists not in answering questions but in raising them. Models
can be used to inspire new field observations and these are the
only source of new knowledge as opposed to new speculation.”
Here, then, we are hoping to encourage a new class of speculation regarding the effect of embedding cultural structures on the manifestations of
acute and chronic infectious disease, particularly as they relate to the development of vaccine strategy. In spite of the considerable formal overhead, this
must remain a relatively modest goal: Mathematical modeling is, at best, a
distinctly subordinate voice in a dialog with data.
Discussion: history and the immunocultural condensation
Individual CNS and immune cognition are embedded in, and interact
with, sociocultural processes of cognition. Sufficiently draconian external
‘social selection pressures’ – a euphemism for the burdens of history, patterns of Apartheid, or the systematic deprivation of a neoliberal ‘market
economy’ – should become manifest at the behavioral and cellular levels of
an individual, through the intermediate mechanism of the embedding sociocultural network. This manifestation will often amplify the perturbative or
chronic impact of exposure to pathogens, parasites, or chemical stressors.
This is not entirely a new observation. Franz Fanon (1966) has described
an essentially similar phenomenon:
“[Under an Apartheid system the] world [is] divided into compartments, a motionless, Manicheistic world... The native is being hemmed in; apartheid is simply one form of the division into
compartments of the colonial world... his dreams are of action
44
and aggression... The colonized man will first manifest this aggressiveness which has been deposited in his bones against his
own people. This is the period when the niggers beat each other
up, and the police and magistrates do not know which way to
turn when faced with the astonishing waves of crime...
It would therefore seem that the colonial context is sufficiently
original to give grounds for reinterpretation of the causes of criminality... The [colonized individual], exposed to temptations to
commit murder every day – famine, eviction from his room because he has not paid the rent, the mother’s dried up breasts,
children like skeletons, the building-yard which has closed down,
the unemployed that hang about the foreman like crows – the native comes to see his neighbor as a relentless enemy. If he strikes
his bare foot against a big stone in the middle of the path, it is
a native who has placed it there... The [colonized individual’s]
criminality, his impulsivity and the violence of his murders are
therefore not of characterial originality, but the direct product of
the colonial system.”
One of our contributions to this debate is to suggest that patterns of immune function should become entrained into this process as well through the
immunocultural condensation, so that colonized man should have a vulnerability to sudden or chronic immune stressors – in particular microbiological
or chemical challenges – which should extend beyond, but will likely be synergistic with, the effects of deprivation alone. Differences in immune function
or response to infection heretofore attributed to genetic differences between
populations will, in our view, closely reflect this mechanism. These matters
have evident importance for development of vaccines against HIV, malaria,
tuberculosis, other infectious diseases, and certain classes of malignancy, for
example breast cancer.
A particular implication of our work is that the functioning of local sociocultural networks can become closely convoluted with past or present external
selection pressures – the burden of history, effects of continuing Apartheid,
or the depredations of ‘the market’. The mathematical model suggests that
increases of selection pressure itself, or of the coupling between selection pressure and sociocultural cognition, should manifest themselves through punctuated changes in the ability of sociocultural networks to meet the challenges
of changing patterns of threat or opportunity, and, ultimately, in the ability
45
of the immunocultural condensation of individuals within those sociocultural
networks to respond to patterns of microbiological, parasite, or mutagenic
challenge.
In general, according to our model, the powerful will be more resilient
than the powerless in the ecological sense of not resonantly amplifying the
impacts of such challenge, although the underlying ‘eigenmodes’ of symptoms and physiological response may often be similar. Sufficiently different
sociocultural systems, however, may, as Nisbett et al. (2001) found for CNS
cognition, impose significantly different response eigenpatterns. At the other
extreme, patterns of staging of chronic disease are likely to be similarly modulated.
These are all questions directly subject to empirical test.
Inherent to our approach is recognition of the burdens of history. By
this we mean the way in which the grammar and syntax of a culturallydetermined individual ‘behavioral language,’ and its embodiment at cellular
levels through the interaction of the CNS and the immunocultural condensation, encapsulate earlier adaptations to external selection pressures, even
in the sudden absence of those pressures. Selection pressures may involve
deliberate policy, ranging, in the US, from an unrelieved history of slavery,
to more contemporary depredations like the ‘urban renewal’ of the 1950’s,
its evolutionary successor the ‘planned shrinkage’ of the 1970’s, or the subsequent explicit counterreformation against the successes of the Civil Rights
Movement.
Our mathematical approach may seem excessive to some. We reply, as
did the master mathematical ecologist EC Pielou above, that the principal
value of mathematical models is in raising research questions for subsequent
empirical test, rather than in answering them directly. Empirical study,
to reiterate Pielou’s words, “Is the only true source of new knowledge, as
opposed to new speculation.”
We have, then, created a mathematical model in the tradition of the Large
Deviations Program of applied probability which makes explicit the entrainment of environment and development into the expression of pathogenic or
chemical challenge. We have used that model to raise the speculation that
naive and simplistic geneticism which, in effect, reifies ‘race’, and its parallel
of naive cross-sectional environmentalism which neglects the path-dependent
burdens of history, significantly hinder current research.
Matters of historical, political, economic, and social justice are largely
and increasingly excised from academic discussions of health and illness in
46
the US, apparently for fear of angering funding agencies or those with power
over academic advancement. Our modeling exercise suggests, among other
things, the extraordinary degree to which that excision may limit the value
of such work in the design of corrective policy, particularly the creation of
effective vaccine strategies.
References
Adame C, C Ofria and T Collier, 2000, “Evolution of biological complexity,” Proceedings of the National Academy of Sciences, 97, 4463-4468.
Alred D, 1998, “Antigenic variation in Babesia bovis: how similar is it to
that in Plasmodium falciparum?”, Annual Tropical Medicine and Parasitology, 92, 461-472.
Ash R, 1990, Information Theory, Dover Publications, New York.
Atlan H, 1983, “Information theory,” pp. 9-41 in R Trappl (ed.), Cybernetics: Theory and Application, Springer-Verlag, New York.
Atlan H, 1987, “Self creation of meaning,” Psysica Scripta, 36, 563-576.
Atlan H, 1998, “Intentional self-organization, emergence and reduction:
towards a physical theory of intentionality,” Theses Eleven, 52, 6-34.
Atlan H and IR Cohen, 1998, “Immune information, self-organization and
meaning,” International Immunology, 10, 711-717.
Bennett C, 1988, “Logical depth and physical complexity”. In: Herkin R
(Ed) The Universal Turing Machine: A Half-Century Survey, Oxford University Press, pp. 227-257.
Cohen IR, 2001, “Immunity, set points, reactive systems, and allograft
rejection.” To appear.
Cohen IR, 1992, “The cognitive principle challenges clonal selection,”
Immunology Today, 13, 441-444.
Cohen IR, 2000, Tending Adam’s Garden: evolving the cognitive immune
self, Academic Press, New York.
Courant R and D Hilbert, 1989, Methods of Mathematical Physics, Volume II, John Wiley and Sons, New York.
Cover T and J Thomas, 1991, Elements of Information Theory, Wiley,
New York.
Dembo A and O Zeitouni, 1998, Large Deviations: Techniques and Applications, 2nd Ed., Springer-Verlag, New York.
Fanon F, 1966, The Wretched of the Earth, Grove Press, New York.
47
Feigenbaum M, 1988, “Presentation functions, fixed points and a theory
of scaling function dynamics”, Journal of Statistical Physics, 52, 527-569.
Granovetter M, 1973, “The strength of weak ties,” American Journal of
Sociology, 78, 1360-1380.
Grossman Z, 2002, personal communication.
Holling C, 1973, “Resilience and stability of ecological systems”, Annual
Reviews of Ecological Systematics, 4, 1-25.
Holling C, 1992, “Cross-scale morphology, geometry, and dynamics of
ecosystems”, Ecological Monographs, 62, 447-502.
Ives A, 1995, “Measuring resilience in stochastic systems”, Ecological
Monographs, 65, 217-233.
Khinchine A, 1957, The Mathematical Foundations of Information Theory, Dover Publications, New York.
Landau L and E Lifshitz, 1959, Classical Mechanics, Addison-Wesley,
Reading MA.
Lewontin R, 2000, The Triple Helix: Gene, Organism, and Environment,
Harvard University Press, Cambridge, MA.
Luce R, 1997, “Several unresolved conceptual problems of mathematical
psychology,” Journal of Mathematical Psychology, 41, 79-87.
McCauley L, 1993, Chaos, Dynamics and Fractals: an algorithmic approach to deterministic chaos, Cambridge University Press, Cambridge, UK.
Nisbett R, K Peng, C Incheol and A Norenzayan, 2001, “Culture and
systems of thought: holistic vs analytic cognition”, Psychological Review,
108, 291-310.
Park H, S Amari and K Fukumizu, 2000, “Adaptive natural gradient
learning algorithms for various stochastic models,” Neural Networks, 13, 755764.
Pielou E, 1977, Mathematical Ecology, John Wiley and Sons, New York.
Richerson P and R Boyd, 1995, “The evolution of human hypersociality.”
Paper for Ringberg Castle Symposium on Ideology, Warfare and Indoctrinability (January, 1995), and for HBES meeting, 1995.
Richerson P and R Boyd, 1998, “Complex societies: the evolutionary
origins of a crude superorganism,” to appear.
Rojdestvenski I and M Cottam, 2000, “Mapping of statistical physics
to information theory with applications to biological systems,” Journal of
Theoretical Biology, 202, 43-54.
Wallace R, Y Huang, P Gould and D Wallace, 1997, “ The hierarchical
diffusion of AIDS and violent crime among US metropolitan regions...”, Social
48
Science and Medicine, 44, 935-947.
Wallace R and RG Wallace, 1998, “Information theory, scaling laws and
the thermodynamics of evolution,” Journal of Theoretical Biology, 192, 545559.
Wallace R and RG Wallace, 1999, “Organisms, organizations and interactions: an information theory approach to biocultural evolution,” BioSystems,
51, 101-119.
Wallace R and RG Wallace, 2002, “Immune cognition and vaccine strategy: beyond geonomics.” In press, Mocrobes and Infection
Wallace R, 2000a, Language and coherent neural amplification in hierarchical systems: Renormalization and the dual information source of a
generalized spatiotemporal stochastic resonance,” International Journal of
Bifurcation and Chaos, 10, 493-502,
Wallace R, 2000b, “Information resonance and pattern recognition in classical and quantum systems: toward a ‘language model’ of hierarchical neural
structure and process,” available at
www.ma.utexas.edu/mp arc-bin/mpa?yn=00-190.
Wallace R, 2002, “Adaptation, punctuation, and information: A ratedistortion approach to non-cognitive ‘learning plateaus’ in evolutionary process.” Submitted. Available at
www.ma.utexas.edu/mp arc-bin/mpa?yn=01-163.
Wilson K, 1971, “Renormalization group and critical phenomena. I Renormalization group and the Kadanoff scaling picture”, Physical Review B,
textbf4, 3174-3183.
49