Investigating the Parameter Space of A T

Ian Wood
4/25/13
I690, Prof. Flammini
T-Cell Cross Regulation
𝑑𝐸
= 𝑝𝐸 𝐸𝐴 βˆ’ 𝑑𝐸 𝐸
𝑑𝑑
𝑑𝑅
= 𝑝𝑅 𝑅𝐴 βˆ’ 𝑑𝑅 𝑅
𝑑𝑑
Image From: J. Carneiro, et al., β€œWhen three is not a crowd: a Crossregulation model of the dynamics and repertoire selection of regulatory CD4+ T
cells.,” Immunological Reviews, vol. 216, pp. 48–68, 2007.
T-Cell Cross Regulation for Machine
Classification
𝑅(𝑑) =
𝑅𝑓 / 𝑅𝑓
2
+ 𝐸𝑓 2
2
+ 𝐸𝑓 2
π‘“βˆˆπ΄π‘‘
𝐸(𝑑) =
𝐸𝑓 / 𝑅𝑓
π‘“βˆˆπ΄π‘‘
Image From: A. Abi-Haidar and L. M. Rocha, β€œCollective Classification of Textual Documents by Guided Self-Organization in T-Cell
Cross-Regulation Dynamics,” Evolutionary Intelligence, p. In press, 2011.
Machine Classification Issues
ο‚— Benefits:
ο‚— Temporal dynamics could allow the system to adapt to
changes over time (concept drift)
ο‚— Possibly useful for classifying unbalanced sets
ο‚— Problems:
ο‚— Agent-based models take time
ο‚— Large parameter space is difficult to explore
A Large Parameter Space
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
ο‚—
Nslot – The number of antigens to produce for each feature
DE – Death rate for unbound effectors
DR – Death rate for unbound regulators
E0- - Initial effector population for Nonself documents
E0+ - Initial effector population for Self documents
E0u - Initial effector population for Unlabeled documents
R0- - Initial regulator population for Nonself documents
R0+ - Initial regulator population for Self documents
R0u - Initial regulator population for Unlabeled documents
This doesn’t include variations in the algorithm!
Finished Work
Top Parameter Configurations So
Far
nslot eself rself enself rnself eunlab runlab edrate rdrate cond condi precision accuracy recall mcc f1
12
3
9
8
2
6
3
1
1
2
1
0.74
13
3
9
8
2
5
3
1
1
2
1
0.95
0.78
0.6
12
3
9
8
2
5
3
2
2
2
1
0.84
0.78
0.7
20
8
12
12
8
8
8
25
25
2
2
1
0.57
0.13
20
12 24
12
10
12
10
2
2
1
2
0.58
0.63
1
20
8
12
8
8
8
25
25
5
2
0.58
0.63
1
12
0.8 0.93
0.6
2
0.6
1
0.5
7
0.2
7
0.3
9
0.3
9
tpos
tneg fpos fneg
0.82
28 20 10
2
0.73
18 29
1
12
0.76
21 26
4
9
0.24
4 30
0 26
0.73
30
8 22
0
0.73
30
8 22
0
Features Over Time
cond precisio
recal
nslot eself rself enself rnself eunself runself edrate rdrate cond i
n
accuracy l
mcc
12
3
9
8
2
6
3
1
1
2
1
0.74
f1
0.8 0.93 0.62 0.82
tpos
tneg fpos fneg
28 20 10
2
Approach
ο‚— See how distributions of cosine scores correspond to
parameters
ο‚— The system should be able to correct itself, so I want to
see how parameters allow sensitivity to changes in cooccurrence frequency
ο‚— Investigate artificial datasets for simple cases
ο‚— Investigate mathematical relationships in simple cases
Distribution of TCells
cond precisio
recal
nslot eself rself enself rnself eunself runself edrate rdrate cond i
n
accuracy l
mcc
12
3
9
8
2
6
3
1
1
2
1
0.74
f1
0.8 0.93 0.62 0.82
tpos
tneg fpos fneg
28 20 10
2
Distribution of Tcells cont.
cond precisio
recal
nslot eself rself enself rnself eunself runself edrate rdrate cond i
n
accuracy l
mcc
14
3
8
3
7
3
7
1
2
5
1
0
0.38
20
4
6
6
4
4
4
1
1
6
1
0
0.5
f1
tpos
tneg fpos fneg
0 -0.36
-1
0
23
7
30
0
0
0
30
0
30
0
Artificial Datasets
ο‚— 10 documents of 100 words each
ο‚— Words are randomly generated and unique to each
document
ο‚— One word, β€œlambda”, is present in every document, but
initially biased incorrectly
ο‚— Set 1 – First document is labeled Self, the rest Nonself
ο‚— Set 2 – First document is labeled Nonself, the rest Self
ο‚— Set 3 – First 5 = Self, Last 5 = Nonself
ο‚— Set 4 – First 5 = Nonself, Last 5 = Self
Parameter Configurations
Parameter
Values, Step
Nslot
[10, 13], 1
DE
0.1
DR
0.1
E0-
=E0+
E0+
[5, 14], 1
E0u
=E0+
R 0-
[1, 6], 1
R 0+
[6, 16], 1
R 0u
=R0-
Set1
Appropriate Behavior
Inappropriate Behavior
Set2
Appropriate Behavior
Inappropriate Behavior
Appropriate Behavior in Sets 1 & 2
Appropriate Configurations
E0+
Nslot
R0+
E0-
R0-
E0u
R0u
DE
DR
10
5
12
5
3
5
3
.1
.1
10
6
12
6
2
6
2
.1
.1
10
7
13
7
1
7
1
.1
.1
10
8
10
8
5
8
5
.1
.1
11
11
11
11
2
11
2
.1
.1
12
6
12
6
3
6
3
.1
.1
12
7
10
7
4
7
4
.1
.1
12
9
11
9
3
9
3
.1
.1
13
5
14
5
4
5
4
.1
.1
13
6
15
6
3
6
3
.1
.1
Future Directions
ο‚— Mathematical Analysis
ο‚— I tried to write equations for the expected change in the
lambda population between the first and second documents,
but I either assumed too much or made errors.
ο‚— Larger Search
ο‚— Simple artificial dataset runs much faster than an actual
corpus
ο‚— Run on Sets 3 and 4
ο‚— More variation in the artificial data (lambda should not be
the only common feature)
ο‚— More precision in distribution data (only looks at mean,
over-emphasizes features that appear only once)
References
ο‚— J. Carneiro, et al., β€œWhen three is not a crowd: a Crossregulation model of the
dynamics and repertoire selection of regulatory CD4+ T cells.,” Immunological
Reviews, vol. 216, pp. 48–68, 2007.
ο‚— A. Abi-Haidar and L. M. Rocha, β€œCollective Classification of Textual
Documents by Guided Self-Organization in T-Cell Cross-Regulation
Dynamics,” Evolutionary Intelligence, p. In press, 2011.