Computer simulation as a tool in teaching introductory plant breeding.

Computer
simulation as a tool in teaching
1
introductoryplant breeding
2S. K. St. Martin and R. V. Skavaril
ABSTRACT
modules because this allows students to achieve execution of a program by entering a relatively short command,such as ex ’ts1359.ssd.clist’, the morecomplexrequired commands having been built into the corresponding commandlist dataset.
The programs contain error-checking routines which
will inspect data values supplied to the program by the
user and direct the user to re-enter any data values
found to be inappropriate. These error-checking routines also allow the instructor to limit where desirable
such items as the numberof runs to be performed by the
program.
Free computer time was supplied by the Instruction
and Research Computer Center of The Ohio State University for the development, testing, and student use of
the programs. Complementarylistings of the programs
are available upon request, and arrangements can be
made to supply, for the costs involved, a tape containing the source statements of the programs.
Three programs, designated SSD, EGT, and RS, are
used to simulate breeding methods employed in the improvementof sell and cross-fertilizing species.
Three computersimulation programshave been developedfor use as laboratoryexercises in an introductory plant breedingcourse. Theprograms
simulatedevelopmentof pure lines by single seed descent (SSD)
early generation testing (EGT),and population improvement
by recurrentselection (RS). Thesesimulations weredevelopedbecauselaboratoryexperiencewith
actualplantselectionis difficult to providein an undergraduatecourse. Webelieve these simulationsprovidea
satisfactory substitute becausethey permitstudentsto
gain familiarity with the steps involved in breeding
methods,andthey provideexamplesof importantconcepts, suchas geneticdrift, selection among
vs. within
heterogeneous
lines, andheritability. In addition,they
stimulatestudentinterest.
Additional index words:Conceptsin plant breeding,
Genetics,Laboratory
vs. simulation.
L
EARNING
by doing is effective in many subjects,
including plant breeding. Several laboratory exercises have been devised to enable students to conduct
experiments in Mendelian genetics and heritability and
to learn breeding methodology and hybridization techniques (Knauft, 1981). The primary activity of most
plant breeders, however, is evaluation and selection,
and it is difficult to translate this activity into a laboratory experiment with actual plant material. Computer
simulation, which has been used as a research tool in
plant breeding (e.g. by Bailey and Comstock, 1976;
Reddy and Comstock, 1976; and Muehlbauer et al.,
1981), can serve as a convenient means for giving students experience with testing and selection. The purpose
of this paper is to describe three computer simulation
programs we have used in an introductory plant breeding course at The Ohio State University and to report on
the usefulness of the programs.
DESCRIPTION
1. ProgramSSD--Single Seed Descent
Program SSD simulates the development and evaluation of lines by the method of single seed descent
(Goulden, 1939). The simulated quantitative trait under
selection is controlled by genesat 32 loci, two alleles per
locus. The genotypic value of an individual plant is determined on a locus-by-locus basis according to values
of parameters supplied to the program by the user (see
below). The 32 loci are arranged into eight linkage
groups of four loci each, with 0.3 as the recombination
frequency between adjacent loci in each of the linkage
groups. The simulated plants are diploid.
The simulation begins by the program calling for the
number of runs to be made. Then, for each of the number of runs to be performed, the program will next call
for the values of four input parameters whicfi we have
designated as parameters 1, 2, 3, and 4. The value of
parameter 1 becomes the standard deviation of the environmental component of the phenotypic value of an
individual. Parameters 2, 3, and 4 determine the genotypic value of the homozygousfavorable, heterozygous,
and homozygousunfavorable genotypes, respectively,
at each locus. These values apply to all loci, and the
total genotypic value of an individual is determined by
OF THE PROGRAMS
The programs have been written in PL/I to run on the
Amdahl470 V/8 of the Instruction and Research Computer Center of The Ohio State University. The programs have been designed to run interactively from remote terminals under the time sharing option (TSO)
an OS/MVS-SPIoperating system. Object modules of
the programs are maintained on-line. Users of a program achieve execution of it by executing a corresponding on-line commandlist dataset. Wehave chosen to use
command lists to invoke execution of the object
’Contribution from the Dep.of Agronomy
and Genetics, Ohio
State Univ., Columbus,
OH43210.
2Assistant professor, Dep.of Agronomy
and professor, Dep.of
Genetics,OhioState Univ., Columbus,
OH43210.
43
44
JOURNAL OF AGRONOMIC EDUCATION
summingthe contributions of the individual loci. The
simulation program derives the phenotypic value by
adding to the genotypic value an environmental component taken at random from a normal distribution
having mean zero and standard deviation equal to
parameter 1.
The user of the programmust select parents for crossing from a group of seven homozygous lines whose
genetic arrays have been defined in the program. The
seven parental lines have been designed to represent a
range in genotypic values from a line homozygousfor
the favorable allele at 23 of the 32 loci to a line homozygous favorable at only 10 loci. The degree of relationship between lines varies widely so that, for example,
the two lines having the greatest genotypic values are
closely related (they have identical alleles at 26 of the 32
loci), while other pairs of parents are less closely related.
The parent with the lowest genotypic value is distantly
related to the other six as an attempt to simulate a wild
relative of the cultivated species. This wild relative line
carries favorable alleles that are rare or nonexistent in
the other lines. Users of the simulation program are
given the genotypic values and some information about
the interrelationships of the seven parental lines. Thus,
students are aware that merely crossing the two lines
having the greatest genotypic value maynot produce the
best progeny, since such a cross might result in relatively
little transgressive segregation.
The initial cross allowed by the program mayinvolve
two, three, or four parents chosen from the seven initial
lines. Examples of permissable initial crosses include
single (2 x 4), three-way (1 x (2 x 6))and
crosses ((1 x 3) × (2 x
After the type of initial cross and choice of parents
have been specified, the program develops genotypic
arrays for the F, and subsequent generations. The number of F, plants per cross and the numberof F2 plants to
be produced from each F, plant are specified by the
user. Each cross is advancedto the F, generation by producing a single, selfed, progeny genotype from each F2
plant. Genotypic and phenotypic values are determined
for each F5 plant (or F~-derived line), and the user receives a printout of these values for the plants that fall
amongthe top 10 percent of the population with respect
to phenotypic value. The phenotypic value is interpreted
as the performance of the line in a preliminary trial,
while the genotypic value is considered the mean performance of the line in trials conducted in manylocations and years.
An example of the output produced by the program is
shownin Fig. 1. In this example, the user specifies one
run of the program to be performed, with 3.0, 2.0, 1.0,
and 0.0, as the values of parameters one, two, three,
and four, respectively. The cross specified is (2 × 7)
(1 x 3). Ten F, plants are specified, and five F2 plants
per F, plant are to be produced. The simulation concludes by presenting to the user the best 10%of the lines
produced by the simulation based on the resulting
phenotypic values.
Fig. 1. Sample run of computer simulation
Descent).
program SSD (Single
Seed
ex ts1359.ssd.clist’["
Enter the number of runs of the program to be made: ~1
*******************************
RUNI *******************************
Enter the value for parameter 1: 3.0
Enter the value for parameter 2: 2.0
Enter the value for parameter 3: I.~0
Enter the value for parameter 4:0.0
Theseare the available lines:
Line
Genotypic
number
value
1
46.00
2
46.00
3
44.00
4
42.00
5
40.00
6
36.00
7
20.00
Lines 1 and 2 are very closely related. Pairs of lines that are fairly closely related
3 and 4
1 and 3
2 and 5
1 and 5
Line 7 is a poorly adapted line which, nevertheless, carries favorable alleles rare
or nonexistent in the other lines.
Three types of initial crosses are allowed:
Type 1: A × B
Type 2: A x (B × C)
Type3: (A × B) x (C ×
WhereA, B, C, and D correspond to any one of the available lines.
Enter the numberof the type of initial cross to be made:3__
Enter the line numberscorresponding to A, B, C, and D: 2 7 1 3
Enter the numberof FI plants to be produced: 10
Enter the numberof F2 plants per FI plant to be produced: 5__
The best 10%of the lines produced are:
Genotypic
Phenotypic
1
value
value
1
42.00
47.98
2
42.00
46.37
3
41.00
46.18
4
44.00
45.62
5
43.00
45.17
READY
Underlined items are typed input provided by the student.
2. Program EGT--Early Generation Testing
Program EGT simulates the development of pure
lines by meansof a preliminary evaluation of F2-derived
heterogeneous lines, followed by development of F~-derived lines from selected F2 progenitors. Program EGT
uses the same genetic model, set of seven parental lines,
and procedure for allowing the user to specify crosses
and population sizes as employed in program SSD.
Separate
environmental
standard
deviations,
parameters 1 and 2, respectively, are specified for
evaluation of F2-derived lines and for evaluation of Fsderived lines. Ordinarily, the standard deviation used
for F~-derived line evaluation is the smaller of the two,
COMPUTERSIMULATION IN PLANT BREEDING
Fig. 2. Sample run of computer
Generation Testing). ~
simulation
program EGT (Early
Enter the numberof the type of initial cross to be made:3~.
Enter the line numberscorresponding to A, B, C, and D: 2
Enter the number of Fl plantsto be produced: 10
Enter the numberof F2 plants per Fl plant to be produced:
Thebest half of the F2’s producedare as follows:
Identification
number
l
2
3
4
5
6
7
8
9
10
11
12
Phenotypic
value
55.22
53.77
52.33
52.29
46.20
45.38
45.22
44.94
44.94
44.85
44.54
44.28
Identification
number
13
14
15
16
17
18
19
20
21
22
23
24
25
Phenotypic
value
44.27
44.27
44.01
43.53
42.l 1
41.22
40.59
40.45
40.13
39.92
39.38
38.80
38.51
Enter the total numberof F2’s to be used: 2
Enter F2 ID 1 and the numberof FS’s to be produced from that F2:1 20
Enter F2 ID 2 and the numberof FS’s to be produced from that F2:2 l0
Thefollowing are the best fifth of the FS’s producedfrom the F2’s indicated:
Genotypic
value
51.00
48.00
53.00
49.00
47.00
48.00
I
1
2
3
4
5
6
Phenotypic
value
54.74
53.80
53.74
53.54
52.16
51.96
Produced from
F2 line no.
2
1
2
1
2
1
READY
Initiation of the program, parameter entry, and description of lines and crosses
are similar to those of programSSD(Fig. 1).
Underlined items are typed input provided by the student.
reflecting the more extensive replication employable in
the latter generation, although this need not necessarily
be" the case. Parameters 3, 4, and 5 determine the genotypic value of the homozygousfavorable, heterozygous,
and homozygous unfavorable genotypes, respectively,
at each locus. Phenotypic, but not genotypic, values for
the best 50°7o of the F2-derived lines are printed. These
form the basis for selection by the user, who must
specify how manyF,-derived lines are to be produced
from each F2 progenitor. Genotypic and phenotypic
values of the best 20o70of the resulting F,-derived lines
are printed as in program SSD. An example of one run
of the programis shownin Fig. 2.
45
unfavorable individuals (parameters 4, 5, and 6, respectively). All loci contribute equally and display the
same level of dominance. The total genotypic value is
determined by summingthe contributions of individual
loci. The phenotypic value of an individual is determined, as in programs SSD and EGT, by adding an
environmental component to the genotypic value.
Parameter 3 determines the standard deviation of this
environmental component. Two additional parameters,
designated 1 and 2, are arbitrary five-digit odd integers
used in the program’s random number generating subroutine.
Aninitial (cycle 0) population consisting of 50 plants
is generated at random. Expected allele frequencies
vary, with two loci each having frequencies of 0.05,
0.10, 0.20, 0.30, 0.40, 0.60, 0.70, 0.80, 0.90, and 0.95
for the favorable allele and the remaining four loci a frequency of 0.50.
Phenotypic values of the generation zero (or cycle 0)
plants are printed, and the user then indicates the plants
that are to be selected. Randommating of selected
plants is simulated, and a genetic array of generation
1 plants is producted.
In generation 1 and subsequent generations, the
population size falls between 20 and 50, the program’s
random number generator determining the exact value.
The resulting phenotypic values of the progeny are
printed and the selection process is repeated. The user
may continue the simulation for as manycycles as desired.
After the user terminates the simulation by specifying
that no plants are to be selected, the genetic arrays and
genotypic and phenotypic values of the plants of the
most recently produced generation are printed. From
this information, the user can calculate allele and genotypic frequencies. An example of the output produced
by the programis shownin Fig. 3.
The simulation can be used to represent either mass
selection or S, selection. For mass selection, gene action
parameters are defined to be the mean genetic values of
the three genotypes, e.g., a, d, and -a for the homozygous favorable, heterozygous and homozygous unfavorable genotypes, respectively. For S, selection,
these parameters are defined to be the mean progeny
values of the three genotypes, e.g., a, d/2, and - a.
USES OF THE PROGRAMS
3. ProgramRS--Recurrent Selection
Program RS simulates cycles of recurrent selection
for a quantitative trait. The trait is controlled by 24 loci,
which are divided into six linkage groups of four loci
each, with 0.3 as the recombination frequency between
adjacent loci. There are two alleles per locus. Anadditive model or a model displaying any level of dominance
maybe selected by specifying the genetic value of the
homozygous favorable, heterozygous, and homozygous
We have used the programs to provide a simulated
plant breeding experience and to generate material for
homeworkproblems. Examples of such uses include:
1. Calculating expected genetic gain, and comparingexpected gain with gain observed (programs SSD,
EGT, and RS);
2. Comparingthe single seed descent and early generation testing methods for effectiveness (SSD and
EGT);
46
JOURNAL
OF
AGRONOMIC
Fig. 3. Sample run of computer simulation programRS (Recurrent
Selection).
Fig. 3. Continued.
9
ex ts1359.a.clist’l"
Enter the value for 9arameter1: 26847
Enter the value for 9arameter2: 26639
Enterthe value for ~arameter3: 3.0
Enterthe value for ~arameter4: 2.~0
Enterthe value for ~arameter5:
1.0
Enterthe valuefor ~arameter6: 0.~0
10
(Generation: 0)
Plant
Phenotypic
Plant
Phenotypic
Plant
number
value
number
value
number
1
27.52
18
30.55
35
2
32.31
19
33.37
36
3
21.04
20
28.18
37
4
27.81
21
27.30
38
5
25.99
22
17.34
39
6
26.30
23
19.71
40
21.36
24
7
32,87
41
8
23.86
25
23.97
42
26.27
26
9
16.17
43
10
25.10
27
27.45
44
I1
29.78
28
30.03
45
12
29.02
29
33.34
46
13
24.81
30
16.53
47
14
24.92
31
31.58
48
15
21.01
32
24.91
49
26.27
33
25.67
16
50
17
20.63
34
20.45
Phenotypic mean= 25.06
Phenotypicvariance = 21.64
Enter the numberof plants to be selected: 5
Enterthe numbers
of the 5 plants to be mated:2 5 17 31 50.
14
11
12
13
Phenotypic
value
23.63
21.32
16.79
25.51
13.72
24.64
22.09
27,87
23.53
30.79
28.52
25.25
19.27
28.28
25.59
22.93
(Generation:1)
Plant
Phenotypic
Plant
Phenotypic
number
value
number
value
12
29.02
1
16.76
2
19.17
13
33.04
3
20.39
14
22.31
4
28.76
15
30.33
5
18.34
16
21.95
21.09
17
20.69
6
7
23.55
18
21.67
8
23.13
19
19.04
9
25.94
20
22.02
10
24.22
21
17.50
11
23.28
22
32.18
Phenotypicmean= 23.38
Phenotypicvariance = 20.77
Enter the numberof plants to be selected: O
Theseare the genotypesand values of the mostrecently producedprogeny:
Numberof Genotypic Phenotypic
Plant
number
Genotype
1 alleles
value
value
1
0000000000011011
10111111
0000000011001011
01001010
19
19.00
16.76
2
000000101100111111001110
00000~100001010
11100111
21
21.00
19.17
3
0000000011001110
11011110
000000101100111001001010
20
20.00
20.39
4
1000100100101111
10011111
0000000000011011
10110111
24.00
28.76
24
0000010000011011
101101ll
5
0000030100100011
11111111
23.00
18.34
23
6
000000010100001011101111
000000010101101001101111
21
21.00
21.09
7
0000000000011011
10110111
0000010000011011
11110111
22
22.00
23.55
0000000100011010
11101111
8
000000101101111111011110
25
25.00
23.13
(continued)
EDUCATION
15
16
17
18
19
20
21
22
0000000011011010
11001011
000000010110001111111111
000(K)(~I0100011001101111
100010111111110110010111
~100011111 01100111
0000000011001111
01011011
0000000011001111
01011111
0000000011001111
11011110
000000010110011111111111
0000000100100111
11111111
~100101111 11101111
000000101101111001011011
1000100100111101
11111111
100000111111110111011111
1000100100111101
10110111
090010110111110111010111
0000013010100011011100111
0000000100101011
10101111
0000010000011011
11110111
~100100111 11111111
0000000000011011
10110111
000000010101111101100111
O0~OlO000011011
10110111
0000000101010111lllO0111
0000000011001010
11001010
0000000100011010
01100111
0000010000311011
11111111
000010111111110111011111
23
23.00
25.94
26
26.00
24.22
22
22.00
23.28
24
24.00
29.02
27
27.00
33.04
25
25.00
22.31
33
33.00
30.33
29
29.00
21.95
21
21.00
20.69
25
25.00
21.67
22
22.00
19.04
23
23.00
22.02
17
17.00
17.50
30
30.00
32.18
Underlineditems are typedinput providedby the student.
3. Comparing wide and narrow crosses with respect to
the frequency of transgressive segregation (SSD and
EGT);
4. Studying the effectiveness of recurrent selection with
different levels of dominance(RS);
5. Graphing changes in population mean and genetic
variance over cycles of sel,ection (RS);
6. Observing the effect of different effective population
sizes and selection intensities on allele fixation (RS);
7. Verifying that the expected Hardy-Weinberg genotypic frequencies are approximated by those in a
random-mated, finite population (RS);
8. Calculating genetic variances from allele frequencies
and mean values of genotypes (RS).
BENEFITS
Webelieve that the use of computer simulation programs in an introductory plant breeding course has
several benefits:
1. Students becomefamiliar with the steps involved in
the breeding methodsthat are simulated.
2. The programs provide a convenient alternative to the
use of data from actual selection experiments as material for problemsets. In 30 to 60 min, a student can
generate a unique data set representing an experiment that would have required 5 to 10 years in the
field. In addition, the simulations provide information, such as allele frequencies, that cannot be obtained in field experiments.
The programs provide realistic
examples of
important concepts, such as genetic drift, selection
amongvs. within heterogeneous lines, and heritabili-
DEMONSTRATION PLOTS AS EXTENSION TOOLS
ty. For example, in using program RS, students experience first-hand the frustration of dealing with
traits having a heritability less than unity when they
repeatedly observe that the mean phenotypic value
of progeny is less than that of the parental plants
they selected.
4. The simulations require the student to make some of
the decisions, e.g., concerning selection and allocation of resources, that plant breeders must make.
5. The simulations stimulate interest in plant breeding.
We encourage this interest by holding an informal
contest each quarter to determine which student can
produce the greatest genetic gain with one or more of
the programs. We also find that, even after all assignments have been turned in, several students continue to "play" with the programs on their own.
Student evaluations have indicated a favorable reaction to the use of the simulations. It is important to assure students, particularly those who have no previous
47
experience with computers, that no knowledge of computer science is necessary to use these programs, but it is
also essential to brief students on the local computer
facilities available and how to use the programs.