E RGO N OM ICS , 2000, VOL . 43, NO. 1, 73 ± 105 Development of physical selection procedures for the British Army. Phase 2: Relationship betw een physical performance tests and criterion tasks M . R AY SON *, D. H OLLIM AN and A. B E LYA VIN Centre for Human Sciences, Defence and Evaluation Research Agency, Farnborough GU14 OLX, UK Keywords: Personnel selection; Military personnel; Physical ® tness; Employm ent standards; Manual material handling. This paper is the second in a series of three to describe the developm ent of physical selection standards for the British Army. The ® rst paper de® ned criterion tasks ( single lift, carry, repetitive lift and carry, and loaded march tasks ) and set standards on the criterion tasks for all British Army trades. The principal objective was to determine which com bination of physical performance tests could be best used to predict criterion task performance. Secondary objectives included developing so-called `gender-free’ and `gender-unbiased’ models. The objectives were met by analysing performance data on the criterion tasks and a large battery of physical performance tests collected from 379 trained soldiers ( mean age 23.5 ( SD 4.45 ) years, stature 1734 (SD 79.5 ) mm, body mass 71.4 (SD 10.58 ) kg ). Objective 1 was met: the most predictive physical performance tests were identi® ed for all criterion tasks. Both single lift tasks were successfully modelled using muscle strength and fat free mass scores. The carry model incorporated muscle endurance and body size data, but the errors of prediction were large. The repetitive lift models included measures of muscle strength and endurance, and body size, but errors of prediction were also large. The loaded march tasks were successfully modelled incorporating indices of aerobic ® tness, supplem ented by measures of strength, endurance or body size and com position. The secondary objectives were partially ful® lled, though limitations in the data hampered the process. Although only one model ( a loaded march ) was gender-free, three models were genderrelated (i.e. contained `gender’ explicitly in the model ). The remaining six were gender-speci® c ( i.e. were appropriate for men or for women ). Owing to both a lower accuracy of prediction in women’ s scores and a greater tendency for the women’ s scores to be distributed around the pass standards, a greater percentage of women than men were misclassi® ed as passing or failing, resulting in indirect discrimination. A validation of the models in a separate sample of the user population of recruits is reported in the third paper in this series. 1. Introduction 1.1. Background This paper is the second in a series of three that describes the phases involved in developing physical selection standards for the British Army (Rayson 1997 ). The ® rst *Author for correspondence at: Optimal Perform ance Ltd, Old Chambers, 93 ± 94 West Street, Farnham GU9 7EB, UK. e-mail: markrayson @ cwcom.net E rgonom ics ISSN 0014-0139 print/ISSN 1366-5847 online Ó 2000 T aylor & Francis Ltd http:// www.tan df.co.uk/ journals / tf / 00140139.htm l 74 M . Rayson et al. phase described the methods and ® ndings from a job analys is and identi® ed criterion tasks that could be used as the basis for developing physical selection standards ( Rayson 1998 ). The criterion tasks re¯ ected the ® ndings from a job analys is, overlaid by subject-matter expert opinion and logistical constraints, and comprised single lift, carry, repetitive lift and loaded march tasks. Soldiers in all trades were allocated to one of three levels ( referred to as levels 1, 2 or 3 ), which represented the required standards or acceptance criteria. The criterion tasks and levels are summ arized in table 1. For exam ple, an Infantryman was required to achieve level 1 on the single lift and carry, level 2 on the repetitive lift, and level 1 on the loaded march. Although ideally, recruits would be directly tested on the criterion tasks ensuring a high content validity, this is not usually practicable for reasons of safety ( e.g. a 44 kg box lift to 1.70 m in novices m ay not be regarded as acceptable ), skill Ð 1 requirements ( e.g. a repetitive 10 kg box lift, six times min requires skill as well as ® tness ) and logistics (a 2 h march is not viable as a selection test ), and surrogate tests are used to predict performance in m any of the arm ed and unarm ed services. This paper describes the relationships between physical performance tests and the criterion tasks, potentially enabling a battery of physical selection tests to be chosen and used for selecting personnel who have the physical prerequisites for the various British Army jobs. For the purposes of selecting a battery of potential physical selection tests, measurements of physical ® tness, body size and body composition were considered. In the context of physical selection, physical ® tness may be de® ned as the physical capacity to meet the dem an ds of the occupation. Thus, physical capacity will re¯ ect both innate capab ility and training-induced enhancement of performance. Physical ® tness may be classi® ed into anaerobic and aerobic components. Anaerobic ® tness refers to the muscle’ s ability to produce energy without the presence of oxyge n via phosphate compounds stored in the muscle cells or by anaerobic glycolysis. The size and speed of contraction of the muscle will be the prim ary determinants of anaerobic ® tness ( Jones and Round 1992 ). Aerobic ® tness refers to the ability of the heart, lungs and circulatio n to allow the muscle to work aerobically by supplying oxygen and carbohydrate and fat substrates (Astrand and Rodahl 1986 ). Table 1. Criterion tasks and levels of performance. Criterion task/ level Single lift ammunition box Carry 2 ´ 20 kg water cans 1 44 kg to 1.70 m 210 m Repetitive lift and 10 m carry ammunition box Loaded march 12.8 km in 120 min 44 kg, ground to 1 1.45 m, 1.min for 20 min 25 kg load 22 kg, ground to 1 1.45 m, 3.min for 15 min 20 kg load 10 kg, ground to 1 1.45 m, 6.min for 10 min 15 kg load Ð 2 35 kg to 1.45 m 90 m Ð 3 20 kg to 1.45 m 30 m Ð Physical selection procedures for the British Army 75 1.2. Relationship between criterion task and physical performance test performanc e The relative importance which diŒerent aspects of physical capability have on task performance depends upon the task in question, the individual perform ing the task, and the environment in which it is performed. There have been numerous studies which have reported on the association between lifting, carrying an d marching tasks ( e.g. Sharp et al. 1980, Teves et al. 1985, Beckett and Hodgdon 1987, Nottrodt and Celentano 1987, M ello et al. 1995, Stevenson et al. 1992, 1994 ), which are broadly similar to the criterion tasks identi® ed for this project ( Rayson 1998 ), and a variety of physical performance tests. M any studies investigating material handling performan ce have employed psychophysical methodology ( Ayo ub and M ital 1989 for a review ). The majority of these studies has been omitted from this review as the psychophysical approach is invalid for assessing true maxim al working capability ( Dueker et al. 1994 ). Knowledge of maxim al working capability is essential to the development of physical selection standards for occupations where physical performance is of param ount im portance. W here more objective evidence of maxim um task performance is lacking, for exam ple for repetitive lift tasks, brief mention is made of selected psychophysical studies to add to the knowledge base. Useful clues may be gleaned from published data that assist the process of assembling a battery of potentially predictive physical performance tests. However, caution must be exercised in interpreting the applicability of the data from previous studies where diŒerences exist between the task protocols, the populations under investigation, the composition of the test batteries and the manner in which gender was accounted for ( i.e. whether the impact of gender was explored when developing the relationships ). 1.3. Predictors of single lift performanc e It is readily apparent from the literature that a number of diŒerent components of physical capability are related to brief or single lift performance ( Poulsen 1970, Sharp et al. 1980, Pytel and Kam on 1981, Ayoub et al. 1982, Teves et al. 1985, Beckett and Hodgdon 1987, N ottrodt and Celentano 1987, D ueker et al. 1994 ). These include body size and composition, static and dynam ic strength / power, and static and dynam ic endurance. M easures of body size and composition were not included frequently or comprehensively in these studies, perhaps because they are not am ong the best predictors of lifting capacity. However, fat free mass ( FFM ) (Sharp et al. 1980, Teves et al. 1985, Beckett and Hodgdon 1987, Nottrodt and Celentano 1987 ) consistently featured am ong the best predictors of lifting. Body mass and bi-iliac breadth ( Nottrodt and Celentano 1987 ) were also cited am ong the better predictors. Tests of static and dynam ic strength have been used m ost frequently to predict material handling performan ce and there has been considerable debate in the scienti® c literature as to their relative merits. Som e authors ( e.g. Cha n 1974 ) favour static tests for reasons of reliability, ease and safety. But m aterial handling com prises dynam ic as well as static muscle contractions involving inertial forces and this has encouraged a recent trend towards measurement of dynam ic strength and simulated tasks. W hen these movem ents are taken into account (e.g. during dynam ic strength measurement ) higher correlation coe cients usually result and fewer independent varia bles are required to predict performance ( M ital et al. 1986b , Ayoub and M ital 1989 ). For these reasons, dynam ic strength testing is favoured by many authors 76 M . Rayson et al. ( Aghazadeh and Ayoub 1985, Pytel and Kam on 1981, Kam on et al. 1982, Kroemer 1983, 1985, M ital 1985, M ital et al. 1986a, b ). The studies reviewed for this paper indicate the value of both static and dynam ic strength tests. Six of the studies included both types of strength test. Of these, three studies reported the importance of both types of test ( Aghazadeh and Ayoub 1985, Teves et al. 1985, Nottrodt and Celentano 1987 ) while the remaining three studies reported dynam ic strength tests to be superior predictors (Ayoub et al. 1982, Beckett and Hodgdon 1987, M ello et al. 1995 ). The relative superiority of dynam ic versus static tests may depend upon the range an d speed of movem ent and the joint an gles involved in executing the task. Static strength tests that were strongly associated with lift performance included upright pull ( Sharp et al. 1980, Teves et al. 1985, Nottrodt and Celentano 1987 ), back ( Poulsen 1970 ), arm and shoulder strength ( Poulsen 1970, Aghazadeh and Ayoub 1985 ). Dynam ic strength / power tests strongly associated with lift performance included the Incremental Lift M achine (ILM ) test ( Ayoub et al. 1982, Teves et al. 1985, Beckett an d Hodgdon 1987, Nottrodt and Celentano 1987 ), vertical and broad jump ( Beckett and Hodgdon 1987 ), and isokinetic back extension ( Pytel and Kam on 1981 ) and isokinetic lift power ( Pytel and Kam on 1981, Aghazadeh and Ayoub 1985 ). Two endurance tests stood out as being am ong the better predictors of the single lifting tasks: a 70 lb hold at elbow height ( Ayoub et al. 1982 ), and push-ups ( Beckett and Hodgd on 1987 ). The strength of the relationships betw een test and task scores can be assessed, in part, by the magnitude of the correlation coe cients ( r ) which were cited in all of the reviewed studies. Caution must be exercised when com paring r between studies as it is a function of the range of data as well as the strength of the relation between the two variab les. Som e of the studies presented r for the genders separately, while others presented combined data. The highest r (0.87 ± 0.96 ) for the genders separately were reported by Pytel and Kam on ( 1981 ) between isokinetic strength scores an d lift scores. Nottrodt and Celentano ( 1987 ) also reported high single-gender r betw een a 1.33 m lift and lean body mass in females ( 0.76 ) and the ILM test in males ( 0.71 ). The highest pooledgender r were between lift performance and ILM ( 0.77, M yers et al. 1984; 0.65, Teves et al. 1985 ), and between lift performance and FFM ( 0.88, Sharp et al. 1980; 0.74, M yers et al. 1984; 0.66, Teves et al. 1985 ). Upright pull ( 0.76, Sharp et al. 1980; 0.64, Teves et al. 1985 ), and hand grip strength ( 0.80, Sharp et al. 1980 ) also correlated highly. r dropped signi® cantly ( typically to 0.3 ± 0.5 ) when the genders were analys ed separately (e.g. M yers et al. 1984, Teves et al. 1985 ). Usually, r was higher for men than for wom en ( e.g. M yers et al. 1984, Teves et al. 1985 ). 2 Several of the studies cited correlation coe cients squared ( r ) indicating the am ount of variab ility accounted for by the physical performance tests, derived from pooled-gender data in multiple regression models ( Sharp et al. 1980, Teves et al. 1985, Beckett and H odgdon 1987, Nottrodt and Celentano 1987 ). The latter 2 produced the model accounting for the most variation, citing a r = 0.88 for the ILM 2 test and FFM . W hen the genders were separated, r dropped to 0.68 for men and 0.65 for women. These values were still substantially higher than those derived from 2 the data of Teves et al. ( 1985 ), which produced r = 0.47 for pooled gender data ( also from the ILM test and FFM ), 0.33 for m en and 0.11 for women. The best model produced by Beckett and Hodgdon ( 1987 ) incorporated vertical jump and push-ups ( r 2 = 0.82 ), though the ILM test alone produced a r 2 = 0.79, and FFM and push-ups Physical selection procedures for the British Army 77 2 a r = 0.77. The model of Sharp et al. (1980 ) included FFM , upright pull and gender, 2 producing a r = 0.79. There was a reasonable consistency am ong the physical performance tests retained in the multiple regression models ( prim arily a combination of FFM and a strength test ). The models with their coe cients are shown in table 2. 1.4. Predictors of repetitive lift performance There have been few studies investigating repetitive or sustained lifting performance that have not used psychophysical methodology. M ello et al. (1995 ) investigated a repetitive lift task involving the maxim um weight male and female participants could lift from the ground, carry 10 m and lift to a platform at 1.32 m at 1 and 4 Ð Ð 1 1 lifts.min for 1 h. At 1 lift.min , m ean arm power from a W ingate test and bench press provided the largest correlation coe cients (0.90 an d 0.88 respectively ). At 4 Ð 1 lift.min , FFM of the arm ( r = 0.83 ), total FFM ( 0.82 ), sex ( 0.82 ) and stature ( 0.81 ) were the most highly correlated tests with maxim um weight of lift. The best Ð 1 equation for the maxim um weight at 1 lift.min incorporated stature and bench Ð 1 press and the maxim um weight at 4 lift.m in incorporated stature, FFM of the arm 2 and total FFM ( both r = 0.88 ). The models are shown in table 2. 1.5. Predictors of carry performance M easurements of anthropom etry, body composition, static an d dynam ic strength, static and dynam ic endurance and aerobic ® tness (i.e. all aspects of physical ® tness ) were am ong the best predictor tests in the seven studies reviewed ( M yers et al. 1984, Stevenson 1985, 1988, 1992, 1994, Beckett and Hodgdon 1987, Rice and Sharp 1994 ). M easurem ents of body size and composition were rarely investigated. Stature was the only measure that appeared am ong the best predictors, albeit in just one of the two studies in which it was measured ( Rice and Sharp 1994 ). FFM was identi® ed as am ong the best predictors in only one of the three studies in which it was measured ( M yers et al. 1984 ). Isometric hand grip and upright pull were the most predictive static strength tests. Hand grip appeared am ong the best predictors in ® ve of the six studies in which it was measured (Stevenson et al. 1985, 1988, 1992, 1994, Rice and Sharp 1994 ). Upright pull appeared as a good predictor in one of the two studies in which it was measured ( Rice and Sharp 1994 ). Dynam ic lift strength measured on the ILM was a strong predictor in four of the ® ve studies in which it was measured (M yers et al. 1984, Beckett and Hodgdon 1987, Stevenson et al. 1988, Rice and Sharp 1994 ). Broad jump was also am ong the best predictors in a single study (Beckett an d Hodgd on 1987 ). Among the endurance tests, isom etric hand grip ( Stevenson et al. 1985, 1988 ), ¯ exed arm hang ( Stevenson et al. 1985, 1988 ), push-ups ( Stevenson et al. 1985, 1988, 1994 ) and sit-ups ( Stevenson et al. 1992 ) appeared am ong the best predictors. M easures of aerobic . ® tness ( VO 2m a x and run time ) were also consistently am ong the better predictors ( Beckett and Hodgdon 1987, Stevenson et al. 1988, 1992, 1994, Rice and Sharp 1994 ). r between test scores and carry task scores was generally lower than found for the lift tasks. The highest single-gender r ( 0.74 for hand grip strength, 0.76 for 2-m ile run ) were reported by Rice and Sharp ( 1994 ), which was promising as this task approxim ated most closely the carry criterion task under scrutiny in this .study. The data of Stevenson et al. ( 1988, 1994 ) also produced a few r > 0.6 for VO 2 m ax and Beckett and Hodgdon (1987 ) Mello et al. (1995 ) Mello et al. (1995 ) Beckett and Hodgdon (1987 ) Stevenson et al. ( 1988 ) Stevenson et al. ( 1988 ) Rayson et al. ( 1993 ) MLC elbow MRLC132 1 lift /min MRLC132 4 lift /min No. of 34 kg boxes carried 51.4 m in 2 ´ 5 min Timed 750 m simulated stretcher carry No. of 20 kg sandbags carried 50 m in 10 min Maximum load at 1 6.4 km.h Ð Ð Ð Teves et al. ( 1985 ) 0.55 + 0.87 * FFM + 0.55 * ILM183 8.466 + 0.9933 * FFM + 0.006349 * upright pull 4.777 * gender Best multiple variable models 0.7 * FFM 10.214 * 1.5 mile run+ 0.029 * broad jump * body mass Ð 0.05 * ¯ exed arm hang Ð . 1 25.3 * VO 2m ax ( 1.min )+ 0.11 * ankle plantar ¯ exion torque + 0.89 *age Ð 1.22 * body fat Ð . 2.18 + 0.23 * VO 2m ax + 0.07 * ILM + 0.01 * hand grip endurance . 0.28 + 0.24 * VO 2m ax + 0.02 * hand grip Ð 944 * 1.5 mile run + 0.697 * ILM152 . 2923 Ð 16.55 * hand grip Ð 24.81 * VO 2m ax 372 Ð 373 Ð 57.3 + 0.53 * stature + 4.1* armFFM 32.8 + 0.28 * stature + 0.22* bench press 5.762+ 0.029 * broad jump * body mass+ 0.297 * push-ups Ð Ð MLC132 Ð Sharp et al. ( 1980 ) Author MLC132 Criterion task Females Females years > 34 Males years > 34 Females years > 34 Pooled Pooled Pooled Pooled Pooled Pooled Pooled Sample 2 0.71 0.61 0.54 0.57 0.53 0.53 0.88 0.88 0.82 0.47 0.79 r Table 2. Multiple regression models for predicting criterion tasks from the literature. Presented from the reviewed literature are: the single lift (MLC, maximum lift capacity ), repetitive lift ( MRLC, maximum repetitive lift capacity ), carry and loaded march performance models from various population samples. 78 M . Rayson et al. Physical selection procedures for the British Army 79 push-ups in a sandbag carry task in older men, and hand grip strength and hand grip endurance in a stretcher carry in older women. Similarly, the multiple regression carry models accounted for less of the variation than had been found in the lift models. M any of the single-gender models had such 2 low r that they were unusable ( e.g. M yers et al. 1984, Stevenson et al. 1985, 1992 ). 2 However, two of the studies produced models with moderate r ( > 0.5 ) ( table 2 ). A measure of aerobic ® tness combined with a strength- or muscular endurance-related measure featured in each equation. 1.6. Predictors of march performance The test batteries were reasonably comprehensive in the ® ve studies reviewed ( Dziados et al. 1987, M ello et al. 1988, Knapik et al. 1990, Rays on et al. 1993, Frykman and Harm an 1995 ), encompassing all aspects of physical capability. Interestingly, it appeared that all aspects of physical capability, including anthropometry and body composition, strength, endurance and aerobic power provided the best predictors of march performance. Some of the studies included a limited number of m easurem ents of body size. Shoulder diam eter ( Frykman and Harm an 1995 ) and stature ( Rayson et al. 1993 ) were the only two measurements to ap pear am ong the best predictors. Despite the widespread measurem ent of body composition, percentage body fat ( Rayson et al. 1993 ) and FFM ( Knapik et al. 1990 ) were am ong the best predictors in only two of the reviewed studies. Among the strength tests, several isometric and isokinetic variab les were am ong the best predictors, including isom etric upper torso, hand grip and trunk ¯ exion strength ( Knapik et al. 1990 ), isokinetic knee ¯ exion (Dziados et al. 1987, M ello et al. 1988 ), knee extension strength ( M ello et al. 1988 ), and plantar ¯ exion strength ( Rayson et al. 1993 ). Several of the studies included selective measurements of m uscle endurance of the knee ¯ exors an d extensors. Isokinetic knee ¯ exion endurance ( M ello et al. 1988 ) and squat endurance ( Frykm an and Harm an 1995 ) were am ong the best predictors. Given the endurance nature of a sustained marching task it was surprising to ® nd that m easures of m uscle strength were often superior predictors to measures of muscle endurance. This topic is addressed further in the D iscussion. M easurem ents of aerobic ® tness were am ong the best predictors in four of the ® ve studies ( Dziados et al. 1987, Knapik et al. 1990, Rayson et al. 1993, Frykman and Harm an 1995 ). The highest r was recorded between m arch performance and . VO 2 m ax ( Fryk man and Harm an 1995 ) with a value of 0.84. However, the distance of this m arch task was only 3 km. Three additional studies incorporating march distances of between 2 and 20 km produced moderate r between 0.42 and 0.59 between march time and the m ost highly correlated physical tests of strength ( knee ¯ exion and abdominal strength ) and aerobic power. Three of the studies ( Dziados et al. 1987, Knapik et al. 1990, Rayson et al. 1993 ) attempted to produce multiple regression models to predict march performan ce. Dziados et al.’ s model to predict march time over 16 km with an 18 kg load ( 1987 ) 2 produced a r = 0.21 incorporating knee ¯ exion strength only. The model of Rayso n . 2 et al. ( 1993 ) produced a r = 0.71, incorporating VO 2m a x , ankle plantar ¯ exion, age and body fat (table 2 ), but the model is not directly relevant as it was used to predict Ð 1 the maxim um tolerable load in women for short duration marching at 6.4 km.h . Knapik et al.’ s attempts to model a 20 km march with 46 kg load ( 1990 ) were 80 M . Rayson et al. ham pered by missing data. Their best model incorporated abdominal strength only, 2 but a r is not provided. In summary, the best loaded march models included measures of aerobic ® tness, strength and body com position. 1.7. Sum mary Variable success has been achieved in relating performance on physical tests with perform ance on occupational tasks that approxim ate the criterion tasks identi® ed for the British Arm y. A considerable range in the am ount of variation in the criterion tasks scores accounted for by the test scores has been reported. Som e studies have shown strong relationships, and while others are weak, some of these weak relationships may be attributed to the fact that many potential predictor tests were not included in the reviewed studies. The absence of a measure must not be equated to the absence of a correlation. Indeed, many of these studies had completely diŒerent objectives from our own and the developm ent of prediction equations was not of prim ary importance. Further, the study populations diŒered from the population of interest in this investigation, and the man ner in which the gender issue had been dealt with in previous studiesÐ ensuring that the relationships between criterion task and predictor tests was equally valid for m en and womenÐ was not always satisfactory. 2. Objective The principal objective of this study was to determine which combination of physical perform ance tests (in multiple regression equations ) could most accurately predict criterion task performance in trained British Army personnel. Secondary objectives included developing so-called `gender-free’ modelsÐ to provide common physical selection tests and standards for men and women, and `gender-unbiased’ modelsÐ to ensure the models did not disproportionately m isclassify either gender. The physical perform ance tests that most successfully ful® lled these criteria would be retained for validation as selection tests in a separate sam ple of the user population ( to be published in the third paper in this series ). 3. M ethods 3.1. Study design The objectives were met by administering the criterion tasks and a short-listed battery of physical performance tests to a representative sam ple of trained soldiers in a cross-sectional study. The study took place between September and Novem ber 1994, at a British Army base in W iltshire. 3.2. Participants Each Arm and Service in the British Army was requested by the project client ( the Directorate of M anning ( Army )) to nominate ~ 20 m ale and 20 female soldiers representing the variety of trades and range of ® tness contained within that Arm and Service. Only soldiers medically classi® ed as fully deployable were nom inated. The authors do not know how many potential participants were screened out by their units. Those nominated were then medically screened via questionnaire and any contra-indications were followed up by a consultation with a civilian physician on the ® rst test occasion. This process screened out six participants. Finally, all participants provided informed consent to participate. Three hundred and four men and 75 women ( mean age 23.5 (SD 4.45 ) years, stature 1734 (SD 79.5 ) mm, body mass 71.4 (SD 10.58 ) kg ) took part. Physical selection procedures for the British Army 81 The target sam ple size of 300 men and 300 women based upon a power analysis was not achieved for women for two reasons. First, female soldiers were not represented in all Arms and Services, and were sparsely represented in others, inevitably leading to a shortfall in num bers. Second, their relatively small numbers overall ( ~ 5% of Army personnel ) made recruitment to the study di cult even in those Arms and Services where female soldiers were reasonably well represented. For both genders arm y commitm ents inevitably took precedence over participation in the study and this aŒected both the numbers nominated to take part in the study and in the compliance of the soldiers to attend for testing over a num ber of days. 3.3. Criterion tasks The criterion tasks consisted of two single lift tasks, a carry task, three repetitive lift and carry tasks, and three loaded march tasks. The rationale for these criterion tasks has been published elsewhere ( Rayson 1997, Rayson 1998 ) and is summarized in Section 1.1. The participants were asked to perform the criterion tasks to their individual maxim um without exposing themselves to undue risks. Army boots, lightweight trousers, and shirt and combat jacket were worn at all times. Typically, the single lift task was performed in the morning and the carry task in the afternoon of day 1. The repetitive lift task was performed on day 2 and the loaded march on day 3. Some ¯ exibility in scheduling was required to m inimize the con¯ ict with other duties and to comply with availab ility of participants, facilities and staŒ. 3.3.1. Single lift tasks: The two single lift ( SL ) tasks involved lifting an am munition box (dim ensions 480 ´ 200 ´ 190 m m ) with side handles, with increasing loads from the ground to 1.70 m ( SL170 ) and to 1.45 m ( SL145 ) respectively. Participants were advised on correct lifting techniques, but essentially the lift was freestyle. Participan ts ® rst attempted a load of 10 kg, and after each successful lift, 5 kg ( or 4 kg after 40 kg ) was ad ded to the box until the participant could not execute the lift at the ® rst attempt, or until a m axim um load of 72 kg had been achieved. An upper limit of 72 kg load was imposed by the capacity of the am munition box. A 10 s limit was imposed for com pletion of each lift and a minimum of 1 min rest was enforced between attempts. The maxim um load lifted successfully was recorded as the score ( kg ). 3.3.2. Carry: The carry ( C ) task required participants to walk continuously up and Ð 1 down a 30 m course at a prescribed pace of 1.5 m.s , carrying by the handle, one standard Army issue plastic water can of 20 kg, in each hand, for as long as possible. The end point was de® ned by the participant’ s inability to m aintain the prescribed pace or the inability to maintain a hold on the cans, and the task was scored as the duration in seconds. No maxim um tim e limit was set. 3.3.3. Repetitive lift and carry: The three repetitive lift and carry ( RLC ) tasks required participants to lift a loaded ( 10 kg (RLC10 ), 22 kg (RLC22 ) or 44 kg ( RLC 44 )) am munition box ( dimensions 480 ´ 200 ´ 190 m m ) from the ground, and carry it 10 m to a platform of 1.45 m height and place it on the platform ( one shuttle ). The participant then picked up the box, carried it back to the start point and placed it down, ¯ at on the ground under control ( also one shuttle ). Trades allocated Ð 1 to the RLC 44 worked at 1 shuttle.min , those allocated to RLC22 at 3 Ð Ð 1 1 shuttles.min and those allocated to RLC10 at 6 shuttles.min . Participants were 82 M . Rayson et al. advised on correct lifting techniques, but essentially the manoeuvre was freestyle. Participants continued until they were unable to lift the box or to sustain the work rate at the prescribed pace. The maxim um duration that the participants achieved, up to a m axim um of 60 min constituted the score ( seconds ). 3.3.4. Load ed march: The loaded m arch ( LM ) required participants to complete, by walking and running, a 12.8 km ¯ at bitumen course as quickly as possible, with a 15 kg ( LM 15 ), 20 kg ( LM 20 ) or 25 kg ( LM 25 ) Bergen rucksack. Participants started individually and were encouraged to adopt a pace that would result in the fastest tim e. Time to completion constituted the score (m inutes ). 3.4. Physical performance tests A battery of physical performance tests was assembled on the basis of three criteria. First, the literature reviewed in Sections 1.2 ± 1.6 identi® ed a number of physical perform ance tests which had been shown to be predictive of sim ilar criterion tasks. Second, discussions with subject-matter experts identi® ed some additional tests which assessed aspects of body size or performance which might com plement those determined from the literature review and account for additional variation in the criterion task scores. Third, pragm atic constraints ( e.g. envisaged limited time and technology at the Recruit Selection Centres where the tests would ultimately be conducted ) im pacted on the selection of the performance test battery. All the tests were scrutinized in a separate pilot study and found to be viable. The participan ts were asked to perform all physical performance tests to their individual maxim um without exposing themselves to undue risks. Army physical training ( PT ) clothingÐ T-shirt, shorts and training shoesÐ was worn for all of the tests. 3.4.1. Anthropom etry: Stature ( mm ), arm span ( mm ), body m ass ( kg ), bi-acromial and elbow diam eters ( mm ), neck girth, chest girth ( at nipple height and below the breasts ), waist girth ( at level of the umbilicus and at the narrowest girth ), and gluteal girths ( mm ) were m easured according to the methods described in Collins ( 1990 ). 3.4.2. Body com position: Percentage body fat ( % ) and FFM ( kg ) were estim ated via skinfold measurements at the biceps, triceps, subscapular and supra-iliac sites ( Durnin and W omersley 1974 ), and two electrical impedance devices used by the British Army (Bodystat, Bodystat Ltd, Isle of M an, UK; and ElectroLipoGraph, Biologics, Henley-on-Tham es, UK ) which employed diŒerent algorithms. 3.4.3. Static strength: During all of the static strength tests, one practice attempt at 50% eŒort was followed by three maxim al eŒorts of 3 ± 5 s, each separated by 30-s rest intervals. Upright pull ( N ) at 38 cm ( Cha n 1975, Knapik et al. 1981 ) and 85 cm height from the ground, and arm ¯ exion strength (N ) ( Herm ansen et al. 1972 ) with the arm ¯ exed at 90 8 were measured using a dynam ometer ( Takei, Cranlea, Birmingham , UK ). Hand grip strength of both hands was measured using a dynam ometer (M IE, M edical Research Ltd, Leeds, UK ) and the mean score recorded (Caldwell et al. 1974 ). Back extension strength ( N ) was measured following the method of Hermansen et al. ( 1972 ). Plantar ¯ exion strength ( N ) was measured using a purpose-built apparatus. The participant was seated with the dominant leg positioned so that the lower leg was at 90 8 to the ¯ oor and the angle of the knee was at 80 8 , with the calliper positioned around the thigh just proxim al to the knee. The Physical selection procedures for the British Army 83 participant was instructed to exert a maxim al upward push again st the calliper by using the plantar ¯ exor muscles to raise the heel. 3.4.4. Dynam ic strength / pow er: M axim um lift performance was measured between a han dle height of 0.3 m and overhead using a hydro-dynam ometer ( Pinder and Grieve 1997 ) according to the methods of Grieve and van der Linden ( 1986 ). M axim um lift performance on an ILM ( kg ) from a handle height of 0.3 ± 1.45 m ( ILM 145 ) and 1.70 m ( ILM 170 ) was also assessed ( M cDaniel et al. 1983 ) using the modi® cations to the procedures proposed by Stevenson et al. ( 1996 ). 3.4.5. M uscular endurance: Six measures of muscular endurance were included. Static arm ¯ exion endurance with an am m unition box weighing 14 kg was measured. The participant stood with feet shoulder width apart and knees slightly ¯ exed. The box was picked up and held just clear of the body with the elbows by the side and ¯ exed at 90 8 and the lower arm parallel to the ¯ oor. The m axim um duration that the participant could maintain this position was recorded ( seconds ). Dynam ic arm ¯ exion endurance with a 15 kg barbell was measured. The test involved the participant standing with the back ¯ at again st a wall, the barbell grasped in an underhand grip, with the hands shoulder width apart and the arm s straight. The participant curled and lowered the barbell in a controlled fashion in time with a metronome. The m etronome commenced at 20 one-directional Ð Ð 1 1 movem ents min and increased by 2 movem ents min thereafter. The endpoint of the test was de® ned by the participant’ s inability to maintain the work rate while maintaining good `form’ up to a m axim um duration of 5 min and the test was scored ( seconds ). Dynam ic shoulder endurance was measured via an upright row manoeuvre with a 15 kg load on a stacked weigh t system. The participant knelt on a padded platform, facing the machine in an upright posture. The bar was grasped with an overhand grip. The test com prised performing a repetitive upright row manoeuvre at a prescribed cadence ( as for the dynam ic arm ¯ exion endurance test ), by bringing the bar to the level of the clavicles, by taking the elbow s outwards and upwards, while the hips and back rem ained stationary. The endpoint of the test was de® ned by a failure to maintain a full range of movement or failure to lift in time with the cadence and the test was scored ( seconds ). The sit-up test used was the Abdominal Curl Conditioning Test as described by Brewer and Davis ( 1993 ) involving curling up and down in time with a prescribed cadence until failure. The test was scored ( seconds ). The push-up test involved performing standardized push-ups at a prescribed cadence. The participant adopted a prone position on the ¯ oor with the hands positioned at shoulder width. The participant pushed up using the muscles of the arm s an d chest until the arm s were extended and the lower body was pivoting on the toes, while maintaining the back straight. The participant then lowered the body until the elbows were ¯ exed at 90 8 and then returned to the start position. This manoeuvre was conducted in tim e to a metronome at the rate described for the dynam ic arm ¯ exion endurance test. The endpoint of the test was de® ned as failure to m aintain the required work rate or loss of correct `form’ . The duration was recorded ( seconds ). Pull-ups were performed on a wooden gym beam with the hands shoulder width apart, and the beam grasped in an underhand grip. From the start position with the 84 M . Rayson et al. arm s straight the participant performed as many pull-ups as possible, by raising the chin above the bar and returning to the start position. No swinging of the body was perm itted. The test was scored as the num ber ( n ) of com plete pull-ups achieved without rest. 3.4.6. Aerobic ® tness: M axim al aerobic power was estimated from perform ance of the M ultistage Fitness Test as described by Ram sbottom et al. ( 1988 ). The test com prises a progressive shuttle run between marker cones placed 20 m apart in a gym nasium. Speed of running was determined by a bleep on a cassette tape which increased every 1 min. Participants continued until they could no longer sustain the prescribed pace and failed to reach two consecutive markers in the allocated time. The time to failure ( seconds ) was recorded as the score. 3.4.7. Performanc e indices: test results: · · · Indices were derived from the physical performance calculation of ratios of girths ( e.g. chest to waist and waist to gluteal girth ratio ); estimates of percentage body fat and FFM by the three m ethods described; and calculation of mean lift power ( W ) from the hydro-dynam ometer test between diŒerent vertical heights ( e.g. between the handle heights of 0.7 and 1.0, 0.4 and 1.45, and 0.4 and 1.70 m ); normalization of some strength / power scores to body m ass where it was physiologically appropriate ( e.g. hydrodynam ic lift power, plantar ¯ exion strength, etc. divided by body m ass; . conversion of the M ultistage Fitness Test scores to estimated VO 2 m a x in Ð Ð Ð Ð Ð 1 1 0.67 1 1 ml.kg .min , ml.kg .min and l.min ; calculation of work indices for pull-ups, sit-ups and push-ups by multiplication of raw scores ( number of repetitions ) by body mass. 3.5. Procedures Ethics approval for the study was provided by the Defence Research Agency’ s Centre for Human Science Ethics Committee. M edical cover was provided during the study by the Garrison M edical Centre. All participants received a detailed brie® ng and gave informed consent prior to participation. Participants were tested over 4 ± 7 days, participating in some of the criterion tasks and all of the physical perform ance tests. The physical performance tests were administered in a gym nasium, in three batteries, each with ® ve stations, and the M ultistage Fitness Test was conducted subsequently. The batteries are shown in table 3. Participan ts were divided into groups of 20 with four participants allocated to each of the ® ve stations. W here possible, physically demanding tests were interspersed with anthropometric measurements to avoid fatigue. Participants began at diŒerent test stations, thereby randomizing any eŒects of cum ulative fatigue on any particular physical perform ance test. 3.6. Statistical analys is For all statistical tests, signi® cance was set at p < 0.05. A product moment correlation matrix was calculated on the physical performan ce tests and factor scores derived using principal components and orthogonal varim ax rotation ( Harris 85 Physical selection procedures for the British Army Table 3. Stations 1 2 3 4 5 Batteries of physical performance tests. Battery 1 Anthropometry Upright pulls and back extension Plantar ¯ exion Static arm ¯ exion endurance Push-ups Battery 2 Skinfolds Static arm ¯ exion strength Hydro-dynamic lift Shoulder endurance Pull-ups Battery 3 Electrical impedance ´ 2 Hand grip Incremental lift machine Dynamic arm ¯ exion endurance Sit-ups 1975 ). The factor analysis was used to identify interdependencies between the performance test variab les and to reduce the set of variab les to a smaller number of independent factors. The factors were nam ed and meaning was assigned to the factors. For those measures where the varia nce increased with the expected value ( heteroscedastic ), power transformations were investigated as a means of stabilizing the varian ce and it was concluded that the natural logarithms ( ln ) of some measures ( carry, dynam ic and static arm ¯ exion endurance, hand grip, hydro-dynam ic lift power ) should be used. The relationships between performance on the criterion tasks and the physical performance tests were calculated using Pearson product moment correlation and stepwise multiple linear regression. W here all of the criterion task scores were maxim al, the standard least squares procedure was used. W here maxim al criterion task scores could not be assured for a minority of individuals who achieved the limit imposed by the protocol, a maxim um likelihood procedure was used to take account of the modi® ed distributional form. All of the relationships were estimated for men and women separately, and the hypothesis that the relationships were the sam e was tested in the standard manner by breaking the hypothesis down into successive tests of parallelism ( equal slopes ) and identity ( coincident intercepts given equal slopes ). If there was a signi® cant relationship between dependent and independent measures determined by the standard F-test and a common relationship for the two genders could not be rejected, a single `gender-free’ relationship was derived. If the hypothesis of parallelism of the relationships for the two genders was not rejected, but that of identity was, then a `gender-related’ relationship was de® ned by including gender explicitly in the model. If the hypotheses of both parallelism and identity for the two genders were rejected, then `gender-speci® c’ relationships were de® ned, with separate models for men and wom en. All multiple regression equations, referred to henceforth as `models’ , were lim ited to a maxim um of three test variab les so as to restrict the most promising physical performance tests ( for nine criterion tasks in total ) to a practicable number. Gender was allowed access as an additional variab le, coded as `1’ for men and `2’ for wom en. The relationships between each criterion task and the most predictive physical performance tests were explored by plotting three graphs. The ® rst showed the relationship between the single most highly correlated physical performance test and the criterion task. The second showed the relationship between the measured and predicted criterion task scores, illustrating the degree of ® t. The third showed the diŒerence between the measured and predicted criterion task scores (i.e. the bias ) 86 M . Rayson et al. again st their mean, according to the method proposed by Bland and Altman (1986 ). This third ® gu re exposed any lack of agreem ent between the measured and predicted scores, which may not have been readily apparent from the second ® gure. The Bland and Altman plot also allowed an investigation of the relationship between the error in the predicted score and the measured value. Type I errors ( false-negative s ) and type II errors ( false-positives ) identify the misclassi® cation associated with a model. Type I errors occur when an individual passes the level or standard on the actual criterion task ( e.g. a 35 kg single lift to 1.45 m, or a 12.8 km march with 20 kg in 120 m in ) but the predicted score derived from the m odel fails to achieve the standard. Type II errors occur when the predicted criterion task score derived from the model achieves the standard but the individual fails to achieve the standard on the actual criterion task. The combined type I and type II errors associated with the selected models are cited in the Results. The selection of a particular model for each of the nine criterion tasks was based on the following criteria, listed in order of importance: minimizing the standard deviation ( SD ) ( i.e. the variation associated with the m odel ); maxim izing r 2 ( i.e. the proportion of varian ce accounted for by the independent variab les ); minimizing the mean error ( i.e. the mean diŒerence between the measured and predicted criterion task score ); and minimizing misclassi® cation rates ( i.e. the sum of type I and type II errors ). 4. Results 4.1. Criterion tasks The results for the criterion tasks are sum marized in table 4. Values of 44 and 16.7% of the men achieved the maxim um permissible load of 72 kg on the SL145 and SL170 respectively, distorting the distributions and any descriptive statistics associated with the men’ s data. The RLC44 task was performed by men only as no women belonged to those trades allocated to the RLC44, and 17% achieved the maxim um time stipulated in the protocol. On the RLC 22, 99% of the men and 12% of the women, and on the RLC 10, 98% of men and 48% of women achieved the maxim um time, thereby limiting the usefulness of the data for developing models. Table 4. Mean ( SD ) results for the criterion tasks. SL145 and SL170 are the maximum load (kg ) lifted to 1.45 and 1.70 m respectively. RLC44, RLC22 and RLC10 are the maximum duration ( s ) sustained on repetitive lift and carry tasks, with 44, 22 and 10 kg respectively. Criterion task ( units ) SL145 ( kg ) SL170 ( kg ) Carry (s ) RLC44 (s ) RLC22 (s ) RLC10 (s ) LM 25 ( min ) LM 20 ( min ) LM 15 ( min ) Male mean ( SD ) 65.7 57.1 288 563 3578 3574 103 102 98 (7.00 )* (10.26 ) ( 107.3 ) ( 448.6 ) ( 226.8 )* (189 )* ( 10.6 ) ( 11.1 ) ( 12.4 ) Female mean (SD ) 36.3 (9.03 ) 29.0 (6.76 ) 117 (41.0 ) n/ a 1048 (1118 ) 2311 (1340 ) n/ a 126 (11.0 ) 120 (15.6 ) LM 25, LM 20 and LM 15 are the time (min ) to com plete the loaded march tasks with 25, 20 and 15 kg loads respectively. *Distributions were distorted for these data, caused by a sizeable proportion of the sample achieving the maximum duration of the protocol and, therefore, not achieving maximum performance. RLC44 and LM 25 tasks were not performed by female soldiers. 87 Physical selection procedures for the British Army 4.2. Factor analys is Table 5 provides the output from the factor analysis. Several of the performance tests had to be excluded from the analys is (gluteal girth, % body fat and FFM derived from the ElectroLipoGraph device, plantar ¯ exion strength ) to retain a valid sam ple size ( n = 272 ). r > 0.5 are highlighted to indicate the performance tests with the strongest associations with these factors. Four factors ( Small Size, Overweight, M uscular Strength and Endurance, and Aerobic Fitness ) were identi® ed which accounted for 79% of the varian ce in the test scores. Inspection of the correlation coe cients between the physical performan ce tests and Factor 1 revealed that the most highly correlated variab les concerned predominantly measures of body size. The measures of stature and arm span had Table 5. Factor num ber Factor name Cum % of variance Stature Arm span Body mass Body mass index Biacromial diameter Elbow diameter Neck girth Chest girth (at nipple height ) Chest girth (below breasts ) Waist girth ( at umbilicus ) Waist girth ( at narrowest ) Skinfolds (sum ) % body fat (skinfolds ) % body fat (Bodystat ) FFM (skinfolds ) FFM (Bodystat ) 38 cm upright pull 85 cm upright pull Back extension strength Static arm ¯ exion strength Hand grip strength Hydro-dynamic lift power ( 0.4 ± 1.0 m ) Hydro-dynamic lift power ( 0.7 ± 1.45 m ) Hydro-dynamic lift power ( 0.71 ± 1.70 m ) ILM 145 ILM 170 Static arm ¯ exion endurance Dynamic arm ¯ exion endurance Dynamic shoulder endurance Sit-ups Push-ups Pull-ups Multistage Fitness Test . 1 VO 2m ax (l.min ) Ð Factor analysis. 1 2 3 Small size Overweight Muscular strength / endurance 34.3 51.8 70.9 4 Aerobic ® tness 0.122 0.096 0.647 0.921 0.046 0.085 0.281 0.734 0.509 0.835 0.728 0.849 0.696 0.677 0.209 0.273 0.110 0.045 0.240 0.067 0.112 0.096 0.107 0.096 0.164 0.181 0.117 0.270 0.178 0.163 0.222 0.467 0.380 0.226 0.065 0.138 0.011 0.024 0.135 0.087 0.228 0.060 0.202 0.095 0.084 0.288 0.384 0.304 0.218 0.147 0.187 0.119 0.148 0.177 0.045 0.056 0.054 0.057 0.191 0.195 0.029 0.378 0.325 0.663 0.373 0.318 0.738 0.483 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 0.929 0.912 0.706 0.161 0.820 0.711 0.668 0.478 0.653 0.431 0.523 0.130 0.375 0.441 0.843 0.846 0.599 0.763 0.580 0.470 0.546 0.681 0.613 0.639 0.600 0.641 0.378 0.388 0.515 0.134 0.048 0.134 0.390 0.754 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð 0.062 0.084 0.217 0.222 0.190 0.258 0.432 0.252 0.382 0.092 0.273 0.177 0.329 0.357 0.373 0.363 0.596 0.375 0.516 0.641 0.515 0.586 0.667 0.646 0.596 0.587 0.388 0.577 0.672 0.153 0.722 0.637 0.240 0.308 Ð Ð Ð Ð Ð 79.4 88 M . Rayson et al. the strongest ( negative ) correlation with this factor. The negative correlation coe cients indicate that the factor concerned `Small Size’ . Other tests of body size ( e.g. elbow and bi-acromial width ), FFM , muscle strength ( e.g. 85 cm upright pull ) . Ð 1 and aerobic power ( VO 2 m ax in l.min ) were also highly negatively correlated with the factor. Factor 1 accounted for 34% of the varian ce in the data. Factor 2 accounted for a further 18% of the varian ce and included variab les measuring aspects of `Overw eight’ Ð body mass index, skinfolds and body fat, body mass and body girths. M easures of muscular endurance ( e.g. pull-ups and push-ups ) and aerobic ® tness ( e.g. M ultistage Fitness Test ) were weakly negative ly correlated with this factor. Factor 3 accounted for a further 19% of varian ce, incorporating several muscular endurance measures ( e.g. push-ups, dynam ic shoulder endurance ) and strength measures (e.g. static arm ¯ exion, lift power )Ð many of which also had high negative loadings with factor 1 ( `Small Size’ ). Factor 3 was nam ed `M uscular Strength an d Endurance’ . Factor 4 accounted for the last 8% of the explained varian ce and included only two tests with a r > 0.5Ð maxim al aerobic power and sit-ups. This factor was nam ed `Aerobic Fitness’ . 4.3. Relationships between performance on the criterion tasks and the physical performance tests The relationships between performance on the criterion tasks and the physical perform ance tests were investigated using the techniques described in Section 3.6. One exam ple is provided for the Single Lift to 1.45 m. Table 6 summ arizes the physical performan ce tests that were most highly correlated with SL145 (p < 0.001 in all cases ). The relationship between SL145 and FFM Ð the most highly correlated physical perform ance testÐ is shown in ® gure 1. The best SL145 model included back extension strength, FFM , ILM 145 divided 2 by body mass, and gender, producing a r = 0.88 and SD = 6.93 kg (table 7 ). The model is displayed in ® gures 2 and 3. Figure 2 shows the relationship between the directly measured SL145 scores and the predicted scores derived from the SL145 model. A diŒerence in the slope of the men’ s and wom en’ s regression lines which is clearly discernible in ® gure 2 necessitated the inclusion of the variab le `gender’ . The cluster of data points at the measured load lifted to 1.45 m of 72 kg re¯ ects the fact that 47% of males reached the maxim um permissible load. Figure 3 shows the degree of agreem ent between measured and predicted SL145 scores according to the methods of Bland and Altman (1986 ). The mean diŒerence Table 6. Simple correlations for a single lift to 1.45 m. Pooled (n = 234 ) Rank 1 2 3 4 5 Men ( n = 181 ) Women (n = 53 ) Test r Test r FFM ( skinfolds ) ( Bodystat ) FFM . 1 VO 2m ax ( l.min ) FFM ( ELG ) ILM 170 0.859 0.843 0.825 0.807 0.806 FFM (Bodystat ) (skinfolds ) FFM . 1 VO 2m ax ( l.min ) ILM 170 Body mass 0.638 0.623 0.609 0.595 0.588 Ð Ð Test FFM ( Bodystat ) ILM 145 ILM 170 FFM ( skinfolds ) FFM ( ELG ) r 0.618 0.585 0.573 0.563 0.546 Physical selection procedures for the British Army Figure 1. 89 Relationship between SL145 and FFM . between measured an d predicted scores was Ð 2.7 kg and the SE of the diŒerence was 0.38 kg. The limits of agreement were + 9.8 to Ð 15.3 kg. Among the wom en, the model tended to over-predict the lower scores and under-predict the higher scores. Type I and type II errors expressed as a percentage are shown in table 8. For SL145, combined type I and type II errors totalled 0 and 22.2% for the men and women respectively at the level 2 standard ( 35 kg ), and 0 and 4.8% for the men and women respectively at the level 3 standard ( 20 kg ). The preceding series of statistical procedures relating criterion task and physical performance test scores were repeated for each of the nine criterion tasks and the preferred models are shown in table 7 and the percentage of type I and type II errors in table 8. Not all of the criterion tasks have models for pooled, male and female sam ples due to the absence of women from certain trades and due to the proportion of participants who ach ieved the maxim um duration of some criterion tasks, as stated in Section 4.1. A natural logarithmic transform ation ( ln ) of the carry data was performed to remove the heteroscedastic variation. A summary of those physical performance tests that appear most frequently as the best predictors of criterion task performance is provided in table 9. Among the measures of anthropometry, stature (C, LM 20 ), arm span ( SL170, C ), bi-acromial width ( SL170, RL22, LM 25 ) and body mass ( SL145, SL170 ) were the best predictors in 2 or more criterion tasks. The body composition measures of percentage body fat ( SL145, SL170, LM 20 ) and FFM ( SL145, SL170, C, LM 15 ) both featured frequently. The static strength tests of 38 cm upright pull ( SL170, C, RL22, RL10 ), and 85 cm upright pull ( C, RL22, LM 20 ), back extension ( SL145, SL170, RL10 ), and hand grip ( C, RL22 ) were am ong the best predictors. Both dynam ic lift tests of hydro-dyn am ic lift power (SL170, RL44, LM 20 ) and the ILM ( SL145, SL170, C, RL44, RL10, LM 20, LM 15 ) and the muscular endurance measures of static arm ¯ exion ( LM 25, LM 15 ), dynam ic arm ¯ exion ( C, RL44, RL22, RL10, LM 25 ), sit-ups ( RL44, LM 15 ) and pull-ups ( C, LM 20 ) were am ong the best 90 M . Rayson et al. Table 7. Task and sample Models for predicting performance on the criterion tasks. Model p r 2 SD n SL145 pooled + 0.017 * back extension + 0.999 * FFM (skinfolds ) + 6.706 * ILM 145 /body mass Ð 6.013 * gender Ð 13.2 < 0.001 < 0.001 < 0.001 < 0.01 0.88 6.93 271 SL170 male + 0.011 * 38 cm upright pull + 0.829 * FFM (Bodystat ) + 0.014 * back extension strength Ð 22.5 < 0.001 < 0.001 < 0.01 0.59 7.59 222 SL170 female + 0.930 * FFM (skinfolds ) + 5.817 * ILM 145 /body mass Ð 19.1 < 0.001 < 0.05 0.40 4.85 63 1n C pooled + 0.022 * pull-ups + 0.022 * arm span + 0.019 * ln dynam ic arm ¯ exion endurance Ð 0.174 * gender + 0.35 < 0.001 < 0.001 < 0.001 < 0.05 0.70 0.30 232 RLC44 male + 1.527 * lift power ( 0.4 ± 1.0 m ) Ð 606.689 * ILM 170 /body mass + 0.027 * sit-ups ´ body mass + 406.0 < 0.001 < 0.05 < 0.05 0.55 321 29 RLC22 female + 16.51 * dynam ic arm ¯ exion endurance + 3.284 * hand grip strength Ð 1440.1 < 0.001 < 0.05 0.45 627 53 RLC10 female + 2.608 * 38 cm upright pull Ð 801.1 . 1 Ð 19.765 * VO 2m ax (l.min ) + 0.530 * body mass Ð 0.052 * static arm ¯ exion endurance + 142.7 < 0.001 0.38 565 25 < 0.001 < 0.001 < 0.01 0.40 8.46 94 < 0.001 < 0.001 0.55 9.48 100 < 0.001 < 0.001 < 0.05 0.75 8.96 82 LM 25 male Ð LM 20 pooled Ð 0.072 * Multistage Fitness Test + 14.134 * gender + 132.7 LM 15 pooled 0.108 * Multistage Fitness Test 11.661 * ln static arm ¯ exion endurance Ð 0.534 * % body fat ( skinfolds ) + 233.4 Ð Ð The models to predict criterion task performance and their associated statistics are shown. Column 1 lists the criterion tasks and the sample for which the model was valid ( pooled, male or female ). Column 2 lists the equation derived from the stepwise regression procedure. Column 3 describes p associated with each variable in the equation. Columns 4± 6 provide the correlation coe cient squared, SD and num ber of cases used in the equation respectively. Physical selection procedures for the British Army 91 predictors in two or more criterion tasks. Indices of aerobic power derived from the M ultistage Fitness Test ( SL145, SL170, C, LM 25, LM 20, LM 15 ) were also am ong the best predictors in many of the criterion tasks. Figure 2. Relationship between measured and predicted SL145 scores. The relationship is shown between the directly measured SL145 scores (maximum load lifted to 1.45 m, up to 72 kg ) and the predicted scores derived from the SL145 model displayed in table 6 ( SL145 = 0.017 * back extension strength + 0.999 * FFM (skinfolds ) + 6.706 * ILM 145 / body massÐ 6.013 * genderÐ 13.2 ). Figure 3. Degree of agreement between measured and predicted SL145 scores. The mean of the measured and predicted criterion task scores (x-axis ) are shown against the diŒerence of the scores (y-axis ) (Bland and Altman 1986 ), exposing the degree of agreement between the measured and predicted scores, and the relationship between prediction error and measured value. The mean diŒerence between measured and predicted scores was Ð 2.7 kg; the SE of the diŒerence was 0.38 kg. The limits of agreement were + 9.8 to Ð 15.3 kg. Among the women, the model tended to over-predict the lower scores and under-predict the higher scores. 92 M . Rayson et al. Table 8. Percentage of type I and type II errors associated with each model. Type I errors (% ) Type II errors ( % ) Task Sample Level Pooled Men Women Pooled Men Women SL145 Pooled 2.2 0 3.0 1.1 0 0 5.0 12.7 4.8 Men Women Pooled 0 0 2.7 9.5 0 SL170 SL170 C 4.2 0 0 1.3 2.9 0 Men Women Women Men Pooled Pooled 0 0 0 13.8 0 1.9 0 0 0 1.6 16.7 0 0 RLC44 RLC22 RLC10 LM 25 LM 20 LM 15 2 3 1 1 1 2 3 1 2 3 1 2 3 8.9 4.3 3.2 5.1 29.4 11.4 3.0 3.7 1.1 0 0 3.4 0 0 5.3 2.4 2.1 3.2 3.0 5.0 4.9 0 1.7 11.7 0 8.9 10.9 5.9 5.7 Provided is the percentage of type I errors (false-negatives ) and type II errors (falsepositives ) by gender and level, indicating the proportion of people who were misclassi® ed when criterion task performance was predicted using the models shown in table 7. In terms of the frequency with which the physical performance tests were am ong the best predictors of criterion task performance, the M ultistage Fitness Test and the ILM 145 were am ong the m ost related tests in six of the nine criterion tasks, dynam ic arm ¯ exion endurance in ® ve, FFM and 38 cm upright pull in four, bi-acromial width, body mass, 85 cm upright pull, ILM 170 and back extension strength in three. The remaining physical performance tests were am ong the best predictors of criterion task perform ance in two or less of the criterion tasks. U sing the m ethods and criteria de® ned in Section 3.6, pooled gender m odels were formulated for SL145, C, LM 20 and LM 15. Only the LM 15 m odel was gender-free. The SL145, C and LM 20 models were gender-related in that they contained gender explicitly in the model. M odels for the remaining criterion tasks were gen der-speci® c, either because the development of a gender-free or gender-related model proved elusive ( e.g. SL170 ), or because usable data for both males and females on the criterion task were not availab le ( e.g. RL44, RL22, RL10, LM 25 ). 5. Discussion 5.1. Introduction The objective was to identify a battery of physical performance tests that could be best used to predict performance on the nine identi® ed criterion tasks in trained soldiers. The models should predict performance as accurately as possible, and ideally be `gender-free’ ( to provide common tests an d standards for men and wom en ) and be `gender-unbiased’ in their outcome ( not to misclassify disproportionately either gender ). The implication underpinning the validity of gender-free models is that the relationship between physical performance test and criterion task performance is fundam entally the sam e in both genders, varyin g in magnitude rather than in quality. The underlying physiological assumption is that men and women perform physical tasks in a compara ble fashion, utilizing the sam e energy systems, muscle groups and 93 Physical selection procedures for the British Army ranges of movem entÐ task performance could largely be explained by the sam e physical attributes and membership of one or other gender would, therefore, be immaterial. This preference for gender-free predictors resulted in the stepwise regression analysis being run on pooled gender data in the ® rst instance. The validity of the derived models was then assessed by testing for parallelism and identity. In selecting Table 9. Test /task Sum mary of simple and multiple correlations between physical performance test and criterion task scores. SL145 SL170 C Stature W Arm span W PM W3 Body mass M3 M Biacromial width W % body fat ( skinfolds ) % body fat ( Bodystat ) % Body fat (ELG ) PW FFM (skinfolds ) PM W3 PM W3 P FFM (Bodystat ) PM W PM W3 FFM (ELG ) PM 38 cm upright pull 3 MW 85 cm upright pull PM Back extension 3 2 strength Hand grip strength PM Plantar ¯ exion strength Lift power ( 0.4± 1.0 m ) Lift power PM ( 0.7± 1.45 m ) Lift power ( 0.7± 1.70 m ) ILM 145 W3 W1 W ILM 170 PM W Static arm ¯ exion endurance Dynamic arm 3 ¯ exion endurance Dynamic shoulder endurance Sit-ups Push-ups Pull-ups 3 Multistage Fitness PM P PM Test RLC44 RLC22 RLC10 LM 25 LM 20 LM 15 F 3 M W P P 1 W W 3 PM W 1 M M3 M PM W M W 1 M2 M W3 W W PM 3 M M 1 MW PM M3 W PMW 3 PM W3 The relations are summarized between the criterion task and physical performance test performance. P (pooled ), M (men ) and W ( women ) appear if the physical performance test score produced one of the ® ve highest correlation coe cients. 1 (p < 0.05 ), 2 (p < 0.01 ) and 3 (p < 0.001 ) refer to the highest level of signi® cance achieved by a physical performance test in any model (e.g. the code PM W3 indicates the strongest relation between the criterion task and physical performance test scores ). 94 M . Rayson et al. a speci® c m odel to predict each criterion task, the four criteria described in Section 3.6 were considered. Inevitably, comprom ises had to be reached to optimize the ful® lm ent of all criteria. For exam ple, in selecting a preferred model, subjective 2 judgem ent was used to determine if a model with a given SD and r was superior to 2 an alternative model with a marginally larger SD but substantially larger r . 5.2. Sam ple The participant sam ple suŒered from several limitations. The 379 participan ts nominated by the British Army re¯ ected a comprom ise between the representative sam ple of men and women from all trades in the Army requested by the authors and the availab ility of personnel who could be released from their duties for up to 7 days. W omen were under-represented in the study in both overall numbers and in speci® c trades that were required to perform certain criterion tasks (e.g. RLC 44, LM 25 ). The com mitment exhibited in some participants in providing maxim um safe eŒort on all of the criterion tasks and performance tests was questionable at times. Further, data were lost due to injury and absence from testing appointments that reduced the data pool and weakened the statistical analyses. In short, although the extent to which the ® nal sam ple was representative of other trained personnel was not known, a validation study on a diŒerent population of potential users ( recruits ) which is described in a separate study, will resolve these uncertainties and unravel any biases. 5.3. Predictors of single lift tasks 5.3.1. Single predictive tests: The highest correlation coe cients between the physical performance test and the lift task scores in this study (r = 0.86 ± 0.87 ) were larger than many reported elsewhere ( Sharp et al. 1980, Stevenson et al. 1985, 1988 ), though not as large as those reported by some authors ( Beckett and Hodgdon 1987, Nottrodt and Celentano 1987 ). The ® nding that the coe cients were high er for men than for wom en, particularly for SL170 was consistent with previous studies ( e.g. M yers et al. 1984, Teves et al. 1985 ). The strength of the correlation analys es would have been weakened by two factors in this study. First, the imposition of a 72 kg m axim um load prevented discrimination between men at the top end of the distribution and led to the authors using a maxim um likelihood procedure to estimate scores in these men. An alternative optionÐ to discard these 44% of males from the data analysisÐ was explored and judged to be less attractive. The planned validation study will verify the validity of these models and hence the appropriateness of using this statistical method in this circumstance. Second, the use of relatively large load increments reduced the sensitivity of the data, especially at the lower end of the distribution am ong the women. Both of these weaknesses will be overcome by modi® cation of the protocols for the subsequent validation study. FFM , an index of whole-body m uscularity, was the single most important predictor test of perform ance on both single lift tasks, irrespective of the sam pling method (pooled, men or wom en ). From the studies in the literature which considered both strength and body composition measurem ents, this ® nding is consistent with only one (N ottrodt and Celentano 1987 ), though a second study ( Teves et al. 1985 ) showed both FFM and ILM test scores to be equally well correlated with maxim um load lifted. Other studies have shown the ILM test to be a better predictor than FFM of maxim um box lifting (Ayoub et al. 1982, M yers et al. 1984, Beckett and Hodgdon 1987 ). Physical selection procedures for the British Army 95 Our study incorporated several strategies that should have enhanced the predictive power of the ILM test above that observed in previous studies. This study was the ® rst to match the hand heights of lift during the ILM tests and the lift criterion tasks. Constrained ILM lifting techniques had been the norm in previous studies, but because of the adverse impact of such constraints on wom en ( Stevenson et al. 1996 ) freestyle techniques were permitted in this study. Further, the recomm endations of Stevenson et al. ( 1996 ) to coach participants in power lifting techniques and to adm inister lighter starting loads and smaller increments for women, were also adopted. However, despite these modi® cations, FFM remained the stronger predictor. This ® nding m ight be explained by the fact that FFM , as a measurement requiring no skill on the part of the participant, provided a more transferable indicator of lifting performance than do the skill-dependent dynam ic lifting tests. The concept of a `generic’ lift task m ay be a misnomer and the extent to which skill may be transferred between a super® cially sim ilar lift task and a dynam ic lift test may be sm aller than anticipated. W hatever the underlying mechanism, these ® ndings strongly support the contention of Vogel ( 1992 ) that minimum FFM standards have an important place in selecting military personnel. Although FFM was the best predictor of single lift score in both men and women, the next best predictor was diŒerent between genders. For SL145, the ILM test the best ® ve predictors for both m en and women. In men, estimated . was am ong Ð 1 VO 2 m ax ( l.min ) and body m ass were also important. For SL170, body mass was an important predictor in men, while ILM 145 and measures of skeletal size, most notably bi-acromial width and arm span, were important in women. The greater importance of anthropometric tests to SL170 might be expected given that the task involved lifting to a greater height. 5.3.2. Single lift to 1.45 m mode l: A series of m odels aŒording an acceptable ® t between performance at the physical performance tests and SL145 were produced. The best model (table 7 ) included tests of muscle strength and body size and compositionÐ nam ely, back extension strength, FFM , ILM 145 / body mass and genderÐ which accounted for 88% of the varia nce. FFM and ILM 145 were the most signi® cant contributors to the m odel. This ® nding was in line with the work of Nottrodt and Celentano ( 1987 ) who recomm ended a model comprising FFM and ILM 183 to predict maxim al load lifted to 1.33 m. A diŒerence in the slope of the men’ s and women’ s regression lines necessitated the inclusion of the variab le `gender’ in the equation and made a gender-free model invalid. The SL145 model resulted in a small mean error of prediction ( Ð 2.7 kg ) with a 95% CI = 13 kg, equating to 21% of the mean SL145 score. This m odel compares favourably with other lift models in the literature ( see Section 1.3 ). Only the model devised by N ottrodt and Celentano ( 1987 ) accounted for a higher proportion of the varia nce. Further, proposed modi® cations to the single lift task protocols should improve the ® t and reduce the varian ce of the model. The relatively small mean misclassi® cation rates of 5% for soldiers allocated to level 2, and 1% for soldiers allocated to level 3 masked considerable variation between gen ders. M isclassi® cation rates for the men were zero at both levels ( all men substantially surpassed both standards ), while for the wom en, misclassi® cation rates were 22% at level 2 and 5% at level 3. The greater misclassi® cation in women was due partly to the larger error in prediction in wom en’ s scores and partly to a greater 96 M . Rayson et al. clustering of women’ s scores around the pass standard. The model was, therefore, gender-biased in its outcome ( i.e. more women were misclassi® ed than men ), resulting in indirect discrimination again st women. 5.3.3. Single lift to 1.70 m model: DiŒerences in the slopes and intercepts of the lines resulted in the formulation of gender-speci® c models for SL170. The best men’ s model ( table 7 ) included tests of muscle strength an d body compositionÐ 38 cm upright pull, back strength and FFM Ð accounting for 59% of the varian ce. The mean error of the model was eŒectively zero and the 95% CI = ~ 14 kg, equating to 24% of the mean m en’ s score. M isclassi® cation rates for the men allocated to level 1 totalled 8% . The best women’ s model ( table 7 ) also included tests of muscle strength and body size and compositionÐ FFM and ILM 145 /body massÐ accounting for 40% of the varian ce. The m ean error of the model was also zero and the 95% CI = ~ 10 kg, equating to 33% of the mean women’ s score. M isclassi® cation rates for the wom en were very low at 1.6% , but the reason for this small misclassi® cation was that only one woman achieved the 44 kg standard and most scored substantially below. 5.3.4. Static versus dynam ic strength tests: In this study, the dynam ic strength tests were superior to the static tests as predictors of maxim um lifting performance ( M ital et al. 1986b ). The ILM test was a better single predictor of SL145, and the hydrodynam ic lift power and ILM scores were better single predictors of SL170, than were the static lift tests ( 38 and 85 cm upright pull ), irrespective of the method of sam pling. The test of static upright pull did, however, feature prom inently in the SL170 model for men, and the static tests were considerably simpler and quicker to administer. 5.4. Predictors of carry task 5.4.1. Single predictive tests: The repeatability of the carry task was found to be unacceptable ( Rayson and H ollim an 1995 ) when the reliability of the criterion tasks was veri® ed by administering them to a subsam ple of soldiers. A mean score on the second test occasion was 34 s ( 17% ) lower than on the ® rst attempt (p < 0.001 ), and this may partly account for the lower correlation coe cients produced between physical perform ance test scores and the carry task ( r = 0.7 ). The reasons for the poor reliability are uncertain. W hen a work-load is su ciently light to be maintained for long periods, the factors which lead to the decision to term inate the test vary betw een individuals. A high degree of motivatio n is key to ensuring maxim al voluntary performance and this can never be assured, even in volunteer participants. The fatigue mechanism s are not well understood, though there is evidence of both centrally and peripherally mediated fatigue ( Jones and Round 1992 ). Tests involving higher loads that can be sustained only for short periods of tim e are more consistent and probably more accurately re¯ ect true peripheral limitation rather than central factors ( Evan s et al. 1983 ). This may also explain the better predictive power of hand grip strength over hand grip endurance in predicting the carry task in the pilot study (Rayson et al. 1994b ). The poor reliability of the carry task failed to instil con® dence in the robustness of any predictive model which could be derived and raised a question as to whether the carry should be elim inated or modi® ed as a criterion task. This type of continuous, prolonged carry ( e.g. stretcher carrying, water and fuel can carrying ) was reported as a common activity am ong soldiers of many trades ( Rayson 1998 ), so eliminating it as Physical selection procedures for the British Army 97 a criterion task was not a desirable option. Consequently, despite the poor reliability, the relationships with the physical performance tests were investigated to gain an insight into the best predictors of carry performance. The physical performance test that best predicted the carry task varied by gender. Higher coe cients am ong women than men have been reported for carry tasks ( Stevenson et al. 1988, Rice and Sharp 1994 ): this ® nding was substantiated in this study. For men, a measure of m uscle strength (85 cm upright pull ) was the strongest predictor, followed by a measure of body size ( arm span ) and a further measure of strength ( hand grip ). For wom en, there was little to diŒerentiate between the predictive power of measures of strength ( mean hydro-dynam ic lift power and 38 cm upright pull ) an d body size ( arm span ). No directly comparable carry tasks were found in the literature, largely because the majority of carry tasks in previous studies involved intermittent rather than continuous work and in some respects were more akin to our RLC task than the carry (e.g. Ayoub et al. 1982, Beckett and Hodgdon 1987 ). A few tasks were broadly similar ( Stevenson et al. 1985, 1988, 1992, 1994, Rice and Sharp 1994 ) but the ingredients of the test batteries were sometimes very restricted, limiting any conclusions that may be drawn. Tests of upright pull have been extensively cited as strong predictors of carry tasks and tests of lift strength / power have also been reported as important predictors in some studies, though not in others ( see Section 1.4 ). The importance of hand grip strength and carry task performance has considerable face validity and has been reported extensively. The m ost acute problem for participants appeared to be maintaining a grip on the cans, which is associated with local muscular fatigue of the hand and wrist ¯ exors (Lind and M cNicol 1968, Kearney and Stull 1981 ). Legg and Patton ( 1987 ) had found that 8 days of sustained m anual work and sleep deprivation resulted in a reduction in hand grip strength in soldiers, though the inclusion of heavy material handling did not appear to im pact on the reduction in hand grip performan ce. Time to fatigue is dependent upon the proportion of m axim um tension exerted by the muscleÐ the relationship being hyperbolic (Evans et al. 1983 ). Fatigue is induced in sustained contractions > ~ 15% maxim um voluntary contraction ( M VC ), after which the blood supply to the muscle is occluded. The 20 kg load in each hand during this carry task represented > 15% M VC for all participants. Despite the importance of hand grip strength as a predictor in the pooled and men’ s sam ple, the test score was noticeably absent as an important predictor for the women. The reason for this absence of relation is uncertain but may concern the small range of M VC in women. Indices of aerobic power have also been cited as contributing to carry performance in a number of studies ( see Section 1.4 ), though the controlled pace Ð 1 of walk ing at 1.5 m.s in this study would have . reduced the im portance of aerobic Ð 1 ® tness to performance. Interestingly, estimated VO 2 m ax ( l.min ) was an important predictor in both the pooled and men’ s sam ple. H owever, the selection of an index of Ð 1 aerobic ® tness recorded in l.min , rather than a body mass related index ( e.g. Ð Ð 1 1 ml.kg min ), sugge sts that the relationship in men may be more a function of muscle mass, than of aerobic ® tness ( M yers et al. 1984 ). However, other studies have found FFM to be unrelated to carry performance ( Stevenson et al. 1985, Beckett and Hodgd on 1987 ). 5.4.2. Carry model: To eliminate the heteroscedastic varia tion in the carry data, a natural logarithmic ( ln ) transformation was conducted. The best model ( table 7 ) 98 M . Rayson et al. included tests of muscle endurance and body sizeÐ pull-ups, arm span and dynam ic arm ¯ exion enduranceÐ accounting for 70% of the variation. DiŒerences in the slopes for m en and women necessitated the inclusion of `gender’ in the model. The two most signi® cant physical performance tests were dynam ic arm ¯ exion endurance and arm span. Neither measure had been considered in previous studies, though related tests (e.g. stature and pull-ups / ¯ exed arm hang ) had been included and several studies had reported their relevance ( Stevenson et al. 1985, 1988, Beckett and Hodgdon 1987 ). The mean error of the model was zero but the 95% CI = 0.57 ( ln seconds ), equating to ~ 57% of the mean carry score. Again, mean misclassi® cation rates of < 5% for the three levels masked som e gender diŒerences. M isclassi® cation rates were higher for wom en than for men allocated to level 1 ( 15 versus 1% ) and level 2 ( 12 versus 0% ), resulting in gender bias and indirect discrimination again st women. The proportion of varian ce accounted for by the best model ( 70% ) failed to match the model derived in the pilot study ( 82% , Rayson et al. 1994b ), but it was com parable or superior to the pooled gender models produced by M yers et al. ( 1984 ) and Beckett and Hodgdon ( 1987 ). 5.5. Predictors of repetitive lift tasks The limitations of the repetitive lift data ( see Section 4.1 ) made the analyses and interpretation of the data di cult. The absence of female data for the RLC44 and the ® nding that virtually all of the males achieved the maxim um duration on the RLC22 and RLC10 resulted in the analyses being run on single gender data, thereby denying the possibility of developing gender-free or gender-related m odels. 5.5.1. Repetitive lift with 44 kg: The most predictive single physical performance tests in the exclusively male sam ple involved a power to body m ass indexÐ mean lift power from 0.7 to 1.0 m divided by body mass. The only other single predictive test worthy of mention was a test of muscle enduranceÐ dynam ic arm ¯ exion. The best model ( table 7 ) included tests of muscle strength and enduranceÐ mean lift power, ILM 170 divided by body mass, and sit-ups multiplied by body massÐ accounting for 55% of the varian ce. M isclassi® cation rates totalled 17% for the men. The model produced a relatively large SD, resulting in a 95% CI = ~ 600 s, equivalent to 100% of the mean score. There were no tasks in the literature involving lifting similar loads to similar heights with which to make direct comparisons. The closest were probably the box carry tasks reported by Beckett and Hodgd on ( 1987 ) and M ello et al. (1995 ), involving objects of sim ilar dimensions with loads of ~ 35 kg. However, the task used by Beckett and Hodgdon did not involve lifting and was really a maxim um eŒort interval carry. Not surprisingly, an index of aerobic ® tness was the best predictor test, though upper body endurance measured via pull-ups and push-ups also correlated highly. Beckett and H odgdon’ s ( 1987 ) m odel included aerobic ® tness and a m easure of explosive power (either broad jump or dynam ic lifting ). The task used by M ello et al. ( 1995 ) involved identifying the maxim um load that could be sustained for one h of repetitive lifting and carrying. M ean arm power on a W ingate test provided the highest correlation in the pooled gender sam ple, followed by bench press. The best model included stature and bench press, and the model selected by the authors included arm power only. Unfortunately, the study by M ello et al. was published after completion of this study, and neither the W ingate nor Physical selection procedures for the British Army 99 bench press tests had been included ( though push-ups, which are closely related to bench press, was included ) as they appeared to have little functional relevance to our criterion tasks. However, in all three studies cited, the relevance of upper body strength and endurance was apparent. 5.5.2. Repetitive lift with 22 kg: The best single predictor tests included measures of muscle strength, enduran ce and body sizeÐ ILM 145, 38 cm upright pull, dynam ic arm ¯ exion endurance and bi-acromial widthÐ on the exclusively female sam ple. The best m odel ( table 7 ) included one test of strength ( hand grip ) and one test of endurance (dynam ic arm ¯ exion ), accounting for 45% of the varian ce. The 95% CI was an unsatisfactory 1800 s, equivalent to ~ 68% of the mean task score. The misclassi® cation rates totalled 18% am ong the women. Parallels may be drawn between the RLC22 task and the 20 kg sand bag carry used by Stevenson et al. ( 1988, 1992, 1994 ), an d the repetitive box lifts used by M ello et al. ( 1995 ) an d by Aghazadeh an d Ayoub ( 1985 ). Aghazadeh and Ayoub’ s paper lacks details of the relationships between the static and dynam ic strength tests, and task performance. However, they reported isokinetic lift scores to be superior predictors to static strength scores for predicting ¯ oor to shoulder height lifting for an 8 h shift. The best static strength tests involved shoulder and leg muscles. M ello et al. ( 1995 ) reported FFM of the arm , whole-body FFM , gender and stature to be the best single predictors of ¯ oor to shoulder height lift performance for an eight h shift in a pooled-gender sam ple. The best model from this study included none of these physical performance test scores. Neither FFM nor stature were am ong the most powerful predictors in the present study. Stevenson et al. ( 1988, 1992 ) allow more detailed comparisons of ® ndings. The 1988 study, using a sand bag carry in younger women, reported maxim um lifting, ¯ exed arm hang, and sit-ups to be the best predictor tests. Neither pull-ups nor situps were good predictors in this study, but the ILM test was the single best predictor. Stevenson et al.’ s model for older women ( 1988 ) included ¯ exed arm hang, ILM and . 2 VO 2 m ax , which provided a lower r to the model in this study. The 1992 study, also . using a sand bag carry in younger wom en, reported VO 2 m ax to be the only signi® cant predictor. Other tests of importance in men and older women included hand grip strength and endurance. 5.5.3. Repetitive lift with 10 kg: The best single predictor tests included measures of strength and enduranceÐ dynam ic arm ¯ exion endurance, ILM 145 and 38 cm upright pullÐ on the exclusively female sam ple. Even the best model produced an unacceptable prediction of performance, involving 38 cm upright pull only ( table 7 ). Although misclassi® cation levels were m oderate (8% overall, 16% for women ) the equation could only account for 38% of the varian ce, had a m ean prediction error of nearly 1000 s and a 95% CI > 2200 s, equivalent to 75% of the mean scores. No comparative tasks with RLC10 have been reported in the literature. Data from this study provide mixed support for the contention of M ital et al. ( 1986a, b ) that repetitive dynam ic tests are superior to strength measures ( either static or dynam ic ) in predicting sustained lifting performance. In this study, the test of lift power provided the highest single predictor of RLC44 by a considerable margin. However, the test of dynam ic arm ¯ exion endurance featured as a prominent predictor of all three repetitive lift tasks, supporting the importance of tests of dynam ic muscular endurance, especially with moderate and lighter loads. 100 M . Rayson et al. It is the proportion of M VC that the load represents, which largely determines the relative contributions of muscle strength and endurance. The negative exponential relationship between percentage maxim um load and sustainable duration will determine the score on the task (Legg and Pateman 1984 ). If the load re¯ ects a high percentage of an individual’ s maxim um strength, then maxim um duration will be severely limited. Conversely, if the load re¯ ects 50% or less of an individual’ s maxim um capacity, then it is likely that that individual would be able to sustain the intermittent activity for some time. 5.6. Predictors of loaded march tasks 5.6.1. Single predictive tests: Only Infantry soldiers ( exclusively male ) were allocated to LM 25 and consequently no wom en performed this criterion task. For LM 20 and LM 15, the best predictors of the m arch tasks diŒered somewhat by gender. The largest correlation coe cients for the wom en (0.66 ) were signi® cantly higher than for the men ( 0.5 ) for LM 20, though this did not hold true for LM 15 where coe cients were of a similar order for both genders ( 0.70 ). As expected from the literature ( see Section 1.5 ), indices of aerobic ® tness dominated the sim ple correlations and the three loaded march tasks. . the models Ð for 1 Estimated maxim al aerobic power ( VO 2m a x (l.min )), derived from the M ultistage Fitness Test and body m ass, was substantially the best predictor for LM 25 in men while the M ultistage Fitness Test score itself was the best predictor of LM 20 in the pooled and men’ s sam ples and of LM 15, however, the participants were classi® ed ( pooled, male or female ). Additional strong predictors of loaded march performance included measures of muscle strength ( 85 cm upright pull, lift power, ILM test ) and endurance (dynam ic and static arm ¯ exion, push-ups ). In women, stature was also important. 5.6.2. Load ed march models: The best march models for each of the three criterion tasks included an index of aerobic ® tness plus one or more additional physical . Ð 1 perform ance tests. The LM 25 model ( table 7 ) included estimated VO 2m a x ( l.min ), body m ass and static arm ¯ exion endurance, which accounted for 40% of the varian ce. The LM 20 model ( table 7 ) included the M ultistage Fitness Test and gender, accounting for 55% of the varian ce. The LM 15 m odel (table 7 ) included the M ultistage Fitness Test, static arm ¯ exion endurance and percentage body fat accounting for 77% of the varian ce. The 95% CI were all between 16 and 18 min, which equated to ~ 16% of the mean march times. The misclassi® cation rates were variab le between the march tasks and genders. The misclassi® cation rates for men were low ( 2 ± 5% ) for all models, largely because the vast majority of men achieved the required standards of 120 min, many by a considerable margin. H ow ever, the misclassi® cation rates for women were highÐ 35% for LM 20 and 18% for LM 15Ð resulting in gender bias and indirect discrimination again st women. The relatively high percentage misclassi® cation rate was mainly a result of the women’ s scores being distributed around the pass standard. Recent studies by Rays on et al. ( 1993 ) on female soldiers and by Frykman and Harm an ( 1995 ) on male soldiers identi® ed stature as a good predictor of the ability to march with a load. Interestingly, stature appeared im portant for women in LM 15, yet not in LM 20, nor in any of the marches for men. Ankle plantar ¯ exion strength had also been shown to be a useful predictor ( Knapik et al. 1990, Rayson et al. 1993 ), but the test did not appear in any of the models with the highest predictive power. Physical selection procedures for the British Army 101 The relationship of percentage body fat and FFM to loaded march performance is intriguing. Several authors have shown a negative relationship between fatness and march performance ( Dziados 1987, Rays on et al. 1993 ), though there is little consensus as to the extent of the impact of fatness. Only in the pooled-gender data for LM 20 does fatness appear as a prominent predictor in this study, though there is also some evidence of fatness degrading performance in the men’ s data for LM 20 and LM 15. Strangely, in the pooled-gender LM 15 model, a greater level of body fat appears to be associated with enhanced march performance. This ® nding might be related to the restricted range of values of percentage body fat in this sam ple and the absence of morbidly obese people. 5.7. Sum mary The objectives of this studyÐ to identify the most predictive physical performance tests of criterion task performance in trained British Arm y personnel and to develop gender-free and gender-unbiased proceduresÐ have been partly m et. Prediction models have been generated for all criterion tasks and the most predictive physical performance tests have been identi® ed. How ever, the accuracy with which criterion task scores could be predicted varied considerably and our success in developing gender-free and gen der-unbiased models was only partially successful. Only one model ( LM 15 ) was gender-free. Three further models were genderrelated ( i.e. contained `gender’ explicitly in the model ) and the remaining six were gender-speci® c ( i.e. were appropriate to men or to wom en only ). Owing m ainly to the larger errors associated with predicting the women’ s scores and the ® nding that women’ s scores had a greater tendency than the m en’ s to be distributed around the pass standard, many of the models were gender-biased in their outcome, resulting in a greater proportion of misclassi® cations and therefore, indirect discrimination again st women. Both single lift criterion tasks were successfully modelled using physical performance tests. A `gender-related’ model was derived for SL145, but `genderspeci® c’ models had to be derived for SL170. All single lift m odels involved measures of muscle strength ( some relative to body mass ) and FFM . The level of agreem ent between measured and predicted criterion task scores was acceptable and should be improved further by m odifying the criterion task protocols. The carry criterion task was modelled using a gender-related model but the errors were large, possibly due to the unsatisfactory reliability of the carry task. The best model included measures of muscle endurance and body size. The misclassi® cation rates were gender-biased resulting in indirect discrimination again st women. Unless an improved carry model can be derived using more highly m otivated participants, options other than using a prediction model will have to be considered for the carry task. Alternative strategies include eliminating performance on the carry task as an entry criterion, or administering the criterion task itself. The data generated for all three repetitive lift criterion tasks were weak due to inadequate task protocols. M odels were derived on single-gender data but they were unsatisfactory in their current form, due mainly to the large CI. However, measures of muscle strength and endurance and body size were identi® ed as predictors of repetitive lift and carry performance. Extending the maxim um duration of the task protocols in the subsequent validation study should generate better data distributions and permit new, improved models to be derived. 102 M . Rayson et al. All of the loaded m arch criterion tasks were successfully modelled incorporating indices of aerobic ® tness, usually supplemented by measures of strength, endurance or body size and composition. An absence of female data inevitably resulted in the generation of a single-gender model for LM 25. A gender-related model was developed for LM 20, while a gender-free model was appropriate for LM 15. The level of agreement between measured an d predicted criterion task scores was acceptable. H aving developed a series of models to predict performance on the nine criterion tasks, the models must be validated in a separate sam ple of the user population of recruits. The validation study is reported in a separate paper. 6. Conc lusions The principal objective of this study to determine which combination of physical perform ance tests could be best used to predict performance in nine criterion tasks in trained British Army soldiers has been met. However, the accuracy with which criterion task perform ance can be predicted was variab le and unsatisfactory for some of the criterion tasks. The secondary objective of developing gender-free models was achieved in only one criterion task, the LM 15. Gender-related models were devised for three criterion tasks, the SL145, C and LM 20, and gender-speci® c models were devised for the remaining ® ve criterion tasks, SL170, RL44, RL22, RL10 and LM 25. The additional secondary objective of developing gen der-unbiased models proved elusive. This was partly due to the limitations in the data which have been discussed and also the ® nding that the women’ s scores tended to be distributed around the pass standards, whereas the male’ s scores tended to exceed the pass standards. The result was a greater proportion of misclassi® cations and, therefore, indirect discrim ination again st females. A further validation study on recruits is necessary to address these concerns. Acknow ledgements This work was funded by the M inistry of Defence and conducted at the Defence Research Agency’ s Centre for Human Sciences. The contribution of the following scienti® c staŒis gratefully acknowledged: M r Rene Nevola, M rs Barbara Sage, M r Dom inic Lineham , M s Clare Birch, M s Anne Rothw ell, Cpl Dave W illis, CSgt Gordon Gill, Dr Reg W ithey and Dr M ike Stroud at DERA, M r Andrew Pinder and Professor Don Grieve from the Royal Free Hospital, and Professor David Jones at the University of Birmingham . The authors acknowledge the client, the Director of M anning ( Arm y ), for their vision and commitment to this project, and also the British Arm y soldiers who took part. Finally, we thank the reviewers of the paper for constructive comments. References A G HAZAD EH , F. and A YO UB , M. M. 1985, A com parison of dynam ic and static-strength models for prediction of lifting capacity, Ergonomics, 28, 1409 ± 1417. A STRAND , P. O. and R ODAH L , K. 1986, Textbook of Work Physiology: Physiological Bases of Exercise, 3rd edn ( Singapore: McGraw-Hill ). A YO UB , M. M. and M ITAL , A. 1989, Manual Materials Handling ( London: Taylor & Francis ). A YO UB , M. M., D ENARD O , J. D., S M ITH , J. L., B ETHEA , N. J., L AMBERT , B. A., A LLEY , L. R. and D URA N , B. S. 1982, Establishing Physical Criteria for Assigning Personnel to Air Force Jobs: Final Report. Air Force O ce of Scienti® c Research, Contract no. F49620-79-C0006 ( Lubbock : Institute for Ergonomics Research, Texas Tech University ). Physical selection procedures for the British Army 103 B ECK ETT , M. B. and H ODG DON , J. A. 1987, Lifting and Carrying Capacities Relative to Physical Fitness Measures. Technical Report 87-26 ( Bethesda: Naval Health Research Centre ). B LAN D , J. M. and A LTM AN , D. G. 1986, Statistical methods for assessing agreement between two methods of clinical measurem ent, Lancet, i ( 8476 ), 307 ± 310. B REW ER , J. and D AVIS , J. 1993, Abdominal Curl Conditioning Test ( Headingley: National Coaching Foundation ). C ALD W ELL L. S., C HAFFIN , D. B., D UK ES -D OBOS , F. N., K ROEMER , K. H. E., L AUBACH , L. L., S NO OK , S. H. and W ASSER MAN , D. E. 1974, A proposed standard procedure for static muscle strength testing, American Industrial Hygiene Association Journal, 35, 201 ± 205. C HAFFIN , D. B. 1974, Human strength capability and low-back pain, Journal of Occupational Medicine, 16, 248 ± 254. C HAFFIN , D. B. 1975, Ergonom ics guide for the assessment of hum an static strength, American Industrial Hygiene Association Journal, 36, 505 ± 511. C OLLINS , K. J. (ed. ) 1990, Handbook of Methods for the Measurement of Work Performance, Physical Fitness and Energy Expenditure in Tropical Populations ( International Union of Biological Sciences ). D U EKER , J. A., R ITCH IE , S. M., K NOX , T. J. and R OSE , S. J. 1994, Isokinetic strength testing and employment, Journal of Occupational Medicine, 36, 42 ± 48. D U RN IN , J. V. G. A. and W ORM ERSLEY , J. 1974, Body fat assessed from total body density and its estimation from skinfold thickness: measurements on 481 men and women aged 16 to 72 years, British Journal of Nutrition, 32, 77 ± 79. D ZIADO S, J. E., D AM AK OSH , A. I., M ELLO , R. P., V OG EL , J. A., K EN NETH , L. and F ARM ER , J. 1987, Physiological Determinants of Load Bearing Capacity. Technical Report T19 ± 87 ( Natick: US Army Research Institute of Environm ental Medicine ). E VA NS , O. M., Z ERBIB , Y., F AR IA , M. H. and M ON OD , H. 1983, Physiological responses to load holding and load carriage, Ergonomics, 26, 161 ± 171. F RYKM AN , P. N. and H ARM AN , E. A. 1995, Anthropometric correlates of maximal locom otion speed under heavy backpack loads, Medicine and Science in Sports and Exercise, 27 ( suppl. ), 5. G R IEVE , D. W. and VAN D ER L IN DEN , J. 1986, Force, speed and power output of the hum an upper limb during horizontal pulls, European Journal of Applied Physiology, 55, 425 ± 430. H A RRIS, R. J. 1975, A Primer of Multivariate Statistics (New York: Academ ic Press ). J ON ES , D. A. and R OUN D , J. M. 1992, Skeletal Muscle in Health and Disease ( Manchester: Manchester University Press ). K A MON , E., K ISER , D. and P YTEL , J. 1982, Dynamic and static lifting capacity and muscular strength of steelmill workers, American Industrial Hygiene Association, 43, 853 ± 857. K EARN EY , J. T. and S TU LL , G. A. 1981, EŒect of fatigue level on rate of force developm ent by the grip ± ¯ exor muscles, Medicine and Science in Sports and Exercise, 13, 339 ± 342. K N APIK , J. J., S TAA B , J., B AH RKE , M., O’ C ON NOR , J., S H ARP , M., F RYK MAN , P., M ELLO , R., R EYN OLD S, K. and V OGEL , J. A. 1990, Relationship of Soldier Load Carriage to Physiological Factors Military Experience and Mood States. Technical Report T17-90 ( Natick: US Army Research Institute of Environm ental Medicine ). K N APIK , J. J., V OG EL , J. A. and W RIG HT , J. E. 1981, Measurement of Isometric Strength in an Upright Pull at 38 cm. Technical Report T3 /81 ( Natick: US Army Research Institute of Environmental Medicine ). K R OEM ER , K. H. E. 1983, An isoinertial technique to assess individual lifting capability, Human Factors, 25, 493 ± 506. K R OEM ER , K. H. E. 1985, Testing individual capability to lift material: repeatability of a dynam ic test com pared with static testing, Journal of Safety Research, 16, 1 ± 7. L EG G , S. J. and P ATEM AN , C. M. 1984, A physiological study of the repetitive lifting capabilities of healthy young males, Ergonomics, 27, 259 ± 272. L EG G , S. J. and P ATTON , J. F. 1987, EŒects of sustained manual work and partial sleep deprivation on muscular strength and endurance, European Journal of Applied Physiology, 56, 64 ± 68. L IN D , A. R. and M C N ICHOL , G. W. 1968, Cardiovascular responses to holding and carrying weights by hand and shoulder harness, Journal of Applied Physiology, 25, 261 ± 267. 104 M . Rayson et al. M C D AN IEL , J. W., S K AND IS, R. J. and M AD OLE , S. W. 1983, Weight Lift Capabilities of Air Force Basic Trainees. AFAMRL Report TR-83-0001 (Ohio: Air Force Aerospace Medical Research Laboratory, Wright ± Patterson Air Force Base ). M ELLO , R. P., D A MOK OSH , A. I., R EYN OLD S, K. L., W ITT , C. E. and V OG EL , J. A. 1988, The Physiological Determinants of Load Bearing Performance at DiŒerent March Distances. USARIEM Technical Report T15 / 88 ( Natick: US Army Research Institute of Environmental Medicine ). M ELLO , R. P., N IN DL , B. C., S HAR P , M. A., R ICE , V. J., B ILLS , R. K. and P ATTON , J. F. 1995, Predicting lift and carry performance from muscular strength, anaerobic power and body composition variables, Medicine and Science in Sports and Exercise, 27 (suppl. ), S152. M ITA L , A. 1985, Use of anthropometry and dynam ic strength in developing placement and screening procedures for workers, in H. J. Bullinger and H. J. Warnecke ( eds ), Towards the Factory of the Future ( Heidelberg: Springer ). M ITA L , A., A G HAZAD EH , F. and K AR WOW SKI , W. 1986a, Importance of isometric and isokinetic lifting strengths in estimating maximum lifting capacities, Journal of Safety Research, 17, 65 ± 71. M ITA L , A., K ARW OW SK I, W., M AZOU Z , A. K. and O R SA RH , E. 1986b, Prediction of maximum weight of lift in the horizontal and vertical planes using simulated job dynam ic strengths, American Industrial Hygiene Association Journal, 47, 288 ± 291. M YERS , D. C., G EBH ARD T , D. L., C R UM P , C. E. and F LEISHM AN , E. A. 1984, Validation of the Military Entrance Physical Strength Capacity Test. Technical Report 610 (Natick: US Army Research Institute of Environmental Medicine ). N OTTR ODT , J. W. and C ELENTA N O , E. J. 1987, Developm ent of predictive selection and placement tests for personnel evaluation, Applied Ergonomics, 18, 279 ± 288. P IN DER , A. D. and G RIEVE , D. W. 1997, Hydro-resistive measurement of dynam ic lifting strength, Journal of Biomechanics, 30, 399 ± 402. P OULSEN , P. E. 1970, Prediction of maximum loads in lifting from measurem ents of muscular strength, Communications from the Danish National Association for Infantile Paralysis, no. 31 (Hellerup, Denm ark ). P YTEL , J. L. and K AMO N , E. 1981, Dynamic strength test as a predictor for maximal and acceptable lifting, Ergonomics, 24, 663 ± 672. R AM SBO TTOM , R., B REW ER , J. and W ILLIAMS , C. 1988, A Progressive Shuttle Run test to estimate maximal oxygen uptake, British Journal of Sports Medicine, 22, 141 ± 144. R AY SON , M. P. 1997, Selection standards for physically-demanding occupations. PhD thesis, University of Birmingham. R AY SON , M. P. 1998, The developm ent of physical selection procedures for the British Army. Phase 1: Job analysis and identi® cation of criterion tasks, in M. A Hanson (ed. ), Contemporary Ergonomics 1998 (London: Taylor & Francis ), 393 ± 397. R AY SON , M. P. and H OLLIM AN , D. E. 1995, Physical Selection Standards for the British Army. Phase 4: Predictors of Task Performance in Trained Soldiers. Technical Report DRA / CHS / PH YS / CR95 /017 (Farnborough: Defence and Evaluation Research Agency ). R AY SON , M. P., B ELL , D. G., H OLLIMA N , D. E., L LEWELY N , M., N EVOLA , V. R. and B ELL , R. L. 1994a, Physical Selection Standards for the British Army: Phases 1 and 2. Technical Report 94R036 ( Farnborough: Army Personnel Research Establishment ). R AY SON , M. P. D A VIES, A. and S TR OUD , M. A. 1993, The physiological predictors of maximum load carriage capacity in trained females, in Proceedings, UK Sport: Partners in Performance ( London: Sports Council ), 212. R AY SON , M. P., H OLLIMA N , D. E. and B ELL , D. G. 1994b, Physical Selection Standards for the British Army. Phase 3: Development of Physical Selection Tests and Pilot Study. Technical Report DRA /CHS /WP04006 ( Farnborough: Defence and Evaluation Research Agency ). R ICE , V. J. and S H ARP , M. A. 1994, Prediction of performance on two stretcher-carry tasks, Work, 4, 201 ± 210. S HA RP , D. S., W RIG HT , J. E., V O GEL , J. A., P ATTON , J. F., D A NIEL , W. L., K NAP IK , J. and K ORVAL , D. M. 1980, Screening for Physical Capacity in the US Army. An Analysis of Measures Predictive of Strength and Stamina. Technical Report T8 /80 ( Natick: US Army Research Institute of Environmental Medicine ). Physical selection procedures for the British Army 105 S TEVENS ON , J. M., A NDR EW , G. M., B RYA NT , J. T. and T HO MSON , J. M. 1985, Development of Minimum Physical Fitness Standards for the Canadian Armed Forces. Phase 1. DSS Contract 8SE85-00017 ( Kingston, Ont.: Ergonom ics Research Laboratory, Queen’ s University ). S TEVENS ON , J. M., A NDR EW , G. M., B RYA NT , J. T. and T HO MSON , J. M. 1988, Development of Minimum Physical Fitness Standards for the Canadian Armed Forces. Phase III. DSS / DND Contract W8477-7-SC02 / 01-ST ( Kingston, Ont.: Ergonom ics Research Laboratory, Queen’ s University ). S TEVENS ON , J. M., B RYAN T , J. T., A ND REW , G. M., S M ITH , J. T., F R EN CH , S. L., T H OMSON , J. M. and D EAKIN , J. M. 1992, Developm ent of physical ® tness standards for Canadian Armed Force’ s younger personnel, Canadian Journal of Sport Science, 17, 214 ± 221. S TEVENS ON , J. M., B RYAN T , J. T., A ND REW , G. M., S M ITH , J. T., F R EN CH , S. L., T H OMSON , J. M. and D EAKIN , J. M. 1994, Developm ent of physical ® tness standards for Canadian Armed Force’ s older personnel, Canadian Journal of Applied Physiology, 19, 75 ± 90. S TEVENS ON , J. M., G R EENH ORN , D. R., B RYAN T , J. T., D EAK IN , J. M. and S M ITH , J. T. 1996, Gender diŒerences in performance of a selection test using the incremental lifting machine, Applied Ergonomics, 27, 45 ± 52. T EVES , M. A., W RIGH T , J. E. and V O GEL , J. A. 1985, Performance on Selected Candidate Screening Test Procedures before and after Army Basic and Advanced Individual Training. Technical Report T13 / 85 ( Natick: US Army Research Institute of Environmental Medicine ). V OGEL , J. A. 1992, Obesity and its relation to physical ® tness in the US military, Armed Forces and Society, 18, 497 ± 513.
© Copyright 2025 Paperzz