Running head: Behavioral and Emotional Rating Scale 1 Description, Evaluation, and Critique of the Behavioral and Emotional Rating Scale: A StrengthBased Approach to Assessment—Second Edition (BERS-2) NAME Chapman University Behavioral and Emotional Rating Scale 2 Description, Evaluation, and Critique of the Behavioral and Emotional Rating Scale: A StrengthBased Approach to Assessment—Second Edition (BERS-2) Description of the Test The Behavioral and Emotional Rating Scale: A Strength-Based Approach to Assessment—Second Edition (BERS-2) is authored by Michael H. Epstein, Ed.D., and it was published by PRO-ED, Inc. in 2004 (the first edition was published in 1998). The BERS-2 is a strength-based approach to assessment with information obtained that focuses on providing professionals with the emotional and behavioral strengths of children. Buckley and Epstein (2004) state that standardized resources to formally explore students’ strengths are limited, and because assessments often focus on the deficits of a child, the BERS-2 is a useful approach to assessment that aims to explore what many other assessments do not. With the information collected, the BERS-2 is meant to create a positive context in which professionals can create individualized mental health and/or educational services and monitor the outcomes of and an individual’s progress with such services. Content and Use Practical Information and Features The BERS-2 consists of three main components including the Examiner’s Manual, the three rating scales—Teacher Rating Scale (TRS), Parent Rating Scale (PRS), and Youth Rating Scale (YRS)—and a Summary Form. Although all three rating scales are to be used with most children, they may not all be applicable for certain individuals such as a student who is not attending school and would not have a TRS completed. Within the three rating scales are five core subscales: interpersonal strength, family involvement, intrapersonal strength, school Behavioral and Emotional Rating Scale 3 functioning, and affective strength. In addition, the PRS and YRS include a supplemental career strength subscale. After individuals complete their respective scales, they are then asked to complete numerous “key questions” which can be found in each of the three scales. These key questions are open-ended in nature, and they are meant to gather information regarding the child’s strengths, preferences, and personal and family resources. An example of a key question is “this student’s favorite hobbies or activities are…” Aside from the three rating scales and the key questions, the BERS-2 also includes a summary form to be used when more than one rating scale is completed which often is the case. The summary form allows the examiner to aggregate the data, plot scores from the rating scales, and compare these scores. Characteristics of the Test The TRS consists of 52 items while the PRS and YRS consist of 57 items due to the inclusion of the supplemental career strength subscale in these two rating scales. All items are answered on a Likert scale ranging from zero to three (0 = if the statement is not at all like the student, 1 = not much like the student, 2 = like the student, 3 = very much like the student). The anchors are clear and the response format is consistent across the rating scales. The items on each scale are generally similar with differences made depending on who is answering the items. For example, on the TRS, item three states “accepts a hug,” while on the YRS, item three states “it’s okay when people hug me.” The individual scales can be completed in a single session with each generally completed within 10 minutes. The TRS can be completed by a teacher (or teachers), or by a nonteaching individual who has frequent contact with the child such as a counselor or school psychologist. The PRS should be completed by the child’s primary Behavioral and Emotional Rating Scale 4 caregivers, and the YRS should be completed by the child if between the ages of 11 to 18 years, at or above the fifth-grade reading level, and with no significant developmental delays. Although the three scales individually contain between 52 and 57 items, each of the five subscales is comprised of remarkably fewer items including 15 items for the interpersonal strength subscale and seven items for the affective strength subscale. In addition, numerous statements on the rating scales can be described as vague and ambiguous. For example, different individuals may interpret statements such as “interacts positively with parents” dissimilarly. In addition, having such few items to measure the five subscales also may skew the results. Rater errors such as “halo effects” and “pitchfork effects” also may occur in such a rating scale format. These features are important considerations as they can affect the results significantly. Because the BERS-2 heavily relies on these rating scales, examiners may want to follow-up with raters regarding their answers before making any interpretations and/or conclusions. Aside from these potential issues, the use of information from multiple informants and the use of a Likert-type scale with numerous options to choose from provide two significant strengths to the BERS-2. Moreover, the test is user-friendly and easy-to-read. Each scale is presented well with explicit directions, easy-to-understand statements with corresponding rating options, and clear key questions to answer. It should be noted, however, that more than two omitted or multiply-marked items per subscale deem that subscale unscorable. This may pose a problem since the scales have either 52 or 57 items and individuals may accidentally miss some items. Since only two mistakes per subscale are permitted, the examiner should carefully explain such an issue to the raters and/or carefully watch the raters as they complete the scales. The BERS-2 is to be administered, scored, and interpreted by an individual who understands the theoretical foundations (e.g., strength-based assessment) and the construction Behavioral and Emotional Rating Scale 5 and statistical characteristics of the BERS-2. Furthermore, the examiner should be experienced in administering the BERS-2 and be familiar with how to interpret the results of the normative scale data. The examiner’s manual recommends the examiner to conduct at least three practice administration and scoring sessions. Despite these somewhat rigorous standards for administering the BERS-2, many different professionals (e.g., classroom teachers, psychologists) with sufficient knowledge and training may give the assessment. For those completing the rating scales, individuals must be familiar with the child’s behavior in the respected environment, and he or she should have had regular and daily contact with the child for at least a few months. Characteristics of the Test Manual While the BERS-2 may have some potential problems, it is a user-friendly and wellorganized measure. A description of and uses for the BERS-2 are presented and clearly laid out in chapter one of the examiner’s manual. Reasons for and advantages of strength-based assessment also are discussed in chapter one, allowing the reader to obtain valuable knowledge regarding the BERS-2 before giving the assessment. Following this helpful chapter, the subsequent chapters are nicely presented. The test manual flows from administration and scoring procedures to interpreting results, normative information, test reliability, and test validity with subsections within the chapters. Each chapter builds upon the previous one and contains pertinent information for the examiner. Moreover, the tables and figures within the test manual are useful and clearly organized. Lastly, the test manual is at an appropriate length of 87 pages with four additional short appendices containing normative scores information. Based on these aspects, the test manual can be described as well-designed and easily accessible for professionals who are administering the BERS-2. Behavioral and Emotional Rating Scale 6 Standardization Sample and Norms The BERS-2 developers provide a total of four sets of normative data including two sets for the TRS, one set for the PRS, and one set for the YRS. For the two TRS normative sets, the first set is comprised of 2,176 students not identified with emotional or behavioral disorders (the NEBD sample) while the second set consisted of 861 children who were identified with emotional or behavioral disorders (the EBD sample). Both data sets were gathered for use with the original BERS from February to November 1996 in the same 31 states, and the children in these samples ranged in age from 5-18 years. Although the original BERS TRS data may have been acceptable, developers should have considered obtaining new TRS data for the BERS-2. The PRS and YRS data were collected from the fall of 2001 though the spring of 2002 with students between the ages of 5 and 18 for the PRS and between 11 and 18 for the YRS since students below the age of 11 are not to complete the YRS. In addition, the PRS sample consisted of 927 students in 34 states while the YRS sample was comprised of 1,301 students in 31 states. Despite having participants from a large number of states, participants from each of the 50 states would have been ideal. Further, it is unclear whether some of the same students or entirely different students were included in the three rating scales. Lastly, it should be noted that many of the normative samples had a sample size that was lower than the acceptable minimum number per age group. In the TRS EBD sample, for example, only ages 14-16 had over 100 students. Data were coordinated and collected for these norms by trained educators throughout the United States. BERS-2 developers selected these trained educators through three methods: (1) by contacting professionals in the field, (2) by searching customer files to locate professionals who had purchased the original BERS (or similar products) within the past two years, and (3) by asking professionals who had collected data in previous norming projects. During this process, Behavioral and Emotional Rating Scale 7 developers also considered the prospective data coordinators’ access to the children based on their geographic region, Hispanic ethnicity, and race. The selected coordinators were trained in administration of the BERS-2 and collected data from both parents and students. Additional information is not given about these trained educators such as their sex, ethnicity, education level, or occupation, which makes it difficult to consider any possible biases or issues with data collection. Furthermore, because a convenience sample was used, random sampling did not occur which also may introduce bias in the sample. A breakdown of characteristics and the percentage of the sample with each characteristic for the four normative data sets are included as helpful tables in this chapter. The tables include the following characteristics—geographic area, gender, race, Hispanic ethnicity, family income, educational attainment of parents, disability status, and age—with a comparison between the percentage of the sample with each characteristics versus the percentage of individuals with that characteristic in the U.S. population. The samples’ characteristics were compared with those for the school-age population from the U.S. Bureau of the Census 2001. Based on these comparisons, BERS-2 developers state that the samples were representative of the general school-age population. Demographic information was later stratified by age for the TRS NEDB sample to further show representativeness. Interestingly, however, only four of the original eight demographic characteristics were stratified during this process, which include geographic region, gender, race, and ethnicity. In addition, sample data for the TRS NEBD was the only data set stratified. Although this stratification did “conform to national expectations at each age included in the test’s norms,” it would have been interesting and helpful to see data stratified for all four data sets (Epstein, 2004, p. 40). Behavioral and Emotional Rating Scale 8 Following stratification, developers weighted the samples for the PRS and the YRS on three sampling variables (race, Hispanic ethnicity, and geographic region) during later normative development. Weighting data for the TRS (collected in 1995 and 1996) was deemed unnecessary as this data were already representative of the U.S. school-age population. It is interesting that developers chose to weight the data as the unweighted sample data were closely similar to the U.S. population; however, developers state that weighted samples were used to achieve a more “proportional” distribution to the U.S. school-age population. The PRS and YRS sample data were weighted in a two-step process. An overall weight was first determined for each race, Hispanic ethnicity, and geographic region by dividing the variable’s percentage in the U.S. population by the percentage in the sample. In the second step, “a weight was obtained for each individual in the sample by multiplying the weight for his or her race by the weight for his or her geographic region by the weight for his or her ethnicity” (Epstein, 2004, p. 42). Based on these weighted normative samples, BERS-2 norms were derived. Scores and Interpretation Chapter three of the test manual provides clear, detailed, and helpful information regarding the interpretation of BERS-2 results. The chapter is divided into five main sections with information including instructions for completing the BERS-2 summary form and the three rating scales, a discussion of the various kinds of scores and what they mean, useful information to understand before making a diagnostic decision, information for interpreting deviant scores and finally, suggestions for professionals when sharing a child’s results on the BERS-2. Although all the information in this chapter is beneficial, this section will largely focus on a discussion of the scores used for the purpose of this paper. Behavioral and Emotional Rating Scale 9 The summary form of the BERS-2 is to be used when more than one rating scale is completed, which likely will be the case with most students. Developers include a helpful example of what a completed summary form may look like using a hypothetical student. This example is used throughout the chapter as a consistent written and visual example for readers when discussing information. On this summary form, examiners record a student’s raw scores from the TRS, PRS, and YRS and three kinds of normative scores: his or her percentile ranks, scaled scores (also known as standard scores), and the BERS-2 Strength Index (the composite score). In the subsection titled “Types of Test Scores,” developers provide clear and concise information regarding these three types of scores. First, a student’s raw scores are described as the sum of the child’s ratings for the items of each subscale. Developers state that it is pertinent to double-check raw score results as they are the basis for calculating standard scores. Second, a short yet critical discussion of percentile ranks is presented. The normative tables for converting the subscale raw scores into percentile ranks can be found in Appendix A of the test manual. Percentile ranks are mentioned as being simplistic and easily understood as they provide a ranking from one to 100, but developers are quick to assert an honest discussion of percentile ranks and their weaknesses. For example, it is stated that percentile ranks are not interval data and “equal differences in percentile ranks do not represent equal differences in the attributes or behaviors being measured” (Epstein, 2004, p. 26). Third, subscale scaled scores are discussed. Like percentile ranks, information for converting raw scores into scaled scores can be found in Appendix A. Scaled scores are described as more useful and more valuable than percentile ranks since they are standard deviation units, can be compared to the average rating of the normative sample, and can be compared to other standard scores. Thorough and useful information regarding scaled scores is Behavioral and Emotional Rating Scale 10 discussed and it is evident that the developers advocate for the use of standard scores over percentile ranks. Despite their usefulness, readers and practitioners should be aware during interpretation that ordinal data was gathered in the BERS-2 (i.e., data using a Likert-type scale), which was then used with interval data (the scaled scores). Furthermore, developers did not include the standard error of measurement (SEM) or the confidence intervals (CIs) for the scaled scores, which are needed for an accurate interpretation. Last in this subsection is an explanation of the BERS-2 Strength Index. It is a type of standard score derived by summing the scaled scores of the subscales and converting this sum to a standard score; the table needed for this conversion is found in Appendix C. This score is described as the most reliable and the most useful of all the scores as it provides an overall rating of the child’s strengths. Additionally, this score also can be compared with other tests’ standard scores, and developers provide an easy-to-understand explanation for these types of comparisons. Following a discussion of these scores, developers move to additional valuable discussions. For example, it is mentioned that tests and/or standard scores alone do not diagnose anything, disagreements among the raters should be identified and discussed, and additional pertinent evidence from other tests, interviews, and observations should be considered before coming to any conclusions. Ultimately, this chapter was thorough and well presented. It provides a strong foundation for interpreting test results, and it does not appear to overstate the importance or use of certain scores. Psychometric Properties Reliability The BERS-2 test reliability was assessed in three ways: content sampling, time sampling, and interrater reliability. Developers state that “error associated with content sampling reflects Behavioral and Emotional Rating Scale 11 the degree of homogeneity among items within a test or subscale,” and items that are more related to one another contain smaller error (Epstein, 2004, p. 49). Content sampling is a type of internal consistency estimate, and it was investigated using Cronbach’s coefficient alpha method. This method is appropriately used as Cronbach’s alpha is meant for a multiple response format like the BERS-2 rating scales. Coefficient alphas were calculated for the five subscales and the composite (Strength Index) using the four normative sample data sets and for all age groups within the samples. In addition, Guilford’s formula was used to derive coefficient alphas for the composite as this is the formula’s purpose. Lastly, the z-transformation technique was used to average the coefficients. Coefficient alphas for the five subscales and composite scores within the four normative groups all fall within the .70 to high .90 range, and the average coefficients equal or exceed .80 with the exception of the Career Strength subscale on the YRS (.79). Further, coefficients for each of the composite scores and the average composite scores equal or exceed .95. Although the alphas for the averages are acceptable in general, developers do not address the individual subscale alphas, which show some alphas within the .70 range. Since that is considered low for clusters, practitioners should be made aware. Despite this, alphas for the composites in each age and in each of the rating scales are highly acceptable deeming the overall score a student receives to be reliable for educational and clinical decisions. In addition to these coefficient alphas, developers are forthright in explaining that although an assessment may be reliable for a general population, this cannot be assumed for each subgroup within the population. Because of this, developers provide coefficient alphas for subgroups that represent a “broad spectrum of ‘mainstream’ and ‘minority’ populations” (Epstein, 2004, p. 53). Alphas are reported for six subgroups within the normative sample (male, Behavioral and Emotional Rating Scale 12 female, White, Black, Hispanic, and students with emotional disturbance) for the five subscales and the Strength Index of each rating scale. The number and ages of students in each subgroup are provided and shown to be acceptable. In general, coefficient alphas equal or exceed .80 for the subscales, although some alphas fall within the .70 range. All alphas for the Strength Index, however, equal or exceed .95, and one can conclude that the BERS-2 is equally reliable for these subgroups. Following content sampling, developers discuss the standard error of measurement (SEM), which is used to estimate the confidence interval for a test score and the average amount of error around a measure. BERS-2 developers provide SEMs for all age groups in the four normative samples in a series of helpful tables, and each table reflects SEMs below three. Due to these small SEMs, it is assumed that one can be more confident in a child’s results. Despite the valuable discussion and inclusion of SEMs, developers also should have provided explicit information regarding confidence intervals (CIs), a clearer formula for CIs, and possibly the CIs themselves for an easier and more precise interpretation. Time sampling error was the next source of error discussed, and developers used the testretest method to investigate temporal stability. Time sampling error is accurately described and developers provide rationale for the importance of testing for such error. Six studies were conducted to check for stability over time, and detailed demographic characteristics for the six studies are provided. Although it is commendable that six studies were conducted, there is a limited age range for each study, a limited geographic area, and a small number of participants with the exception of study two. In addition, only one study was conducted using EBD students, and students aged 10 and 13 were not included in any of the studies. Behavioral and Emotional Rating Scale 13 Despite these limitations, coefficients for each test-retest group showed strong results overall. Coefficients for study one and studies four, five, and six all exceed .80. Because guidelines for the test-retest method with a one to two week interval state that coefficients should be .80 or above, these results are considered acceptable. For studies two and three with a sixmonth interval between ratings, coefficients generally exceed .60. Although practitioners should be aware of the two lower coefficients at .53, these results also can be described as acceptable. Ultimately, practitioners can be confident that the BERS-2 is temporally stable. Lastly, interrater reliability was investigated, and developers provide a brief discussion of interrater reliability and its potential problems. Three separate studies were used to investigate interrater reliability, and demographic characteristics for each study are presented in a table. In study one, nine pairs of special education teachers rated 96 students ages 14 to 18 with serious emotional disturbance (SED). In study two, teachers and parents rated 20 students ages 7 to 16 with SED who were receiving special education services. Lastly, in study three, parent and student ratings were assessed for 296 students from the normative sample ages 10 to 18. Numerous limitations can be noted for each of these studies. Aside from the limited age ranges and limited geographic area for each study, it is questionable why teacher ratings and teacherparent ratings in the general population are not analyzed. Moreover, parent-parent ratings are excluded entirely. In addition, the teacher-parent study with SED students (study two) consists only of 20 students. Moreover, race and ethnic group data were not collected for study one and study two. Lastly, helpful demographic characteristics for the teachers and parents are not provided. One should consider such limitations when interpreting the results. For study one (teacher-teacher ratings), coefficients are large with each exceeding .80. For study two (parent-teacher ratings), most coefficients fall between .54 and .67, although the Behavioral and Emotional Rating Scale 14 intrapersonal strength subscale had a coefficient of .20. Developers state that this small coefficient is “not unexpected because internalizing types of behavior problems typically demonstrate smaller relationships” (Epstein, 2004, p. 59-60). Aside from this, the coefficients in study two are acceptable. For study three (parent-youth), all coefficients equal or exceed .50. These can be described as generally strong results. Ultimately, practitioners should keep in mind the numerous limitations of the studies, however, a summary of reliability results is discussed and valuable tables are provided portraying the BERS-2 as an overall reliable measure. Validity BERS-2 developers provide an extensive discussion of test validity in chapter six of the test manual. Three types of validity are examined: content validity, criterion-related validity, and construct validity. Content validity is stated as an analysis of the test content to ensure the material is representative of the behavior(s) to be measured. Developers employed three methods to evaluate the BERS-2 content validity. First, a detailed and statistically supported rationale is provided for the content and format of the original BERS subscales plus the BERS-2 Career Strength subscale. Item discrimination and factor analyses were two additional methods used during this process. This discussion describes the thorough process developers used to create the 68 items and the five subscales of the BERS-2. Despite this process, developers fail to provide any important demographic characteristics for the professionals they contacted at multiple time points to review the BERS-2 items aside from the sample size of these professionals (sample sizes ranged from 250 to 400). Furthermore, for the item discrimination method, the sample sizes for the two groups of students (with and without SED) used was acceptable, but only 26 teachers were asked to rate these students. In addition, developers do not provide the actual data/numbers when Behavioral and Emotional Rating Scale 15 claiming that the ANOVAs were significant for each item and could accurately discriminate between students with and without SED. Aside from this lack of information, the rationale given provides strong evidence for content validity. Following this, developers use classical item analysis procedures—a method for choosing test content—to explore content validity. Classical item analysis is used to assess features such as item discrimination. In the step mentioned above, item discrimination was used more as a tool to select and omit items that were either useful or not useful in discrimination. In this section, item discrimination was assessed to identify whether the BERS-2 items could actually discriminate between students. For this method, item analysis using the normative samples as subjects was conducted for all age levels within the three rating scales. Median item coefficients for the three rating scales indicate strong content validity as all coefficients greatly exceed .35— the value identified by Epstein as acceptable for item discrimination indexes (as cited in Ebel, 1972; as cited in Pyrczak, 1973). Uhing, Mooney, and Ryser (2005) conducted a separate study to assess differences in scores for students with and without EBD across the YRS and PRS. In their two studies to analyze BERS-2 discriminative ability, t-tests and effect sizes similarly show the BERS-2 to be valid for differentiating between populations. Lastly, differential functioning analyses to check for content validity are used to support a lack of bias in BERS-2 test items. The logistic regression procedure was used to detect differential item functioning (DIF). A significance level of .001 was selected and three of the four normative samples were used in the analyses (the NEBD sample was used for the TRS and the EBD sample was not used). Comparisons were made between two dichotomous groups— Black versus other and Hispanic versus other—for the five subscales in each of the rating scales and in total, 332 comparisons were made. Sixteen of these comparisons were found to be Behavioral and Emotional Rating Scale 16 statistically significant (i.e., containing bias) with moderate to large effect sizes, but ultimately developers claim that less than five percent of the comparisons were significant, which shows little bias in the BERS-2. Although this may be true, it would have been helpful for developers to provide more rationale for the conclusion of little bias. Moreover, comparisons between more than these two dichotomous groups such as between SES levels also would have been a helpful addition. Between these three in-depth methods, however, one can say that the BERS-2 contains appropriate content validity. Criterion-related validity is the next type of validity examined in this chapter. Criterionrelated validity can either be concurrent or predictive. For this measure, developers used concurrent validation in their analyses as they stated that this was the more appropriate type for behavioral assessments such as the BERS-2. Concurrent validity was analyzed in the three rating scales separately. For the TRS, six studies were conducted to evaluate this validity. A detailed table with each study’s demographic characteristics is provided, and sample sizes range from 91 to 382 participants. Despite the acceptable sample sizes and relatively good age ranges, the geographic area in each study is limited once again. Further, only one study was used with students with SED and one study with students with EBD. The BERS-2 was correlated with six measures including the Systematic Screening for Behavior Disorders and the Social Skills Rating System. The correlations are displayed in another detailed table showing moderate correlations ranging from .44 to .62. These correlations indicate adequate validity for the TRS. For the PRS, two studies were conducted to evaluate concurrent validity. Both of these studies had smaller sample sizes (n=55; n=85), virtually no students identifying as Black, Hispanic, or “other,” no students between the ages of 14 and 18, and no students in special education or with an EBD or SED diagnosis. The BERS-2 was correlated with the Child Behavioral and Emotional Rating Scale 17 Behavior Checklist and the Social Skills Rating System. Results show correlations ranging between .53 and .67. Although these are large correlations, there are many limitations for these two studies and results should be interpreted cautiously. More studies with more representative samples should have been used, as they would have made the concurrent validity for the PRS stronger. The final rating scale analyzed for concurrent validity was the YRS. As with the PRS, concurrent validity was evaluated with two studies. Age ranges are especially restricted in these two studies (13-14 and 12-13) since the YRS is to be completed with students ages 11-18, and the sample sizes are small (n=42; n=49). Again, students identifying as White are predominant in these studies. The YRS was correlated with the Youth Self-Report and the Social Skills Rating System, and correlations show a wide range between .33 and .64; however, this indicates adequate validity for the YRS. Construct validity, used to assess the overall validity of a measure, was the third and final type of validity examined for the BERS-2. Developers describe construct validity as “the degree to which underlying traits of a test can be identified and the extent to which these traits reflect the theoretical model on which the test is based” (Epstein, 2004, p. 78). A three-step procedure was used in this process: (1) numerous constructs thought to account for test performance were identified, (2) hypotheses were generated based on these constructs, and (3) the hypotheses were verified. Additionally, group differentiation, subscale interrelationships, and factor analysis are explored as part of this process. Group differentiation, or analyzing the performance of different groups of people on an instrument, is one way to test for construct validity. Two validity studies were conducted to assess group differentiation. In the first, the same professionals who helped to norm the BERS-2 Behavioral and Emotional Rating Scale 18 rated participants. Participants included the total sample used to norm the BERS-2 rating scales and six subgroups (male, female, White, Black, Hispanic, and students with EBD). It is unclear how the participants in the six subgroups were obtained. Ultimately, the mean standard scores of the total sample and of the six subgroups support the construct validity of the BERS-2 with average mean scores for the subgroups and below average scores for the EBD subgroup. It should be noted, however, that although the EBD group did score below average, the lowest score on the subscales was not too far below an “average” score and some scores were within the average range; the scores within the average range, however, were all on the YRS. The EBD group’s scores on the composite also fell below what is considered average with scores ranging from 74 to 88 (90 to 110 is average), although the highest composite score also is not far away from the average. For the second of these two studies, developers were interested in determining if the BERS-2 could discriminate between students without disabilities and those with learning disabilities or behavioral disorders. Participants were a total of 418 students who were matched on age and gender, and over 100 students were in each of the three groups. General education teachers rated each of the participants, and significant differences between the groups were found with students with behavioral disorders obtaining the lowest ratings on the five subscales and the Strength Index. Despite these findings, it is interesting that general education teachers were used for each of the groups. It may have been more appropriate to have special education teachers rate some of the students if they were primarily in special education. Subscale interrelationships were the next type of method used to check for construct validity. Developers claim that subscales should be significantly correlated since they all measure some aspect of behavioral/emotional strength, but they should not be correlated too Behavioral and Emotional Rating Scale 19 highly as that would indicate the scales do not contribute anything unique. Correlations between the subscales were obtained and all were shown to be significant beyond the .01 level. Coefficients range from .37 to .87 with the mean at .65. The mean coefficient is considered in the acceptable range which shows that the BERS-2 subscales are related, but do measure different aspects of behavioral and emotional strengths. A longer discussion for why and how this conclusion was made would be helpful, and without one, readers without a statistical background must take the developers’ word at face value. Still, these results provide evidence for construct validity. The last method used for construct validity was factor analysis, which is an advanced correlational measure for multiple measures. Exploratory factor analysis with a Promax rotation was used to analyze the TRS and confirmatory factor analyses were used for both the PRS and YRS. According to the exploratory factor analysis, five factors were identified (the five subscales). Further, results from the confirmatory factor analyses reveal that the BERS-2 Strength Index was identified as a valid underlying influence on the five subscales for the PRS and the YRS. Within this discussion for factor analysis, developers mention “factor loadings,” “eigenvalues,” and “goodness of fit” multiple times, although they never provide a clear explanation of what these terms actually mean. For individuals without a strong background in statistics, this discussion appears confusing and difficult to dissect. Although these aforementioned tests to establish BERS-2 test validity contain some limitations, multiple studies including those in outside research (e.g., Benner, Beaudoin, Mooney, Uhing, & Pierce, 2008) support the overall validity of the BERS-2 in identifying the behavioral and emotional strengths of children. Behavioral and Emotional Rating Scale 20 Conclusion The BERS-2 is a well-formatted and user-friendly assessment. It is one of the only assessments used to identify the strengths and abilities of students, and it is especially helpful for professionals in creating a positive context for individualized services. The BERS-2 gathers information from multiple informants and allows for informants to make additional helpful comments about a student with the “key questions.” Norming procedures were thorough and led to representative samples, and scoring guidelines are nicely laid out and easily interpreted. Despite the limitations in some of the BERS-2 reliability and validity tests, the BERS-2 has proved to be a reliable and valid measure overall. Based on this, the BERS-2 is an appropriate measure to use when identifying students with emotional or behavioral problems; however, BERS-2 developers were open in stating that it should be used alongside additional important data, and follow-ups with informants regarding their ratings should occur. In doing so, the BERS-2 can be a helpful first-step in an assessment process. Behavioral and Emotional Rating Scale 21 References Benner, G.J., Beaudoin, K., Mooney, P., Uhing, B.M., & Pierce, C.D. (2008). Convergent validity with the BERS-2 teacher rating scale and the achenbach teacher’s report form: A replication and extension. Journal of Child and Family Studies, 17, 427-436. doi: 10.1007/s10826-007-9156-z Buckley, J.A., & Epstein, M.H. (2004). The behavioral and emotional rating scale-2 (BERS-2): Providing a comprehensive approach to strength-based assessment. The California School Psychologist, 9, 21-27. Retrieved from http://education.ucsb.edu/schoolpsychology/CSP-Journal/index.html Epstein, M.H. (2004). Behavioral and Emotional Rating Scale: A Strength-Based Approach to Assessment—Second Edition (BERS-2). Austin, TX: PRO-ED, Inc. Uhing, B.M., Mooney, P., & Ryser, G.R. (2005). Differences in strength assessment scores for youth with and without ED across the youth and parent rating scales of the BERS-2. Journal of Emotional and Behavioral Disorders, 13, 181-187. Retrieved from http://www.ingentaconnect.com.libproxy.chapman.edu/content/proedcw/jebd
© Copyright 2026 Paperzz