Evaluation of Research Quality 2011-2014 (VQR 2011-2014) Appendix B Imputation of missing bibliometric indicators and journal classification B.1 CONSTRUCTION OF THE JOURNAL LIST………………………………………………………………………………………………….2 B.2 IMPUTATION METHODOLOGY………………………………………………………………………………………………………………7 B.3 JOURNAL CLASSIFICATION…………………………………………………………………………………………………………………….9 B.4 ITALIAN JOURNALS………………………………………………………………………………………………………………………………10 B.5 REPLICATION………………………………………………………………………………………………………………………………..……..11 1 This appendix describes the construction of the journal list prepared by GEV13, the method used to impute missing values of the bibliometric indicators, and the journal classification into the five VQR merit classes. B.1 Construction of the journal list The journal list for Area 13 compiled for VQR 2004-2010 (hereafter, VQR1) has been expanded using three sources: (i) the list provided by CINECA containing the publication outlets of all Italian researchers in the Area for the period 2011-2014, (ii) new additions to WoS database provided by Thomson Reuters, and (iii) the Scopus database provided by Elsevier. The WoS and Scopus databases have been limited to the categories that are relevant to the Area. For WoS, the Subject Categories (SC) selected as relevant are: DI (Business), DK (Business, Finance), FU (Demography), GY (Economics), NM (Industrial Relations and Labor), PS (Social Sciences, Mathematical Methods), PE (Operations Research and Management Science), and XY (Statistics and Probability). For Scopus, the selected All Science Journal Classifications (ASJC) are: 1400 (Business, Management and Accounting – all, i.e., from 1400 to 1410), 1800 (Decision Sciences – all, i.e., from 1800 to 1804), 2000 (Economics, Econometrics and Finance – all, i.e., from 2000 to 2003), 2613 (Statistics and Probability), and 3317 (Demography). Additional journals not belonging to the above-listed categories have also been added if the GEV agreed on their relevance for the Area. Starting from the above-mentioned three databases, journals were selected on the basis of their relevance for Area 13 and their scientific standards. Working paper series, newspapers, collections of reports, and the like were not included. The list was divided into five non-overlapping sub-lists: Business Administration and Management (from now on, Business), Economics, Economic History (also including History of Economic Thought and, from now on, denominated as History), General, and Statistics and Mathematical Methods for Decisions (from now on, Statistics). For indexed journals included in the preliminary list published on November 20, 2015, the GEV collected 2014 values of IF, IF5Y and AIS from WoS and IPP, SNIP and SJR from Scopus, whenever available. For all journals the GEV also collected Google Scholar h-indices for the period 2010-2014. Journals with a missing or zero h-index were excluded. The list reported information on the source for the computation of h-indices (Google Scholar Metrics when 2 available, Publish or Perish in the other cases), and the status of Italian journal (as defined in Section B.4 of this appendix). The GEV analyzed over 200 comments and suggestions concerning over 1,000 journals, which led to integrations, deletions, and re-allocations across sub-lists, as detailed in a public announcement (Appendix A) published together with the final journal list on December 14, 2015. To be noticed is that a pre-condition for integration of a journal in the list was its presence in at least one of the three above-mentioned databases. After removing a few minor errors detailed in subsequent public announcements, the final list published on January 14, 2016 included 2,731 journals, distributed as follows across the 5 sub-lists: Business: 1,224 journals (45.82%); Economics: 864 journals (31.64%); General: 3 journals (0.11%); History: 71 journals (2.60%); Statistics: 569 journals (20.83%). For the sake of comparison, the journal list employed for VQR1 included 1,906 journals, distributed as follows: 767 journals (40.24%) in Business, 643 (33.74%) in Economics, 3 (0.15%) in General, 48 (2.52%) in History, and 445 (23.35%) in Statistics. Because of the addition of the Scopus database and of the increased coverage by WoS, the number of journals included in the list has increased considerable (by 43.28%). The increase is larger than average for the Business (59.58%) and History (47.92%) sub-lists. These patterns can be explained by the larger coverage by Scopus and by the ongoing faster process toward indexation for certain sub-Areas within Area 13. The following set of tables reports summary statistics for the journals in the list. WoS covers 36.65% of the entire list. The fraction of journals varies across sub-lists, ranging from 100% for General to 27.29% for Business (Table B.1). Coverage by Scopus is much higher (66.75%) in general, and differences in coverage across sub-lists is more limited (Table B.2). 3 Table B.1: Distribution of journals by WoS coverage and by sub-list. Business Economics General History Statistics Total Non WoS 890 506 0 46 288 1730 % 72.71 58.56 0.00 64.79 50.62 63.35 WoS 334 358 3 25 281 1001 % 27.29 41.44 100.00 35.21 49.38 36.65 Total 1224 864 3 71 569 2731 % 100.00 100.00 100.00 100.00 100.00 100.00 Table B.2: Distribution of journals by Scopus coverage and by sub-list. Business Economics General History Statistics Total Non Scopus 456 273 0 21 158 908 % 37.25 31.60 0.00 29.58 27.77 33.25 Scopus 768 591 3 50 411 1823 % 62.75 68.4 100.00 70.42 72.23 66.75 Total 1224 864 3 71 569 2731 % 100.00 100.00 100.00 100.00 100.00 100.00 Tables B.3-B.7 report basic statistics (mean, standard deviation, 10-th, 25-th, 50-th, 75-th, and 90-th percentiles, inter-quartile range) for the four indicators selected by GEV13 for the journal classification, that is, IF5, AIS, IPP, and SJR, and for the h-index to be used for the imputation. IF5 and AIS are available for a subset of WoS journals (97%). IF5 has a mean of 1.82 and a standard deviation of 2.23, with variation across sub-lists. Apart from General, Business shows the highest mean and standard deviation, while History shows the lowest ones. AIS has a mean of 1.03 and a standard deviation of 1.63 with the largest values, after General, for Economics. A comparable pattern emerges for IPP and SJR (available for 99% of the Scopus journals in the list). The h-index also reveals differences in citation standards across sub-lists: the lowest mean h-index is once again in History, while the highest is in Business. 4 Table B.3: Statistics for IF5 by sub-list. Business Economics General History Statistics Total mean 2.23 1.57 29.04 0.55 1.47 1.82 sd 1.73 1.37 16.28 0.34 1.03 2.23 p10 0.58 0.33 10.56 0.23 0.55 0.42 p25 1.00 0.64 10.56 0.30 0.78 0.77 p50 1.77 1.27 35.26 0.40 1.28 1.41 p75 2.88 2.04 41.30 0.82 1.88 2.26 p90 4.33 3.21 41.30 1.09 2.69 3.40 iqr 1.88 1.39 30.73 0.52 1.10 1.49 p75 0.98 1.10 21.95 0.63 1.09 1.07 p90 2.02 2.55 21.95 1.04 1.98 2.11 iqr 0.69 0.79 17.22 0.43 0.64 0.74 p75 1.60 1.30 30.76 0.56 1.35 1.39 p90 2.80 2.12 30.76 0.81 2.06 2.30 iqr 1.24 1.03 21.53 0.35 0.93 1.06 p75 0.89 0.95 17.31 0.42 1.11 0.96 p90 1.90 1.89 17.31 0.75 1.98 1.89 iqr 0.67 0.75 11.53 0.26 0.82 0.74 Table B.4: Statistics for AIS by sub-list. Business Economics General History Statistics Total mean 0.90 1.13 14.85 0.47 0.95 1.03 sd 1.13 1.86 9.00 0.37 0.85 1.63 p10 0.17 0.13 4.73 0.16 0.24 0.17 p25 0.29 0.31 4.73 0.20 0.44 0.33 p50 0.57 0.62 17.87 0.32 0.72 0.65 Table B.5: Statistics for IPP by sub-list. Business Economics General History Statistics Total mean 1.18 0.94 22.10 0.39 0.98 1.07 sd 1.21 0.98 11.36 0.31 0.78 1.41 p10 0.12 0.10 9.24 0.04 0.21 0.12 p25 0.36 0.26 9.24 0.20 0.42 0.33 p50 0.84 0.66 26.31 0.31 0.79 0.73 Table B.6: Statistics for SJR by sub-list. Business Economics General History Statistics Total mean 0.86 0.93 11.07 0.37 0.85 0.88 sd 1.41 1.81 5.83 0.35 0.85 1.51 p10 0.13 0.12 5.78 0.11 0.17 0.13 p25 0.21 0.20 5.78 0.16 0.29 0.22 p50 0.41 0.39 10.11 0.26 0.59 0.43 5 Table B.7: Statistics for h-index by sub-list. Business Economics General History Statistics Total mean 14.94 14.66 303.00 7.14 14.74 14.92 sd 14.24 13.79 81.28 4.73 12.22 16.74 p10 3.00 3.00 216.00 2.00 3.00 3.00 p25 6.00 5.00 216.00 3.00 6.00 6.00 p50 10.00 10.00 316.00 6.00 11.00 10.00 p75 19.50 19.00 377.00 11.00 19.00 19.00 p90 31.00 32.00 377.00 12.00 31.00 31.00 iqr 13.50 14.00 161.00 8.00 13.00 13.00 Tables B.8- B.12 report the correlation coefficient across the five above variables, separately for each sub-list. Correlation coefficients are reported after taking the logarithms of these variables because the imputation is based on the same transformation in order to make the distribution closer to the normal one and to reduce heteroskedasticity (see Section B.2 of this appendix). The correlation between the four bibliometric indicators is, not surprisingly, high: for instance, the correlation between log (IF5) and log (IPP) is well above 0.9 in all sub-lists; the correlation between log (IF5) and log (AIS) is almost 0.9 in all sub-lists. The h-index is strongly and positively correlated with each of the four bibliometric indicators, in particular: in Business the correlation between the log of each indicator and log (h) always exceeds 0.7; in Economics it exceeds 0.8; in Statistics it ranges from 0.66 (AIS) to 0.77 (IF5); in History it ranges between 0.56 (IPP) and 0.76 (SJR). Such values support the use of the h-index as a predictor of the four bibliometric indices in the imputation step. Table B.8: Correlation matrix of log of IF5, AIS, IPP, SJR, and h-index for Business. log (IF5) 1.0000 log (IF5) 0.9032 log (AIS) 0.9429 log (IPP) log (SJR) 0.8920 0.8407 log (h) log (AIS) log (IPP) log (SJR) 1.0000 0.8261 0.9158 0.7625 1.0000 0.8676 0.8500 1.0000 0.7893 log (h) 1.0000 Table B.9: Correlation matrix of log of IF5, AIS, IPP, SJR, and for Economics. log (IF5) 1.0000 log (IF5) 0.8987 log (AIS) 0.9614 log (IPP) log (SJR) 0.8808 0.8549 log (h) log (AIS) log (IPP) log (SJR) 1.0000 0.8496 0.9108 0.8320 1.0000 0.8860 0.8523 1.0000 0.8453 log (h) 1.0000 6 Table B.10: Correlation matrix of log of IF5, AIS, IPP, SJR, and h-index for General. log (IF5) 1.0000 log (IF5) 0.9998 log (AIS) 0.9999 log (IPP) log (SJR) 0.9184 0.9782 log (h) log (AIS) log (IPP) log (SJR) 1.0000 1.0000 0.9252 0.9817 1.0000 0.9237 0.9810 1.0000 0.9805 log (h) 1.0000 Table B.11: Correlation matrix of log of IF5, AIS, IPP, SJR, and h-index for History. log (IF5) 1.0000 log (IF5) 0.9436 log (AIS) 0.9279 log (IPP) log (SJR) 0.8224 0.6914 log (h) log (AIS) log (IPP) log (SJR) 1.0000 0.8921 0.8118 0.6207 1.0000 0.8279 0.5563 1.0000 0.7568 log (h) 1.0000 Table B.12: Correlation matrix of log of IF5, AIS, IPP, SJR, and h-index for Statistics. log (IF5) 1.0000 log (IF5) 0.7991 log (AIS) 0.8928 log (IPP) log (SJR) 0.7573 0.7738 log (h) log (AIS) log (IPP) log (SJR) 1.0000 0.6836 0.8477 0.6621 1.0000 0.7874 0.7408 1.0000 0.7442 log (h) 1.0000 B.2 Imputation methodology Despite the introduction of the Scopus database and the resulting expanded presence of indexed journals within the GEV journal list, a large fraction (32%) of the journals which are relevant to Area 13 is still non-indexed. Therefore, a preliminary step to the ranking was the imputation of the missing values of the bibliometric indicators. Exploiting the substantial correlation between the hindex, which is available for the entire list of journals, and the bibliometric indicators, GEV13 applied the same imputation method developed for VQR1 and based on a simple specification which makes use of the logarithm of the h-index. The Area Final Report for VQR 2004-2010 (in its Appendix C) contains a detailed discussion of the imputation methodology, which is briefly summarized as follows. The imputation models are fitted on the logarithms of the values of the bibliometric indicators. The use of the log7 specification is motivated by the large skewness in the observed distributions of the indicators. In particular, the logarithm of each bibliometric indicator is regressed on log (h), including the intercept. The estimation is performed separately for each sub-list. After the regression model is estimated, the mean prediction from the regression is imputed for each observation exhibiting a missing value. A more elaborate multiple imputation model produces essentially equivalent results in terms of journal rankings. Therefore, the simpler model is selected for its ease of implementation. In VQR 2011-2014, GEV13 explored sensitivity issues by comparing the above described methodology with the following alternative approaches: (i) regression on log (h) and dummies for the source of the h-index (Google Scholar Metrics vs. Publish or Perish) and for Italian journals; (ii) local regression on log (h); (iii) regression tree. In order to check to which extent alternative methods may provide different conclusions, results were compared using the Spearman correlation coefficient and a series of squared tables that directly compare the different rankings produced. Since the results were robust across imputation methods, after considering some instability problems associated with the alternative methods, as well as the advantage of the original method in terms of continuity, reproducibility and communicability, the GEV decided to confirm the latter. Obviously, the adopted method is not free of limitations. In particular, it is based on the assumption of a constant relation between the log of a bibliometric index and log (h), summarized by the constant slope of the regression line. This may not be true and, in this sense, local regression could represent a potentially interesting alternative. However, we have faced some instabilities in the estimates in the lower tail of the distribution, which is of central interest in the imputation step as we may guess that the large majority of non-indexed journals lay within this portion of the h-index values distribution. In the evaluation criteria published on November 20, 2015 GEV13 announced its intention to develop an algorithm for journal ranking which would rule out the possibility, implied by the above-described imputation methodology, that an indexed journal could be ranked lower than a non-indexed journal with a lower h-index. However, because of subsequent considerations leading to two public announcements (published on January 14 and 22, 2016, and reproduced in Appendix A), the GEV decided to confirm the methodology even if the requirement is not fulfilled, for the following reasons. First, it is important to point out that this effect is simply determined by the existence of some extreme outliers lying below (above) the regression line, that is, by the presence of indexed journals showing particularly low (high) values of the bibliometric indicators with respect to their h-index, even if the regression procedure has overall a good fit. In other words, the observed effect is in no way systematic across groups or types of journals. Since the VQR aims at 8 evaluating Institutions rather than individual authors or journals, we may guess that, within each Institution, there are similar probabilities that a journal with articles submitted to the VQR lies below or above the regression line, hence generating a compensation between the above-cases and the below-cases in the aggregated results. Second, despite the introduction of Scopus, the imputation was still deemed to be necessary, because of the incomplete coverage in WoS and Scopus of a significant fraction of journals which are relevant to the Area. Thus, the GEV established that alternative methods capable of ruling out the effect at issue were all leading to undesirable consequences. For instance, it would have been possible to employ the h-index directly for evaluation, as a fifth indicator, with the questionable consequence that an indexed journal could have been ranked lower than another indexed journal showing a higher value of the h-index but lower values of the bibliometric indicators. Alternatively, outliers could have been excluded from the journal ranking, which would have implied an undesirably incomplete classification. Another option would have been to modify the imputation procedure by assigning to each non-indexed journal the minimum between the imputed value and the minimum value of the corresponding indicator among all indexed journals with a larger or equal value of the h-index. This highly ad hoc procedure, however, would have produced questionable effects for non-indexed journals. Therefore, the GEV decided to avoid any ad hoc alternative solution, especially after considering that the effect in question would be considerably attenuated by the intervening innovations in the evaluation criteria. Indeed, the introduction of Scopus allows a much higher coverage of indexed journals and, as a consequence, also a more reliable imputation procedure; furthermore, the choice, left to the author, of one out of four proposed bibliometric indicators, from two distinct databases, has automatically reduced the incidence of the effect in question. B.3 Journal classification The GEV produced a classification into the five merit classes reflecting the VQR percentiles (10/20/20/30/20) for each of four bibliometric indicators: IF and IF5Y for WoS and IPP and SNIP for Scopus. The classification was produced separately for each sub-list. Since the 3 journals in General presented uniformly high values of all indicators, they were all placed in the Excellent merit class. The shares of journals in each merit class reflect the VQR percentiles with slight variations due to the presence of ties and to the upgrading of 9 Italian journals, which is described in Section B.4 of this appendix. 9 The rankings were published on January 14, 2016, and further clarifications and corrections were provided on January 22 and 29. The final journal classification employed in the evaluation process was made available on the ANVUR website on January 22, 2016 and can be consulted at the following url: http://www.anvur.it/attachments/article/856/22_01_2016_riviste.xls. The final classification of each journal article will also depend on individual citations, as detailed in Section 2 of the Area Final Report. B.4 Italian journals GEV13 defines as Italian those journals that publish only papers in Italian or a mix of papers in Italian and other languages, and/or are published by Italian publishers or by international publishers on behalf of Italian scientific institutions or associations. Italian Journals represent 6% of the total number of journals (Table B13), while they represented 5.7% in the list employed for VQR1, and they are more represented in History (17%) and Economics (10%). Table B13. Distribution of journals by nationality and by sub-list. Business Economics General History Statistics Total Italian 37 83 0 12 21 153 % 0.03 0.10 0.00 0.17 0.04 0.06 Other 1187 781 3 59 548 2581 % 0.97 0.90 1.00 0.83 0.96 0.94 Total 1224 864 3 71 569 2731 % 100 100 100 100 100 100 As specified in the evaluation criteria, the GEV decided to upgrade by one class a number of Italian journals, distributed across all sub-lists, equal to 20-25 minus the number of Italian journals already ranked in the top three merit classes defined by the VQR Call (that is, Excellent, Good, and Fair). Since 16 Italian journals were classified in the top three classes, the GEV decided to upgrade 9 Italian journals, selected on the basis of the analysis of all bibliometric indicators and taking into account their distribution across sub-lists. In terms of the h-index, the selected journals show a value equal or greater than 6, even though the threshold varies across sub-list. For all of them, the merit class was upgraded from Acceptable to Fair and highlighted in red in the journal list published on the ANVUR website. Listed in alphabetical order, the upgraded journals (and the corresponding sub-list) are: Azienda Pubblica (Business), Decisions in Economics and Finance (Statistics), Financial Reporting (Business), Mercati e Competitività (Business), Metron 10 (Statistics), QA-Rivista dell’Associazione Rossi Doria (Economics), Rivista di Storia Economica (History), Stato e Mercato (Economics), and Symphonya Emerging Issues in Management (Business). B.5 Replication Statistics presented in this appendix can be replicated by using the Stata “dta” data and “do” code files available on the ANVUR website at http://www.anvur.it/attachments/article/856/22_01_2016riviste.zip. To be noticed is that the files do not include: the 3 General journals which were classified separately, the upgrades of Italian journals (described in Section B.3 of this appendix) and corrections to the journal list detailed in two public announcements published on February 16, 2016 and January 13, 2017 (see Appendix A). Such corrections were inserted in the Excel file containing the journal list without altering the classification of the journals which were not subject to corrections. 11
© Copyright 2026 Paperzz