A New Feature Sampling Method in Random Forests for Predicting High-Dimensional Data Thanh-Tung Nguyen1 , He Zhao2 , Joshua Zhexue Huang3 , Thuy Thi Nguyen4 , and Mark Junjie Li3(B) 1 Faculty of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam [email protected] 2 Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, People’s Republic of China [email protected] 3 College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China {zx.huang,jj.li}@szu.edu.cn 4 Faculty of Information Technology, Vietnam National University of Agriculture, Hanoi, Vietnam [email protected] Abstract. Random Forests (RF) models have been proven to perform well in both classification and regression. However, with the randomizing mechanism in both bagging samples and feature selection, the performance of RF can deteriorate when applied to high-dimensional data. In this paper, we propose a new approach for feature sampling for RF to deal with high-dimensional data. We first apply p-value to assess the feature importance on finding a cut-off between informative and less informative features. The set of informative features is then further partitioned into two groups, highly informative and informative features, using some statistical measures. When sampling the feature subspace for learning RFs, features from the three groups are taken into account. The new subspace sampling method maintains the diversity and the randomness of the forest and enables one to generate trees with a lower prediction error. In addition, quantile regression is employed to obtain predictions in the regression problem for a robustness towards outliers. The experimental results demonstrated that the proposed approach for learning random forests significantly reduced prediction errors and outperformed most existing random forests when dealing with high-dimensional data. Keywords: Subspace feature selection · Regression · Classification Random forests · Data mining · High-dimensional data 1 · Introduction High-dimensional data has become common in today’s applications. State-ofthe-art machine learning methods can work well for data sets of moderate size c Springer International Publishing Switzerland 2015 T. Cao et al. (Eds.): PAKDD 2015, Part II, LNAI 9078, pp. 459–470, 2015. DOI: 10.1007/978-3-319-18032-8 36 460 T.-T. Nguyen et al. but they suffer when scaling for high-dimensional data. It is well-known that in a high-dimensional data set only a small portion of the predictor features are relevant to the response feature, the irrelevant features may even degrade the performance of the model. This requires methods for selecting good subsets of features for learning efficient prediction models. Random forests (RF) [1] [2], an ensemble learning machine composed of decision trees for prediction, is defined as follow: Given a training data set L = {(Xi , Yi ), X ∈ RM , Y ∈ Y}N i=1 , where Xi are features (also called predictor variables) and Y is the target (also called response feature), Y ∈ R1 for a regression problem and Y ∈ {1, 2, ..c} for a classification problem (c ≥ 2), N and M are the number of training samples and features, respectively. A standard version of RF independently and uniformly resamples observations from the training data L to draw a bootstrap data set L∗ from which a decision tree T ∗ is grown. Repeating this process K times produces a series of bootstrap data sets L∗k and corresponding decision trees Tk∗ (k = 1, 2, ..., K), that form a RF. Given an input X = x, the predicted value by the whole RF is obtained by aggregating the results given by individual trees. Let fˆk (x) denote the prediction of unknown value y of input x ∈ RM by kth tree, we have K 1 ˆ fk (x) for regression problems, and fˆ(x) = K (1) k=1 fˆ(x) = argmaxy∈Y K ˆ I[fk (x) = y] for classification problems, (2) k=1 where I(·) and fˆ(x) denote the indicator function and RF prediction, respectively. RFs have shown to be a state-of-the-art tool in machine learning. RF model can be used for both feature selection and prediction, and it can perform well in both classification and regression problems. However, the performance of random forests suffers when applied to high-dimensional data, i.e., data with thousands to millions of features. The main cause is that in the process of growing a tree from the bagged sample data, the subspace of features randomly sampled from the thousands of features in the training data to split a node of the tree is often dominated by less important features. The tree grown from such randomly sampled subspace features will have low accuracy in prediction, hence affects the final prediction of the random forests. In this paper, we propose a new approach for feature weighting subspace selection to improve the accuracy of prediction for RF, meanwhile maintaining the diversity and the randomness of the forest. Given a training data set L, we first use a feature permutation technique [3] [4] to measure the importance of features and produce raw feature importance scores. Then we apply p-value assessment on finding the cut-off between informative and less informative features. For all informative features, the Spearman rank test is then used for regression problem and the χ2 statistic is used for classification problem to A New Feature Sampling Method in Random Forests 461 find the subset of highly informative features. The separation forms three sub sets of features. When sampling the feature subspace for learning, features from these three groups of highly informative, informative and less-informative features are taken into account for splitting the data at a node. Since the subspace always contains highly informative features, it can guarantee a better split at a node, therefore assuring a qualified tree. This sampling method always provides enough highly informative features for the subspace feature at any levels of the decision tree. By using taking into account features from all three subsets, the diversity and the randomness of the forests in the Breiman’s framework [1] are maintained. The above feature subspace selection will be used for building trees in our new random forests algorithm, called ssRF, for dealing with both classification and regression problems. With the ssRF model, the quantile regression is employed to predict both point prediction and range prediction in regression problems. Our experimental results have shown that with the proposed feature sampling method, our random forests ssRF model outperformed existing random forests in reduction of prediction errors, even though a small feature subspace size of log2 (M ) + 1 is used, and especially they performed well in range prediction on high-dimensional data. 2 2.1 Feature Weighting Subspace Selection Importance Measure of Features from a Random Forest The feature importance measure obtained from the random forest is described as follows [5], [6]. At each node t in a decision tree, a split on feature Xj is determined by the decrease in node impurity ΔR(Xj , t). For a regression tree, the node impurity R(t) = σ 2 (t)p(t), where p(t) = N (t)/N is the probability for the impurity reduction that an sample chosen at random from the underlying theoretical distribution falls into t, N (t) is the total number of samples and σ 2 (t) = xi ∈t (Yi − Ȳt )2 /N (t) is the sample variance of Y. Then the decrease of impurity in node t after splitting into tL and tR is ΔR(Xj , t) = R(t) − [R(tL ) + R(tR )] = σ 2 (t)p(t) − [σ 2 (tL )pL + σ 2 (tR )pR ], (3) where pL , pR are the proportions of samples in t that go left and right, respectively. For classification trees, the Gini index is used to reflect the node impurity R(t). Suppose there are S categorical values in node t (s ∈ S). Let πt (s) be the proportion of the samples from the sth category in node t. The node impurity is defined as R(t) = N (t) S s=1 πt (s)[1 − πt (s)]. 462 T.-T. Nguyen et al. The chosen split of feature Xj for each node t is the one that maximizes ΔR(Xj , t). Let ISk (Xj ) denotes the importance score of feature Xj in a single decision tree Tk , we have ΔR(Xj , t). ISk (Xj ) = t∈Tk Let ISj be an importance score of feature Xj , ISj is computed over all K trees in a random forest, defined as ISj = K ISk (Xj )/K. k=1 It is worth noting that a random forest uses in-bag samples (i.e. the set of the bagged samples used in building the trees) to produce importance scores ISj . This is the main difference between this importance score and an out-of-bag measure, which requires so much computational time using OOB-permutation [7], [3]. We can normalize them into [0, 1] using the min-max normalization as follows: V Ij = ISj − min(ISj ) . max(ISj ) − min(ISj ) (4) Having the raw importance scores V Ij determined by Equation (4) we can evaluate the contributions of the features in predicting the response feature. 2.2 A New Feature Sampling Method for Subspace Selection We first compute importance scores for all features according to Equation (4). Denote the feature set as LX = {Xj }, j = 1, 2, ..., M , we randomly permute all values in each feature to get a corresponding shadow feature set, denoted as LA = {Aj }M 1 . The shadow features do not have prediction power to the response feature. Following the feature permutation procedure recently presented in [3], we ran RF R times on the extended data set {LX ∪ LA , Y } to get importance scores r r V IX and V IA , and the samples for comparison denoted as V ∗ = max{Arj , r = j j 1, ..R}. The unequal variance Welch’s two-sample t-test [8] is then used to compare the importance score of each feature with the maximum importance scores of generated shadows. The non-parametric statistical test is required because the importance scores across the replicates are not normal distribution. Having computed the t statistic, we can compute the p-value for the features and perform ∗ hypothesis test on V I Xj > V . This test confirms that if a feature is important, it consistently scores higher than the shadow over multiple permutations. Therefore, any feature whose importance score is smaller than the maximum importance score of noisy features, is considered less important, otherwise, it is considered important. A New Feature Sampling Method in Random Forests 463 The p-value of a feature indicates the importance of the feature in prediction. The smaller the p-value of a feature, the more correlated the predictor feature to the response feature, and the more powerful the feature in prediction. Given a statistical significance level, we can identify informative features from lowinformative ones. Given all p values of features, we set a significance level as a threshold λ, for instance λ = 0.05. Any feature whose p-value is greater than λ is added to the low-informative feature subset denoted as Xl , the direct relationship with the Y values is assessed otherwise. The non-parametric Spearman ρ test is used to measure the strength of the relationship between Xj and Y ∈ R1 in regression problems. The value |ρ| ∈ [0, 1], where |ρ| = 1 means a perfect correlation, 0 means that there is no correlation. Spearman rank correlation coefficient performs well in cases when the conditional distribution is not normal, each pair (Xj , Y ) is converted to ranks (R(xi ), R(yi )), (i = 1, .., N ) and ρ is the absolute value, computed as follows: j (R(xi ) − X)(R(yi ) − Y ) (5) ρj = N N 2 2 (R(x ) − X) (R(y ) − Y ) i i i=1 i=1 where X, Y are the average values of important feature Xj and response feature Y , respectively. Given all ρ values in the remaining features {X \ Xl }, we take the mean of all ρ values as the threshold γ, γ= Mλ 1 ρj , Mλ j=1 (6) where Mλ is the number of numerical features in the important feature subset {X \ Xl }. Let Xh denote a subset of highly informative features, all features Xj are added to Xh whose ρ-value is greater than γ. The remaining features including categorical features are added to the informative feature subset, denoted as Xm . For the classification problem, χ2 (X, Y ) is used to test the association between the class label and each feature Xj . For the test of independence, a chi-squared probability of less than or equal to 0.05 is commonly interpreted for rejecting the hypothesis that the feature is independent of the response feature. All features Xj whose p-value is smaller than 0.05 from the results of χ2 -test are added into Xh , the remaining features are added to Xm otherwise. Given Xh , Xm and Xl , at each node, we randomly select mtry (mtry > 1) features from three separated groups. For a given subspace size, we can choose proportions between highly informative, informative and less-informative features depending on the size of the three groups. That is mtryhigh = mtry × (Mhigh /M ), mtrymid = mtry ×(Mmid /M ) and mtrylow = mtry −mtryhigh − mtrymid , where Mhigh and Mmid are the number of features in Xh and Xm , respectively. These are merged to form the feature subspace for splitting nodes of trees. 464 3 T.-T. Nguyen et al. The Proposed ssRF Algorithm The new feature subspace sampling method is now used to grow decision trees for building RFs. In regression problem, we propose to use quantile regression to obtain both point and range prediction, this idea was introduced in [9]. Using the notations as in [1], let θk be the random parameter vector that determines the growth of the kth tree and Θ = {θk }K 1 be the set of random parameter vectors for the forests generated from L. In each regression tree Tk from Lk , we compute a positive weight wi (xi , θk ) for each case xi ∈ L. Let l(x, θk , t) be a leaf node t in Tk . The cases xi ∈ l(x, θk , t) are assigned the same weight wi (x, θk ) = 1/N (t), where N (t) is the number of cases in l(x, θk , t). In each classification tree, N (t) wi (x, θk ) = 1 if N (t) I(Yn = Yi ) ≥ n=1 I(Yn = Yj )∀Yi = Yj . n=1 This means the prediction for a regression problem is simply the average and for the classification problem is the category received by a majority votes by all Y values in node t. In this way, all cases in Lk are assigned positive weights and the cases not in Lk are assigned zero weight. For a single tree prediction, given X = x, the prediction value is Ŷ k = N i=1 wi (x, θk )Yi = wi (x, θk )Yi . (7) x,Xi ∈l(x,θk ,t) The new random forests algorithm ssRF is summarized as follows. 1. Given L, separate the highly informative features and the informative features from the less informative ones to obtain three feature subsets Xh , Xm and Xl as described in Section 2.2. 2. Sample the training set L with replacement to generate bagged samples Lk , k = 1, 2, .., K. 3. For each Lk , grow a regression tree Tk as follows: (a) At each node, select a subspace of mtry (mtry > 1) features randomly and separately from Xl , Xm and Xh and use the subspace features as candidates for splitting the node. (b) Each tree is grown nondeterministically, without pruning until the minimum node size nmin is reached. At each leaf node, all Y ∈ R1 values of the samples in the leaf node are kept. (c) Compute the weights wi (x, θk ) of each Xi by individual tree Tk using out-of-bag samples. 4. Compute the weights wi (x) assigned by RF which is the average of weights by all trees: K 1 wi (x) = wi (x, θk ) (8) K k=1 A New Feature Sampling Method in Random Forests 465 5. Given an input X = x, use Equation (2) to predict the new sample for the classification problem. For the regression problem, we can find the leaf nodes lk (x, θk ) from all trees where X falls and the set of Yi in these leaf nodes. Given all Yi and the corresponding weights wi (x), the conditional distribution N function of Y given X is estimated as F̂ (y|X = x) = i=1 wi (x)I(Yi ≤ y), where I(·) is the indicator function that is equal to 1 if Yi ≤ y and 0 otherwise. Given a probability α, the quantile Qα (X) is estimated as Q̂α (X = x) = inf {y : F̂ (y|X = x) ≥ α}. Given a probability τ , αl and αh for αh −αl = τ , τ is the probability that prediction Y will fall in the range of [Qαl (X), Qαh (X)], we have [Qαl (X), Qαh (X)] = [inf {y : F̂ (y|X = x) ≥ αl }, inf {y : F̂ (y|X = x) ≥ αh }] (9) For the point regression, the median Q̂0.5 can be chosen in a range as the prediction of Y given input X = x. 4 Experiments and Evaluation 4.1 Data Sets We conducted experiments to test our proposed system on high-dimensional data sets for both classification and regression problems. Table 1 lists the real data sets used to evaluate the performance of random forests models. The Fbis data set was compiled from the archive of the Foreign Broadcast Information Service and the La1s, La2s data sets were taken from the archive of the Los Angeles Times for TREC-51 . The Rivers 2 data set was used to predict the flow level of a river. It is based on a data set containing river discharge levels of 1, 439 Californian rivers for a period of 12, 054 days. This data set contains 48.6% missing values, all values were used to train the model. The level of the 1, 440-th river was predicted in our experiments, the target values were converted from [0.062; 101, 000] to [0; 1]. The LOG1P data set was used in [10]. The Stock data set was described in [11] to make a stock price prediction. This data set has about 8.35% missing values in the predictor features. The original Y value is between 880 and 82, 710, these target feature values were converted to [0; 1] using linear scale. Regarding the characteristics of the data sets given in Table 1, the proportion of the sub-data sets for training was separately from the testing. 4.2 Experimental Setting Evaluation Measure: We used Breiman’s method of measurement as described in [1]. The accuracy of prediction of RF models was evaluated on test set. 1 2 http://trec.nist.gov http://www.usgs.gov 466 T.-T. Nguyen et al. Table 1. Description of high-dimensional data sets sorted by the number of features and grouped into two groups - for regression and classification problems, accordingly Data set #Train #Test #Features #Classes Stock 1,942 785 495 Rivers 8,345 3,709 1,439 LOG1P 16,087 3,308 4,272,227 Fbis La2s La1s 1,711 1,855 1,963 752 845 887 2,000 12,432 13,195 17 5 5 In which, for the regression problem the mean of square residuals (M SR) measure was computed, for the classification problem the test error measure was used. The latest RF [12], QRF [13], cRF (cForest) [14] and GRRF R-packages [15] in CRAN3 were used in R environment to conduct these experiments. For the GRRF model, we used a value of 0.1 for the coefficient γ because GRRF(0.1) has shown competitive prediction performance in [16]. The novel SRF model [17] using the stratified sampling method was intended to solve the classification problem. The QRF and eQRF [18] models were developed for solving only regression problems. The ssRF model with the new subspace sampling method is a new implementation. In that implementation, we called the corresponding R/C++ functions in R environment. From each training data set we built 10 random forest models and the average of MSRs and the test errors of the models were computed; each of the RF models had 200 and 500 trees, respectively. The number of the minimum node size nmin was 5 for regression and 1 for classification problems. The number of features-candidates was set with the default setting to mtry = log2 (M ) + 1. The parameters R, √ mtry and λ for pre-computation of feature partition used in ssRF were 30, M and 0.05, respectively. In order to process the large-scale data set LOG1P, only 5% of the samples was used to train the eQRF and ssRF models for feature partition and subspace selection, since the computational time required for all the samples is too long. To address the missing values in the data set, we separate all samples containing missing values and create an extra ”missing” group for them. We then treat this ”missing” class as a predictor feature of the response feature. If missing values occur in the response feature, those samples are routinely omitted. After separation, missing values are typically treated as if they were actually observed. All experiments were conducted on the six 64-bit Linux machines, each one equipped with IntelR XeonR CPU E5620 2.40 GHz, 16 cores, 4 MB cache, and 32 GB main memory. The ssRF and eQRF models were implemented as multithread processes, while other models were run as single-thread processes. 3 http://cran.r-project.org/ A New Feature Sampling Method in Random Forests 4.3 467 Results on Real Data Sets The performance of RF models is evaluated when the number of trees and features are varied, those are two key parameters in the RF models. Figures 1(a), (b) show the regression errors of the random forest models varied with the number of K trees used with mtry = log2 (M ) + 1. Figures 1(c), (d) present the plots of curves when the number of random features mtry in the subspace increases while the number of trees is fixed (K = 200), the vertical line in each plot indicates the size of a subspace of features mtry = log2 (M ) + 1, this subspace was suggested by Breiman [1] for the case when applying RF to low-dimensional data sets. Table 2 shows the test errors on the classification data sets against the number of trees and features. The RF, QRF and eQRF models were unable to build their models on the data sets Stock and Rivers containing missing values. The imputation function in randomForest R-package was used to recover missing values on the two data sets. The eQRF model was not considered in this experiment because its prediction accuracy is last in this ranking on imputed data sets. The cRF model was processed well on data set containing missing values, however this model crashed when applied to the large-size data sets. The results of RF models when applied to imputed data sets are denoted as RF.i, QRF.i in the plots, respectively. (a) (b) (c) (d) Fig. 1. The prediction performance of regression random forest models changes against the number of trees and features on real data sets. (a), (c) Stock data. (b), (d) Rivers data. 468 T.-T. Nguyen et al. Table 2. The prediction test error of the RF models against the number of trees K and features mtry on classification data sets. Numbers in bold are the best results. Data Model The number of trees The number of features set K=50 100 150 200 300 mtry=10 20 30 40 50 Fbis RF .2307 .2241 .2254 .2261 .2279 .2434 .2351 .2156 .2303 .2187 GRRF .2394 .2407 .2287 .2314 .2340 .2527 .2101 .1955 .1862 .1981 SRF .1689 .1649 .1622 .1569 .1618 .1569 .1702 .1636 .1715 .1715 ssRF .1676 .1676 .1543 .1689 .1569 .1822 .1556 .1503 .1503 .1522 La2s RF GRRF SRF ssRF .2303 .2476 .1327 .1078 .2363 .2121 .1517 .1066 .2256 .2180 .1493 .1102 .2315 .2156 .1445 .1185 .2280 .2192 .1410 .1090 .2536 .2820 .1244 .1149 .1611 .1860 .1315 .1102 .1586 .1540 .1374 .0995 .1432 .1505 .1386 .1002 .1402 .1386 .1434 .1014 La1s RF GRRF SRF ssRF .6708 .1928 .1308 .1354 .6697 .1759 .1353 .1321 .6731 .2063 .1330 .1322 .6742 .1849 .1353 .1321 .6488 .1966 .1488 .1264 .6776 .1905 .1330 .1477 .6032 .1691 .1375 .1432 .4543 .1612 .1387 .1443 .3337 .1577 .1364 .1319 .2052 .1409 .1398 .1387 We can see that ssRF always provided good results and achieved lower prediction error in Figure 1 and Table 2 when varying K and mtry on both kind of data sets. In some cases where the ssRF model did not obtain the best results compared with SRF on the data sets Fbis and La1s, the differences from the best results were minor. These results demonstrated that, at lower levels of the tree, the gain is reduced because of the effect of splits on different features at higher levels of the tree. The other random forests models increase prediction errors while the ssRF model always produces better results. This was because the selected subspace of features contains enough highly informative features at any levels of the decision tree. The effect of the new sampling method is clearly demonstrated in this result. In Figures 1 (c), (d) and the right panel of Table 2, the RF and QRF models require larger number of features to achieve the lower prediction error. This means the RF and QRF models could achieve better prediction performance only if they are provided with a much larger feature subspace. For solving the regression and classification problem, the size of the subspace in the default √ settings of RF and QRF R-packages were set to mtry = M/3 and mtry = M , respectively. With this size, the computational time for building a RF is still too high, especially for large high-dimensional data. These empirical results indicated that, the ssRF model does not need many features in the subspace to achieve good prediction performance. For application on high-dimensional data, when the ssRF model uses a subspace of features of size mtry = log2 (M )+1 features, the achieved results can be satisfactory. In general, when the feature subspace of the same size as the one suggested by Breiman is used, the ssRF model gives lower prediction error with a less computational time than those reported by Breiman. This achievement is considered to be one of the contributions in this work. A New Feature Sampling Method in Random Forests 469 Figure 2 shows the point and 90% range prediction results of the large highdimensional data set LOG1P by the eQRF and ssRF models. The green and red points show the predictions inside and outside the predicted ranges, respectively. Figure 2 (a) shows the point and 90% range predictions of the eQRF model, we can see that the point prediction is more scattered than that of the ssRF model in the results. Significant improvement in the prediction results of the ssRF model can be observed in Figure 2 (b). We can see that, the predicted points are closer to the diagonal line which indicates that the predicted values were close to the true values in data, and there are less red points in the Figure 2 (b) which indicates that a large number of predictions were within the predicted ranges. These results clearly demonstrate the advantages of the ssRF model over very recently proposed eQRF model. (a) Range predictions by eQRF. (b) Range predictions by ssRF. Fig. 2. Comparisons of range predictions by the regression eQRF and ssRF models on large high-dimensional data sets LOG1P 5 Conclusions We have presented a new approach for feature subspace selection for efficient node splitting when building decision trees in random forests. Based on that, a new random forest algorithm, ssRF, has been developed for prediction highdimensional data. The quantile regression is employed to obtain predictions in the regression problem, which makes the RF more robust towards outliers. With the new subspace feature selection, the small subspace size mtry = log2 (M )+1 reported by Breiman can be used in our algorithm to get lower prediction error. With ssRF, the performance for both classification and regression problems (the point and range prediction) is preserved and improved. Experimental results have demonstrated the improvement of our ssRF in reduction of prediction errors in comparison with existing recent proposed random forests including eQRF, GRRF and SRF, and especially it performed well on large high-dimensional data. 470 T.-T. Nguyen et al. Acknowledgments. This research is supported in part by NSFC under Grant NO. 61203294 and Natural Science Foundation of SZU(grant no. 201433). Joshua Huang was supported by The National Natural Science Foundation of China under Grant No. 61473194. References 1. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001) 2. Breiman, L.: Manual on setting up, using, and understanding random forests v3. 1. (2002) (retrieved October 23, 2010) 3. Nguyen, T.T., Huang, J., Nguyen, T.: Two-level quantile regression forests for bias correction in range prediction. Machine Learning, 1–19 (2014) 4. Tuv, E., Borisov, A., Runger, G., Torkkola, K.: Feature selection with ensembles, artificial variables, and redundancy elimination. The Journal of Machine Learning Research 10, 1341–1366 (2009) 5. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC Press (1984) 6. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Advances in Neural Information Processing Systems, pp. 431–439 (2013) 7. Genuer, R., Poggi, J.M., Tuleau-Malot, C.: Variable selection using random forests. Pattern Recognition Letters 31(14), 2225–2236 (2010) 8. Welch, B.L.: The generalization ofstudent‘s’ problem when several different population variances are involved. Biometrika, 28–35 (1947) 9. Meinshausen, N.: Quantile regression forests. The Journal of Machine Learning Research 7, 983–999 (2006) 10. Ho, C.H., Lin, C.J.: Large-scale linear support vector regression. The Journal of Machine Learning Research 13(1), 3323–3348 (2012) 11. Cai, Z., Jermaine, C., Vagena, Z., Logothetis, D., Perez, L.L.: The pairwise gaussian random field for high-dimensional data imputation. In: Data Mining (ICDM), pp. 61–70. IEEE (2013) 12. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002) 13. Meinshausen, N.: quantregforest: quantile regression forests. R package version 0.2-3 (2012) 14. Hothorn, T., Hornik, K., Zeileis, A.: party: A laboratory for recursive part (y) itioning. r package version 0.9-9999 (2011). http://cran.r-project.org/package=party (date last accessed November 28, 2013) 15. Deng, H.: Guided random forest in the rrf package. arXiv preprint arXiv:1306.0237 (2013) 16. Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recognition 46(12), 3483–3489 (2013) 17. Ye, Y., Wu, Q., Zhexue Huang, J., Ng, M.K., Li, X.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition 46(3), 769–787 (2013) 18. Tung, N.T., Huang, J.Z., Khan, I., Li, M.J., Williams, G.: Extensions to Quantile Regression Forests for Very High-Dimensional Data. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part II. LNCS, vol. 8444, pp. 247–258. Springer, Heidelberg (2014)
© Copyright 2026 Paperzz