SUPPORTING MATERIA Improving Performance of Mammalian MicroRNA Target Prediction Hui Liu1, Dong Yue2, Yidong Chen4,5, Shou-Jiang Gao3,5 and Yufei Huang2,5* 1 SIEE, China University of Mining and Technology, Xuzhou, Jiangsu, CHINA. Department of ECE, University of Texas at San Antonio, 3 Department of Pediatrics, 4 Department of Epidemiology and Biostatistics, 5Greehey Children’s Cancer Research Institute, University of Texas Health Science Center at San Antonio. 2 S.1. SENISITIVTIES OF PROPOSED POTENTIAL SITE FILTER Table 1. Sensitivities of the proposed filter and the rule based on 6-mer seed match obtained on training data. Seed Match Rules Sensitivity of Site Detection Sensitivity of UTR Detection 6mer perfect match proposed rules 79.8% 96.2 77.1% 95.8% © Oxford University Press 2005 1 H. Liu S.2. BRIEF SUMMARY OF SITE FEATURES Table 2. Brief summary of all site features. Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 2 Feature name consv_3cntxt consv_seed consv_5cntxt sm_6mer sm_7mer_A1 sm_7mer_m1 sm_7mer_m8 sm_8mer_A1 sm_8mer_m1 to_stop_codon to_ends ratio_to_ends nt1 nt2 nt3 nt4 nt5 nt6 nt7 nt8 nt9 nt10 nt11 nt12 nt13 nt14 nt15 nt16 nt17 nt18 nt19 nt20 2mer1 2mer2 2mer3 2mer4 2mer5 2mer6 2mer7 2mer8 2mer9 2mer10 2mer11 2mer12 2mer13 2mer14 2mer15 2mer16 2mer17 2mer18 2mer19 rgs_match rgs_gu rgs_mismatch rgs_gap rgs_bulge rgs_bulge_nt Data type FLOAT FLOAT FLOAT INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER FLOAT INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER Group conservation conservation conservation seed match type seed match type seed match type seed match type seed match type seed match type position position position nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status nt match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status 2mer match status region region region region region region Explanation seed's 3' context conservation score seed conservation score seed's 5' context conservation score 6mer seed match 7mer_A1 seed match 7mer_m1 seed match 7mer_m8 seed match 8mer_A1 seed match 8mer_m1 seed match distance to stop codon distance to nearest end ratio to nearest end p1 match status p2 match status p3 match status p4 match status p5 match status p6 match status p7 match status p8 match status p9 match status p10 match status p11 match status p12 match status p13 match status p14 match status p15 match status p16 match status p17 match status p18 match status p19 match status p20 match status 2mer1 match status 2mer2 match status 2mer3 match status 2mer4 match status 2mer5 match status 2mer6 match status 2mer7 match status 2mer8 match status 2mer9 match status 2mer10 match status 2mer11 match status 2mer12 match status 2mer13 match status 2mer14 match status 2mer15 match status 2mer16 match status 2mer17 match status 2mer18 match status 2mer19 match status number of match in seed region number of mismatch in seed region number of G:U in seed region number of gap in seed region number of bulge in seed region number of bulged nts in seed region Typical value Min Max 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 2000 2000 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 16 8 8 8 8 2 2 Improving Performance of Mammalian MicroRNA Target Prediction 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 rgs_energy rg3_match rg3_gu rg3_mismatch rg3_gap rg3_bulge rg3_bulge_nt rg3_energy rgt_match rgt_gu rgt_mismatch rgt_gap rgt_bulge rgt_bulge_nt rgt_energy acc_energy cntxt_A_cntnt cntxt_C_cntnt cntxt_G_cntnt cntxt_U_cntnt cntxt_AA_cntnt cntxt_AC_cntnt cntxt_AG_cntnt cntxt_AU_cntnt cntxt_CA_cntnt cntxt_CC_cntnt cntxt_CG_cntnt cntxt_CU_cntnt cntxt_GA_cntnt cntxt_GC_cntnt cntxt_GG_cntnt cntxt_GU_cntnt cntxt_UA_cntnt cntxt_UC_cntnt cntxt_UG_cntnt cntxt_UU_cntnt cntxt_pos_n8 cntxt_pos_n7 cntxt_pos_n6 cntxt_pos_n5 cntxt_pos_n4 cntxt_pos_n3 cntxt_pos_n2 cntxt_pos_n1 cntxt_pos_n0 cntxt_pos_p1 cntxt_pos_r1 cntxt_pos_r2 cntxt_pos_r3 cntxt_pos_r4 cntxt_pos_r5 cntxt_pos_r6 cntxt_pos_r7 cntxt_pos_r8 cntxt_pos_r9 cntxt_pos_r10 FLOAT INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER FLOAT INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER INTEGER region region region region region region region region region region region region region region region accessbility energy context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context context binding energy of seed region number of match in 3' region number of mismatch in 3' region number of G:U in 3' region number of gap in 3' region number of bulge in 3' region number of bulged nts in 3' region binding energy of 3' region number of match in total region number of mismatch in total region number of G:U in total region number of gap in total region number of bulge in total region number of bulged nts in total region binding energy of total region accessbility energy A content in context C content in context G content in context U content in context AA content in context AC content in context AG content in context AU content in context CA content in context CC content in context CG content in context CU content in context GA content in context GC content in context GG content in context GU content in context UA content in context UC content in context UG content in context UU content in context nt type of -8 nt type of -7 nt type of -6 nt type of -5 nt type of -4 nt type of -3 nt type of -2 nt type of -1 nt type of -0 nt type of +1 nt type of r1 nt type of r2 nt type of r3 nt type of r4 nt type of r5 nt type of r6 nt type of r7 nt type of r8 nt type of r9 nt type of r10 -10 0 0 0 0 0 0 -10 0 0 0 0 0 0 -20 -20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 5 8 8 8 8 2 5 5 15 15 15 15 4 12 10 10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 H. Liu S.3. BRIEF SUMMARY OF UTR FEATURES Table 3. Brief summary of all UTR features. Index Feature name Data type Group Explanation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 utr_len psite_dens max_partial_psite_num pos_site_dens max_partial_pos_site_num total_pos_score psite_num pos_site_num top_score psite_num_6mer pos_site_num_6mer top_score_6mer psite_num_7mer_A1 pos_site_num_7mer_A1 top_score_7mer_A1 psite_num_7mer_m1 pos_site_num_7mer_m1 top_score_7mer_m1 psite_num_7mer_m8 pos_site_num_7mer_m8 top_score_7mer_m8 psite_num_8mer_A1 pos_site_num_8mer_A1 top_score_8mer_A1 psite_num_8mer_m1 pos_site_num_8mer_m1 top_score_8mer_m1 psite_num_other pos_site_num_other top_score_other INTEGER FLOAT INTEGER FLOAT INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT INTEGER INTEGER FLOAT utr length density density density density globe site score globe site score globe site score globe site score site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type site score of seed type length of utr density of potential site in entire UTR max number of potential site in 100 nt density of positive site in entire UTR max number of positive site in 100 nt total score of positive sites number of potential sites number of postive sites top score of all potential sites number of potential sites with 6mer seed number of postive sites with 6mer seed top score of all potential sites with 6mer seed number of potential sites with 7mer_A1 seed number of postive sites with 7mer_A1 seed top score of all potential sites with 7mer_A1 seed number of potential sites with 7mer_m1 seed number of postive sites with 7mer_m1 seed top score of all potential sites with 7mer_m1 seed number of potential sites with 7mer_m8 seed number of postive sites with 7mer_m8 seed top score of all potential sites with 7mer_m8 seed number of potential sites with 8mer_A1 seed number of postive sites with 8mer_A1 seed top score of all potential sites with 8mer_A1 seed number of potential sites with 8mer_m1 seed number of postive sites with 8mer_m1 seed top score of all potential sites with 8mer_m1 seed number of potential sites without perfect seed number of postive sites without perfect seed top score of all potential sites without perfect seed 4 Typical value Min 0 0 0 0 0 -2 0 0 -2 0 0 -2 0 0 -2 0 0 -2 0 0 -2 0 0 -2 0 0 -2 0 0 -2 Max 2000 0.1 5 0.01 2 2 50 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Improving Performance of Mammalian MicroRNA Target Prediction S.4. DATA SOURCE OF NEGATIVE SAMPLES Table 4. Data Source of Negative Samples. Index miRNA GEO Dataset ID NO. of negative sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 hsa-let-7c hsa-miR-15a hsa-miR-16 hsa-miR-17 hsa-miR-192 hsa-miR-20a hsa-miR-215 has-miR-192 has-mirR-215 hsa-miR-122 hsa-miR-128 hsa-miR-132 hsa-miR-133a hsa-miR-142-3p hsa-miR-148b hsa-miR-34a hsa-miR-34b hsa-miR-34c-5p hsa-miR-7 hsa-miR-9 GSM156557, GSM156558 GSM156545, GSM156549 GSM156546, GSM156550 GSM156553, GSM156555 GSM156547, GSM156551 GSM156554, GSM156556 GSM156548, GSM156552 GSM328290, GSM328287 GSM328291, GSM328288 GSM210900, GSM210901 GSM210902, GSM210903 GSM210904, GSM210905 GSM210906, GSM210907 GSM210908, GSM210909 GSM210910, GSM210911 GSM187633, GSM187634, GSM187631, GSM187632 GSM190765, GSM190757 GSM190758, GSM190766 GSM210896, GSM210897 GSM210898, GSM210899 29 613 587 115 77 108 92 21 20 13 10 11 203 38 42 676 424 451 8 4 Total No. 3542; And 3492 pairs left after remove reduplicate ones 5 H. Liu S.5. HISTOGRAMS OF SITE FEATURES The independent empirical distributions of each site feature in the forms of histograms were obtained from the positive and negative data. Although, they do not reveal combinatory discriminative power of the features, they do provide information regarding the importance of the features in prediction. Particularly, if the distributions of a feature in the positive and negative target sites are similar, it means that the positive and negative target sites cannot be easily separated by this feature, and thus this feature bears low discriminative power, or in other word, is unlikely to be a good feature. 0.5 0 0.5 0 0 1 6mer seed match 0.5 0 0 1 7mer_A1 seed match 0.5 0.5 0 0 1 7mer_m8 seed match 0 1 7mer_m1 seed match 1 Probability 1 Probability Probability 1 0 1 Probability 1 Probability Probability 1 0.5 0 0 1 8mer_A1 seed match 0 1 8mer_m1 seed match Fig. 1. Histograms of perfect seed match features. 1 negative positive 0.4 0.2 0.6 0.4 0.2 0 -10 -5 0 binding energy of seed region 1 negative positive 0.8 Probability 0.6 0 0.8 Probability Probability 0.8 1 negative positive 0.6 0.4 0.2 -20 -10 0 binding energy of 3' region 0 0.8 Probability 1 negative positive 0.6 0.4 0.2 -30 -20 -10 0 binding energy of total region 0 -20 0 20 accessbility energy Fig. 2. Histograms of energy features. 1 0.4 0.2 0 Fig. 3. Histograms of conservation features. 6 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 seed's 3' context conservation score 0 0.8 Probability 0.6 1 negative positive 0.8 Probability Probability 0.8 1 negative positive negative positive 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 seed conservation score 1 0 0 0.2 0.4 0.6 0.8 1 seed's 5' context conservation score Improving Performance of Mammalian MicroRNA Target Prediction 0.5 1 2 3 4 p16 match status 1 2 3 4 p17 match status negative 0 1 2 3 4 p15 match status Probability Probability 1 0.5 1 2 3 4 p18 match status 0.5 0 1 2 3 4 p14 match status 1 0.5 0 Probability Probability Probability Probability 1 0.5 0 0 1 2 3 4 p10 match status 1 0.5 1 2 3 4 p13 match status 0.5 0 1 2 3 4 p9 match status 1 0.5 0 1 2 3 4 p12 match status 1 Probability Probability 1 0 0.5 0 1 2 3 4 p11 match status 0 1 2 3 4 p5 match status 1 0.5 1 2 3 4 p8 match status 0.5 0 1 2 3 4 p4 match status 1 1 Probability 0.5 0 1 2 3 4 p3 match status 0.5 0 1 2 3 4 p7 match status 1 Probability Probability 1 0 0.5 0 1 2 3 4 p6 match status 0.5 1 Probability 0.5 0 0 1 2 3 4 p2 match status 1 Probability Probability 1 0.5 1 Probability 0 1 2 3 4 p1 match status Probability 0 0.5 1 Probability 0.5 1 Probability 1 Probability Probability 1 1 2 3 4 p19 match status 0.5 0 1 2 3 4 p20 match status positive Fig. 4. Histograms of nt match features. 7 H. Liu 0.5 0 5 10 2mer1 match status 0 15 0 5 10 2mer4 match status 5 10 2mer7 match status 0 5 10 2mer10 match status Probability 5 10 2mer8 match status 0.5 0 5 10 2mer13 match status 0 5 10 2mer11 match status 0.5 0 0 5 10 2mer16 match status 15 Probability 0 0 5 10 2mer14 match status 5 10 2mer19 match status Fig. 5. Histograms of 2mer match features. 8 15 5 10 2mer9 match status 15 0 5 10 2mer12 match status 15 0 5 10 2mer15 match status 15 0 15 0.5 0 0 5 10 2mer18 match status 15 0.5 1 0 5 10 2mer17 match status negative positive 0 0 1 1 0.5 15 0.5 0 15 1 Probability 1 5 10 2mer6 match status 1 0.5 0 15 0 0.5 0 15 1 Probability Probability 0 0.5 0 15 15 1 Probability Probability Probability 0.5 5 10 2mer3 match status 0.5 0 15 1 1 Probability 5 10 2mer5 match status 0.5 0 15 1 0 0 Probability Probability Probability 0 0 1 1 0.5 0.5 0 15 0.5 0 15 1 0 5 10 2mer2 match status Probability 0.5 0 0 1 Probability Probability 1 0 0.5 Probability 0 1 Probability 1 Probability Probability 1 15 0.5 0 Improving Performance of Mammalian MicroRNA Target Prediction 0 0 10 20 30 number of bulged nts in seed region 0 5 10 number of match in 3' region 0.5 0 0 1 2 3 4 5 number of bulge in 3' region 0 10 20 30 number of bulged nts in 3' region 0.5 0 0 10 20 number of gap in total region Probability 1 0.5 0.5 0 0 0 1 1 1 0.5 0 5 10 number of G:U in 3' region 0 5 10 number of gap in 3' region 0.5 0 0 0 10 20 0 10 20 number of match in total region number of mismatch in total region 0.5 0 0 10 20 number of G:U in total region 1 Probability 1 Probability 1 0.5 0 1 2 3 number of bulge in seed region 1 0 5 10 number of mismatch in 3' region Probability 0.5 0 Probability 0 1 Probability Probability 1 0.5 0.5 0 0 1 2 3 4 5 6 7 8 number of gap in seed region Probability 0.5 0 0 1 2 3 4 5 6 7 8 number of G:U in seed region 1 Probability 1 Probability Probability 1 0 0.5 Probability 0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 number of match in seed region number of mismatch in seed region 0.5 1 Probability 0 Probability 0.5 1 Probability 0.5 1 Probability 1 Probability Probability 1 0.5 negative positive 0.5 0 0 0 1 2 3 4 5 6 0 10 20 30 number of bulge in total region number of bulged nts in total region Fig. 6. Histogram of regional binding structure features. 1 negative positive 0.6 0.4 0.2 0 0.8 Probability Probability 0.8 1 negative positive 0.6 0.4 0.2 0 5000 10000 distance to stop codon 0 negative positive 0.8 Probability 1 0.6 0.4 0.2 0 811 2433 4055 5677 distance to nearest end 0 0 0.2 0.4 ratio to nearest end Fig. 7. Histogram of Position Features. 9 H. Liu S.6 ROC PERFORMANCE OF SITE-SVM To investigate the performance of Site-SVM, the receiver operating characteristic (ROC) performance is obtained from the cross-validation (Figure 9) based on the training dataset. The ROC evaluates the performance of the true positive rate (TPR), or sensitivity vs. the false positive rate (FPR), or 1-specificity. TPR denotes the chance of having predicted the entire true targets, while FPR measures the odds of falsely predicting a target. A better algorithm should have smaller FPR at a given TPR. In Figure 9, Site-SVM shows a better performance comparing to 6 types of perfect seed match. Moreover, Site-SVM presents a continuous curve, which means Site-SVM can calculate the confidence of a potential site to be a positive site, and this is meaningful for sequential identification work. Site-SVM Top 11 ROC: 95.33 1 0.9 0.8 True positive rate 0.7 0.6 0.5 Site-SVM 6mer seed match 7mer_A1 seed match 7mer_m1 seed match 7mer_m8 seed match 8mer_A1 seed match 8mer_m1 seed match 0.4 0.3 0.2 0.1 0 0 0.1 Fig. 8. ROC curve of Site-SVM and perfect seed match. 10 0.2 0.3 0.4 0.5 0.6 False positive rate 0.7 0.8 0.9 1 Improving Performance of Mammalian MicroRNA Target Prediction S.7. HISTOGRAMS OF UTR FEATURES 0.5 negative positive Probability 0.4 0.3 0.2 0.1 0 0 2000 4000 6000 8000 10000 12000 Fig. 9. Histogram of UTR Length Feature. 0.6 0.4 1 negative positive 0.8 0.6 0.4 0.6 0.4 0.2 0.2 0.2 0 0 0 0 0.05 0.1 density of potential site in entire UTR 0 5 10 15 20 25 max number of potential site in 100 nt 1 negative positive 0.8 Probability 0.8 Probability Probability 1 negative positive Probability 1 0.8 negative positive 0.6 0.4 0.2 0 0.001 0.01 0.05 density of positive site in entire UTR 0 0 1 2 3 max number of positive site in 100 nt Fig. 10. Histograms of sites density features. 11 H. Liu 0.5 Fig. 11. Histograms of Sites Score Features. negative positive 0.5 negative positive 0.5 0 4 3 2 1 0 7mer_m8 seed postive sites number Probability Probability Probability Probability 1 negative positive 0.5 negative positive 0.5 0 0 0 4 3 2 1 0 2 0 -2 3 2 1 0 8mer_A1 seed postive sites number 8mer_A1 seed potential sites top score 8mer_m1 seed potential sites number 1 1 1 negative positive 0 0 2 0 -2 3 2 1 0 8mer_m1 seed postive sites number 8mer_m1 seed potential sites top score 12 0.5 Probability 0.5 Probability Probability Probability 0.5 1 negative positive 1 1 negative positive 1 negative positive 0.5 negative positive 0.5 0 0 2 0 -2 3 2 1 0 7mer_A1 seed postive sites number 7mer_A1 seed potential sites top score 0 0 0 4 2 0 2 0 -2 3 2 1 0 7mer_m1 seed postive sites number 7mer_m1 seed potential sites top score 7mer_m8 seed potential sites number 0 0 4 3 2 1 0 2 0 -2 7mer_m8 seed potential sites top score 8mer_A1 seed potential sites number 1 0.5 1 negative positive 0.5 1 negative positive negative positive 0.5 0 200 100 0 other seed potential sites number negative positive 0.5 0 3 2 1 0 other seed postive sites number Probability 0.5 Probability negative positive 0 4 2 0 7mer_A1 seed potential sites number negative positive 0 10 5 0 6mer seed potential sites number 10 5 0 potential sites toppest score Probability 0.5 1 1 0.5 Probability 0 4 2 0 7mer_m1 seed potential sites number negative positive Probability 0.5 Probability negative positive 0.5 1 negative positive 1 1 1 1 negative positive 0 8 6 4 2 number of postive sites Probability 4 2 0 6mer seed postive sites number Probability 0.5 0 2 0 -2 6mer seed potential sites top score 0 0 1 negative positive Probability Probability Probability 0.5 0.5 0 65 129 193 257 1 number of potential sites 1 negative positive negative positive Probability 0 10 5 0 total score of positive sites 1 Probability 0.5 Probability 0 negative positive Probability 0.5 Probability Probability negative positive 1 1 1 1 negative positive 0.5 0 10 5 0 other seed potential sites top score Improving Performance of Mammalian MicroRNA Target Prediction S.8. EVALUATION BASED ON THE PROTEOMICS DATA To demonstrate the robustness of prediction, we carried out the prediction of 5 more miRs (miR-155, hsa-let-7b, hsa-miR-16, and hsa-miR-30a), for which the proteomic data are available in (Selbach, et al., 2008). The cumulative fold changes of different number of top ranked predictions for each miRNA are summarized in Figs. 12-15, respectively. In all cases, SVMicrO achieves the largest down-fold for three of the 4 miRs by top 300, indicating a better sensitivity. For the performance of the top 200 predictions, SVMicrO has achieved consistently among the highest cumulative down-fold; this suggests the better precision of the algorithm. 3 cumulative sum of protein fold change 2 1 SVMicro TargetScan miRanda MirTarget PicTar PITA 0 -1 -2 -3 -4 -5 Top 25 Top 50 Top 100 Top 200 Top 300 Fig. 12. Cumulative sum of protein fold change for different number of top ranked predictions of miR-155. 13 H. Liu 0 cumulative sum of protein fold change -5 -10 -15 -20 -25 SVMicro TargetScan miRanda MirTarget PicTar PITA -30 -35 -40 Top 25 Top 50 Top 100 Top 200 Top 300 Fig. 13. Cumulative sum of protein fold change for different number of top ranked predictions of miR-let-7b. 2 cumulative sum of protein fold change 0 -2 -4 -6 -8 -10 SVMicro TargetScan miRanda MirTarget PicTar PITA -12 -14 -16 -18 Top 25 Top 50 Top 100 Top 200 Fig. 14. Cumulative sum of protein fold change for different number of top ranked predictions of miR-16. 14 Top 300 Improving Performance of Mammalian MicroRNA Target Prediction 6 cumulative sum of protein fold change 5 4 3 SVMicro TargetScan miRanda MirTarget PicTar PITA 2 1 0 -1 -2 -3 -4 Top 25 Top 50 Top 100 Top 200 Top 300 Fig. 15. Cumulative sum of protein fold change for different number of top ranked predictions of miR-30a. 15 H. Liu S.14. EVALUATION FOR MIR-1 BASED ON THE IP PULL-DOWN DATA Validation based on IP pull-down data was also carried out on miR-1. 56 high confidence targets by the IP experiment were treated as true targets. The ROC curve and the number of true positives among top ranked predictions are shown in Fig 17 and 18. It is easy to see that SVMicrO has the best performance. 1 0.9 0.8 True Postitive Rate 0.7 0.6 0.5 0.4 SVMicrO (0.75184) pictar (0.56736) miRanda (0.63766) mirTarget (0.60963) (0.74165) PITA TargetScan(0.58089) 0.3 0.2 0.1 0 0 0.1 0.2 0.6 0.5 0.4 False Positive Rate 0.3 0.7 0.8 0.9 Fig. 16. ROC curves for the predictions of miR-1 tested on IP pull-downs.. 20 18 SVMicro PicTar miRanda mirTarget PITA TargetScan Number of True Positives 16 14 12 10 8 6 4 2 0 Top 25 50 75 Fig. 17. Number of true positives among top ranked predictions of miR-1. 16 100 150 200 250 300 1 Improving Performance of Mammalian MicroRNA Target Prediction 17
© Copyright 2026 Paperzz