xi = more likely (or less likely) to appear together in the xi , LabelPathFeature(`k , Yi ), InstancePathFeature(i, YL [ {Yi |ican 2 Ube }, S) structures. Similar to the previous subsection, the instance no also be linked by certain meta-paths, thus the la of the linked instances can be correlated. For exam chemical compound nodes are linked with each other causeSideEf f ect the meta-path “chemical compound ! label set of than those without “PPI” 2. Update the estimated value Ŷi for Yi on each testing instance (i 2 U ) as follows: 81achemical k q, Ŷikcompound = fk (xk i ). links. Output: ⇣ ⌘ To generalize, the label concepts can also be linked by ŶU = Ŷ1 , · · · , Ŷnu : the label sets of testing instances (i 2 U ). certain meta-paths, thus being correlated. For example, the gene labels are also linked with each other through the meta- Multi-Label Classification by Mining Label and Instance Meta-path-based Label Correlations Correlations from 3.2Heterogeneous Information Networks ! hasGO hasGO 1 Figure 3: The PIPL algorithm path “gene !GO !gene”. It indicates another causeSideEf f ect 1 fect ! chemical compounds”. It i type of label correlations: gene labels that share similar gene the type of instance correlations: chemical drugs th ontology terms can be more likely (or less likely) to appear di↵erent instances assumed to be independent: similar side e↵ects can be more likely (or less likely) together in the are label set of a chemical compound than those Y Thus meta-paths among the similar label sets than those without such meta-pat without such meta-path links. Heterogeneous information networks can involve various P (Y|X ⇡ P (Ystarting i |xi ) meta-paths among theasinstance label nodes, i.e., the) meta-path with (`k , YThus existing knowledge about the candidate label concepts, where ! the set ofnodes, diseasesi.e., that th th •and x` = ending LabelPathFeature i i) starting and ending with the instance node ty For each meta-path the label node type, can e↵ectively capture di↵erent types Pj 2 Spath gene-disease association pred `: the complex correlations among label concepts are embed1. Get relatedare labels for node `k through capture meta-path P j However, in heterogeneous information networks, there oftypes training gene instances, a e↵ectively di↵erent of instance corr of label correlations embedded in heterogeneous information ded within the network structures. For example, in Figure 1, i.e., the related index set C = Pj (i) stance, we wantnetworks. to predict wh k embedded in heterogeneous information complex correlations not only among di↵erent but networks. 2. xlabels j = Aggregation {Yi |k 2 C} the label nodes (i.e., the gene nodes) are linked with each cause. In this way, we could > We propose to exploit meta-path-based instance Return (· · · , x> We propose to exploit meta-path-based label correlations also among di↵erent instances. Heterogeneous information j ,···) might be caused by a specific other directly through “PPI” links. tions for multi-label classification. Given a setterms of met for multi-label classification. Given a set of meta-paths networks can involve various existing knowledge about the 3000 gene ontology (G • xI = InstancePathFeature (i, Y) 0 It indicates one type3 of label correlation within the netthe instance nodes, Sfeatures , · · ·gene , Pc0 instanc }, th amongwhere the label S` =correlations {P1 , · · · , Pc`among }, the I = {P of1each For meta-patheach I instances, thenodes, complex themeta-path label Pj0 2 Samong I: Meta-path Multi-label Classification work structure: gene labels that are linkedbased by “PPI” links 0 associated withused the largest instance can be as foll 1. Get related instances forpath-based node Ii through meta-pathcorrelations Pj , based label instances correlationsare canembedded be used aswithin follows: sets of di↵erent the network the candidate labels. The l i.e. the related index set C = Pj0 (i) can be more likely (or less likely) to appear together in the Y q Pathway structures. complete. Thus, we kept al Y 2. xP Aggregation ({Yi |i 2 C}) P (Y|X ) ⇡ j = (k) P (Yi |x label set of a chemical compound than those without “PPI” c` P1 (k) i , YP 0 (i) , · · · , YPc0 (i) ) challenge:! the%large%space%of%all%possible%label%sets,% 192 > , · · · )> 1 and randomlyI sam Gene (· · · ), x P (Yprevious P (Yik |xi , Y , · · · ,Return Yinodes i one label, i |xi ) ⇡ subsection, Similar8i, to the the instance can j i Xiangnan Kong, Bokai Cao, Philip S. Yu University of Illinois at Chicago! 1 Multi-label Classification Multi-label Classification Key 9710 Previous research: Gene%A% c% f% d% a% a% e% b% c% ` and each dashed line representsbased one label type correlations of links. can be used as follows: Each number under the node/link represents the toq Y Pc (k) tal number of nodes/links of the same type. P (k) 8i, P (Yi |xi ) ⇡ P (Yik |xi , Yi 1 , · · · , Yi ` ) k=1 2 Heterogeneous Information Network ical compounds as theclass insta In order to perform multi-label collective …% e% f% m …% d% d% Y 3.3 Meta-path-based Instance Correlations causeSideEf f ect 1 bel correlations and meta-path-based intance correlations sifect ! chemical compounds”. It indicates We firstinformation represent the structu Existing approaches for multi-labelY1 classification more e↵ectively in heterogeneous netw multaneously. usually the type instance correlations: that as a graph object. Then we Yshare Yqon haveofi.i.d. assumptions, where chemical the label drugs set predictions this Ppaper, wek explicitly consider both meta-path-b (k) mining algorithm [19] to ext 0 (i) , · · · ) (Y|X ) ⇡ to have P (Yik |xi , Yi j , · · · , YP Yk lessP likely) x 1% similar side e↵ects can be more likely (or 0 j i k=1 dataset and used them as fe similar label sets than those without such meta-path links. In Figure 3, we summarize the proposed multi-label collar to the previous task, the Ym lective algorithm, called PIPL. The algorithm Thus meta-paths among the instance nodes, i.e.,classification the metawith the largest number of c Y1 following as the candidate labels. We path starting and ending with the instanceincludes nodethe type, cansteps: Meta-path Construction: Given a heterogeneous inforfeature. The dataset contain Yk x 3% e↵ectively capture di↵erent types of instance mationcorrelations network, we first extract all non-redundant metaand 50 labels in total. All th embedded in heterogeneous information networks. paths for label correlations and instance correlations sepathe network, and can be used Ym 0 A meta-path Pj in S` (or Pj in SI ) is non-redundant We propose to exploit meta-path-based rately. instance correla4.2 Evaluation Metri if Pj (or Pj0 ) cannot be reconstructed by combining any subtions for multi-label classification. Given a set ofthe meta-paths set of meta-paths in S` (or SI ). We only extract short Multi-label classification p among the instance nodes, SI = {P10 , · · · ,meta-paths Pc0 I }, the metawith a maximum path length p max. It has been cated criteria for performanc shown [28] that long meta-paths are not quite useful in single-label classification pro path-based instance correlations can be used as in follows: capturing the linkage structure of heterogeneous information criteria in [10, 14, 21, 37, 8] t Y networks. cation performance in hetero P (Y|X ) ⇡ P (Yi |xi , YP10 (i) , · · · ,Training YPc0 (i)Initialization: ) We construct q extended trainGiven a multi-label dataset …% a% b% without any direct link conne k=1meta-paths, thus the label sets also be linked by certain where Pj0 (i) denotes the indextoset 1 : of 10instances ratio. Thethat final ar da Figure 4:Y1 The the functions of constructing relational of thewhere linked correlated. For example, features and labels to the i-th instance through 3000 meta-path Pj0 50 2S I . in Pjinstances (k) denotescan thebe index set of labels that aremeta-path-based linked features for label correlations and are kept in the network, and chemical compound aremeta-path linked with other to the k-th labelnodes through Pjeach 2meta-path-based . Ykthrough xS instance correlations 2%` • Drug-Target Binding Predic causeSideEf f ect 3.4 The Unified Model the meta-path “chemical compound ! Side Efied is drug-target binding pr …% How to extract Label Correlations Gene%C% PPI 30213 causeSideEffect 14633 hasGO hasGO 1 labels%to%facilitate%the%learning%process,%by%assuming%that%the%label% Tissue Gene hasTissue path “gene 9004!GO !gene”. It indicates another 9905 507 20419 correlaSons%are%given%or!can%be%derived%directly%from%data% bindtype of label correlations: gene labels that share similar gene 489392 samples%by%counSng%their%label%co8occurrences.% Chemical Compound ontology terms can be more likely (or less likely) to appear hasGeneFamily 244233 7181 % causeDisease together in the label set of a chemical compound than those 2676 We%argue%that%in%many%real8world%mulS8label%classificaSon%tasks,% without such meta-path links. Thus meta-paths among the treatDisease hasSubstructure Gene Family 927 6127 the%label%correlaSons%are%not%given%and%can%be%hard%to%learn% label nodes, i.e., the meta-path starting and ending with 329 the label node type, can e↵ectively capture di↵erent types directly%from%data%samples%within%a%moderate8sized%training%set.% Disease Substructure 1284 290 of label correlations embedded in heterogeneous information % networks. Solution :%we%can%exploit%%heterogeneous0informa4on0networks0 Figure 1: An example of heterogeneous information We propose to exploit meta-path-based label correlations to%extract%abundant%knowledge%about%relaSonships%among%data% network (the data schema of SLAP network). classification. Each for multi-label Given a set of meta-paths samples%and%class%labels.% box represents one type of nodes in the network, among the label nodes, S` = {P1 , · · · , Pc }, the meta-path- Labels Gene%B% 2777 To generalize, Side the label concepts can also be linked by Effect 1051 certain meta-paths, thus being correlated. For example, the exploits%correla(ons!among!different! gene labels are also linked with each other through the metahasChemicalOntology hasPathway 10796 hasGeneOntology 89691 …% Instance links. Ontology which%is%exponenSal%to%the%number%of%candidate%labels.%% Chemical Ontology …% Given%a%mulS8label%dataset {x1, … , xn},%which%consists%of%% n%instances,%labeled%by%{Y1, … , Yn},%where%Yi ✓{l1, l2, …, lm} denotes%the%set%of%labels%assigned%to%xi.% i I ing sets 81 k q, Dk = (xki , yik ) by converting each instance xi to xki using the functions in Figure 4. We train 0 (xi , Yi ). Here Yi 2 {0, 1}q ( the predicted label set for x where P that are linked 1: Summary of compared methods. j (i) denotes Figure 2: Multi-label Table classification by mining het- the index set of instances ! Instance and Label Correlations on each label, using the extended training sets. We have the following evalua tion network 1 as Figure 1 that contains abundant i-th instance through meta-pathPublication Pj0 one 2 Sclassifier I. where Pj (k)knowledge denotes the4 index set of labels are linked Method Type ofthat Classification Type of Correlations Exploited Iterative Inference: Overall, it is an iterative classification erogeneous information networks.to the • Micro F1 [10, 14, 21]: is Experiments Table 1: Summary of compared methods. about the relationships among di↵erent types of entities into the k-th label through meta-path Pj 2 S` . algorithm [22] for the inference step. During the inference, average of Precision and Rec Bsvm Binary Classification all independent [2] we iteratively update the label set predictions of the testing cluding chemical compounds and gene targets, we can make that the score is first comput 3.4 The UnifiedType Model Method Type of Classification of Correlations Exploited Publication instances, and the relational features corresponding to the then averaged with equal im Ecc Multi-Label Classification ¨ label correlcation from data samples [27] use of the domain knowledge within this network to facilitate 3.3 Meta-path-basedreasons: Instance Correlations Bsvm all independent [2] 1) they belong to the Binary sameClassification gene 2) perform they In family; order to multi-label collective label classification and instance correlations. multi-label classification. PIsl Collective Classification ≠ instance correlations from heterogeneous network [18] 2 share similarclassification pathways; 3)usually theyMulti-Label are inter-connected through Existing approaches for multi-label more e↵ectively heterogeneous information networks, [27] in micro-F1(h, DU ) = Pn Ecc Classification ¨ in label correlcation from data samples First, the heterogeneous information network can provide ¨ label correlation from can data samples i= Compared 4. EXPERIMENTS PPI links, etc. information networks have i.i.d. assumptions, where the label set Heterogeneous predictions on this paper, we explicitly consider both meta-path-based laPIsl Collective ≠ instance correlations from heterogeneous [18] Icml Multi-Label Collective Classification Æ Classification instance correlation from homogeneous network [17] network abundant knowledge about the relationships among di↵erThe larger the value, the bet provide complex relationships among the label concepts, inMethods label correlation from data samples 4.1 Data Collections ent gene targets. In the network, gene targets are inter• Hamming loss [8, 37]: ev PIml ≠ instance correlations¨from network This paper volvingMulti-Label multiple Collective types ofClassification label correlations [28]. How toheterogeneous Icml Multi-Label Collective Classification Æ instance correlation from homogeneous network [17] between true labels and pred In order to evaluate the performances of multi-label collecconnected with many other types of nodes, such as diseases Pathway% ≠ meta-path-based instance correlation exploit the linkagePIml semantics is a very challenging problem Multi-Label Collective Classification ≠ instance correlations from heterogeneous network This paper tive classification in heterogeneous information networks, we PIPL Multi-Label Collective Classification Ø label correlations from heterogeneous network This paper and pathways. The gene targets, that are linked with similar HammingLoss(h, DU ) = [28], which has not yet been explored in this context. ≠ meta-path-based instance correlationhad our algorithm tested on a bioinformatic dataset SLAP diseases or pathways, are more likely to appear together in [4], which heterogeneous network composed by over PIPL instance Multi-Label Collective label correlations from heterogeneous network paper Mining heterogeneous correlations: In(rank) multi-label Table 2: Classification performances “average score ±Classification std ” onØ gene-disease association prediction task.is a This 290Kbetter nodes and 720K edges. As shown in Figure 1, the the same label set than those without such connections. Sec- “#” indicates where stands for the symm Gene%Ontology% the smaller the value the better theinstances performance; indicates the value the classification, the label sets of di↵erent can “"” also be± std the larger Chemical%Ontology % SLAP dataset contains integrated data related to chemiTask 1: Gene-Disease Association Table 2: Classification performances “average score ” on gene-disease association prediction task. denotes the l1 -norm. The sm ond, the heterogeneous information network can also provide the performance. correlated with “#” each otherthe through types of the relaindicates smaller multiple the value the better performance; “"” indicates thecal larger the value the diseases, better side e↵ects, pathways etc. compounds, genes, performance. hasPathway0 methods abundant knowledge about the relationships among di↵erthe performance. Specifically, there are two di↵erent prediction tasks studied • Subset 0/1 Loss [8, 10]: ev tionships. For example, di↵erent chemical compounds can PIsl criteria #label Bsvm Ecc Icml PIml PIPL ent chemical compounds. In the network, chemical commethods in this section: set prediction. be correlated for various reasons: 1) they have similar side hasGeneOntology0 PPI0 10 0.360±0.082 (6) 0.387±0.073 (4) 0.366±0.079 (5) 0.390±0.115 (3) 0.399±0.107 (2) 0.400±0.106 (1) • Gene-Disease Association Prediction: The first task we criteria #label Bsvm Ecc Icml PIsl PIml PIPL pounds are also connected with other types of objects, such e↵ects; 2) they have similar 3)(5) they 20 0.385±0.046 (6) chemical 0.406±0.043 ontologies; (4) 0.389±0.066 0.417±0.055 (3) 0.426±0.055 (2) 0.433±0.066 (1) is gene-disease association prediction, where we treat 10 0.360±0.082 0.387±0.073 0.366±0.079 0.390±0.115 studied 0.399±0.107 0.400±0.106 as side e↵ects and chemical ontologies. The chemical comSubsetLoss(h, DU ) = Micro-F1 " 30 0.317±0.035 (6) 0.359±0.027 (2) 0.343±0.039 (5) 0.342±0.037 (4) 0.355±0.013 (3) 0.360±0.007 (1) hasChemicalOntology0 as the instances, and diseases as the labels. In SLAP have similar40substructures, etc.20 Heterogeneous information 0.385±0.046 0.406±0.043 0.389±0.066 0.417±0.055 genes 0.426±0.055 0.433±0.066 0.342±0.045 (5) 0.386±0.032 (3) 0.339±0.042 (6) 0.382±0.034 (4) 0.387±0.030 (2) 0.391±0.030 (1) pounds, that are linked with similar side e↵ects or chemical Micro-F1 " 30 0.317±0.035 0.359±0.027 0.343±0.039 0.342±0.037 dataset, 0.355±0.013 0.360±0.007 each gene can cause or be related to multiple disPIPL%outperforms%others%on%both% networks can complex among di↵erent 50 provide 0.303±0.055 (6) relationships 0.346±0.059 (4) 0.321±0.063 (5) 0.348±0.064 (3) 0.360±0.075 (2) 0.366±0.078 (1) 40 0.342±0.045 0.386±0.032 0.339±0.042 0.382±0.034 0.387±0.030 0.391±0.030 eases simultaneously. The label set of each gene is defined I(·) denotes the indicator fun ontologies, are more likely tocauseSideEffect0 have similar label sets than the 50 0.346±0.059 0.321±0.063 0.348±0.064 0.360±0.075 0.366±0.078 datasets,%indicaSng%that%PIPL%can% instances, involving multiple(1)types of0.303±0.055 correlations. 10 0.011±0.002 0.013±0.002 (6) 0.011±0.002 (1) 0.011±0.003 (1) 0.011±0.003 (1) 0.011±0.002 (1) inTissue0 chemicals without such connections. 20 0.008±0.001 (1) 10 (6) 0.008±0.000 0.008±0.001 (1) 0.008±0.001 (1) 0.008±0.001 (1) 0.011±0.002 0.013±0.002 0.011±0.002 0.011±0.003 0.011±0.003 0.011±0.002 exploit%HIN%to%extract%correlaSons% In this paper, we study how 0.010±0.000 we can facilitate the (1) multiTable 4: Examples of meta-paths used in PIPL method Hamming Loss # 30 0.008±0.000 (5) 20 0.009±0.000 (6) 0.007±0.000 (1) 0.007±0.001 (1) 0.007±0.001 (1) 0.007±0.000 (1) 0.008±0.001 0.010±0.000 0.008±0.000 0.008±0.001 0.008±0.001 0.008±0.001 By mining the linkage structure of heterogeneous informaamong%instances%and%labels%for% 40 0.007±0.000 (4) 0.007±0.000 (4) 0.007±0.000 (4) 0.006±0.000 (1) 0.006±0.000 (1) 0.006±0.000 (1) label classification process by the correlations among Hamming Loss # mining 30 0.008±0.000 0.009±0.000 0.007±0.000 0.007±0.001 0.007±0.001 0.007±0.000 Task Meta-path Correlation tionbind0 networks, multiple types of relationships among di↵er50 0.006±0.001 (1) 40 0.007±0.001 (6) 0.006±0.001 (1) 0.006±0.001 (1) 0.006±0.001 (1) 0.006±0.001 (1) 0.007±0.000 0.007±0.000 0.007±0.000 0.006±0.000 0.006±0.000 0.006±0.000 mulS8label%classificaSon. instances and labels from heterogeneous information net- 0.006±0.001 treated treat 50 0.006±0.001 0.007±0.001Disease 0.006±0.001 0.006±0.001 0.006±0.001 !Chemical compound !Disease label correlation ent class labels and data samples can be extracted. Such re10 0.108±0.023 (5) 0.125±0.020 (6) 0.107±0.020 (4) 0.103±0.024 (1) 0.103±0.024 (1) 0.103±0.023 (1) inGeneFamily0 works. We propose a novel (4) solution, called PIPL, to (5) assign treated has0.103±0.024 in0.103±0.023 treat 10 0.108±0.023 0.125±0.020 0.107±0.020 0.103±0.024 20 0.153±0.008 0.180±0.011 (6) 0.154±0.006 0.148±0.005 (2) 0.148±0.009 (2) 0.147±0.007 (1) Gene-Disease Disease !Chemical compound !Substructure !Chemical compound !Disease label correlation lationships can then be used to infer the correlations among Subset 20 0.153±0.008 0.180±0.011 0.154±0.006 0.148±0.005 0.148±0.009 0.147±0.007 0/1 Loss # 30 0.197±0.009 0.212±0.006 (6) 0.191±0.010 (5) 0.195±0.011 (3) 0.192±0.010 (2) 0.191±0.009 (1) treated bind a set of candidate labels to(4)a group of related instances in cause Prediction Gene !Disease !Chemical compound !Gene instance correlation Subset 0/1 Loss (4) # 30 0.197±0.009 0.212±0.006 0.191±0.010 0.195±0.011 0.192±0.010 0.191±0.009 causeDisease0 di↵erent class labels in general, and the dependencies among 40 0.227±0.018 0.238±0.016 (5) 0.228±0.017 (1) 0.219±0.018 (3) 0.218±0.014 (2) 0.216±0.015 (1) binded cause 0.218±0.014caused0.216±0.015 bind heterogeneous Di↵erent from previ0.227±0.018 0.238±0.016 0.228±0.017 0.219±0.018 50 information 0.255±0.026 networks. (5) 40 0.275±0.029 (6) 0.250±0.030 (4) 0.244±0.028 (3) 0.243±0.028 (2) 0.241±0.028 (1) Gene !Chemical compound !Side e↵ect !Chemical compound !Gene instance correlation the label sets of di↵erent data samples. In this paper, we fo-Table 3: performances “average score ± std 0.275±0.029 ” on drug-target binding prediction task. “#” 0.241±0.028 0.255±0.026 0.244±0.028 0.243±0.028 ousClassification work, the proposed PIPL, 50 as shown in Figure(rank) 2, can ex- P0.250±0.030 P Ithe larger the value the better the the smaller the value the better the performance; “"” indicates Gene !Gene label correlation cus on studying the problem of multi-label classification byindicates hasSubstructure0 treatDisease0 otherwise I(⇡) = 0. The smaller the value, the better the classification method in heterogenous information netploit various types of dependencies among both of instances has has Task 2: Drug-Target Prediction performance. Gene% otherwise I(⇡) = 0. The smaller the value, the better the classification method in heterogenous information netDrug-Target Gene !pathway !Genecollective label correlation mining label and instance correlations from heterogeneous performance. works [18] to perform the binary classifica- collective classificaand labels basedperformance. upon di↵erent meta-paths in heterogeneous works [18] bind to perform P P I the binary binded Prediction Chemical compound !Genemeta-path !Gene based!Chemical compound instance correlation tion.methods This method can exploit information networks. The major research challenges are as Family tion. This the method can exploit the meta-path based networks. ByBsvm explicitly exploiting these metahas information has binded 4.3 information Compared criteria Methods #label Ecc PIsl PIml PIPL instance correlation within a bind heterogenous 4.3 Compared Methods Chemical compound !Gene !Tissue !Gene !Chemical compound instance correlation instance correlation within a heterogenous information follows: pathtobased dependencies, ourof our PIPL method can e↵ectively In order demonstrate the multi-label network. Ine↵ectiveness order to demonstrate e↵ectiveness of multi-label network. 10 0.532±0.046 (5) the 0.576±0.053 (4)our 0.608±0.046 (3) 0.611±0.040 (2) 0.625±0.042 (1) Mining heterogeneous label correlations: Substructure % In multi-label classification approach, weclassification compared the following approach, we methods compared the following methods capture the diverse and complex relationships in- label 20 0.553±0.019 (5) 0.588±0.018 (4) among 0.696±0.016 (3) 0.714±0.011 (2) 0.724±0.011 (1) • Icml (simple correlation + instance correlation • Icml (simple label correlation + instance correlation in Table 1): C C classification, the multiple label can be correlated (summarized H H concepts in Table 1): Micro-F1 " labels. 30 (summarized 0.536±0.052 (5) 0.585±0.054 (4) 0.674±0.032 0.695±0.025 (2) 0.706±0.026 (1) stances and Empirical studies on real-world tasks (3) homogeneous network): This method was proposed in homogeneous network): This method was proposed O 40 0.523±0.018 (5) 0.568±0.022 (4) in 0.599±0.022 (3) 0.618±0.022 (2) 0.642±0.022 (1) N with each other through multiple types ofO relationships. For in [17] which exploit relational features interin [17] which(2)can exploitfor relational features for inter• show Bsvm (binary SVM): The first baselineSVM): method uses • Bsvm (binary The first baseline uses that the proposed method can0.571±0.036 significantly boost the can 50 0.521±0.028 (5) (4) method 0.603±0.031 (3) 0.635±0.028 0.653±0.026 (1) example, di↵erent gene targets can be correlated for various dependencies based netupon homogeneous netinstance dependenciesinstance based upon homogeneous binary method to solve multi-label clasbinary decomposition method to decomposition solve multi-label clasperformance of multi-label classification by incorporating inExample:! 10whichsification 0.021±0.003 0.020±0.003 (2) 0.020±0.002 (3) 0.018±0.002 (1) work for classification. multi-label collective classification. Since this which is similar to(4)[2].work The multifor multi-label collective Since this sification problems, is0.024±0.003 similarproblems, to [2].(5)The multi20 divided 0.019±0.001 (5) 0.017±0.000 (4) 0.012±0.001 (3) 0.012±0.001 (2) (1) formation method requires a0.011±0.001 homogeneous network among the label dataset isinformation first divided intonetworks. multiple single-label method requires a homogeneous network among the 1 label dataset from is first heterogeneous into multiple single-label relationships among labels instances of meta-paths used SLAPindataset this dataset is an information network Hamming Loss # 30 0.018±0.002 (5) 0.016±0.002 (4) 0.012±0.001 (3) 0.011±0.000 (2) (1) can 0.010±0.000 only run Genethis method on the Genedatasets by one-vs-all binary For eachwe can Tableand 4: Examples PIPL [4]: method instances, onlyinstances, run this we method on the datasets by one-vs-all decomposition. For eachdecomposition. 40 binary 0.017±0.001 (5) 0.015±0.001 (4) 0.014±0.001 (3) 0.013±0.001 (2) 0.012±0.001 (1) that integrates many datasets into a single framework using Disease association prediction task, where the PPI netbinary classification task, we use the SVM as the base Disease association prediction task, where the PPI netbinary classification we0.016±0.001 use the SVM(5) as the base 50 task, classifier. 0.014±0.001 (4) 0.013±0.001 (3) 0.012±0.001 0.011±0.001 (1) Task Meta-path Correlation work is used(2) as the homogeneous Semantic Web technologies for drug discovery. It includes Then the all is labels work used as the homogeneous network among the network among the classifier. Then the predictions of SVMs for predictions all labels of SVMs for 2. PROBLEM DEFINITION gene instances. combined to (5) make the asfinal prediction. Bsvm as10 theare 0.147±0.012 0.128±0.017 (4) 0.123±0.011 (2) 0.124±0.010 (3) 0.113±0.010 (1) public datasets related to systems chemical biology: such as gene instances. are combined to make final prediction. Bsvm treated treat sumes all introduce the labels all instances are independent. 20andwe 0.222±0.009 (5) and 0.193±0.006 (4) 0.165±0.011 (2) 0.148±0.004 (1) Disease !Chemical compound !Disease correlation In this first some related concepts (3) 0.163±0.010 PubChem, DrugBank, PPI, SIDER,label CTD diseases, KEGG sumes all thesection, labels all instances are independent. • PImlNumber (simple label correlation + meta-path based in- Number of features of features Subset 0/1 Loss # 30 0.265±0.019 (5) 0.223±0.029 (4)• PIml 0.214±0.007 (3) 0.207±0.004 (2) 0.182±0.003 (1) (simple label correlation + meta-path based intreated has in Pathways, etc. treat and notations,40then define the problem. stance correlation): This method is extended from PIsl • Ecc (multi-label classification + ensemble): This baseGene-Disease Disease !Chemical compound !Substructure !Chemical compound !Disease label correlation 0.305±0.008 (5) 0.250±0.004 (2) 0.268±0.010 (4) 0.257±0.010 (3) 0.223±0.010 (1) Side% Effect% Tissue% (rank) Gene% Chemical% Compound% % (6) (4) (5) (3) (2) (1) (6) (4) (5) (3) (2) (1) (6) (2) (5) (4) (3) (1) (5) (3) (6) (4) (2) (1) (6) (4) (5) (3) (2) (1) (1) (6) (1) (1) (1) (1) (1) (6) (1) (1) (1) (1) (5) (6) (1) (1) (1) (1) (4) (4) (4) (1) (1) (1) (1) (6) (1) (1) (1) (1) (5) (6) (4) (1) (1) (1) (4) (6) (5) (2) (2) (1) (4) (6) (5) (3) (2) (1) (4) (5) (1) (3) (2) (1) (5) (6) (4) (3) (2) (1) Disease% 0.38 0.70 Micro−F1 Micro−F1 0.65 0.34 0.60 0.55 BSVM ECC PISL PIML PIPL ICML 0.30 500 Prediction Gene Gene Gene Drug-Target Prediction Gene cause !Disease treated !Chemical compound binded !Chemical compound cause bind !Side e↵ect !Gene caused !Chemical compound instance correlation bind PPI !Gene has !pathway !Gene instance correlation label correlation has Chemical compound Chemical compound !Gene bind PPI binded bind has has !Gene !Gene label correlation !Gene !Tissue !Chemical compound !Gene binded !Chemical compound 0.38 0.70 0.65 instance correlation instance correlation 1000 1500 2000 2500 BSVM ECC PISL PIML 0.50 PIPL 3000 3500 4000 4500 5000 0.45 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 stance correlation): This method is extended from PIsl according to inter• Ecc (multi-label classification + ensemble): ensemble This base-of classifier chains [18] by adding relational features line method is an (CC) 0.351±0.009 (5) 0.288±0.018 (2) 0.306±0.013 (4) 0.288±0.020 (3) 0.261±0.017 (1) (a) Gene-Disease Association Prediction [18] by adding relational features according to interline method is an50ensemble classifier chains (CC) by training (b) Drug-Target Binding Prediction instance-cross-label dependencies for multi-label col[27].of The ensemble is created di↵erent instance-cross-label dependencies for multi-label [27]. The ensemble is created training lective classification [17]. This colmethod can only exploit classifierbychains usingdi↵erent randomly sampled subset of inlective classification [17]. This method can only exploit classifier using randomly sampled subset of intion): The chains proposed method for multi-label collective tion task, meta-path based methods designed for heterothe instance correlations from heterogeneous networks. stances with random label orders. Figure 5: Micro-F1 scores with di↵erent number of features. the instance correlations from heterogeneous networks. stances with random label orders. classification in heterogenous information networks. The However, label can correlations in this method geous networks (i.e., PIsl, PIml,the PIPL) achieve used better • PIsl (binary decomposition + meta-pathHowever, based in-the label correlations in from this method are directly used derived data samples instead of us- di↵erence PIPL + and PIml is that PIml performances than the Icml that only exploits the homoge• PIsl (binary between decomposition meta-path based instance correlation): We compare with another base- derived 5only Tablefrom 5: ofnetworks. relatedofdatasets used in network classification. are network directly data samples instead using Summary heterogeneous Acknowledgements does not consider the meta-path based label correlaneous among instances. stance correlation): We compare another base- For each ing line using with binary decomposition. binary clasheterogeneous type PIPL of #types tion. We collective further observenetworks. that the (meta-path proposed the label correlasificationFor task, webinary use theclasmeta-path based • PIPL based performs instance and line using binary decomposition. each dataset publication networkmethods.classification (node,link) #Nodes best among all compared Especially, PIPL outsification task, we use the meta-path based collective • PIPL (meta-path based instance and label correlaFor a fair comparison, we use LibSVM [3] with linear ker- Geneperforms PIsl, by takingsingle-label meta-path based(1,1) label [34] PIml andhomogeneous 1,243 nel and default parameter as the base classifier for all the Citeseer correlation heterogeneous informa[22] into consideration. homogeneous In single-label (1,1) 3,312 compared methods. The maximum number of iterations in WebKB tion networks, bothhomogeneous of instances and candidate labels can [7] single-label (1,1) 3,877 the methods PIPL, PIml and PIsl are all set as 10. be correlated with homogeneous each other via diverse semantic meanDBLP [17] multi-label (1,1) 4638 show four examples of the meta (5,5) paths ACMings. In [18]Table 4, weheterogeneous single-label 12,499 4.4 Performances of Multi-Label Classification Coraused by [23]PIPL method heterogeneous single-label (5,5)to 4,330 in both tasks, which correspond [1] heterogeneous single-label 1,382 label correlations and instance correlations separately.(5,5) Such We first study the e↵ectiveness of the proposed PIPL IMDB NASD [24] heterogeneous single-label (5,6) 22,000 Node Types This%work%is%supported%in%part%by%NSF%through%grants%IIS80905215,%CNS81115234,%IIS80914934,%DBI80960443,% gene and%OISE81129076,%US%Department%of%Army%through%grant%W911NF8%128180066,%and%Huawei%grant.% paper webpage author paper,author,conference,proceeding,school paper, author, journal, publisher, editor movie, actor, director, studio, producer broker, branck, firm, disclosure, regulator
© Copyright 2024 Paperzz