Towards The Use of Machine Learning Algorithms to Predict Human Immunodeficiency Virus Drug Resistance Y. Singh1, M. Hajek2, D. Moodley2, C. Seebregts3 1 Nelson R. Mandela School of Medicine, RSA; 2University of KwaZulu-Natal, RSA; 3 Medical Research Council, South Africa, [email protected] Several factors contribute to the success or failure of antiretroviral treatment (ART), but drug resistance is arguably the most critical. There are two laboratory methods for testing HIV resistance: genotypic and phenotypic assays. Unlike phenotypic assays, genotypic assays are cheaper, faster and can yield multi-drug resistant profiles, and are thus preferred. However, genotypic testing requires interpretation of mutations, genotypic variation patterns and inference of the phenotype using complex rules. The aim of this study was to investigate the use of machine learning algorithms in a computer based HIV resistance prediction tool for patients on highly active anti-retroviral therapy (HAART). Each algorithm takes the pol region of the HIV gene in the infected patent as input, and predicts the phenotypic response of that patient to particular HAART, i.e. it aims to aid in the determination of HIV resistance, by finding a correlation between genotypic and phenotypic data that facilitates classification and regression prediction. Four machine learning algorithms were employed in the interpretation of the genetic data: Support Vector Machines (SVM), Gene Expression Programming (GEP), Particle Swarm Optimization (PSO) and Multi Layer Perceptrons (MLP). We found that the optimal algorithms are Support Vector Machines for reverse transcriptase inhibitors and Backpropagation Multilayer Perceptron for protease inhibitors. Used together they are called the UKZN-implementation. When compared to actual phenotypic resistance profiles, the UKZN-implementation produced an accuracy ranging from 85.4% to 96.3% with an average 88.9% ± 6.8 at 95% confidence interval (CI), and a correlation coefficient that ranges from 0.673 to 0.947 with an average of 0.745 ± 0.16 (95% confidence interval). The UKZN-implementation produced a statistically higher accuracy or at least the same accuracy when compared directly to HIV-db and the results reported by various studies employing Retrogram, ANRS, Geno2Pheno and AntiRetroScan, which are all international gold standards. Gene expression programming and particle swarm optimisation are relatively new additions to the field of classification and regression. This study marks one of the first attempts at using such machine learning algorithms in this problem domain. The comparison of the support vector machines, gene expression programming, multi-layer perceptron and particle swarm optimisation with the Stanford HIV-db contributes to the domain’s current knowledge and understanding of interpretation algorithms. By implementing such algorithms one hopes to understand new, and confirm existing knowledge in the HIV resistance domain. Keywords: HIV drug resistance, interpretation algorithms, bio-informatics, medical artificial intelligence
© Copyright 2025 Paperzz