abstract - Med-e-Tel

Towards The Use of Machine Learning Algorithms to Predict Human
Immunodeficiency Virus Drug Resistance
Y. Singh1, M. Hajek2, D. Moodley2, C. Seebregts3
1
Nelson R. Mandela School of Medicine, RSA; 2University of KwaZulu-Natal, RSA; 3 Medical
Research Council, South Africa, [email protected]
Several factors contribute to the success or failure of antiretroviral treatment (ART), but drug
resistance is arguably the most critical. There are two laboratory methods for testing HIV
resistance: genotypic and phenotypic assays. Unlike phenotypic assays, genotypic assays are
cheaper, faster and can yield multi-drug resistant profiles, and are thus preferred. However,
genotypic testing requires interpretation of mutations, genotypic variation patterns and
inference of the phenotype using complex rules.
The aim of this study was to investigate the use of machine learning algorithms in a computer
based HIV resistance prediction tool for patients on highly active anti-retroviral therapy
(HAART). Each algorithm takes the pol region of the HIV gene in the infected patent as
input, and predicts the phenotypic response of that patient to particular HAART, i.e. it aims to
aid in the determination of HIV resistance, by finding a correlation between genotypic and
phenotypic data that facilitates classification and regression prediction.
Four machine learning algorithms were employed in the interpretation of the genetic data:
Support Vector Machines (SVM), Gene Expression Programming (GEP), Particle Swarm
Optimization (PSO) and Multi Layer Perceptrons (MLP). We found that the optimal
algorithms are Support Vector Machines for reverse transcriptase inhibitors and
Backpropagation Multilayer Perceptron for protease inhibitors. Used together they are called
the UKZN-implementation.
When compared to actual phenotypic resistance profiles, the UKZN-implementation produced
an accuracy ranging from 85.4% to 96.3% with an average 88.9% ± 6.8 at 95% confidence
interval (CI), and a correlation coefficient that ranges from 0.673 to 0.947 with an average of
0.745 ± 0.16 (95% confidence interval). The UKZN-implementation produced a statistically
higher accuracy or at least the same accuracy when compared directly to HIV-db and the
results reported by various studies employing Retrogram, ANRS, Geno2Pheno and
AntiRetroScan, which are all international gold standards.
Gene expression programming and particle swarm optimisation are relatively new additions
to the field of classification and regression. This study marks one of the first attempts at using
such machine learning algorithms in this problem domain. The comparison of the support
vector machines, gene expression programming, multi-layer perceptron and particle swarm
optimisation with the Stanford HIV-db contributes to the domain’s current knowledge and
understanding of interpretation algorithms. By implementing such algorithms one hopes to
understand new, and confirm existing knowledge in the HIV resistance domain.
Keywords: HIV drug resistance, interpretation algorithms, bio-informatics, medical artificial
intelligence