HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

A variable selection approach for highly correlated predictors in high-dimensional genomic data

Abstract : In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also illustrated on publicly available gene expression data in breast cancer. Our method is implemented in the WLasso R package which is available from the Comprehensive R Archive Network (CRAN).
Complete list of metadata

Contributor : Céline Lévy-Leduc Connect in order to contact the contributor
Submitted on : Thursday, April 14, 2022 - 5:58:48 PM
Last modification on : Friday, April 15, 2022 - 11:22:53 AM


Files produced by the author(s)



Wencan Zhu, Céline Lévy-Leduc, Nils Ternes. A variable selection approach for highly correlated predictors in high-dimensional genomic data. Bioinformatics, Oxford University Press (OUP), 2021, 37 (16), pp.2238-2244. ⟨10.1093/bioinformatics/btab114⟩. ⟨hal-02904344⟩



Record views


Files downloads