Including scale when analyzing high throughput sequencing datasets
Learning objectives
- Data normalizations make strong, but usually inappropriate, assumptions about the scale of the data
- Explicitly incorporating scale into the normalizations solves common problems in analysis
- ALDEx2 can be used to build posterior models of both measurement and scale error
- The posterior models greatly aid interpretation of problematic datasets such as meta-transcriptomes
Speaker bio
Greg Gloor is Professor and Chair of the Department of Biochemistry. He has had a long interest in the analysis of problematic high throughput sequencing (HTS) data sets that are typically asymmeteric; ie. metagenomes, metatranscriptomes and others. He has been a leading advocate for the use of compositional data (CoDa) analysis methods for the analysis of HTS data, and is the author and maintainer of the most widely used compositional analysis tool on Bioconductor. This presentation describes a collaboration between Gloor and Drs Justin Silverman and Michelle Pistner Nixon, both in the College of Information Science at Penn State.