Analyzing genes with outlier values in lymphomas
Status: Project selected by K. Kotzbauer
Thesis: Bachelor (A shortened version can be used for Practical Bioinformatics II)
Field: Genomics, Data Analysis
Advisors: Spang, Kohler
Praktikum required: Practical Bioinformatics II
Objective: In expression profiles from lymphomas, most genes show continues distributions across tumors; in fact they are typically normal. However, some genes e.g? immunoglobolins show outlier values in a subset of cases. Detect these outliers (the genes and the cases) and analyze their impact association with clinical phenotypes (like survival).
Data: We have more than 900 expression profiles of lymphomas?
First Steps:
- Get familiar with the lymphoma data sets including the phenotype data tables
- Read on outlier detection algorithms, test them and compare the results of multiple algorithms on the lymphoma data
- Generate a discrete data matrix +1: outlier (high), 0: no outlier value,? -1: outlier (low)
- Mine for associations of this data with phenotypical data
Questions:?? Which genes show outlier values? Do these genes have something in common (function)? Which lymphomas show outlier values? Do these lymphomas have something in common? Do certain genes show outliers in the same cases? How do genes and patients cluster on the outlier data? Are outlier values predictive for survival? Do they correlate with other variables from the phenotype tables?
Start reading:
Hummel et al. NEJM 2005
Kriegel et al. Outlier detection techniques
www.dbs.ifi.lmu.de/~zimek/publications/KDD2010/kdd10-outlier-tutorial.pdf