|Home||Suche | Sitemap | Impressum ||
Development of a Matrix CGH analysis pipeline
Markus Kreuz, Hilmar Berger, Dirk Hasenclever, Markus Löffler
Institute for Medical Informatics, Statistics and Epidemiology
University of Leipzig
Interdisciplinary Centre for Bioinformatics
University of Leipzig
Comparative genome hybridisation (CGH) is a technique to detect local copy number difference between DNA-samples. In short, colour-marked samples of DNA (e.g. normal versus tumour) are competitively hybridised to a relevant clone probe. The ratio of the intensities of the two colours corresponds to the ratio in copy numbers. The high through-put version of this technique, called matrix CGH, allows measuring several thousand clones simultaneously.
Analysis of matrix-CGH data requires a multi-step analysis, up to now often performed manually. Our working group biometrically analysed the involved processes and established a fully automatic analysis pipeline for this sort of data implemented in R.
Steps of analysis include
We developed extensive routine quality checks to detect data problems.
In addition to standard normalisation we developed a new normalisation method to deal with clone-specific biases (Figure 1):
Figure 1. Clone specific bias can be seen in raw-data of N=103 chips on the left side (blue arrows). The stripes disappear after normalisation.
Chip-wise segmentation of DNA-copy number changes
Copy number changes typically involve larger segments of the chromosomes. Information on the localisation of the clones on the genome is therefore used in interpreting the normalised raw data. Having compared several methods we currently recommend the circular binary segmentation method of Olshen 2003:
Segments found have to be classified as loss, normal or gain in copy number in a further step (see Figure 2).
Figure 2. Recurrent regions on Chromosomes: Green and red bars indicate the frequency of gains and losses on Chromosome 1 obtained from 230 matrix CGH-chips. The vertical dashed line indicates the position of the centromer and the thin horizontal lines are the recurrence threshold (black) and smoothed frequency-data (blue). The gain region in the right part is characterized by significant recurrence (i.e. similar behaviour ) over nearly the whole length of the q-arm of the chromosome.