| Project | Rearrangements in genomes with unequal content |
|---|---|
| Date | 30/12/05 |
| Version | 1.0 |
| Purpose of Meeting | Discuss experimental results |
| Supervisor present | Leong Hon Wai |
In order to assess the characteristics of the best rearrangements algorithm, I carried out a number of experiments which are similar to the ones presented in Bourque and Pevzner 2002.
The first set of experiments concerns the solution to the median problem, the input data is a set of 3 genomes which are generated using k rearrangements from the ancestor. Various values of k was used to vary the ratio of the (number of reversals)/(number of markers). For each ratio, the score of the recovered tree as well as the rearrangement distance between the recovered median and the true median was computed by both the best rearrangements algorithm and the original MGR algorithm. The choice of rearrangements is determined by the percentage of reversals and deletions. The following set of percentages was used in the experiment: 10% deletions, 20% deletions, 30% deletions, 40% deletions (the percentage of reversals is simply 100 - % deletions).
The second set of experiments was to investigate how the number of genomes affect the score. To generate a tree for n genomes, we first generate a complete binary tree which have at least n leave node. Excess leave nodes are randomly removed to give as a tree with exactly n leave nodes. Each edge in this tree is assigned a weight of k, which represents the number of rearrangement events along this edge. Then a simulation is carried out along the tree starting with the identity permutation at the root to generate n genomes. These n genomes act as the input to the reconstruction algorithms. Similar to the first set of experiments, we vary the percentage of deletion events.
The final results show that the score of the best rearrangements algorithm was always higher than the original MGR algorithm and thus closer to the expected score. However the difference was not very significant. Hopefully more conclusive results can be seen when comparing the RF distance of the trees recovered by the algorithm and the true truee.
I will also be looking into incorporating the rearrangement distance between the genomes into rearrangement score function. One possibility is to use (min dis)/(dist). How to combine this to the existing function which uses the percentage of common genes remains unsolved.
As I have pointed out in the the experimental results, the new algorithm actually recovers trees with a higher score than the orignal MGR algorithm. This means that the original problem formulation of finding the tree with smallest score may not be a good criteria in the case of unequal content.