Minutes 30/12/05

Project Rearrangements in genomes with unequal content
Date 30/12/05
Version 1.0
Purpose of Meeting Discuss experimental results
Supervisor present Leong Hon Wai

Current Task/Sub-Task

  • Running experiments to compare the best rearrangements algorithm with the original MGR
  • Integrating Hoang's tree distance algorithms into the program
  • Looking at using information about the rearrangement distance in the scoring function for rearrangements
  • Reformulating the computational problem for the case of unequal content
  • Writing up the interim report

Reported On

  • Some results obtained when comparing the best rearrangements algorithm with the original MGR

Discussed

Experimental results

In order to assess the characteristics of the best rearrangements algorithm, I carried out a number of experiments which are similar to the ones presented in Bourque and Pevzner 2002.

The first set of experiments concerns the solution to the median problem, the input data is a set of 3 genomes which are generated using k rearrangements from the ancestor. Various values of k was used to vary the ratio of the (number of reversals)/(number of markers). For each ratio, the score of the recovered tree as well as the rearrangement distance between the recovered median and the true median was computed by both the best rearrangements algorithm and the original MGR algorithm. The choice of rearrangements is determined by the percentage of reversals and deletions. The following set of percentages was used in the experiment: 10% deletions, 20% deletions, 30% deletions, 40% deletions (the percentage of reversals is simply 100 - % deletions).

The second set of experiments was to investigate how the number of genomes affect the score. To generate a tree for n genomes, we first generate a complete binary tree which have at least n leave node. Excess leave nodes are randomly removed to give as a tree with exactly n leave nodes. Each edge in this tree is assigned a weight of k, which represents the number of rearrangement events along this edge. Then a simulation is carried out along the tree starting with the identity permutation at the root to generate n genomes. These n genomes act as the input to the reconstruction algorithms. Similar to the first set of experiments, we vary the percentage of deletion events.

The final results show that the score of the best rearrangements algorithm was always higher than the original MGR algorithm and thus closer to the expected score. However the difference was not very significant. Hopefully more conclusive results can be seen when comparing the RF distance of the trees recovered by the algorithm and the true truee.

Rearrangement score function

I will also be looking into incorporating the rearrangement distance between the genomes into rearrangement score function. One possibility is to use (min dis)/(dist). How to combine this to the existing function which uses the percentage of common genes remains unsolved.

Problems Raised

Revise problem formulation

As I have pointed out in the the experimental results, the new algorithm actually recovers trees with a higher score than the orignal MGR algorithm. This means that the original problem formulation of finding the tree with smallest score may not be a good criteria in the case of unequal content.

Things to report on in next meeting

  • Results for comparing the accuracy of the algorithms using tree distance
 
mgr/mtg_0016.txt · Last modified: 2007/12/29 09:37 (external edit)
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki