Haplotype Determination and Recovery

1. Introduction

An O(nm.alpha(m)) time algorithm is given for inferring haplotypes from genotypes of non-recombinant pedigree data, where n is the number of members, m is the number of sites, and alpha(m) is the inverse of the Ackermann function. The algorithm works on both tree and general pedigree structures with cycles. Constraints between pairs of heterozygous sites are used to resolve unresolved sites for the pedigree, enabling the algorithm to avoid problems previously experienced for non-tree pedigrees.

The algorithm is implemented in C++ and tested it on more than 10,000 cases. The results for 640 cases were analyzed in detail. The remaining test cases were used to check the conjecture that the algorithm always works.

2. Download PHI and data sets

Data set contains 640 pedigrees
Data set (input pedigrees: *.txt; single haplotype solution *.ous; multiple haplotype solutions: *.out)

Executable files (to generate a single solution: nrhd4ss.exe; to generate multiple solutions: nrhd4ms.exe)
Data set contains more than 10000 pedigrees
Data set (input pedigrees: *.txt; single haplotype solution *.ous)

Executable file (to generate a single solution: nrhd4ss.exe)

3. Usages

Extract these zip files into one directory and run the appropriate executable files.

Note: Before running the nrhd4ms.exe file to generate multiple solutions, set stack size to 1000M. This can be done in Unix as follows:

limit stacksize 1000M

Users do not need to increase stack size when executing the nrhd4ss.exe file

4. Contact us

Duong D. Doan, Patricia A. Evans, Joseph D. Horton
Faculty of Computer Science, University of New Brunswick
Fredericton, N.B., Canada, E3B 5A3
b89ct@ unb.ca, pevans@ unb.ca, jdh@ unb.ca