A Multi-Fragment Heuristic Algorithm of Marker Ordering in Building Genetic Linkage Maps
|Course||Crop Genetics and Breeding|
|Keywords||Linkage map Marked Sort Traveling Salesman Problem (TSP) Approximation algorithm Multi- fragment heuristic algorithm|
Accurate linkage map premise is to carry out genetic research, gene mapping, fine mapping and cloning. With the continuous development of the theory and experimental techniques of molecular biology, there are a lot of high-throughput molecular markers, may be constructed to provide a high-density genetic linkage map. However, the marked increase in the number of the increasingly high demand for building a linkage map algorithm. Construct a linkage map process sort of mark with classic postman problem (traveling salesman problem, abbreviated TSP) similar, this study based on the approximation algorithm for TSP solving problems greedy algorithm based on the the matrix embryos theory (matroid theory) (hereinafter referred to MF), sort of mark after clustering, to simulate different tag number, tag density, population types, missing rate linkage map, analyze the impact of these factors build linkage map, described by comparison with existing methods MF in constructing the effectiveness of high-density linkage map, obtained the following results. 1 MF mark sort algorithm: TSP approximation algorithm is divided into the loop construction algorithm and loop transformation algorithm, the the loop construction algorithm applied greedy algorithm based on matrix germ theory, the algorithm for the TSP as a complete graph; a TSP path is the complete graph on a Hamilton path, that is, a collection of a set of edges, wherein in addition to both ends of the two marked points is 1, all remaining marked points is 2; during operation, the algorithm will generate a series of piece of the loop, so that the algorithm can also be referred to as a multi-fragment heuristic algorithm. In order to make the algorithm more efficient, we combine the the loop construction algorithm and loop transformation algorithm using local search 2-OPT algorithm transformation loop algorithm, the new solution of 2-OPT at a time, will be used as the initial solution re start local search. 2.MF algorithm with other build linkage map algorithm contrast: the fixed linkage map mark the number and density, using the QTL IciMapping software (www.isbreeding.net) to simulate the 20 kinds of groups of the amphiphilic derivative type are represented P1BC1F1 P2BC1F1 , F1DH F1RIL, P1BC1RIL P2BC1RIL, F2, F3, P1BC2F1, P2BC2F1, P1BC2RIL, P2BC2RIL, P1BC1F2, P2BC1F2, P1BC2F2, P2BC2F2, P1BC1DH, P2BC1DH, P1BC2DH and P2BC2DH, population size of 200. These groups cover the vast majority of groups in the study of plant genetic type, such as backcross generation backcross second generation, since the cross-F2 and F3 generation, and generations derived doubled haploid (DH) and recombinant inbred pedigree ( RIL) population. Population types in 20 different markers, comparing MF and accuracy of existing methods build a map of the SER, RCD, RECORD and UG, respectively, estimated the total length of the linkage map, Euclidean distance, correlation coefficient, the correct rate and the most optimum solution to the proportion of the mean and standard deviation. The results show that in all five of the evaluation criteria, the MF are obviously superior to the existing four algorithms. 3 the missing and singular separation tags on the map: In order to verify the effectiveness of the multi-fragment heuristic algorithm chosen for this study real groups of two crops of rice and corn as the object of the test algorithm to real linkage map of the two crops RIL population derived by the parents and F2 populations using QuLine (www.isbreeding.net) based analog and set three levels of deletion rates were 0%, 10%, 20%, three levels singular separation, 3 fitness genotypes were 1:1:1,1:0.6:0.2,1:0.4:0, each simulation are missing and singular separation of different groups of type 200, the population size was 200. MF algorithm and SER, RCD, RECORD and UG algorithm in various simulation groups estimated accuracy, the mean and standard deviation of the Euclidean distance, correlation coefficient, the correct rate and the optimal solution proportion comprehensive standards without missing and singular separation conditions MF construct a linkage map of the quality of the best, and missing the MF algorithm significantly smaller than the other four algorithms, Singular separation almost no effect on the construction of linkage maps. 4. The MF algorithm of ultra-high density validity: In order to test the efficiency of the algorithm in the ultra-high-density linkage map, this study simulates 200,500,1000,1500 and 2000 marked the linkage group, mark all the linkage groups are equally spaced, and the distance between adjacent markers are set O.1cM population size of 200, Intel Core T7200 clocked at 2.0GHz of CPU to run all the algorithms in the C # language environment. With the growth of the number of markers, only SER algorithm time increases in the size of the linkage group than MF time increase the amount of small, remaining three algorithms RCD, RECORD and UG are significantly higher than the increase in MF algorithm time large, although the MF the growth rate of the algorithm running time the SER low, but has a qualitative improvement in accuracy can be said MF algorithm is more suitable to build high-density linkage map of the large sample size.