📄 frestdist.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
                                 frestdist Function   Distance matrix from restriction sites or fragmentsDescription   Distances calculated from restriction sites data or restriction   fragments data. The restriction sites option is the one to use to also   make distances for RAPDs or AFLPs.Algorithm   Restdist reads the same restriction sites format as RESTML and   computes a restriction sites distance. It can also compute a   restriction fragments distance. The original restriction fragments and   restriction sites distance methods were introduced by Nei and Li   (1979). Their original method for restriction fragments is also   available in this program, although its default methods are my   modifications of the original Nei and Li methods.   These two distances assume that the restriction sites are accidental   byproducts of random change of nucleotide sequences. For my   restriction sites distance the DNA sequences are assumed to be   changing according to the Kimura 2-parameter model of DNA change   (Kimura, 1980). The user can set the transition/transversion rate for   the model. For my restriction fragments distance there is there is an   implicit assumption of a Jukes-Cantor (1969) model of change, The user   can also set the parameter of a correction for unequal rates of   evolution between sites in the DNA sequences, using a Gamma   distribution of rates among sites. The Jukes-Cantor model is also   implicit in the restriction fragments distance of Nei and Li(1979). It   does not allow us to correct for a Gamma distribution of rates among   sites.  Restriction Sites Distance   The restriction sites distances use data coded for the presence of   absence of individual restriction sites (usually as + and - or 0 and   1). My distance is based on the proportion, out of all sites observed   in one species or the other, which are present in both species. This   is done to correct for the ascertainment of sites, for the fact that   we are not aware of many sites because they do not appear in any   species.   My distance starts by computing from the particular pair of species   the fraction                 n++   f =  ---------------------         n++ + 1/2 (n+- + n-+)   where n++ is the number of sites contained in both species, n+- is the   number of sites contained in the first of the two species but not in   the second, and n-+ is the number of sites contained in the second of   the two species but not in the first. This is the fraction of sites   that are present in one species which are present in both. Since the   number of sites present in the two species will often differ, the   denominator is the average of the number of sites found in the two   species.   If each restriction site is s nucleotides long, the probability that a   restriction site is present in the other species, given that it is   present in a species, is      Qs,   `where Q is the probability that a nucleotide has no net change as one   goes from the one species to the other. It may have changed in   between; we are interested in the probability that that nucleotide   site is in the same base in both species, irrespective of what has   happened in between. The distance is then computed by finding the   branch length of a two-species tree (connecting these two species with   a single branch) such that Q equals the s-th root of f. For this the   program computes Q for various values of branch length, iterating them   by a Newton-Raphson algorithm until the two quantities are equal.   The resulting distance should be numerically close to the original   restriction sites distance of Nei and Li (1979) when divergence is   small. Theirs computes the probability of retention of a site in a way   that assumes that the site is present in the common ancestor of the   two species. Ours does not make this assumption. It is inspired by   theirs, but differs in this detail. Their distance also assumes a   Jukes-Cantor (1969) model of base change, and does not allow for   transitions being more frequent than transversions. In this sense mine   generalizes theres somewhat. Their distance does include, as mine does   as well, a correction for Gamma distribution of rate of change among   nucleotide sites.   I have made their original distance available here  Restriction Fragments Distance   For restriction fragments data we use a different distance. If we   average over all restriction fragment lengths, each at its own   expected frequency, the probability that the fragment will still be in   existence after a certain amount of branch length, we must take into   account the probability that the two restriction sites at the ends of   the fragment do not mutate, and the probability that no new   restriction site occurs within the fragment in that amount of branch   length. The result for a restriction site length of s is:                Q2s          f = --------               2 - Qs   (The details of the derivation will be given in my forthcoming book   Inferring Phylogenies (to be published by Sinauer Associates in 2001).   Given the observed fraction of restriction sites retained, f, we can   solve a quadratic equation from the above expression for Qs. That   makes it easy to obtain a value of Q, and the branch length can then   be estimated by adjusting it so the probability of a base not changing   is equal to that value. Alternatively, if we use the Nei and Li (1979)   restriction fragments distance, this involves solving for g in the   nonlinear equation       g  =  [ f (3 - 2g) ]1/4   and then the distance is given by       d  =  - (2/r) loge(g)   where r is the length of the restriction site.   Comparing these two restriction fragments distances in a case where   their underlying DNA model is the same (which is when the   transition/transversion ratio of the modified model is set to 0.5),   you will find that they are very close to each other, differing very   little at small distances, with the modified distance become smaller   than the Nei/Li distance at larger distances. It will therefore matter   very little which one you use.  A Comment About RAPDs and AFLPs   Although these distances are designed for restriction sites and   restriction fragments data, they can be applied to RAPD and AFLP data   as well. RAPD (Randomly Amplified Polymorphic DNA) and AFLP (Amplified   Fragment Length Polymorphism) data consist of presence or absence of   individual bands on a gel. The bands are segments of DNA with PCR   primers at each end. These primers are defined sequences of known   length (often about 10 nucleotides each). For AFLPs the reolevant   length is the primer length, plus three nucleotides. Mutation in these   sequences makes them no longer be primers, just as in the case of   restriction sites. Thus a pair of 10-nucleotide primers will behave   much the same as a 20-nucleotide restriction site, for RAPDs (26 for   AFLPs). You can use the restriction sites distance as the distance   between RAPD or AFLP patterns if you set the proper value for the   total length of the site to the total length of the primers (plus 6 in   the case of AFLPs). Of course there are many possible sources of noise   in these data, including confusing fragments of similar length for   each other and having primers near each other in the genome, and these   are not taken into account in the statistical model used here.Usage   Here is a sample session with frestdist% frestdist Distance matrix from restriction sites or fragmentsInput file: restdist.datOutput file [restdist.frestdist]: Restriction site or fragment distances, version 3.6bDistances calculated for species    Alpha        ....    Beta         ...    Gamma        ..    Delta        .    EpsilonDistances written to file "restdist.frestdist"Done.   Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-data]              discretestates File containing one or more sets of                                  restriction data  [-outfile]           outfile    Output file name   Additional (Optional) qualifiers (* if not always prompted):   -[no]restsites      boolean    Restriction sites (put N if you want                                  restriction fragments)   -neili              boolean    Use original Nei/Li model (default uses                                  modified Nei/Li model)*  -gamma              boolean    Gama distributed rates among sites*  -gammacoefficient   float      Coefficient of variation of substitution                                  rate among sites   -ttratio            float      Transition/transversion ratio   -sitelength         integer    Site length   -lower              boolean    Lower triangular distance matrix   -printdata          boolean    Print data at start of run   -[no]progress       boolean    Print indications of progress of run   Advanced (Unprompted) qualifiers: (none)   Associated qualifiers:   "-outfile" associated qualifiers
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -