📄 emowse.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
   protein Mw of 50 kDaltons to reduce the influence of random score   accumulation in large proteins (>200 kDaltons). The final score is   thus calculated as:Score = 50/(Pn x H)   Where Pn is the product of n distribution scores and H the 'hit'   protein molecular weight in kD.   Important consequences of this type of scoring scheme are that matches   with peptides of higher Mw carry more scoring weight, and that the   non-random distribution of fragment molecular weights in proteins of   different sizes is compensated for.  Simulation studies   In a simulation of scoring properties, 100 test proteins with masses   between 10 kD and 100 kD were randomly selected from the OWL sequence   database. The sets of all possible tryptic peptide masses for each   protein were randomized and database searches performed with   increasing numbers of fragments (default search parameters) until the   test protein reached the top of the ranked scoring list. 99% of the   test proteins were correctly identified using only five peptides or   less (mean=3.6 peptides), with one example requiring six. These   figures were surprisingly small considering that some of the proteins   in the test sample generated more than 100 possible tryptic fragments.   All 100 test examples were identified using 30% or less of the maximum   number of available peptides.   This distribution was somewhat dependent on protein size, as smaller   proteins generally yield fewer peptide fragments. Thus, all proteins   of 30 kD and over were identified using 13% or less of possible   fragments (1 in 8), with all proteins of 40 kD and above requiring   less than 10% (1 in 10). In this latter group, therefore, more than   90% of the potential information (all possible peptides) was   redundant. For the simulation a 'unique' identification required   matching not only of protein type (e.g. globin) but correct   discrimination of type, species, and isoform or isozyme.   Discrimination could be further improved by reducing the error   tolerance to only +/- 1 Dalton (mean=2.7 peptides). Such accuracies   are easily bettered using more sophisticated ESI/quadrupole or   high-field FAB spectrometers, but the default search value (+/- 2   Daltons) compensates for the reduced accuracy obtainable from the   smaller time-of-flight (TOF) instruments. Mass accuracies better than   +/- 1 Dalton were not essential, and in fact the error tolerance could   be relaxed to +/- 5 Daltons in many cases with little degradation in   performance. The simulation thus clearly demonstrated the high degree   of discrimination afforded by relatively few peptide masses, even with   generous allowance for error.Usage   Here is a sample session with emowse% emowse Protein identification by mass spectrometryInput protein sequence(s): tsw:*Peptide molecular weight values file: test.mowseWhole sequence molwt [0]: Output file [100k_rat.emowse]:    Go to the input files for this example   Go to the output files for this exampleCommand line arguments   Standard (Mandatory) qualifiers:  [-sequence]          seqall     Protein sequence(s) filename and optional                                  format, or reference (input USA)  [-infile]            infile     Peptide molecular weight values file   -weight             integer    [0] Whole sequence molwt (Any integer value)  [-outfile]           outfile    [*.emowse] Output file name   Additional (Optional) qualifiers: (none)   Advanced (Unprompted) qualifiers:   -aadata             datafile   [Eamino.dat] Amino acids properties and                                  molecular weight data file   -frequencies        datafile   [Efreqs.dat] Amino acid frequencies data                                  file   -enzyme             menu       [1] Enzyme or reagent (Values: 1 (Trypsin);                                  2 (Lys-C); 3 (Arg-C); 4 (Asp-N); 5                                  (V8-bicarb); 6 (V8-phosph); 7                                  (Chymotrypsin); 8 (CNBr))   -pcrange            integer    [25] Allowed whole sequence weight                                  variability (Integer from 0 to 75)   -tolerance          float      [0.1] Tolerance (Number from 0.100 to 1.000)   -partials           float      [0.4] Partials factor (Number from 0.100 to                                  1.000)   Associated qualifiers:   "-sequence" associated qualifiers   -sbegin1            integer    Start of each sequence to be used   -send1              integer    End of each sequence to be used   -sreverse1          boolean    Reverse (if DNA)   -sask1              boolean    Ask for begin/end/reverse   -snucleotide1       boolean    Sequence is nucleotide   -sprotein1          boolean    Sequence is protein   -slower1            boolean    Make lower case   -supper1            boolean    Make upper case   -sformat1           string     Input sequence format   -sdbname1           string     Database name   -sid1               string     Entryname   -ufo1               string     UFO features   -fformat1           string     Features format   -fopenfile1         string     Features file name   "-outfile" associated qualifiers   -odirectory3        string     Output directory   General qualifiers:   -auto               boolean    Turn off prompts   -stdout             boolean    Write standard output   -filter             boolean    Read standard input, write standard output   -options            boolean    Prompt for standard and additional values   -debug              boolean    Write debug output to program.dbg   -verbose            boolean    Report some/full command line options   -help               boolean    Report command line options. More                                  information on associated and general                                  qualifiers can be found with -help -verbose   -warning            boolean    Report warnings   -error              boolean    Report errors   -fatal              boolean    Report fatal errors   -die                boolean    Report dying program messagesInput file format  Input files for usage example   'tsw:*' is a sequence entry in the example protein database 'tsw'  File: test.mowse6082.85423.03086.32930.32424.72030.21399.61086.2   The input file is a list of molecular weights of the peptide   fragments. One weight is allowed per line. The example file above is a   Trypsin digest of the protein sw:100K_rat (produced by using the   program digest).   Each molecular weight must be on a line of its own. Masses (M not   M[H+]) are accepted in any order (ascending,descending or mixed).   Peptide masses can be entered as integers or floating-point values,   the latter being rounded to the nearest integer value for the search.   It is suggested that peptide masses should be selected from the range   700-4000 Daltons. This range balances the fact that very small   peptides give little discrimination and minimizes the frequency of   partially-cleaved peptides.   As a general rule, users are advised to identify and remove peptide   masses resulting from autodigestion of the cleavage enzyme (e.g   tryptic fragments of trypsin), best obtained by MS analysis of control   digests containing only the enzyme.   Further information on the partial sequence and/or composition of the   peptides can be given after the weight with a 'seq()' or 'comp()'   specification. This should be formatted like:mw seq(...) comp(...)   where mw is the molecular mass of the fragment, seq(...) is sequence   information and comp(...) is composition information. A line may   contain more than one sequence information qualifiers. For example:     _________________________________________________________________7176 seq(b-t[pqt]ln)174414901433   comp(3[ed]1[p]) seq(gmde)  __________________________________________________________________________  Sequence informationThe sequence information should be given in standardOne-character code. It should be preceded by a prefixas outlined in the table below, to indicate what type of sequenceit is.CAPTION: Prefixes to use with sequence information foremowse Prefix Meaning Example b- N->C sequence seq(b-DEFG) y- C->N sequence seq(y-GFED) *- Orientation unknown seq(*-DEFG)seq(*-GFED) n- N terminal sequence seq(n-ACDE) c- C terminal sequence seq(c-FGHI) The examples are all correct data for apeptide with a sequence ACDEFGHI. Note that *-DEFGwill search for both DEFG and GFEDBoth lower and upper case characters may be used for amino-acids.An unknown amino acid may be indicated by an 'X'.More than one amino acid may be specified for a position byputting them between square brackets.A line may contain several sequence informationqualifiers. An example for a peptide with the actualsequence ACDEFGHI might look like:12345 seq(n-AC[DE]) seq(c-HI)  Composition Information   Composition should consist of a number, followed by the corresponding   amino acid between square brackets. For examplecomp(2[H]0[M]3[DE]*[K])   indicates a peptide which contains 2 histidines, no methionines, 3   acidic residues (glutamic or aspartic acid) and at least 1 lysine.Output file format  Output files for usage example  File: 100k_rat.emowseUsing data fragments of:          1086.2          1399.6          2030.2
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -