📄 emowse.txt

📁 emboss的linux版本的源代码
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
                                  emowse Function   Protein identification by mass spectrometryDescription   Peptide mass information can provide a 'fingerprint' signature   sufficiently discriminating to allow for the unique and rapid   identification of unknown sample proteins, independent of other   analytical methods such as protein sequence analysis. Practical   experience has shown that sample proteins can be uniquely identified   using as few as 3-4 experimentally determined peptide masses when   screened against a fragment database derived from over 50,000   proteins.   Given a one-per-line file of molecular weights cut by   enzymes/reagents, emowse will search a protein database for matches   with the mass spectrometry data.   One of eight cutting enzymes/reagents can be specified and an optional   whole sequence molecular weight.   Determination of molecular weight has always been an important aspect   of the characterization of biological molecules. Protein molecular   weight data, historically obtained by SDS gel electrophoresis or gel   permeation chromatography, has been used establish purity, detect   post-translational modification (such as phosphorylation or   glycosylation) and aid identification. Until just over a decade ago,   mass spectrometric techniques were typically limited to relatively   small biomolecules, as proteins and nucleic acids were too large and   fragile to withstand the harsh physical processes required to induce   ionization. This began to change with the development of 'soft'   ionization methods such as fast atom bombardment (FAB)[1],   electrospray ionisation (ESI) [2,3] and matrix-assisted laser   desorption ionisation (MALDI)[4], which can effect the efficient   transition of large macromolecules from solution or solid crystalline   state into intact, naked molecular ions in the gas phase. As an added   bonus to the protein chemist, sample handling requirements are minimal   and the amounts required for MS analysis are in the same range, or   less, than existing analytical methods.   As well as providing accurate mass information for intact proteins,   such techniques have been routinely used to produce accurate peptide   molecular weight 'fingerprint' maps following digestion of known   proteins with specific proteases. Such maps have been used to confirm   protein sequences (allowing the detection of errors of translation,   mutation or insertion), characterise post-translational modifications   or processing events and assign disulphide bonds [5,6].   Less well appreciated, however, is the extent to which such peptide   mass information can provide a 'fingerprint' signature sufficiently   discriminating to allow for the unique and rapid identification of   unknown sample proteins, independent of other analytical methods such   as protein sequence analysis.   Practical experience has shown that sample proteins can be uniquely   identified using as few as 3- 4 experimentally determined peptide   masses when screened against a fragment database derived from over   50,000 proteins. Experimental errors of a few Daltons are tolerated by   the scoring algorithms, permitting the use of inexpensive   time-of-flight mass spectrometers. As with other types of physical   data, such as amino acid composition or linear sequence, peptide   masses can clearly provide a set of determinants sufficiently unique   to identify or match unknown sample proteins. Peptide mass   fingerprints can prove as discriminating as linear peptide sequence,   but can be obtained in a fraction of the time using less material. In   many cases, this allows for a rapid identification of a sample protein   before committing to protein sequence analysis. Fragment masses also   provide structural information, at the protein level, fully   complementary to large-scale DNA sequencing or mapping projects   [7,8,9].   For each entry in the specified set of sequences to search, emowse   derives both whole sequence molecular weight and calculated peptide   molecular weights for complete digests using the range of cleavage   reagents and rules detailed in Table 1. Cleavage is disallowed if the   target residue is followed by proline (except for CNBr or Asp N). Glu   C (S. aureus V8 protease) cleavages are also inhibited if the adjacent   residue is glutamic acid. Peptide mass calculations are based entirely   on the linear sequence and use the average isotopic masses of   amide-bonded amino acid residues (IUPAC 1987 relative atomic masses).   To allow for N-terminal hydrogen and C-terminal hydroxyl the final   calculated molecular weight of a peptide of N residues is given by the   equation:        N        __        \        /  Residue mass + 18.0153        --        n=1   Molecular weights are rounded to the nearest integer value before   being used. Cysteine residues are calculated as the free thiol,   anticipating that samples are reduced prior to mass analysis. CNBr   fragments are calculated as the homoserine lactone form. Information   relating to post- translational modification (phosphorylation,   glycosylation etc.) is not incorporated into calculation of peptide   masses.  Table 1: Cleavage reagents modelled by emowse.Reagent no.     Reagent                 Cleavage rule        1       Trypsin                 C-term to K/R        2       Lys-C                   C-term to K        3       Arg-C                   C-term to R        4       Asp-N                   N-term to D        5       V8-bicarb               C-term to E        6       V8-phosph               C-term to E/D        7       Chymotrypsin            C-term to F/W/Y/L/M        8       CNBr                    C-term to M   Current versions of emowse also incorporate calculated peptide Mw's   resulting from incomplete or partial cleavages. At present, this is   achieved by computing all nearest-neighbour pairs for each enzyme or   reagent detailed in table 1.  Tolerance   The supplied number specifies the error allowed for mass accuracy of   experimental mass determination. If no figure is specified, a default   tolerance of 2 Daltons will be assumed. If you wish to specify a   different tolerance then follow the qualifier '-tolerance' with the   required number of Daltons. eg: '-tolerance 1'. In this case, supplied   peptide masses will be matched to +/- 1 Daltons. Values of 2-4 are   suggested for data obtained by laser- desorption TOF instruments.   Accuracies of +/- 2 Daltons or better are generally only possible   using an appropriate internal standard (e.g. oxidised insulin B chain)   with TOF instruments. For electrospray or FAB data, a value of 1 can   be selected in most cases. If you have real confidence in mass   determination, specify '0' (zero) to limit matches to the nearest   integer value (effectively +/- 0.5 Daltons). Discrimination is   significantly improved by the selection of a small error tolerance.  Whole sequence molecular weight   This option allows you to give the molwt of the whole protein (if   known). This allows you to limit the search to proteins of this molwt   plus/minus a 'limit' (see below). If unspecified, a whole protein   molwt of 0 is assumed which emowse interprets as "search the whole   database". This will include all proteins up to the maximum size of   just under 700,000 Daltons. You can specify any molwt in Daltons with   this command e.g. '-weight 90000'.  Allowed whole sequence weight variability   This option is used in conjunction with the '-weight' option and is   meaningless without it. It specifies a percentage. Only proteins of   the given Sequence molecular weight +/- this percentage will be   searched. If a Sequence molecular weight is specified but '-pcrange'   is unspecified then '-pcrange ' will default to 25%. To specify a   percentage of 30% use: '-pcrange 30'. In this case, a molecular weight   of 90,000 Daltons was specified and the selection of 30 for the filter   restricts the search to those proteins with masses from 63,000 to   117,000 Daltons. A value of 25 is suggested for initial searches,   which can be progressively widened for subsequent search attempts if   no matches are found. Discrimination is best when the filter   percentage is narrow, but some Mw estimates (particularly from SDS   gels) should be given considerable allowance for error.  Partials factor   This specifies the weighting given to partially-cleaved peptide   fragments, with a range from 0.1 to 1.0. If not specified, the default   value is 0.4. The factor effectively down-weights the score awarded to   a partial fragment by the specified amount. For example, a '-partials'   of 0.25 will reduce the score of partial fragments to 25% (one   quarter) of the score of a complete ('perfect') peptide cleavage   fragment of equal mass.   Computing all possible nearest-neighbour partial fragments adds   significantly to the number of peptides entered in the database (by a   factor of two). The major effect of this is to increase the background   score by increasing the number of random Mw matches, which can   significantly reduce discrimination. The use of a low '-partials'   factor (eg 0.1 - 0.3) is a useful way of limiting this effect -   partial peptide matches will add a little to the cumulative frequency   score, but without compromising discrimination.   More experienced users can utilise the '-partials' factor to optimize   searches where the peptide Mw data contain a significant proportion of   partial cleavage fragments (eg > 30%). In such cases, setting the   '-partials' factor within the range 0.4 - 0.6 can help to improve   discrimination. Conversely, if the digestion is perfect, with no   partial fragments present, the lowest '-partials' factor of 0.1 will   give maximum discrimination.  Program requirements   The emowse search program accepts a single text file containing a list   of experimentally-determined masses, generally selected from the range   700-4,000 Daltons to reduce the influence of partial cleavage   products. The program outputs a ranked hit list comprising the top 30   scores, with information including the protein entry name, text   identifiers, final accumulated scores, matching peptide sequences and   hit versus miss tallies. User-selectable search parameters include an   error tolerance (default +/- 2 Daltons), selection of the enzyme or   reagent used and an intact protein Mw (optional, if known).   For each peptide Mw entry in the data file, emowse matches individual   fragment molecular weights (FMWs) with database entry molecular   weights (DBMWs). A 'hit' is scored when the following criterion is   met:        DBMW-tolerance-1 < FMW < DBMW+tolerance+1   If an intact protein Mw is specified (SMW) then the program prompts   for a molecular weight filter percentage (MWFP). emowse then restricts   the search to those entries which match the following criteria:        R = SMW x MWFP / 100        0 < SMW-R < emowse entry Mol.wt. < SMW+R   Default search parameters are a tolerance of +/- 2 Daltons, intact Mw   specified and the MWFP set to 25.  emowse Scoring scheme   The final scoring scheme is based on the frequency of a fragment   molecular weight being found in a protein of a given range of   molecular weight. OWL database sequence entries were initially grouped   into 10 kDalton intact molecular weight intervals. For each 10 kDalton   protein interval, peptide fragment molecular weights were assigned to   cells of 100 Dalton intervals. The cells therefore contained the   number of times a particular fragment molecular weight occurred in a   protein of any given size. This operation was performed for each   enzyme. Cell frequency values were calculated by dividing each cell   value by the total number of peptides in each 10 kD protein interval.   Cell frequency values for each 10 kDalton interval were then   normalised to the largest cell value (Fmax), with all the cell values   recalculated as:        Cell value = Old value / Fmax   to yield floating point numbers between 0 and 1. These distribution   frequency values, calculated for each cleavage reagent, were then   built into the emowse search program. For every database entry   scanned, all matching fragments contribute to the final score. In the   current implementation, non-matching fragments are ignored (neutral).   For each matching peptide Mw a score is assigned by looking up the   appropriate normalised distribution frequency value. In the case of   multiple 'hits' in any one target protein (i.e. more than one matching   peptide Mw), the distribution frequency scores are multiplied. The   final product score is inverted and then normalised to an 'average'
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -