📄 prophecy.txt
字号:
prophecy Function Creates matrices/profiles from multiple alignmentsDescription prophecy produces a simple frequency matrix for use by profit or a position specific weighted profile using either the Gribskov (1) or Henikoff (2) method for use by prophet. Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutational-distance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by structural or sequence similarity. The similarity of any other target sequence to the group of aligned probe sequences can be tested by comparing the target to the profile using dynamic programming algorithms. The profile method differs in two major respects from methods of sequence comparison in common use: (i) Any number of known sequences can be used to construct the profile, allowing more information to be used in the testing of the target than is possible with pairwise alignment methods. (ii) The profile includes the penalties for insertion or deletion at each position, which allow one to include the probe secondary structure in the testing scheme. Algorithm For Gribskov the scoring scheme is based on a notion of distance between a sequence and an ancestral or generalized sequence. For Henikoff it is based on weights of the diversity observed at each position in the alignment, rather than on a sequence distance measure.Usage Here is a sample session with prophecy% prophecy Creates matrices/profiles from multiple alignmentsInput (aligned) sequence set: globins.msfProfile type F : Frequency G : Gribskov H : HenikoffSelect type [F]: Enter a name for the profile [mymatrix]: globinsEnter threshold reporting percentage [75]: Output file [globins.prophecy]: Go to the input files for this example Go to the output files for this example Example 2% prophecy Creates matrices/profiles from multiple alignmentsInput (aligned) sequence set: globins.msfProfile type F : Frequency G : Gribskov H : HenikoffSelect type [F]: gScoring matrix [Epprofile]: Enter a name for the profile [mymatrix]: globinsGap opening penalty [3.0]: Gap extension penalty [0.3]: Output file [globins.prophecy]: Go to the output files for this exampleCommand line arguments Standard (Mandatory) qualifiers (* if not always prompted): [-sequence] seqset (Aligned) sequence set filename and optional format, or reference (input USA) -type menu [F] Select type (Values: F (Frequency); G (Gribskov); H (Henikoff))* -datafile matrixf ['Epprofile' for Gribskov type, or EBLOSUM62] Scoring matrix -name string [mymatrix] Enter a name for the profile (Any string is accepted)* -threshold integer [75] Enter threshold reporting percentage (Integer from 1 to 100)* -open float [3.0] Gap opening penalty (Any numeric value)* -extension float [0.3] Gap extension penalty (Any numeric value) [-outfile] outfile [*.prophecy] Output file name Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -odirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write standard output -filter boolean Read standard input, write standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messagesInput file format prophecy reads a protein or a nucleic sequence alignment USA. Input files for usage example File: globins.msf!!AA_MULTIPLE_ALIGNMENT 1.0 ../data/globins.msf MSF: 164 Type: P 25/06/01 CompCheck: 4278 .. Name: HBB_HUMAN Len: 164 Check: 6914 Weight: 0.61 Name: HBB_HORSE Len: 164 Check: 6007 Weight: 0.65 Name: HBA_HUMAN Len: 164 Check: 3921 Weight: 0.65 Name: HBA_HORSE Len: 164 Check: 4770 Weight: 0.83 Name: MYG_PHYCA Len: 164 Check: 7930 Weight: 1.00 Name: GLB5_PETMA Len: 164 Check: 1857 Weight: 0.91 Name: LGB2_LUPLU Len: 164 Check: 2879 Weight: 0.43// 1 50HBB_HUMAN ~~~~~~~~VHLTPEEKSAVTALWGKVN.VDEVGGEALGR.LLVVYPWTQRHBB_HORSE ~~~~~~~~VQLSGEEKAAVLALWDKVN.EEEVGGEALGR.LLVVYPWTQRHBA_HUMAN ~~~~~~~~~~~~~~VLSPADKTNVKAA.WGKVGAHAGEYGAEALERMFLSHBA_HORSE ~~~~~~~~~~~~~~VLSAADKTNVKAA.WSKVGGHAGEYGAEALERMFLGMYG_PHYCA ~~~~~~~VLSEGEWQLVLHVWAKVEAD.VAGHGQDILIR.LFKSHPETLEGLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQELGB2_LUPLU ~~~~~~~~GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKD 51 100HBB_HUMAN FFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSEHBB_HORSE FFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSEHBA_HUMAN FPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDHBA_HORSE FPTTKTYFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDMYG_PHYCA KFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQGLB5_PETMA FFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRDLGB2_LUPLU LFSFLKGTSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKN 101 150HBB_HUMAN LHCDKLH..VDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAHBB_HORSE LHCDKLH..VDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVAHBA_HUMAN LHAHKLR..VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSHBA_HORSE LHAHKLR..VDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSMYG_PHYCA SHATKHK..IPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRGLB5_PETMA LSGKHAK..SFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSALGB2_LUPLU LGSVHVSKGVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELA 151 164HBB_HUMAN NALAHKYH~~~~~~HBB_HORSE NALAHKYH~~~~~~HBA_HUMAN TVLTSKYR~~~~~~HBA_HORSE TVLTSKYR~~~~~~MYG_PHYCA KDIAAKYKELGYQGGLB5_PETMA Y~~~~~~~~~~~~~LGB2_LUPLU IVIKKEMNDAA~~~Output file format The output is a profile file. Output files for usage example File: globins.prophecy# Pure Frequency Matrix# Columns are amino acid counts A->Z# Rows are alignment positions 1->nSimpleName globinsLength 164Maximum score 496Thresh 75Consensus PIVDTGSVVALSEEEKSAVDAAWVKANAVAEVGGHALERGLLALEPATLEFFDSFKDLSTFDASHGSAQVKAHGKKVLDALGAAVAHLDDLEGTLAALSDLHADKLHKGVDPVNFKLLSEALLVTLAAHFGADFTPEVQASLDKALAGVANVLAHKYHDAAYQG0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 01 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -