homologysearch.txt

来自「比对算法的具体应用DNA序列分析 ——基因序列 ——基因表达调控信息寻找基因牵」· 文本代码 · 共 210 行
TXT
210 行
***  GENOME EXPLORER HELP FILE  ***To provide help on using the GUI, and information about how the programs runContents1)  Outline of Function2)  Parameters loaded from the .inf file (settings menu)3)  The User Interface4)  Underlying Method---  HOMOLOGY SEARCH  ------  1)  Outline of Function ---To search a phylogeny for homologous sequences.Uses blast to search for homologues - using the evalue and score criteriaset by the user.Produces an output file describing the input data, and listingthe homologues.  Also produces a namelist file to enable the user to refer backfrom the numerical naming system in the homology file to the sequence descriptions from the input fasta filesREQUIRES chromosome fasta files to have sequences in the CORRECT ORDER---  2)  Parameters loaded from the .inf file (settings menu) ---blastExePath          path to the blastall executable (blastall.exe)formatdbExePath       path to the formatdb executable (formatdb.exe)fastacmdExePath       path to the fastacmd executable (fastacmd.exe)blastOutfileExt       file extension for the blast report (.blst)blastParserFilename   default outfile name (not path) for the summary file of parsed blast                      reports for an entire fasta file of sequences (blastParsed.sum)                      Will be written to blastParserOutDir                      blastTempSeqFilename  filename (not path) for the single sequence file to be overwritten each                      time a sequence from a fasta file gets blasted.  Will be written to                       blastTemSeqDir ("seq.temp")                      blastTempSeqDir       directory in which to write single sequences pulled from fasta files so                      that they can be blasted individually                      blastTempDbFilename   filename (not path) for a temporary database (if one needs to be                      created from a fasta file).  Will refer to several files with different                      extensions, so does NOT need an extension here (dbtemp)                      blastTempDbDir        directory in which to write a temporary blastable database (if one                      needs to be created from a fasta file)homolExt              file extension for files produced by this program (.homol)homolOutDirPath       default directory in which to write the final outfile                      (text representing a Homology object - infile for geneOrderMatrix program)outDirPath            default directory in which to write all files produced in generating the                      final homology filegenomeDirPath         directory in which to open file chooser when browsing for phylogenies                       / chromosome.fasta files ---  3)  The User Interface  ---phylogenies generated by evolve / user defined phylogenies (evolve directory structure) / user defined phylogenies     radio buttons to switch between layouts for data entry in the "select phylogenies"    tab.  'phylogenies generated by evolve' provides a simple entry form    because the data is output in a consistent format.  This format is a directory for     each phylogeny, within that, a directory for each species, within that, a fasta file    for each chromosome - with sequences listed in the order in which they appear on the    chromosome.  The user defined phylogenies (evolve directory structure) allows the user    to take advantage of this directory structure to easily enter their own data.BASIC PARAMETERS TABprotein sequences / dna sequences    radio button switch indicating the type of sequences being used.  This effects the    blast program used to look for homologues (blastp or blastn)    cut-off evalue     the maximum evalue of a hit that is considered a homologue     (must also satisfy score criteria)cut-off score    the minimum score of a hit that is considered a homologue    (must also satisfy evalue criteria)only include genes with homologues in output    If the homology file is going to be used as a basis for a phylogeny, it simplifies    things if only the genes with at least one homologue (somewhere amongst all     other chromosomes) be included in the final file.evolve input - use only genes    Evolve has several different types of sequence - genes, introns, transposons and     footprints.  ticking this box allows the program to ignore anything that isn't a     gene when building the homology file.evolve input - parse by seq names (no blast)    If the evolve files have full evolve names that enable the ancestry of each gene to    be calculated from the name, this option uses that information and will run a lot    quicker than the blast method.select directory to output final homology file    select the directory to which the final homology file will be writtenselect directory for all output files    An inner directory will be created within this one, to hold only files output by    this run of the program.  This includes all blast reports, parsed blast reports,     AND the TRANSLATION files to enable gene ids in the final homology file to be    related back to actual sequence descriptions.enter name for final homology file    the name (not path) of the final homology file.  An extension (set below)    will be added.  If you are writing all homology files to the same directory over    several runs of this program, ensure you name this file so as not to overwrite    any others.enter extension for final homology file    the extension that will be added to the final homology file (defaults to     homolExt in settings)    SELECT PHYLOGENIES (ANY) TABadd / delete species      enter / remove species name in species list box.  Edit name to something    meaningful!add chromosome    add path to a fasta file containing all sequences in a chromosome, in the order    in which they occur.  The ORDERING is VERY IMPORTANT.    Edit the name of the chromosome to something meaningful!    Will not allow two chromosomes with identical paths (i.e. the same file) to be    added twice - even in different species.delete selected chromosome    removes the selected chromosome from the listSELECT PHYLOGENIES (EVOLVE STRUCTURE) TABselect / remove directories    these are directories with evolved genomes in.  If unaltered evolve output, the    directories will have names like phy_1, phy_2 etc.    Each phylogeny is analysed independently, and a homology file will be produced     for each in turn.  In this case final homology files will be named with the name    of the phylogeny directory with the homology extension added (e.g. phy_2.homol). ---  4)  Underlying Method   --- Class to search a phylogeny for homologous sequences.  The search is conducted using RunBlast and RunBlastParser objects.  A homologous sequences are determined using cut off evalues and scores.  Every chromosome to be included in the phylogeny must be in a separate file,  and all sequences must appear in the same order in the fasta file as they do on  the chromosome.  A single fasta file (allSeq.fasta) is constructed from all the chromosomes in all  the species in the phylogeny.  Each sequence is renamed according to its species,  chromosome and chromosome position in a way that the program can easily decipher.   A translation file is written to enable the user to relate a program generated sequence  id back to the original sequence name (seqNameList.txt).  Every sequence in allSeq.fasta is then blasted against a database consisting  only of the sequences in allSeq.fasta.  The top hit of every sequence will be to  itself in the database.  Each blast report is parsed to produce a new file listing only the program  assigned name of significant hits, the hit start position and the hit stop position.   A hit is judged to be significant if it meets the e-value and score criteria specified  in the HomologySearchParameters object.  The parsed blast reports are used to construct a homology file.  Each parsed report  in turn is read by the program. All sequences in a single parsed file are assigned  the same homologueId, and a direction value based on the hit start position and hit  stop position (if start position is less than stop position the hit sequence ran in the  same direction as the query sequence and is assigned the value 'true' for its direction.   Otherwise it is assigned the value 'false').  If a sequence has appeared before in a previous parsed file, all sequences in the  current file are assigned the same homologueId as that sequence.  All direction  values are adjusted accordingly (if sequence A already appears in the homologue  list with a homologueId of 5 and a direction of false, but here has a  provisional homologueId of 12 and a direction of true, all sequences in this  file are reassigned the homologueId 5 and their directions are reversed).  This  ensures the homologue list is always internally consistent.  The final output file is a list of all sequences in the phylogeny detailing  speciesNumber, chromosomeNumber, positionOnChromosome, homologueId and direction.  A translation section is provided to record the species name corresponding to each  speciesNumber and the chromosome name corresponding to each chromosomeNumber.   To identify an individual sequence by name, the user must refer to the seqNameList.txt  file produced when the sequences are originally renamed.  This class creates a new directory based on date and time in which to write all temporary and in-process files.
homologysearch.txt - 源码说明

本页面展示了「比对算法的具体应用DNA序列分析 ——基因序列 ——基因表达调控信息寻找基因牵涉到两个方面的工作：识别与基因相关的特殊序列信号预测基因的编码区域结合两个方面的结果确定基因的位置和结构基因表」中的 homologysearch.txt 源码文件，采用文本编程语言编写，共 210 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与基因相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?