📄 homologysearch.txt
字号:
*** GENOME EXPLORER HELP FILE ***To provide help on using the GUI, and information about how the programs runContents1) Outline of Function2) Parameters loaded from the .inf file (settings menu)3) The User Interface4) Underlying Method--- HOMOLOGY SEARCH ------ 1) Outline of Function ---To search a phylogeny for homologous sequences.Uses blast to search for homologues - using the evalue and score criteriaset by the user.Produces an output file describing the input data, and listingthe homologues. Also produces a namelist file to enable the user to refer backfrom the numerical naming system in the homology file to the sequence descriptions from the input fasta filesREQUIRES chromosome fasta files to have sequences in the CORRECT ORDER--- 2) Parameters loaded from the .inf file (settings menu) ---blastExePath path to the blastall executable (blastall.exe)formatdbExePath path to the formatdb executable (formatdb.exe)fastacmdExePath path to the fastacmd executable (fastacmd.exe)blastOutfileExt file extension for the blast report (.blst)blastParserFilename default outfile name (not path) for the summary file of parsed blast reports for an entire fasta file of sequences (blastParsed.sum) Will be written to blastParserOutDir blastTempSeqFilename filename (not path) for the single sequence file to be overwritten each time a sequence from a fasta file gets blasted. Will be written to blastTemSeqDir ("seq.temp") blastTempSeqDir directory in which to write single sequences pulled from fasta files so that they can be blasted individually blastTempDbFilename filename (not path) for a temporary database (if one needs to be created from a fasta file). Will refer to several files with different extensions, so does NOT need an extension here (dbtemp) blastTempDbDir directory in which to write a temporary blastable database (if one needs to be created from a fasta file)homolExt file extension for files produced by this program (.homol)homolOutDirPath default directory in which to write the final outfile (text representing a Homology object - infile for geneOrderMatrix program)outDirPath default directory in which to write all files produced in generating the final homology filegenomeDirPath directory in which to open file chooser when browsing for phylogenies / chromosome.fasta files --- 3) The User Interface ---phylogenies generated by evolve / user defined phylogenies (evolve directory structure) / user defined phylogenies radio buttons to switch between layouts for data entry in the "select phylogenies" tab. 'phylogenies generated by evolve' provides a simple entry form because the data is output in a consistent format. This format is a directory for each phylogeny, within that, a directory for each species, within that, a fasta file for each chromosome - with sequences listed in the order in which they appear on the chromosome. The user defined phylogenies (evolve directory structure) allows the user to take advantage of this directory structure to easily enter their own data.BASIC PARAMETERS TABprotein sequences / dna sequences radio button switch indicating the type of sequences being used. This effects the blast program used to look for homologues (blastp or blastn) cut-off evalue the maximum evalue of a hit that is considered a homologue (must also satisfy score criteria)cut-off score the minimum score of a hit that is considered a homologue (must also satisfy evalue criteria)only include genes with homologues in output If the homology file is going to be used as a basis for a phylogeny, it simplifies things if only the genes with at least one homologue (somewhere amongst all other chromosomes) be included in the final file.evolve input - use only genes Evolve has several different types of sequence - genes, introns, transposons and footprints. ticking this box allows the program to ignore anything that isn't a gene when building the homology file.evolve input - parse by seq names (no blast) If the evolve files have full evolve names that enable the ancestry of each gene to be calculated from the name, this option uses that information and will run a lot quicker than the blast method.select directory to output final homology file select the directory to which the final homology file will be writtenselect directory for all output files An inner directory will be created within this one, to hold only files output by this run of the program. This includes all blast reports, parsed blast reports, AND the TRANSLATION files to enable gene ids in the final homology file to be related back to actual sequence descriptions.enter name for final homology file the name (not path) of the final homology file. An extension (set below) will be added. If you are writing all homology files to the same directory over several runs of this program, ensure you name this file so as not to overwrite any others.enter extension for final homology file the extension that will be added to the final homology file (defaults to homolExt in settings) SELECT PHYLOGENIES (ANY) TABadd / delete species enter / remove species name in species list box. Edit name to something meaningful!add chromosome add path to a fasta file containing all sequences in a chromosome, in the order in which they occur. The ORDERING is VERY IMPORTANT. Edit the name of the chromosome to something meaningful! Will not allow two chromosomes with identical paths (i.e. the same file) to be added twice - even in different species.delete selected chromosome removes the selected chromosome from the listSELECT PHYLOGENIES (EVOLVE STRUCTURE) TABselect / remove directories these are directories with evolved genomes in. If unaltered evolve output, the directories will have names like phy_1, phy_2 etc. Each phylogeny is analysed independently, and a homology file will be produced for each in turn. In this case final homology files will be named with the name of the phylogeny directory with the homology extension added (e.g. phy_2.homol). --- 4) Underlying Method --- Class to search a phylogeny for homologous sequences. The search is conducted using RunBlast and RunBlastParser objects. A homologous sequences are determined using cut off evalues and scores. Every chromosome to be included in the phylogeny must be in a separate file, and all sequences must appear in the same order in the fasta file as they do on the chromosome. A single fasta file (allSeq.fasta) is constructed from all the chromosomes in all the species in the phylogeny. Each sequence is renamed according to its species, chromosome and chromosome position in a way that the program can easily decipher. A translation file is written to enable the user to relate a program generated sequence id back to the original sequence name (seqNameList.txt). Every sequence in allSeq.fasta is then blasted against a database consisting only of the sequences in allSeq.fasta. The top hit of every sequence will be to itself in the database. Each blast report is parsed to produce a new file listing only the program assigned name of significant hits, the hit start position and the hit stop position. A hit is judged to be significant if it meets the e-value and score criteria specified in the HomologySearchParameters object. The parsed blast reports are used to construct a homology file. Each parsed report in turn is read by the program. All sequences in a single parsed file are assigned the same homologueId, and a direction value based on the hit start position and hit stop position (if start position is less than stop position the hit sequence ran in the same direction as the query sequence and is assigned the value 'true' for its direction. Otherwise it is assigned the value 'false'). If a sequence has appeared before in a previous parsed file, all sequences in the current file are assigned the same homologueId as that sequence. All direction values are adjusted accordingly (if sequence A already appears in the homologue list with a homologueId of 5 and a direction of false, but here has a provisional homologueId of 12 and a direction of true, all sequences in this file are reassigned the homologueId 5 and their directions are reversed). This ensures the homologue list is always internally consistent. The final output file is a list of all sequences in the phylogeny detailing speciesNumber, chromosomeNumber, positionOnChromosome, homologueId and direction. A translation section is provided to record the species name corresponding to each speciesNumber and the chromosome name corresponding to each chromosomeNumber. To identify an individual sequence by name, the user must refer to the seqNameList.txt file produced when the sequences are originally renamed. This class creates a new directory based on date and time in which to write all temporary and in-process files.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -