📄 blast.txt
字号:
*** GENOME EXPLORER HELP FILE ***To provide help on using the GUI, and information about how the programs runContents1) Outline of Function2) Parameters loaded from the .inf file (settings menu)3) The User Interface4) Underlying Method--- BLAST ------ 1) Outline of Function ---Blast an entire fasta file of sequences vs either a blastable database, or a fasta file (which will be converted to a temporary database using formatdb). Parse blast reports to extract all information and produce a single summary file suitable for loading into a spreadsheet. Retrieve the sequences of all hits that meet e-value and score criteria, and produce a fastafile for each query sequence and all its hits.Every sequence that has at least one hit, is written to a "fasta file of query sequences with one or more hits"** WARNINGS **Formatdb (the program that converts a fasta file to a blastable database)will struggle and hang if the sequences in the fasta file it is trying to convert do not begin with a unique identifier.Fastacmd (the program that pulls sequences from a blastable database) will fail if thesequence id's have commas (and possibly other punctuation characters) in them.If the progress report screen stops for a long time (more than 30 seconds) then a problem has occurred - go back and check the naming format of the files you're using.--- 2) Parameters loaded from the .inf file (settings menu) ---blastExePath path to the blastall executable (blastall.exe)formatdbExePath path to the formatdb executable (formatdb.exe)fastacmdExePath path to the fastacmd executable (fastacmd.exe)blastFastaFileDir directory in which to open file chooser when browsing for fasta files blastDatabaseDir directory in which to open file chooser when browsing for databasesblastOutDir default directory in which to write blast reportsblastOutfileExt file extension for the blast report (".blst")blastParserOutDir default directory in which to write parsed blast reportsblastParserFilename default outfile name (not path) for the summary file of parsed blast reports for an entire fasta file of sequences ("blastParsed.sum") Will be written to blastParserOutDir blastTempSeqFilename filename (not path) for the single sequence file to be overwritten each time a sequence from a fasta file gets blasted. Will be written to blastTemSeqDir ("seq.temp") blastTempSeqDir directory in which to write single sequences pulled from fasta files so that they can be blasted individually blastTempDbFilename filename (not path) for a temporary database (if one needs to be created from a fasta file). Will refer to several files with different extensions, so does NOT need an extension here ("dbtemp") blastTempDbDir directory in which to write a temporary blastable database (if one needs to be created from a fasta file)--- 3) The User Interface ---INPUT OPTIONS TABselect program to run select one of the programs from the drop down list - please ensure it is the right one!! blastp - protein vs protein blastn - nucleotide vs nucleotide blastx - protein vs nucleotide tblastn - nucleotide vs protein (dna will be translated in all 6 reading frames) tblastx - nucleotide vs nucleotide (dna will be translated in all 6 reading frames) For more help, see the readme files distributed with blastselect whether to blast vs a database or a fasta file The program can happily handle either - but you have to tell it which one its dealing with. A copy of the fasta file will be converted to a temporary database (using fastacmd) and properly indexed for sequence retrieval. select fasta files to blast use the "select query files" button to enter paths to the fasta files of sequences you want to blast (QUERY sequences).select database / fasta file (multiple will be searched as a single virtual db) use the "select databases" or "select fasta files" button to locate the database or fasta file you want to blast queries against. If you have your desired file isn't registering in the file chooser, a) check you have selected the right radio button above, b) check file extensions, c) if you're looking for a database, it probably isn't indexed, so fastacmd won't work, and no sequences can be retrieved for query+hits fasta filesselect cutoff evalue Select a cutoff evalue from the drop down list, OR tick the "enter evalue power" box and write in an evalue power. For 1e-3 the power would be -3.OUTPUT OPTIONS TABselect blast output directory Defaults to blastOutDir - edit (or use file chooser) as required.write in basename for outfiles or tick checkbox Blast reports can be named sequentially as basename_number.extension (e.g. report_1.blst, report_2.blst etc) or using the sequence id taken from the fasta description line (all characters after the '>' and before the first space. Any illegal filename characters will be replaced with '_') create fasta file of all qry seqs that had one or more hits tick the check box if you want to write a fasta file all query sequences that have one or more hits.name fasta file of qry seqs based on qry file and db Option to have the fasta file of all query sequences that have one or more hits named automatically. Alternatively - see next option. enter name for a fasta file of query sequences with one or more hits type in a name for the fasta file to be written. It will list all query sequences with one or more hits.create fasta file of hits for each query sequence tick the check box if you want to write a fasta file of hits for each query sequence that has any. This will produce a fasta file of homologues that can then be aligned (using e.g. clustalw). If you want to append the query sequence to this file, tick the "append the query sequence" checkbox.append the query sequence tick the checkbox to append the query sequence to the fasta file of its hits.filter results by score before writing fasta file tick the checkbox if you want to use score as a 'hit' criteria. e-value varies depending on database size, but score *should* be the same for a given query and hit pair, regardless of the size of the database. if the checkbox is ticked, only hits with a score greater than or equal to the selected score will be included in summary output files. The blast report will still list all hits (filtered only by evalue).--- 4) Underlying Method ---Runs BLAST independently on each sequence in a fasta file,using the same command line parameters each time. Will convert a fasta file to a BLASTable database usingformatdb if required. Uses RunBlastParser and RunRetrieveSeq Objects.RunBlastParser parses native BLAST output files to a list of hitIds thatmeet the evalue and score criteria in the BlastParameters object. This list is then used as input for fastacmd - which is run from withina RunRetrieveSeq Object (so that it can take care of indexing etc). The possible outfiles are (according to criteria specified in the BlastParameters Object): a) A single fasta file of all query sequences which had one or more hits (according to the score and evalue criteria in BlastParameters) b) A fasta file for each query sequence which had one or more hits (according to the score and evalue criteria in BlastParameters) comprising all hit sequences, and the query sequence itself. c) A list file for each query sequence which had one or more hits (according to the score and evalue criteria in BlastParameters) detailing the names/hitId of all hit sequences.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -