diana.txt

来自「比对算法的具体应用DNA序列分析 ——基因序列 ——基因表达调控信息寻找基因牵」· 文本代码 · 共 170 行

TXT

170 行

***  GENOME EXPLORER HELP FILE  ***To provide help on using the GUI, and information about how the programs runContents1)  Outline of Function2)  Parameters loaded from the .inf file (settings menu)3)  The User Interface4)  Underlying Method---  DIANA  ------  1)  Outline of Function ---Defined Interval Amino-acid Numerating Algorithm.Allows user to search for a minimum number of the search element within a particular sequence window (length of amino acids / nucleotides within a sequence).Produces an output file for each input file, of all sequences which meet the entered criteria.  The output file will be in fasta format.The search element can be: a) a number of different nucleotides / amino acids to be searched for individually    but counted cumulatively    (e.g. search for Q and N:  in 10 residue window yNabggQNyN: cumulative score: 4) b) a number of different nucleotides / amino acids to be searched for individually    but counted independently    (e.g. search for Q and N:  in 10 residue window yNabggQNyN: Q score:1, N score: 3,     highest overall score: 3)  c) a number of strings searched for individually and counted cumulatively    e.g. search for GGY and YQQ: in 10 residue window aaaGGYQQgg:         GGY score: 1, YQQ score: 1, cumulative score 2    however         search for GG and QQ in 10 residue window aaGGGygQQg         GG score: 1, QQ score: 1, cumulative score 2.         because when GG is found, the search continues from where that 'hit' ends...  d) a number of string searched for individually and counted independently    As for c) but only the highest score will count  ---  2)  Parameters loaded from the .inf file (settings menu) ---fastaLineLength  number of characters per line of fasta output (80)dianaInDir       directory in which to open the infile file chooserdianaOutDir      default directory to which all output files will be writtendianaOutfileExt  file extension for all outfiles (.di_out.fasta)---  3)  The User Interface  ---fasta files to search     Use the add / remove buttons to add / remove files to / from the list box.     All files displayed in the list box will be searched (independently) when      the program is run.select output directory     select the directory to which all output files will be writtenuse infile name as outfile basename     tick this check box to use the name of the file being searched as the basename     for the outfile of sequences that meet criteria.  The file extension     dianaOutfileExt (as set in the settings panel) will be added.select a name for output fasta file     enter the basename for output files - this will be incremented for each infile     and the file extension dianaOutfileExt will be added.       e.g.  outfile_1.diOut, outfile_2.diOut etcmatch min quantity     match a minimum number of residues in a window of a specific size.     match min percentage     match a minimum percentage of residues in a range of window sizes.     min number of matches in window     number of matches per window - if cumulative search, total number of all matches     must equal or exceed this number if sequence is to be a 'hit'.  If NOT cumulative     one individual search must equal or exceed this score for sequence to be a hit.     e.g. search for Q and N in 10 residue window, min number matches = 5          yNQbggQNyN            Q score: 2, N score: 3          cumulative score: 5 therefore hit          OR          individual max score: 3, therefore not a hit     min window size (residues)     the minimum number of residues in the search window (only applicable when a percentage     search is being performed)max window size (residues)     the maximum number of residues in the search window (for a percentage     search) or the fixed window size (for a minimum quantity search).report progress every x sequences     reads the fasta files to be searched in blocks of this many sequences to prevent     memory problems.  Reports progress to the screen every time a batch finishes.count all residue string hits cumulatively     tick this box to count hits cumulatively, as described in the 'outline of function'      section of this documentstrings to search for (; separated)     enter the strings to search for if you do not want to search for individual      nucleotides or amino acids.  Upper/lower case is irrelevant as all data is     converted to lower case before comparisons are madedna sequence / protein sequence / enter text     radio buttons enable the different components for entering searchesnucleotide / amino acid check boxes     tick the boxes of ALL residues you want to search for---  4)  Underlying Method   --- Code for DIANA (Defined Interval Amino acid Numerating Algorithm).  This class has additional functions that allow it to search for single elements, or strings of letters.  ***  FUNCTION  read in a fasta file of protein sequences set a 'window size' in which to scan for a set of amino acids set the amino acids to scan for set a cut off number of key amino acids that must be found in the window to be interesting scan each possible window of each sequence for the key amino acids  fasta file is read in 'batches' of sequences - to avoid hitting the memory ceiling  ***  ASSUMPTIONS    1) that the input file is in FASTA format where '>' character indicates the start      of a protein name   2) that ALL characters in a sequence (except blank spaces) represent residues      and should therefore be counted in length-of-protein counts, even if they      do not represent a single specific residue  ***  ALGORITHM overall form of the diana part of this program    1) get array of names and corresponding array of protein sequences   2) determine average key aa per window (string of windowLength consecutive amino acids in a protein)   3) look at all possible windows and find those rich in key aas   4) output list of all sequences with key aa rich regions, and max key-aa-per-window value  ***

diana.txt - 源码说明

本页面展示了「比对算法的具体应用DNA序列分析 ——基因序列 ——基因表达调控信息寻找基因牵涉到两个方面的工作：识别与基因相关的特殊序列信号预测基因的编码区域结合两个方面的结果确定基因的位置和结构基因表」中的 diana.txt 源码文件，采用文本编程语言编写，共 170 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与基因相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?