⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 read_ap_txt.m

📁 上载文件为Matlab环境下的高斯以马尔科夫模型例程
💻 M
字号:
function [data, textlines, Nalphabet] = read_ap_txt(filename, max_lines)% READ_AP_TXT Read AP news file and convert to numeric format% [data, textlines, Nalphabet] = read_ap_txt(filename, max_lines)%% If max_lines is omitted, the whole file is read.% data{m} is an integer representation of the m'th sentence.% textlines{m} is the original text of line m% Nalphabet is the number of unique symbols used in the encoding. if nargin < 2, max_num_lines = inf; end% textlines{m} is the m'th line of texttextlines = textread(filename, '%s','delimiter','\n','whitespace','');textlines = lower(textlines); % convert to lower casenlines = length(textlines);% convert each character to a number% asciimap(a) = i means ascii code a will get mapped to integer i% inverse_asciimap(i) = a means integer i is letter aasciimap = zeros(1,128);punctuation = [32:47 58:64 91:96 123:127];digits = 48:57;uppercase = 65:90;lowercase = 97:122;i = 1;for a=lowercase  asciimap(a) = i;  inverse_asciimap(i) = char(a);  i = i + 1;endif 0  for a=digits    asciimap(a) = i;    inverse_asciimap(i) = char(a);    i = i + 1;  endelse  % map all numbers to a single number (which will print as 0)  for a=digits    asciimap(a) = i;    inverse_asciimap(i) = '0';  end  i = i + 1;end% map all punctuation to a single number (which will print as *)for a=punctuation  asciimap(a) = i;  inverse_asciimap(i) = '*';endNalphabet = i;%assert(Nalphabet == length(lowercase) + length(digits) + 1) % 37assert(Nalphabet == length(lowercase) + 1 + 1) % 28nlines = min(nlines, max_lines);data = cell(1, nlines);for l=1:nlines  data{l} = asciimap(double(textlines{l}(1:end-1))); % strip off final \n   if any(data{l} < 1) | any(data{l} > Nalphabet)    error(['problem with line ' num2str(l) ': ' textlines{l}]);  endend

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -