📄 read_ap_txt.m
字号:
function [data, textlines, Nalphabet] = read_ap_txt(filename, max_lines)% READ_AP_TXT Read AP news file and convert to numeric format% [data, textlines, Nalphabet] = read_ap_txt(filename, max_lines)%% If max_lines is omitted, the whole file is read.% data{m} is an integer representation of the m'th sentence.% textlines{m} is the original text of line m% Nalphabet is the number of unique symbols used in the encoding. if nargin < 2, max_num_lines = inf; end% textlines{m} is the m'th line of texttextlines = textread(filename, '%s','delimiter','\n','whitespace','');textlines = lower(textlines); % convert to lower casenlines = length(textlines);% convert each character to a number% asciimap(a) = i means ascii code a will get mapped to integer i% inverse_asciimap(i) = a means integer i is letter aasciimap = zeros(1,128);punctuation = [32:47 58:64 91:96 123:127];digits = 48:57;uppercase = 65:90;lowercase = 97:122;i = 1;for a=lowercase asciimap(a) = i; inverse_asciimap(i) = char(a); i = i + 1;endif 0 for a=digits asciimap(a) = i; inverse_asciimap(i) = char(a); i = i + 1; endelse % map all numbers to a single number (which will print as 0) for a=digits asciimap(a) = i; inverse_asciimap(i) = '0'; end i = i + 1;end% map all punctuation to a single number (which will print as *)for a=punctuation asciimap(a) = i; inverse_asciimap(i) = '*';endNalphabet = i;%assert(Nalphabet == length(lowercase) + length(digits) + 1) % 37assert(Nalphabet == length(lowercase) + 1 + 1) % 28nlines = min(nlines, max_lines);data = cell(1, nlines);for l=1:nlines data{l} = asciimap(double(textlines{l}(1:end-1))); % strip off final \n if any(data{l} < 1) | any(data{l} > Nalphabet) error(['problem with line ' num2str(l) ': ' textlines{l}]); endend
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -