📄 sqlfun10.sas
字号:
/****************************************************************/ /* S A S S A M P L E L I B R A R Y */ /* */ /* NAME: SQLFUN10 */ /* TITLE: fun/interesting applications of PROC SQL. (fun10) */ /* PRODUCT: BASE */ /* SYSTEM: ALL */ /* KEYS: SQL DATMAN WHERE TRANSLATE LIKE CREATE TABLE */ /* PROCS: SQL */ /* DATA: */ /* */ /* SUPPORT: pmk UPDATE: */ /* REF: */ /* MISC: this example was contributed by Howard Schreier */ /* of the US Dept. of Commerce, via BITNET */ /* */ /* you can contribute your interesting samples. */ /* send internet email to KENT@UNX.SAS.COM or */ /* USmail to SAS Institute. */ /* */ /****************************************************************/ title1 'SAS SAMPLE LIBRARY, MEMBER(SQLFUN10)'; /* * A User asks: * * I have two sas data sets, which need to be merged by social * security number (SSN). The variable SSN in DATA 1 is clean * and non-missing. But some SSN in DATA 2 have some digits * missing, but the remaining ones ARE still in the RIGHT columns. * For instance, the correct one in DATA 1 is 123456789. But the * missing one in DATA 2 is 1 345 789. By the way, both variables * are character variables. How can I still merge the two data sets * based on the remaining figures, which are in the RIGHT columns? * */ /* * I think PROC SQL is the tool of choice here, for two * reasons. First of all, it is necessary to compare each of * the corrupted SSN values with all of the clean ones in the * other data set; this is essentially a "join" operation, * which is pretty much characteristic of SQL. Secondly, PROC * SQL offers the LIKE operator (pattern matching *with* wild * cards), which is exactly what is needed here. * * Here is a sample test. After the two data sets are created, * a special data set containing *only* the corrupted values is * created (in actual applications, this should enormously * reduce the volume of comparisons to be made); the * single-character wildcard, an underscore, is substituted for * each blank. Note that this preparation could readily be * done by a statement within PROC SQL. The LIKE condition * serves as a screen, so that the resulting data set lists all * possible matches found for each of the corrupted SSN values. * */ data clean; input ssn $ 1-9; cards;123456789234567890345678901345678902; data dirty; input ssn $ 1-9; cards;12 4567 9456 89012 no match34567890 ambiguous234567890 clean ; data withwild; set dirty; length withwild $ 9; * Substitute underscores for blanks; withwild = translate(ssn,'_',' '); * Get rid of observations having all nine digits; if index(withwild,'_'); run; proc sql; create table possible as select withwild.ssn, clean.ssn as possible from withwild, clean where clean.ssn like withwild.withwild; title2 'Possible Matches'; select * from possible; quit;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -