📄 sqlfun10.sas

📁 SAS是功能強大的統計軟體
💻 SAS
字号:
 /****************************************************************/ /*          S A S   S A M P L E   L I B R A R Y                 */ /*                                                              */ /*    NAME: SQLFUN10                                            */ /*   TITLE: fun/interesting applications of PROC SQL. (fun10)   */ /* PRODUCT: BASE                                                */ /*  SYSTEM: ALL                                                 */ /*    KEYS: SQL DATMAN WHERE TRANSLATE LIKE CREATE TABLE        */ /*   PROCS: SQL                                                 */ /*    DATA:                                                     */ /*                                                              */ /* SUPPORT: pmk                         UPDATE:                 */ /*     REF:                                                     */ /*    MISC: this example was contributed by Howard Schreier     */ /*          of the US Dept. of Commerce, via BITNET             */ /*                                                              */ /*          you can contribute your interesting samples.        */ /*          send internet email to KENT@UNX.SAS.COM or          */ /*          USmail to SAS Institute.                            */ /*                                                              */ /****************************************************************/ title1 'SAS SAMPLE LIBRARY, MEMBER(SQLFUN10)'; /*  * A User asks:  *  * I have two sas data sets, which need to be merged by social  * security number (SSN). The variable SSN in DATA 1 is clean  * and non-missing. But some SSN in DATA 2 have some digits  * missing, but the remaining ones ARE still in the RIGHT columns.  * For instance, the correct one in DATA 1 is 123456789. But the  * missing one in DATA 2 is 1 345 789. By the way, both variables  * are character variables. How can I still merge the two data sets  * based on the remaining figures, which are in the RIGHT columns?  *  */ /*  * I think PROC SQL  is  the  tool  of  choice  here,  for  two  * reasons.   First  of all, it is necessary to compare each of  * the corrupted SSN values with all of the clean ones  in  the  * other  data  set;  this  is  essentially a "join" operation,  * which is pretty much characteristic of SQL.  Secondly,  PROC  * SQL  offers  the LIKE operator (pattern matching *with* wild  * cards), which is exactly what is needed here.  *  * Here is a sample test.  After the two data sets are created,  * a special data set containing *only* the corrupted values is  * created (in  actual  applications,  this  should  enormously  * reduce   the   volume   of  comparisons  to  be  made);  the  * single-character wildcard, an underscore, is substituted for  * each  blank.   Note  that  this preparation could readily be  * done by a statement within PROC  SQL.   The  LIKE  condition  * serves as a screen, so that the resulting data set lists all  * possible matches found for each of the corrupted SSN values.  *  */ data clean;   input ssn $ 1-9;   cards;123456789234567890345678901345678902; data dirty;   input ssn $ 1-9;   cards;12 4567 9456 89012 no match34567890  ambiguous234567890 clean   ; data withwild;   set dirty;   length withwild $ 9;   * Substitute underscores for blanks;   withwild = translate(ssn,'_',' ');   * Get rid of observations having all nine digits;   if index(withwild,'_');   run; proc sql;   create table possible as    select withwild.ssn, clean.ssn as possible     from withwild, clean     where clean.ssn like withwild.withwild;   title2 'Possible Matches';   select * from possible;   quit;
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -