📄 str.mx

📁 一个内存数据库的源代码这是服务器端还有客户端
💻 MX
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
@' The contents of this file are subject to the MonetDB Public License@' Version 1.1 (the "License"); you may not use this file except in@' compliance with the License. You may obtain a copy of the License at@' http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html@'@' Software distributed under the License is distributed on an "AS IS"@' basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the@' License for the specific language governing rights and limitations@' under the License.@'@' The Original Code is the MonetDB Database System.@'@' The Initial Developer of the Original Code is CWI.@' Portions created by CWI are Copyright (C) 1997-2007 CWI.@' All Rights Reserved.@f str@a N.J. Nes, M.L. Kersten@v 1.1@+ The String Module@TStrings can be created in many ways. Already in the built-in operationseach atom can be cast to a string using the str(atom) mil command.The string module gives the possibility of construction string as asubstring of the a given string (s). There are two such construction functions.The first is the substring from some position (offset) until the end ofthe string. The second start again on the given offset position but onlycopies count number of bytes. The functions fail when the position andcount fall out of bounds. A negative position indicates that the position iscomputed from the end of the source string.The strings can be compared using the "=" and "!=" operators.The operator "+" concatenates a string and an atom. The atom will beconverted to a string using the atom to string c function. Thestring and the result of the conversion are concatenated to form a newstring. This string is returned.The length function returns the length of the string. The length isthe number of characters in the string.chrAt() returns the character at position index in the string s. Thefunction will fail when the index is out of range. The range isfrom 0 to length(s)-1.The startsWith and endsWith functions test if the string s starts with orends with the given prefix or suffix.The toLower and toUpper functions cast the string to lower or upper casecharacters.The search(str,chr) function searches for the first occurrence of acharacter from the begining of the string. The search(chr,str) searchesfor the last occurrence (or first from the end of the string). The lastsearch function locates the position of first occurrence of the string s2in string s. All search functions return -1 when the search failed.Otherwise the position is returned.All string functions fail when an incorrect string (NULL pointer) is given.In the current implementation, a fail is signaled by returning nil, sincethis facilitates the use of the string module in bulk operations.All functions in the module have now been converted to Unicode. Internally,we use UTF-8 to store strings as Unicode in zero-terminated byte-sequences.@* Module Definition@malmodule str;command str(s:str):str address STRtostrcomment "Noop routine.";command string(s:str,offset:int) :str address STRTailcomment "Return the tail s[offset..n] 	 of a string s[0..n].";command string(s:str,offset:int,count:int):str address STRSubStringcomment "Return substring s[offset..offset+count] of a string s[0..n]";command +( l:str, r:str) :str address STRConcatcomment "Concatenate two strings.";command length( s:str ) :int address STRLengthcomment "Return the length of a string.";command stringlength( s:str ) :int address STRstringLengthcomment "Return the length of a right trimed string (SQL semantics).";command nbytes( s:str ) :int address STRBytescomment "Return the string length in bytes.";command chrAt( s:str, index:int) :chr address STRChrAtcomment "String array lookup operation.";command unicodeAt(s:str, index:int) :int address STRWChrAtcomment "get a unicode character 	(as an int) from a string position.";command unicode(wchar:int) :str address STRFromWChrcomment "convert a unicode to a character.";command codeset() :str address STRcodesetcomment "Return the locale's codeset";command iconv(org:str,fromCs:str,toCs:str):str address STRIconvcomment "String codeset conversion";command startsWith(s:str,prefix:str):bit address STRPrefixcomment "Prefix check.";command endsWith( s:str, suffix:str ) :bit address STRSuffixcomment "Suffix check.";command toLower( s:str ) :str address STRLowercomment "Convert a string to lower case.";command toUpper( s:str ) :str address STRUppercomment "Convert a string to upper case.";command search( s:str, c:str ) :int address STRstrSearchcomment "Search for a substring. Returns 	 position, -1 if not found.";command search( s:str, c:chr ) :int address STRchrSearchcomment "Search for a character. Returns 	 position, -1 if not found.";command r_search( s:str, c:str ) :int address STRReverseStrSearchcomment "Reverse search for a substring. Returns 	 position, -1 if not found.";command r_search( s:str, c:chr ) :int address STRReverseChrSearchcomment "Reverse search for a char. Returns 	 position, -1 if not found.";command trim( s:str ) :str address STRStripcomment "Strip whitespaces around a string.";command ltrim( s:str ) :str address STRLtrimcomment "Strip whitespaces from start of a string.";command rtrim( s:str ) :str address STRRtrimcomment "Strip whitespaces from end of a string.";command substitute(s:str,src:str,dst:str,rep:bit)	:str address STRSubstitutecomment "Substitute first occurrence of 'src' by 	'dst'.  Iff repeated = true this is 	repeated while 'src' can be found in the 	result string. In order to prevent 	recursion and result strings of unlimited 	size, repeating is only done iff src is 	not a substring of dst.";command like(s:str,pat:str):bitaddress STRlikewrap2comment "SQL pattern match function";command like(s:str,pat:str,esc:str):bitaddress STRlikewrapcomment "SQL pattern match function";command ascii(s:str):intaddress STRasciicomment "Return unicode of head of string";command substring(s:str, start:int):straddress STRsubstringTailcomment "Extract the tail of a string";command substring(s:str, start:int, len:int):straddress STRsubstringcomment "Extract a substring from str starting at start, for length len";command prefix(s:str,l:int):straddress STRprefixcomment "Extract the prefix of a given length";command suffix(s:str,l:int):straddress STRsuffixcomment "Extract the suffix of a given length";command stringleft(s:str,l:int):straddress STRprefix;command stringright(s:str,l:int):straddress STRsuffix;command locate(s1:str,s2:str):intaddress STRlocatecomment "Locate the start position of a string";command locate(s1:str,s2:str,start:int)address STRlocate2comment "Locate the start position of a string";command insert(s:str,start:int,l:int,s2:str):straddress STRinsertcomment "Insert a string into another";command replace(s:str,pat:str,s2:str):straddress STRreplacecomment "Insert a string into another";command repeat(s2:str,c:int):straddress STRrepeat;command space(l:int):straddress STRspace;command STRprelude() :voidaddress strPrelude;command STRepilogue() :voidaddress strEpilogue;str.STRprelude();	@-@{@* Implementation Code@h#ifndef __string_H__#define __string_H__#include <gdk.h>#include "mal.h"#include "mal_exception.h"#include "ctype.h"#ifdef WIN32#ifndef LIBSTR#define str_export extern __declspec(dllimport)#else#define str_export extern __declspec(dllexport)#endif#else#define str_export extern#endifstr_export bat *strPrelude(void);str_export str strEpilogue(int *ret);str_export str STRtostr(str *res, str *src);str_export str STRConcat(str *res, str *val1, str *val2);str_export str STRLength(int *res, str *arg1);/* length of rtrimed string, needed for sql */str_export str STRstringLength(int *res, str *s);str_export str STRBytes(int *res, str *arg1);str_export str STRTail(str *res, str *arg1, int *offset);str_export str STRSubString(str *res, str *arg1, int *offset, int *length);str_export str STRFromWChr(str *res, int *at);str_export str STRWChrAt(int *res, str *arg1, int *at);str_export str STRcodeset(str *res);str_export str STRIconv(str *res, str *o, str *fp, str *tp);str_export str STRChrAt(chr *res, str *arg1, int *at);str_export str STRPrefix(bit *res, str *arg1, str *arg2);str_export str STRSuffix(bit *res, str *arg1, str *arg2);str_export str STRLower(str *res, str *arg1);str_export str STRUpper(str *res, str *arg1);str_export str STRChrSearch(int *res, str *arg1, chr *c);str_export str STRstrSearch(int *res, str *arg1, str *arg2);str_export str STRReverseStrSearch(int *res, str *arg1, str *arg2);str_export str STRchrSearch(int *res, str *arg1, chr *c);str_export str STRReverseChrSearch(int *res, str *arg1, chr *c);str_export str STRStrip(str *res, str *arg1);str_export str STRLtrim(str *res, str *arg1);str_export str STRRtrim(str *res, str *arg1);str_export str STRSubstitute(str *res, str *arg1, str *arg2, str *arg3, bit *g);str_export int strConcat(str *res, str s, ptr val, int t);str_export int strLength(int *res, str s);str_export int strBytes(int *res, str s);str_export int strTail(str *res, str s, int *offset);str_export int strSubString(str *res, str s, int *offset, int *length);str_export int strPrefix(bit *res, str s, str prefix);str_export int strLower(str *res, str s);str_export int strUpper(str *res, str s);str_export int strSuffix(bit *res, str s, str suffix);str_export int strStrSearch(int *res, str s, str s2);str_export int strReverseStrSearch(int *res, str s, str s2);str_export int strStrip(str *res, str s);str_export int strLtrim(str *res, str s);str_export int strRtrim(str *res, str s);str_export int strChrSearch(int *res, str s, chr *c);str_export int strReverseChrSearch(int *res, str s, chr *c);str_export int strFromWChr(str *res, int *c);str_export int strWChrAt(int *res, str val, int *at);str_export int strChrAt(chr *res, str val, int *at);str_export int codeset(str *res);str_export int strIconv(str *res, str org, str f, str t);str_export int strSubstitute(str *res, str s, str src, str dst, bit *g);str_export str STRfindUnescapedOccurrence(str b, str c, str esc);str_export int STRlike(str s, str pat, str esc);str_export str STRsubstringTail(str *ret, str *s, int *start);str_export str STRsubstring(str *ret, str *s, int *start, int *l);str_export str STRlikewrap2(bit *ret, str *s, str *pat);str_export str STRlikewrap(bit *ret, str *s, str *pat, str *esc);str_export str STRascii(int *ret, str *s);str_export str STRprefix(str *ret, str *s, int *l);str_export str STRsuffix(str *ret, str *s, int *l);str_export str STRlocate(int *ret, str *s1, str *s2);str_export str STRlocate2(int *ret, str *s1, str *s2, int *start);str_export str STRinsert(str *ret, str *s, int *start, int *l, str *s2);str_export str STRreplace(str *ret, str *s1, str *s2, str *s3);str_export str STRrepeat(str *ret, str *s, int *c);str_export str STRspace(str *ret, int *l);#endif /* __string_H__ */@}@{@c#include "mal_config.h"#include "str.h"#include <string.h>#ifdef HAVE_LANGINFO_H#include <langinfo.h>#endif#ifdef HAVE_ICONV_H#include <iconv.h>#endif@+ UTF-8 Handling@TUTF-8 is a way to store Unicode strings in zero-terminated byte sequences, which you can e.g.strcmp() with old 8-bit Latin-1 strcmp() functions and which then gives the same results as doingthe strcmp() on equivalent Latin-1 and ASCII character strings stored in simple one-byte sequences.These characteristics make UTF-8 an attractive format for upgrading an ASCII-oriented computerprogram towards one that supports Unicode. That is why we use UTF-8 in Monet.For Monet, UTF-8 mostly has no consequences, as strings stored in BATs are regarded as data,and it does not matter for the database kernel whether the zero-terminated byte sequence it isprocessing has UTF-8 or Latin-1 semantics. This module is the only place where explicit stringfunctionality is located. We {\bf do} have to adapt the behavior of the MIL length(), search(),substring() and the like commands to the fact that one (Unicode) character is now stored ina variable number of bytes (possibly > 1).One of the things that become more complex in Unicode are uppercase/lowercase conversions. Thebelow tables are the simple one-to-one Unicode case mappings. We do not support the special casing mappings(e.g. from one to two letters).References:\begin{verbatim}simple casing:	http://www.unicode.org/Public/UNIDATA/UnicodeData.txtcomplex casing: http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt\end{verbatim}The Unicode case conversion implementation in Monet fills a mapping BAT of int,int combinations,in which we perform high-performance hash-lookup (all code inlined).@c/* This table was generated from the Unicode 3.2.0 spec.   The table is generated by using the codes for conversion to lower   case and for conversion to title case (note: not to upper case).   Title case is used since the interface to convert to upper case   converts the whole string.   A few code points have been moved in order to get reasonable   conversions (if two code points are converted to the same value,   the first one in this table wins).  The code points that have   been interchanged are:   U+0345 (COMBINING GREEK YPOGEGRAMMENI) / U+03B9 (GREEK SMALL LETTER IOTA) <-> U+0399 (GREEK CAPITAL LETTER IOTA)   U+00B5 (MICRO SIGN) / U+03BC (GREEK SMALL LETTER MU) <-> U+039C (GREEK CAPITAL LETTER MU)   U+03C2 (GREEK SMALL LETTER FINAL SIGMA) / U+03C3 (GREEK SMALL LETTER SIGMA) <-> U+3A3 (GREEK CAPITAL LETTER SIGMA)   In addition, there are a few code points where there are different   versions for upper case and title case.  These had to be switched   around a little so that the mappings are done sensibly.   The following combinations are included in this order:   lower case <-> title case   lower case <-  upper case   upper case  -> title case   The relevant code points are:   U+01C4 (LATIN CAPITAL LETTER DZ WITH CARON)   U+01C5 (LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON)   U+01C6 (LATIN SMALL LETTER DZ WITH CARON)   U+01C7 (LATIN CAPITAL LETTER LJ)   U+01C8 (LATIN CAPITAL LETTER L WITH SMALL LETTER J)   U+01C9 (LATIN SMALL LETTER LJ)   U+01CA (LATIN CAPITAL LETTER NJ)   U+01CB (LATIN CAPITAL LETTER N WITH SMALL LETTER J)   U+01CC (LATIN SMALL LETTER NJ)   U+01F1 (LATIN CAPITAL LETTER DZ)   U+01F2 (LATIN CAPITAL LETTER D WITH SMALL LETTER Z)   U+01F3 (LATIN SMALL LETTER DZ)   The core awk script used is:	$15 != "" && $15 != $1 {printf "{0x%s,0x%s,},\n",$1,$15}	$14 != "" && $14 != $1 {printf "{0x%s,0x%s,},\n",$14,$1}   with some hand munging afterward.  The data file is UnicodeData.txt   from http://www.unicode.org/. */struct UTF8_lower_upper {	unsigned short lower, upper;} UTF8_lower_upper[] = {	{ 0x0061, 0x0041, },	{ 0x0062, 0x0042, },	{ 0x0063, 0x0043, },	{ 0x0064, 0x0044, },	{ 0x0065, 0x0045, },	{ 0x0066, 0x0046, },	{ 0x0067, 0x0047, },	{ 0x0068, 0x0048, },	{ 0x0069, 0x0049, },	{ 0x0069, 0x0130, },	{ 0x006A, 0x004A, },	{ 0x006B, 0x004B, },	{ 0x006B, 0x212A, },	{ 0x006C, 0x004C, },	{ 0x006D, 0x004D, },	{ 0x006E, 0x004E, },	{ 0x006F, 0x004F, },	{ 0x0070, 0x0050, },	{ 0x0071, 0x0051, },	{ 0x0072, 0x0052, },	{ 0x0073, 0x0053, },	{ 0x0074, 0x0054, },	{ 0x0075, 0x0055, },	{ 0x0076, 0x0056, },	{ 0x0077, 0x0057, },	{ 0x0078, 0x0058, },	{ 0x0079, 0x0059, },	{ 0x007A, 0x005A, },	{ 0x03BC, 0x039C, },	{ 0x00E0, 0x00C0, },	{ 0x00E1, 0x00C1, },	{ 0x00E2, 0x00C2, },	{ 0x00E3, 0x00C3, },	{ 0x00E4, 0x00C4, },	{ 0x00E5, 0x00C5, },	{ 0x00E5, 0x212B, },	{ 0x00E6, 0x00C6, },	{ 0x00E7, 0x00C7, },	{ 0x00E8, 0x00C8, },	{ 0x00E9, 0x00C9, },	{ 0x00EA, 0x00CA, },	{ 0x00EB, 0x00CB, },	{ 0x00EC, 0x00CC, },	{ 0x00ED, 0x00CD, },	{ 0x00EE, 0x00CE, },	{ 0x00EF, 0x00CF, },	{ 0x00F0, 0x00D0, },	{ 0x00F1, 0x00D1, },	{ 0x00F2, 0x00D2, },	{ 0x00F3, 0x00D3, },	{ 0x00F4, 0x00D4, },	{ 0x00F5, 0x00D5, },	{ 0x00F6, 0x00D6, },	{ 0x00F8, 0x00D8, },	{ 0x00F9, 0x00D9, },	{ 0x00FA, 0x00DA, },	{ 0x00FB, 0x00DB, },	{ 0x00FC, 0x00DC, },	{ 0x00FD, 0x00DD, },	{ 0x00FE, 0x00DE, },	{ 0x00FF, 0x0178, },	{ 0x0101, 0x0100, },	{ 0x0103, 0x0102, },	{ 0x0105, 0x0104, },	{ 0x0107, 0x0106, },	{ 0x0109, 0x0108, },	{ 0x010B, 0x010A, },	{ 0x010D, 0x010C, },	{ 0x010F, 0x010E, },	{ 0x0111, 0x0110, },	{ 0x0113, 0x0112, },	{ 0x0115, 0x0114, },	{ 0x0117, 0x0116, },	{ 0x0119, 0x0118, },	{ 0x011B, 0x011A, },	{ 0x011D, 0x011C, },	{ 0x011F, 0x011E, },	{ 0x0121, 0x0120, },	{ 0x0123, 0x0122, },	{ 0x0125, 0x0124, },	{ 0x0127, 0x0126, },	{ 0x0129, 0x0128, },	{ 0x012B, 0x012A, },	{ 0x012D, 0x012C, },	{ 0x012F, 0x012E, },	{ 0x0131, 0x0049, },	{ 0x0133, 0x0132, },	{ 0x0135, 0x0134, },	{ 0x0137, 0x0136, },	{ 0x013A, 0x0139, },	{ 0x013C, 0x013B, },	{ 0x013E, 0x013D, },	{ 0x0140, 0x013F, },	{ 0x0142, 0x0141, },	{ 0x0144, 0x0143, },	{ 0x0146, 0x0145, },	{ 0x0148, 0x0147, },	{ 0x014B, 0x014A, },	{ 0x014D, 0x014C, },	{ 0x014F, 0x014E, },	{ 0x0151, 0x0150, },	{ 0x0153, 0x0152, },	{ 0x0155, 0x0154, },	{ 0x0157, 0x0156, },	{ 0x0159, 0x0158, },	{ 0x015B, 0x015A, },	{ 0x015D, 0x015C, },	{ 0x015F, 0x015E, },	{ 0x0161, 0x0160, },	{ 0x0163, 0x0162, },	{ 0x0165, 0x0164, },	{ 0x0167, 0x0166, },	{ 0x0169, 0x0168, },	{ 0x016B, 0x016A, },	{ 0x016D, 0x016C, },	{ 0x016F, 0x016E, },	{ 0x0171, 0x0170, },	{ 0x0173, 0x0172, },	{ 0x0175, 0x0174, },	{ 0x0177, 0x0176, },	{ 0x017A, 0x0179, },	{ 0x017C, 0x017B, },	{ 0x017E, 0x017D, },	{ 0x017F, 0x0053, },	{ 0x0183, 0x0182, },	{ 0x0185, 0x0184, },	{ 0x0188, 0x0187, },	{ 0x018C, 0x018B, },	{ 0x0192, 0x0191, },	{ 0x0195, 0x01F6, },	{ 0x0199, 0x0198, },	{ 0x019E, 0x0220, },	{ 0x01A1, 0x01A0, },	{ 0x01A3, 0x01A2, },	{ 0x01A5, 0x01A4, },	{ 0x01A8, 0x01A7, },	{ 0x01AD, 0x01AC, },	{ 0x01B0, 0x01AF, },	{ 0x01B4, 0x01B3, },	{ 0x01B6, 0x01B5, },	{ 0x01B9, 0x01B8, },	{ 0x01BD, 0x01BC, },	{ 0x01BF, 0x01F7, },	{ 0x01C6, 0x01C4, },	{ 0x01C6, 0x01C5, },	{ 0x01C5, 0x01C4, },	{ 0x01C9, 0x01C7, },	{ 0x01C9, 0x01C8, },	{ 0x01C8, 0x01C7, },	{ 0x01CC, 0x01CA, },	{ 0x01CC, 0x01CB, },	{ 0x01CB, 0x01CA, },	{ 0x01CE, 0x01CD, },	{ 0x01D0, 0x01CF, },	{ 0x01D2, 0x01D1, },	{ 0x01D4, 0x01D3, },	{ 0x01D6, 0x01D5, },	{ 0x01D8, 0x01D7, },	{ 0x01DA, 0x01D9, },	{ 0x01DC, 0x01DB, },	{ 0x01DD, 0x018E, },	{ 0x01DF, 0x01DE, },	{ 0x01E1, 0x01E0, },	{ 0x01E3, 0x01E2, },	{ 0x01E5, 0x01E4, },	{ 0x01E7, 0x01E6, },	{ 0x01E9, 0x01E8, },	{ 0x01EB, 0x01EA, },	{ 0x01ED, 0x01EC, },	{ 0x01EF, 0x01EE, },	{ 0x01F3, 0x01F1, },	{ 0x01F3, 0x01F2, },	{ 0x01F2, 0x01F1, },	{ 0x01F5, 0x01F4, },	{ 0x01F9, 0x01F8, },	{ 0x01FB, 0x01FA, },	{ 0x01FD, 0x01FC, },	{ 0x01FF, 0x01FE, },	{ 0x0201, 0x0200, },	{ 0x0203, 0x0202, },	{ 0x0205, 0x0204, },	{ 0x0207, 0x0206, },	{ 0x0209, 0x0208, },	{ 0x020B, 0x020A, },	{ 0x020D, 0x020C, },	{ 0x020F, 0x020E, },	{ 0x0211, 0x0210, },	{ 0x0213, 0x0212, },	{ 0x0215, 0x0214, },
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -