📄 changelog
字号:
global variable. By default it is unset, which disables all calling out. To getthe function called, the regex must include (?C) at appropriate points. Thisis, in fact, equivalent to (?C0), and any number <= 255 may be given with (?C).This provides a means of identifying different callout points. When PCREreaches such a point in the regex, if pcre_callout has been set, the externalfunction is called. It is provided with data in a structure calledpcre_callout_block, which is defined in pcre.h. If the function returns 0,matching continues; if it returns a non-zero value, the match at the currentpoint fails. However, backtracking will occur if possible. [This was changedlater and other features added - see item 49 below.]29. pcretest is upgraded to test the callout functionality. It provides acallout function that displays information. By default, it shows the start ofthe match and the current position in the text. There are some new data escapesto vary what happens: \C+ in addition, show current contents of captured substrings \C- do not supply a callout function \C!n return 1 when callout number n is reached \C!n!m return 1 when callout number n is reached for the mth time30. If pcregrep was called with the -l option and just a single file name, itoutput "<stdin>" if a match was found, instead of the file name.31. Improve the efficiency of the POSIX API to PCRE. If the number of capturingslots is less than POSIX_MALLOC_THRESHOLD, use a block on the stack to pass topcre_exec(). This saves a malloc/free per call. The default value ofPOSIX_MALLOC_THRESHOLD is 10; it can be changed by --with-posix-malloc-thresholdwhen configuring.32. The default maximum size of a compiled pattern is 64K. There have been afew cases of people hitting this limit. The code now uses macros to handle thestoring of links as offsets within the compiled pattern. It defaults to 2-bytelinks, but this can be changed to 3 or 4 bytes by --with-link-size whenconfiguring. Tests 2 and 5 work only with 2-byte links because they outputdebugging information about compiled patterns.33. Internal code re-arrangements:(a) Moved the debugging function for printing out a compiled regex into its own source file (printint.c) and used #include to pull it into pcretest.c and, when DEBUG is defined, into pcre.c, instead of having two separate copies.(b) Defined the list of op-code names for debugging as a macro in internal.h so that it is next to the definition of the opcodes.(c) Defined a table of op-code lengths for simpler skipping along compiled code. This is again a macro in internal.h so that it is next to the definition of the opcodes.34. Added support for recursive calls to individual subpatterns, along thelines of Robin Houston's patch (but implemented somewhat differently).35. Further mods to the Makefile to help Win32. Also, added code to pcregrep toallow it to read and process whole directories in Win32. This code wascontributed by Lionel Fourquaux; it has not been tested by me.36. Added support for named subpatterns. The Python syntax (?P<name>...) isused to name a group. Names consist of alphanumerics and underscores, and mustbe unique. Back references use the syntax (?P=name) and recursive calls use(?P>name) which is a PCRE extension to the Python extension. Groups still havenumbers. The function pcre_fullinfo() can be used after compilation to extracta name/number map. There are three relevant calls: PCRE_INFO_NAMEENTRYSIZE yields the size of each entry in the map PCRE_INFO_NAMECOUNT yields the number of entries PCRE_INFO_NAMETABLE yields a pointer to the map.The map is a vector of fixed-size entries. The size of each entry depends onthe length of the longest name used. The first two bytes of each entry are thegroup number, most significant byte first. There follows the correspondingname, zero terminated. The names are in alphabetical order.37. Make the maximum literal string in the compiled code 250 for the non-UTF-8case instead of 255. Making it the same both with and without UTF-8 supportmeans that the same test output works with both.38. There was a case of malloc(0) in the POSIX testing code in pcretest. Avoidcalling malloc() with a zero argument.39. Change 25 above had to resort to a heavy-handed test for the .* anchoringoptimization. I've improved things by keeping a bitmap of backreferences withnumbers 1-31 so that if .* occurs inside capturing brackets that are not infact referenced, the optimization can be applied. It is unlikely that arelevant occurrence of .* (i.e. one which might indicate anchoring or forcingthe match to follow \n) will appear inside brackets with a number greater than31, but if it does, any back reference > 31 suppresses the optimization.40. Added a new compile-time option PCRE_NO_AUTO_CAPTURE. This has the effectof disabling numbered capturing parentheses. Any opening parenthesis that isnot followed by ? behaves as if it were followed by ?: but named parenthesescan still be used for capturing (and they will acquire numbers in the usualway).41. Redesigned the return codes from the match() function into yes/no/error sothat errors can be passed back from deep inside the nested calls. A mallocfailure while inside a recursive subpattern call now causes thePCRE_ERROR_NOMEMORY return instead of quietly going wrong.42. It is now possible to set a limit on the number of times the match()function is called in a call to pcre_exec(). This facility makes it possible tolimit the amount of recursion and backtracking, though not in a directlyobvious way, because the match() function is used in a number of differentcircumstances. The count starts from zero for each position in the subjectstring (for non-anchored patterns). The default limit is, for compatibility, alarge number, namely 10 000 000. You can change this in two ways:(a) When configuring PCRE before making, you can use --with-match-limit=n to set a default value for the compiled library.(b) For each call to pcre_exec(), you can pass a pcre_extra block in which a different value is set. See 45 below.If the limit is exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.43. Added a new function pcre_config(int, void *) to enable run-time extractionof things that can be changed at compile time. The first argument specifieswhat is wanted and the second points to where the information is to be placed.The current list of available information is: PCRE_CONFIG_UTF8The output is an integer that is set to one if UTF-8 support is available;otherwise it is set to zero. PCRE_CONFIG_NEWLINEThe output is an integer that it set to the value of the code that is used fornewline. It is either LF (10) or CR (13). PCRE_CONFIG_LINK_SIZEThe output is an integer that contains the number of bytes used for internallinkage in compiled expressions. The value is 2, 3, or 4. See item 32 above. PCRE_CONFIG_POSIX_MALLOC_THRESHOLDThe output is an integer that contains the threshold above which the POSIXinterface uses malloc() for output vectors. See item 31 above. PCRE_CONFIG_MATCH_LIMITThe output is an unsigned integer that contains the default limit of the numberof match() calls in a pcre_exec() execution. See 42 above.44. pcretest has been upgraded by the addition of the -C option. This causes itto extract all the available output from the new pcre_config() function, and tooutput it. The program then exits immediately.45. A need has arisen to pass over additional data with calls to pcre_exec() inorder to support additional features. One way would have been to definepcre_exec2() (for example) with extra arguments, but this would not have beenextensible, and would also have required all calls to the original function tobe mapped to the new one. Instead, I have chosen to extend the mechanism thatis used for passing in "extra" data from pcre_study().The pcre_extra structure is now exposed and defined in pcre.h. It currentlycontains the following fields: flags a bitmap indicating which of the following fields are set study_data opaque data from pcre_study() match_limit a way of specifying a limit on match() calls for a specific call to pcre_exec() callout_data data for callouts (see 49 below)The flag bits are also defined in pcre.h, and are PCRE_EXTRA_STUDY_DATA PCRE_EXTRA_MATCH_LIMIT PCRE_EXTRA_CALLOUT_DATAThe pcre_study() function now returns one of these new pcre_extra blocks, withthe actual study data pointed to by the study_data field, and thePCRE_EXTRA_STUDY_DATA flag set. This can be passed directly to pcre_exec() asbefore. That is, this change is entirely upwards-compatible and requires nochange to existing code.If you want to pass in additional data to pcre_exec(), you can either place itin a pcre_extra block provided by pcre_study(), or create your own pcre_extrablock.46. pcretest has been extended to test the PCRE_EXTRA_MATCH_LIMIT feature. If adata string contains the escape sequence \M, pcretest calls pcre_exec() severaltimes with different match limits, until it finds the minimum value needed forpcre_exec() to complete. The value is then output. This can be instructive; formost simple matches the number is quite small, but for pathological cases itgets very large very quickly.47. There's a new option for pcre_fullinfo() called PCRE_INFO_STUDYSIZE. Itreturns the size of the data block pointed to by the study_data field in apcre_extra block, that is, the value that was passed as the argument topcre_malloc() when PCRE was getting memory in which to place the informationcreated by pcre_study(). The fourth argument should point to a size_t variable.pcretest has been extended so that this information is shown after a successfulpcre_study() call when information about the compiled regex is being displayed.48. Cosmetic change to Makefile: there's no need to have / after $(DESTDIR)because what follows is always an absolute path. (Later: it turns out that thisis more than cosmetic for MinGW, because it doesn't like empty pathcomponents.)49. Some changes have been made to the callout feature (see 28 above):(i) A callout function now has three choices for what it returns: 0 => success, carry on matching > 0 => failure at this point, but backtrack if possible < 0 => serious error, return this value from pcre_exec() Negative values should normally be chosen from the set of PCRE_ERROR_xxx values. In particular, returning PCRE_ERROR_NOMATCH forces a standard "match failed" error. The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions. It will never be used by PCRE itself.(ii) The pcre_extra structure (see 45 above) has a void * field called callout_data, with corresponding flag bit PCRE_EXTRA_CALLOUT_DATA. The pcre_callout_block structure has a field of the same name. The contents of the field passed in the pcre_extra structure are passed to the callout function in the corresponding field in the callout block. This makes it easier to use the same callout-containing regex from multiple threads. For testing, the pcretest program has a new data escape \C*n pass the number n (may be negative) as callout_data If the callout function in pcretest receives a non-zero value as callout_data, it returns that value.50. Makefile wasn't handling CFLAGS properly when compiling dftables. Also,there were some redundant $(CFLAGS) in commands that are now specified as$(LINK), which already includes $(CFLAGS).51. Extensions to UTF-8 support are listed below. These all apply when (a) PCREhas been compiled with UTF-8 support *and* pcre_compile() has been compiledwith the PCRE_UTF8 flag. Patterns that are compiled without that flag assumeone-byte characters throughout. Note that case-insensitive matching appliesonly to characters whose values are less than 256. PCRE doesn't support thenotion of cases for higher-valued characters.(i) A character class whose characters are all within 0-255 is handled as a bit map, and the map is inverted for negative classes. Previously, a character > 255 always failed to match such a class; however it should match if the class was a negative one (e.g. [^ab]). This has been fixed.(ii) A negated character class with a single character < 255 is coded as "not this character" (OP_NOT). This wasn't working properly when the test character was multibyte, either singly or repeated.(iii) Repeats of multibyte characters are now handled correctly in UTF-8 mode, for example: \x{100}{2,3}.(iv) The character escapes \b, \B, \d, \D, \s, \S, \w, and \W (either singly or repeated) now correctly test multibyte characters. However, PCRE doesn't recognize any characters with values greater than 255 as digits, spaces, or word characters. Such characters always match \D, \S, and \W, and never match \d, \s, or \w.(v) Classes may now contain characters and character ranges with values greater than 255. For example: [ab\x{100}-\x{400}].(vi) pcregrep now has a --utf-8 option (synonym -u) which makes it call PCRE in UTF-8 mode.52. The info request value PCRE_INFO_FIRSTCHAR has been renamedPCRE_INFO_FIRSTBYTE because it is a byte value. However, the old name isretained for backwards compatibility. (Note that LASTLITERAL is also a bytevalue.)53. The single man page has become too large. I have therefore split it up intoa number of separate man pages. These also give rise to individual HTML pages;these are now put in a separate directory, and there is an index.html page thatlists them all. Some hyperlinking between the pages has been installed.54. Added convenience functions for handling named capturing parentheses.55. Unknown escapes inside character classes (e.g. [\M]) and escapes thataren't interpreted therein (e.g. [\C]) are literals in Perl. This is now alsotrue in PCRE, except when the PCRE_EXTENDED option is set, in which case theyare faulted.56. Introduced HOST_CC and HOST_CFLAGS which can be set in the environment whencalling configure. These values are used when compiling the dftables.c programwhich is run to generate the source of the default character tables. Theydefault to the values of CC and CFLAGS. If you are cross-compiling PCRE,you will need to set these values.57. Updated the building process for Windows DLL, as provided by Fred Cox.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -