Karl Rixon

PHP Regular Expression Fails Silently on Long Strings

without comments

I had an odd bug today which took me a while to track down. I was using preg_replace_callback to match blocks of code in a string and hand them off to Geshi for syntax highlighting. However I found that some blocks which should match, were not being matched. I couldn’t find any explanation for it at all. I was using the following non-greedy-match-anything sub-pattern:

1
(.*?)

There really shouldn’t be any reason why that would fail to match. I gradually started removing bits of text from my string to try to find the cause, and suddenly after a few chunks were gone the pattern matched. I couldn’t see anything in what I had removed which could be causing an issue, so I assumed the string length itself was the issue, and this assumption proved correct.

As of PHP 5.2, a new ini setting was implemented called pcre.backtrack_limit. The documentation is very sparse for this setting, but it basically sets an upper limit on how much data the regular expression engine will trawl through to check dependant characters. This affects things like non-greedy patterns, and I assume lookahead and lookbehind assertions (though I have not tested this). The default value for this setting is a meagre 100000 bytes, or 97KB. Prior to 5.2, this setting did not exist and longer patterns would match without problem. The really annoying thing about all this is that the regex function will just fail silently, leaving you to start madly pulling your hair out while you try to see what could be preventing your pattern from matching. A notice or warning error would have saved me a couple of hours!

The pcre.backtrack_limit setting can be altered either in your php.ini, or at runtime. I set mine to 1MB and have not had any issues.

1
ini_set('pcre.backtrack_limit', '1048576');

Written by Karl

February 11th, 2010 at 3:01 pm

Posted in PHP

Leave a Reply