Conversation
|
This is tough to change because a for loop would have to be fixed at the plugin level. Fundamentally, even at a base level you're absolutely correct this is wrong. But people using < length would have to be fixed to <=. I'm sure there are other cases, but this to me is technical debt that can't be changed at this point. |
|
My understanding is still that this is currently working exactly as expected and documented, the documentation just confuses people more familiar with modern regex implementations than Perl. Global matching behaviour should be behind a new native, maybe |
…ffset to MatchRegex. Clarify documentaion.
|
After additional discussion it appears it was working correctly and the documentation was leading to lots of confusion and various features lacking. MatchRegex has been reverted to how it originally was, but will now also accept an additional param for offset (defaulting to 0). GetRegexSubString now also takes an additional param to identify the match (defaulting to 0) to get substrings for. MatchAll has been added. MatchCount, MatchOffset (Offset in the test plugin since i decided to rename it after), CaptureCount have also been added to provide info when using MatchAll, or when trying to manually loop using MatchRegex. Test Plugin And output I tried my best to clarify the documentation, but its probably still not great... |
asherkin
left a comment
There was a problem hiding this comment.
Functionality seems good, see inlines.
extensions/regex/CRegEx.cpp
Outdated
| rc = pcre_exec(re, NULL, subject, (int)strlen(subject), 0, 0, ovector, 30); | ||
| unsigned int len = strlen(subject); | ||
|
|
||
| rc = pcre_exec(re, NULL, subject, len, offset, 0, mMatches[0].mVector, sizeof(mMatches[0].mVector)); |
There was a problem hiding this comment.
This sizeof is incorrect, it needs to be the element count, not the byte size.
The PRCE documentation points this out about 100 times 😛
extensions/regex/CRegEx.cpp
Outdated
| unsigned int len = strlen(subject); | ||
| unsigned int matches = 0; | ||
|
|
||
| while (offset < len && (rc = pcre_exec(re, 0, subject, len, offset, 0, mMatches[matches].mVector, sizeof(mMatches[matches].mVector))) >= 0) |
extensions/regex/CRegEx.h
Outdated
| struct RegexMatch | ||
| { | ||
| int mSubStringCount; | ||
| int mVector[30]; |
There was a problem hiding this comment.
Let's double this, 10 captures seems limiting, we have the RAM.
…er of matches and captures.
| */ | ||
| native bool GetRegexSubString(Handle regex, int str_id, char[] buffer, int maxlen, int match = 0); | ||
|
|
||
| stock int SimpleRegexMatch(const char[] str, const char[] pattern, int flags = 0, char[] error="", int maxLen = 0) |
There was a problem hiding this comment.
It looks like the documentation got deleted for SimpleRegexMatch?
There was a problem hiding this comment.
Oops, good catch must have deleted it when i deleted the new natives from non-methodmap.
Not sure how long this has been broken but currently MatchRegex doesn't actually return the number of matches. It instead returns 1 + the number of capture group matches. This simply fixes it to work how the docs currently say it should work. However, it is possible to add capture group support at some point but would require some effort along with new natives.
Some test outputs.
Test plugin
I'm not exactly sure if this is the best approach. But I am open to all feedback.