[PATCH] regular expression support (was Re: [PATCH xfree86] Signed-off-by: Oleh Nykyforchyn)
Dan Nicholson
dbn.lists at gmail.com
Fri May 27 05:51:13 PDT 2011
On Thu, May 26, 2011 at 10:56 PM, Peter Hutterer
<peter.hutterer at who-t.net> wrote:
> On Wed, May 25, 2011 at 03:08:32PM +0300, Oleh Nykyforchyn wrote:
>> On Wed, 25 May 2011 16:14:42 +1000
>> Peter Hutterer <peter.hutterer at who-t.net> wrote:
>>
>> > I believe | is used frequently in regular expressions, so using it as
>> > decision over regex or simple is dangerous. we can't use slash either, there
>> > are likely a few configurations out there already.
>> > My suggestion is simply prefixing with "re:". I guess there's very few
>> > configurations out there that need exactly that match and it makes it very
>> > clear that this match is a regular expression
>> >
>> > MatchProduct "re:[^ a-z]{4}|(x|y)"
>>
>> Nice, short and intuitive. I modified my patch to use this idea, and the code
>> is simpler and clearer now. A piece of Xorg.0.log and the result is below:
>>
>> [ 2959.284] (II) config/udev: Adding input device Power Button (/dev/input/event2)
>> [ 2959.285] (**) Applying regex ".implicit." to attribute "Radeon"
>> [ 2959.285] (**) Applying regex "^R.d[a-z]*$" to attribute "Radeon"
>> [ 2959.285] (**) Applying regex ".implicit." to attribute "Radeon"
>> [ 2959.285] (**) Applying regex "^R.d[a-z]*$" to attribute "Radeon"
>> [ 2959.285] (**) Power Button: Applying InputClass "Keyboard0"
>>
>> Statements were:
>> MatchLayout "re:.implicit."
>> MatchLayout "re:^R.d[a-z]*$"
>> To tell the truth, I have no idea why each match is repeated twice, but it works.
>>
>> The patch itself:
>> ---------------------------------------
>>
>> Add option to use extended regex expresssions in Match* statements
>>
>> If a pattern list is of the form "re:str", then str is treated
>> as a single extended regular expression
>>
>> Signed-off-by: Oleh Nykyforchyn <oleh.nyk at gmail.com>
>> ---
>> hw/xfree86/common/xf86Xinput.c | 25 +++++++++++++++++++
>> hw/xfree86/parser/InputClass.c | 53 +++++++++++++++++++++++++++++++++-------
>> 2 files changed, 69 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/xfree86/common/xf86Xinput.c b/hw/xfree86/common/xf86Xinput.c
>> index 88cf292..0b90b5b 100644
>> --- a/hw/xfree86/common/xf86Xinput.c
>> +++ b/hw/xfree86/common/xf86Xinput.c
>> @@ -85,6 +85,9 @@
>> #include <ptrveloc.h> /* dix pointer acceleration */
>> #include <xserver-properties.h>
>>
>> +#include <sys/types.h> /* MatchAttrToken */
>> +#include <regex.h>
>> +
>> #ifdef XFreeXDGA
>> #include "dgaproc.h"
>> #endif
>> @@ -509,6 +512,28 @@ MatchAttrToken(const char *attr, struct list *patterns,
>> char * const *cur;
>> Bool match = FALSE;
>>
>> + cur = group->values;
>> + if ((cur[0]) && (!*cur[0]) && /* cur[0] == "", can never come out of strtok() */
>> + (cur[1]) && /* cur[1] == regex */
>> + (cur[2] == NULL)) {
>
> urgh, this seems a bit hacked on. any chance we can have a define, enum,
> special return value so we don't rely on magic value combinations?
>
> patterns is a xf86MatchGroup struct, we could extend this with a enum type {
> MATCH_TYPE_NORMAL, MATCH_TYPE_REGEX } or somesuch.
>
>> + /* Treat this statement as a regex match */
>> + regex_t buf;
>> + int r;
>> + r = regcomp(&buf, cur[1], REG_EXTENDED | REG_NOSUB);
>> + if (r) { /* bad regex */
>> + regfree(&buf);
>> + xf86Msg(X_ERROR, "Wrong regex: \"%s\"\n", cur[1]);
>> + return FALSE;
>> + }
>> + xf86Msg(X_CONFIG, "Applying regex \"%s\" to attribute \"%s\"\n", cur[1], attr);
>> + r = regexec(&buf, attr, 0, NULL, 0);
>> + regfree(&buf);
>> + if (r)
>> + return FALSE;
>> + else
>> + return TRUE;
>> + }
>> +
>> for (cur = group->values; *cur; cur++) {
>> if (**cur == '!') {
>> /*
>> diff --git a/hw/xfree86/parser/InputClass.c b/hw/xfree86/parser/InputClass.c
>> index 3f80170..78a9446 100644
>> --- a/hw/xfree86/parser/InputClass.c
>> +++ b/hw/xfree86/parser/InputClass.c
>> @@ -66,6 +66,39 @@ xf86ConfigSymTabRec InputClassTab[] =
>>
>> #define TOKEN_SEP "|"
>>
>> +/*
>> + * Tokenize a string into a NULL terminated array of strings. The same that
>> + * xstrtokenize unless the string starts with "re:", then the rest of
>> + * the string is treated as a single extended regex, and
>> + * {"", regex, NULL} is returned.
>> + */
>> +static char**
>> +rstrtokenize(const char *str, const char *separators)
>> +{
>> + if (!str)
>> + return NULL;
>> + if (!strncmp(str,"re:",3)){
>> + char **list;
>> +
>> + list = calloc(3, sizeof(*list));
>> + if (!list)
>> + return NULL;
>> + list[0] = strdup("");
>> + list[1] = calloc(strlen(str)-2, sizeof(char));
>> + if (!list[0] || !list[1])
>> + goto error;
>> + strcpy(list[1], str+3);
>
> list[1] = xasprintf("%s", str + 3)
> would be simpler here.
>
> Dan, any comments?
I think at this point I've made myself pretty clear what I think about
this idea. I don't think I need to repeat myself again.
For the actual code, I would agree that extending the xf86MatchGroup
with a type member would be better than strangely manipulating the
array. It's also going to be a big performance drain to compile the
regex every time though the matching code. It would require more
surgery, but it would be better to compile the regexes once at parsing
time. By adding a type member to xf86MatchGroup and making the values
member void**, you can just cast to what's necessary in
MatchAttrToken.
--
Dan
More information about the xorg-devel
mailing list