Thread: regexp character class locale awareness patch
Attached is a pacth against 7.2 which adds locale awareness to the character classes of the regular expression engine. Please consider including this feature to postgreSQL. Regards, Manuel.
Can someone who is multbyte-aware comment on this patch? Thanks. --------------------------------------------------------------------------- Manuel Sugawara wrote: > Attached is a pacth against 7.2 which adds locale awareness to > the character classes of the regular expression engine. Please > consider including this feature to postgreSQL. > > Regards, Manuel. Content-Description: regexp character class locale awareness patch [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 *** src/backend/regex/regcomp.c.org Sun Mar 17 16:39:13 2002 --- src/backend/regex/regcomp.c Sun Mar 17 16:53:43 2002 *************** *** 47,53 **** --- 47,64 ---- #include "regex/regex.h" #include "regex/utils.h" #include "regex/regex2.h" + #ifdef USE_LOCALE + struct cclass + { + char *name; + char *chars; + char *multis; + }; + static struct cclass* cclasses = NULL; + static struct cclass* cclass_init(void); + #else #include "regex/cclass.h" + #endif /* USE_LOCALE */ #include "regex/cname.h" /* *************** *** 174,179 **** --- 185,195 ---- pg_wchar *wcp; #endif + #ifdef USE_LOCALE + if ( cclasses == NULL ) + cclasses = cclass_init(); + #endif /* USE_LOCALE */ + #ifdef REDEBUG #define GOODFLAGS(f) (f) #else *************** *** 884,890 **** struct cclass *cp; size_t len; char *u; ! char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); --- 900,906 ---- struct cclass *cp; size_t len; char *u; ! unsigned char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); *************** *** 905,911 **** u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } --- 921,927 ---- u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } *************** *** 1716,1718 **** --- 1732,1796 ---- return (islower((unsigned char) c)); #endif } + + #ifdef USE_LOCALE + static struct cclass * + cclass_init(void) + { + struct cclass *cp = NULL; + struct cclass *classes = NULL; + struct cclass_factory + { + char *name; + int (*func)(int); + char *chars; + } cclass_factories [] = + { + { "alnum", isalnum, NULL }, + { "alpha", isalpha, NULL }, + { "blank", NULL, " \t" }, + { "cntrl", iscntrl, NULL }, + { "digit", NULL, "0123456789" }, + { "graph", isgraph, NULL }, + { "lower", islower, NULL }, + { "print", isprint, NULL }, + { "punct", ispunct, NULL }, + { "space", NULL, "\t\n\v\f\r " }, + { "upper", isupper, NULL }, + { "xdigit", isxdigit, NULL }, + { NULL, NULL, NULL } + }; + struct cclass_factory *cf = NULL; + + classes = malloc(sizeof(struct cclass) * (sizeof(cclass_factories) / sizeof(struct cclass_factory))); + if (classes == NULL) + elog(ERROR,"cclass_init: out of memory"); + + cp = classes; + for(cf = cclass_factories; cf->name != NULL; cf++) + { + cp->name = strdup(cf->name); + if ( cf->chars ) + cp->chars = strdup(cf->chars); + else + { + int x = 0, y = 0; + cp->chars = malloc(sizeof(char) * 256); + if (cp->chars == NULL) + elog(ERROR,"cclass_init: out of memory"); + for (x = 0; x < 256; x++) + { + if((cf->func)(x)) + *(cp->chars + y++) = x; + } + *(cp->chars + y) = '\0'; + } + cp->multis = ""; + cp++; + } + cp->name = cp->chars = NULL; + cp->multis = ""; + + return classes; + } + #endif /* USE_LOCALE */
> Can someone who is multbyte-aware comment on this patch? Thanks. I thought the patch is not relevant to multibyte support? -- Tatsuo Ishii
Tatsuo Ishii wrote: > > Can someone who is multbyte-aware comment on this patch? Thanks. > > I thought the patch is not relevant to multibyte support? Sorry, yes, it is for locale. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Whatever you do with this patch, remember that the USE_LOCALE symbol is gone. Bruce Momjian writes: > > Can someone who is multbyte-aware comment on this patch? Thanks. > > --------------------------------------------------------------------------- > > Manuel Sugawara wrote: > > Attached is a pacth against 7.2 which adds locale awareness to > > the character classes of the regular expression engine. Please > > consider including this feature to postgreSQL. -- Peter Eisentraut peter_e@gmx.net
> Whatever you do with this patch, remember that the USE_LOCALE symbol is > gone. I thought we have some way to tern off locale support at the configure time. -- Tatsuo Ishii
Tatsuo Ishii writes: > > Whatever you do with this patch, remember that the USE_LOCALE symbol is > > gone. > > I thought we have some way to tern off locale support at the configure > time. You do it at initdb time now. -- Peter Eisentraut peter_e@gmx.net
> Whatever you do with this patch, remember that the USE_LOCALE symbol is > gone. Then the patches should be modified. -- Tatsuo Ishii
Tatsuo Ishii wrote: > > Whatever you do with this patch, remember that the USE_LOCALE symbol is > > gone. > > Then the patches should be modified. Yes, I am not quite sure how to do that. I will research it unless someone else lends a hand. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian writes: > Tatsuo Ishii wrote: > > > Whatever you do with this patch, remember that the USE_LOCALE symbol is > > > gone. > > > > Then the patches should be modified. > > Yes, I am not quite sure how to do that. I will research it unless > someone else lends a hand. Basically, you manually preprocess the patch to include the USE_LOCALE branch and remove the not USE_LOCALE branch. However, if the no-locale branches have significant performance benefits then it might be worth pondering setting up some optimizations. -- Peter Eisentraut peter_e@gmx.net
According to POSIX -regex (7)-, standard character class are: alnum digit punct alpha graph space blank lower upper cntrl print xdigi Many of that classes are different in different locales, and currently all work as if the localization were C. Many of those tests have multibyte issues, however with the patch postgres will work for one-byte encondings, which is better than nothing. If someone (Tatsuo?) gives some advice I will work in the multibyte version. Peter Eisentraut <peter_e@gmx.net> writes: > > Basically, you manually preprocess the patch to include the > USE_LOCALE branch and remove the not USE_LOCALE branch. Yeah, that should work. You may also remove include/regex/cclass.h since it will not be used any more. > However, if the no-locale branches have significant performance > benefits then it might be worth pondering setting up some > optimizations. This is not the case. Regards, Manuel.
> According to POSIX -regex (7)-, standard character class are: > > alnum digit punct > alpha graph space > blank lower upper > cntrl print xdigi > > Many of that classes are different in different locales, and currently > all work as if the localization were C. Many of those tests have > multibyte issues, however with the patch postgres will work for > one-byte encondings, which is better than nothing. If someone > (Tatsuo?) gives some advice I will work in the multibyte version. I don't think character classes are applicable for most mutibyte encodings. Maybe only the exeception is Unicode? > Peter Eisentraut <peter_e@gmx.net> writes: > > > > Basically, you manually preprocess the patch to include the > > USE_LOCALE branch and remove the not USE_LOCALE branch. > > Yeah, that should work. You may also remove include/regex/cclass.h > since it will not be used any more. But I don't like cclass_init() routine runs every time when reg_comp called. In my understanding the result of cclass_init() is always same. What about running cclass_init() in postmaster, not postgres? Or even better in initdb time? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > I don't think character classes are applicable for most mutibyte > encodings. Maybe only the exeception is Unicode? Maybe, and is the only one I need ;-) > > > Peter Eisentraut <peter_e@gmx.net> writes: > > > > > > Basically, you manually preprocess the patch to include the > > > USE_LOCALE branch and remove the not USE_LOCALE branch. > > > > Yeah, that should work. You may also remove include/regex/cclass.h > > since it will not be used any more. > > But I don't like cclass_init() routine runs every time when reg_comp > called. Actually it is called once per backend and only if it uses the regular expression engine. > In my understanding the result of cclass_init() is always > same. Yes, if localization does not change. Karel once talked about the possibility of being able to have different locales in the same DB. > What about running cclass_init() in postmaster, not postgres? Or > even better in initdb time? It might be, but ... I think that it would be nice if we leave the door open to the possibility of having mixed locale configurations, across data bases or even across columns of the same table. Regards, Manuel.
Manuel Sugawara wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > > > Basically, you manually preprocess the patch to include the > > USE_LOCALE branch and remove the not USE_LOCALE branch. > > Yeah, that should work. You may also remove include/regex/cclass.h > since it will not be used any more. > > > However, if the no-locale branches have significant performance > > benefits then it might be worth pondering setting up some > > optimizations. > > This is not the case. Here is a patch based on this discussion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: src/backend/regex/regcomp.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/regex/regcomp.c,v retrieving revision 1.28 diff -c -r1.28 regcomp.c *** src/backend/regex/regcomp.c 28 Oct 2001 06:25:49 -0000 1.28 --- src/backend/regex/regcomp.c 16 Apr 2002 23:12:38 -0000 *************** *** 47,53 **** #include "regex/regex.h" #include "regex/utils.h" #include "regex/regex2.h" ! #include "regex/cclass.h" #include "regex/cname.h" /* --- 47,60 ---- #include "regex/regex.h" #include "regex/utils.h" #include "regex/regex2.h" ! struct cclass ! { ! char *name; ! char *chars; ! char *multis; ! }; ! static struct cclass* cclasses = NULL; ! static struct cclass* cclass_init(void); #include "regex/cname.h" /* *************** *** 174,179 **** --- 181,189 ---- pg_wchar *wcp; #endif + if ( cclasses == NULL ) + cclasses = cclass_init(); + #ifdef REDEBUG #define GOODFLAGS(f) (f) #else *************** *** 884,890 **** struct cclass *cp; size_t len; char *u; ! char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); --- 894,900 ---- struct cclass *cp; size_t len; char *u; ! unsigned char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); *************** *** 905,911 **** u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } --- 915,921 ---- u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } *************** *** 1715,1718 **** --- 1725,1788 ---- #else return (islower((unsigned char) c)); #endif + } + + static struct cclass * + cclass_init(void) + { + struct cclass *cp = NULL; + struct cclass *classes = NULL; + struct cclass_factory + { + char *name; + int (*func)(int); + char *chars; + } cclass_factories [] = + { + { "alnum", isalnum, NULL }, + { "alpha", isalpha, NULL }, + { "blank", NULL, " \t" }, + { "cntrl", iscntrl, NULL }, + { "digit", NULL, "0123456789" }, + { "graph", isgraph, NULL }, + { "lower", islower, NULL }, + { "print", isprint, NULL }, + { "punct", ispunct, NULL }, + { "space", NULL, "\t\n\v\f\r " }, + { "upper", isupper, NULL }, + { "xdigit", isxdigit, NULL }, + { NULL, NULL, NULL } + }; + struct cclass_factory *cf = NULL; + + classes = malloc(sizeof(struct cclass) * (sizeof(cclass_factories) / sizeof(struct cclass_factory))); + if (classes == NULL) + elog(ERROR,"cclass_init: out of memory"); + + cp = classes; + for(cf = cclass_factories; cf->name != NULL; cf++) + { + cp->name = strdup(cf->name); + if ( cf->chars ) + cp->chars = strdup(cf->chars); + else + { + int x = 0, y = 0; + cp->chars = malloc(sizeof(char) * 256); + if (cp->chars == NULL) + elog(ERROR,"cclass_init: out of memory"); + for (x = 0; x < 256; x++) + { + if((cf->func)(x)) + *(cp->chars + y++) = x; + } + *(cp->chars + y) = '\0'; + } + cp->multis = ""; + cp++; + } + cp->name = cp->chars = NULL; + cp->multis = ""; + + return classes; }
En Tue, 16 Apr 2002 19:21:50 -0400 (EDT) Bruce Momjian <pgman@candle.pha.pa.us> escribió: > Here is a patch based on this discussion. I still think the xdigit class could be handled the same way the digit class is (by enumeration rather than using the isxdigit function). That saves you a cicle, and I don't think there's any loss. -- Alvaro Herrera (<alvherre[a]atentus.com>) "The ability to monopolize a planet is insignificant next to the power of the source"
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Bruce Momjian wrote: > Manuel Sugawara wrote: > > Peter Eisentraut <peter_e@gmx.net> writes: > > > > > > Basically, you manually preprocess the patch to include the > > > USE_LOCALE branch and remove the not USE_LOCALE branch. > > > > Yeah, that should work. You may also remove include/regex/cclass.h > > since it will not be used any more. > > > > > However, if the no-locale branches have significant performance > > > benefits then it might be worth pondering setting up some > > > optimizations. > > > > This is not the case. > > Here is a patch based on this discussion. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > Index: src/backend/regex/regcomp.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/regex/regcomp.c,v > retrieving revision 1.28 > diff -c -r1.28 regcomp.c > *** src/backend/regex/regcomp.c 28 Oct 2001 06:25:49 -0000 1.28 > --- src/backend/regex/regcomp.c 16 Apr 2002 23:12:38 -0000 > *************** > *** 47,53 **** > #include "regex/regex.h" > #include "regex/utils.h" > #include "regex/regex2.h" > ! #include "regex/cclass.h" > #include "regex/cname.h" > > /* > --- 47,60 ---- > #include "regex/regex.h" > #include "regex/utils.h" > #include "regex/regex2.h" > ! struct cclass > ! { > ! char *name; > ! char *chars; > ! char *multis; > ! }; > ! static struct cclass* cclasses = NULL; > ! static struct cclass* cclass_init(void); > #include "regex/cname.h" > > /* > *************** > *** 174,179 **** > --- 181,189 ---- > pg_wchar *wcp; > #endif > > + if ( cclasses == NULL ) > + cclasses = cclass_init(); > + > #ifdef REDEBUG > #define GOODFLAGS(f) (f) > #else > *************** > *** 884,890 **** > struct cclass *cp; > size_t len; > char *u; > ! char c; > > while (MORE() && pg_isalpha(PEEK())) > NEXT(); > --- 894,900 ---- > struct cclass *cp; > size_t len; > char *u; > ! unsigned char c; > > while (MORE() && pg_isalpha(PEEK())) > NEXT(); > *************** > *** 905,911 **** > > u = cp->chars; > while ((c = *u++) != '\0') > ! CHadd(cs, c); > for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) > MCadd(p, cs, u); > } > --- 915,921 ---- > > u = cp->chars; > while ((c = *u++) != '\0') > ! CHadd(cs, c); > for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) > MCadd(p, cs, u); > } > *************** > *** 1715,1718 **** > --- 1725,1788 ---- > #else > return (islower((unsigned char) c)); > #endif > + } > + > + static struct cclass * > + cclass_init(void) > + { > + struct cclass *cp = NULL; > + struct cclass *classes = NULL; > + struct cclass_factory > + { > + char *name; > + int (*func)(int); > + char *chars; > + } cclass_factories [] = > + { > + { "alnum", isalnum, NULL }, > + { "alpha", isalpha, NULL }, > + { "blank", NULL, " \t" }, > + { "cntrl", iscntrl, NULL }, > + { "digit", NULL, "0123456789" }, > + { "graph", isgraph, NULL }, > + { "lower", islower, NULL }, > + { "print", isprint, NULL }, > + { "punct", ispunct, NULL }, > + { "space", NULL, "\t\n\v\f\r " }, > + { "upper", isupper, NULL }, > + { "xdigit", isxdigit, NULL }, > + { NULL, NULL, NULL } > + }; > + struct cclass_factory *cf = NULL; > + > + classes = malloc(sizeof(struct cclass) * (sizeof(cclass_factories) / sizeof(struct cclass_factory))); > + if (classes == NULL) > + elog(ERROR,"cclass_init: out of memory"); > + > + cp = classes; > + for(cf = cclass_factories; cf->name != NULL; cf++) > + { > + cp->name = strdup(cf->name); > + if ( cf->chars ) > + cp->chars = strdup(cf->chars); > + else > + { > + int x = 0, y = 0; > + cp->chars = malloc(sizeof(char) * 256); > + if (cp->chars == NULL) > + elog(ERROR,"cclass_init: out of memory"); > + for (x = 0; x < 256; x++) > + { > + if((cf->func)(x)) > + *(cp->chars + y++) = x; > + } > + *(cp->chars + y) = '\0'; > + } > + cp->multis = ""; > + cp++; > + } > + cp->name = cp->chars = NULL; > + cp->multis = ""; > + > + return classes; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
OK, once I apply the original patch, please submit a patch for this and people can comment on it. Thanks. --------------------------------------------------------------------------- Alvaro Herrera wrote: > En Tue, 16 Apr 2002 19:21:50 -0400 (EDT) > Bruce Momjian <pgman@candle.pha.pa.us> escribi?: > > > Here is a patch based on this discussion. > > I still think the xdigit class could be handled the same way the digit > class is (by enumeration rather than using the isxdigit function). That > saves you a cicle, and I don't think there's any loss. > > -- > Alvaro Herrera (<alvherre[a]atentus.com>) > "The ability to monopolize a planet is insignificant > next to the power of the source" > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Alvaro Herrera wrote: > En Tue, 16 Apr 2002 19:21:50 -0400 (EDT) > Bruce Momjian <pgman@candle.pha.pa.us> escribi?: > > > Here is a patch based on this discussion. > > I still think the xdigit class could be handled the same way the digit > class is (by enumeration rather than using the isxdigit function). That > saves you a cicle, and I don't think there's any loss. In fact, I will email you when I apply the original patch. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Alvaro Herrera wrote: > > En Tue, 16 Apr 2002 19:21:50 -0400 (EDT) > > Bruce Momjian <pgman@candle.pha.pa.us> escribi?: > > > > > Here is a patch based on this discussion. > > > > I still think the xdigit class could be handled the same way the digit > > class is (by enumeration rather than using the isxdigit function). That > > saves you a cicle, and I don't think there's any loss. > > In fact, I will email you when I apply the original patch. I miss that case :-(. Here is the pached patch. Regards, Manuel. Index: src/backend/regex/regcomp.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/regex/regcomp.c,v retrieving revision 1.28 diff -c -r1.28 regcomp.c *** src/backend/regex/regcomp.c 28 Oct 2001 06:25:49 -0000 1.28 --- src/backend/regex/regcomp.c 16 Apr 2002 23:12:38 -0000 *************** *** 47,53 **** #include "regex/regex.h" #include "regex/utils.h" #include "regex/regex2.h" ! #include "regex/cclass.h" #include "regex/cname.h" /* --- 47,60 ---- #include "regex/regex.h" #include "regex/utils.h" #include "regex/regex2.h" ! struct cclass ! { ! char *name; ! char *chars; ! char *multis; ! }; ! static struct cclass* cclasses = NULL; ! static struct cclass* cclass_init(void); #include "regex/cname.h" /* *************** *** 174,179 **** --- 181,189 ---- pg_wchar *wcp; #endif + if ( cclasses == NULL ) + cclasses = cclass_init(); + #ifdef REDEBUG #define GOODFLAGS(f) (f) #else *************** *** 884,890 **** struct cclass *cp; size_t len; char *u; ! char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); --- 894,900 ---- struct cclass *cp; size_t len; char *u; ! unsigned char c; while (MORE() && pg_isalpha(PEEK())) NEXT(); *************** *** 905,911 **** u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } --- 915,921 ---- u = cp->chars; while ((c = *u++) != '\0') ! CHadd(cs, c); for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) MCadd(p, cs, u); } *************** *** 1715,1718 **** --- 1725,1788 ---- #else return (islower((unsigned char) c)); #endif + } + + static struct cclass * + cclass_init(void) + { + struct cclass *cp = NULL; + struct cclass *classes = NULL; + struct cclass_factory + { + char *name; + int (*func)(int); + char *chars; + } cclass_factories [] = + { + { "alnum", isalnum, NULL }, + { "alpha", isalpha, NULL }, + { "blank", NULL, " \t" }, + { "cntrl", iscntrl, NULL }, + { "digit", NULL, "0123456789" }, + { "graph", isgraph, NULL }, + { "lower", islower, NULL }, + { "print", isprint, NULL }, + { "punct", ispunct, NULL }, + { "space", NULL, "\t\n\v\f\r " }, + { "upper", isupper, NULL }, + { "xdigit",NULL, "abcdefABCDEF0123456789" }, + { NULL, NULL, NULL } + }; + struct cclass_factory *cf = NULL; + + classes = malloc(sizeof(struct cclass) * (sizeof(cclass_factories) / sizeof(struct cclass_factory))); + if (classes == NULL) + elog(ERROR,"cclass_init: out of memory"); + + cp = classes; + for(cf = cclass_factories; cf->name != NULL; cf++) + { + cp->name = strdup(cf->name); + if ( cf->chars ) + cp->chars = strdup(cf->chars); + else + { + int x = 0, y = 0; + cp->chars = malloc(sizeof(char) * 256); + if (cp->chars == NULL) + elog(ERROR,"cclass_init: out of memory"); + for (x = 0; x < 256; x++) + { + if((cf->func)(x)) + *(cp->chars + y++) = x; + } + *(cp->chars + y) = '\0'; + } + cp->multis = ""; + cp++; + } + cp->name = cp->chars = NULL; + cp->multis = ""; + + return classes; }
OK, previous patch removed. This patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Manuel Sugawara wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Alvaro Herrera wrote: > > > En Tue, 16 Apr 2002 19:21:50 -0400 (EDT) > > > Bruce Momjian <pgman@candle.pha.pa.us> escribi?: > > > > > > > Here is a patch based on this discussion. > > > > > > I still think the xdigit class could be handled the same way the digit > > > class is (by enumeration rather than using the isxdigit function). That > > > saves you a cicle, and I don't think there's any loss. > > > > In fact, I will email you when I apply the original patch. > > I miss that case :-(. Here is the pached patch. > > Regards, > Manuel. > > Index: src/backend/regex/regcomp.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/regex/regcomp.c,v > retrieving revision 1.28 > diff -c -r1.28 regcomp.c > *** src/backend/regex/regcomp.c 28 Oct 2001 06:25:49 -0000 1.28 > --- src/backend/regex/regcomp.c 16 Apr 2002 23:12:38 -0000 > *************** > *** 47,53 **** > #include "regex/regex.h" > #include "regex/utils.h" > #include "regex/regex2.h" > ! #include "regex/cclass.h" > #include "regex/cname.h" > > /* > --- 47,60 ---- > #include "regex/regex.h" > #include "regex/utils.h" > #include "regex/regex2.h" > ! struct cclass > ! { > ! char *name; > ! char *chars; > ! char *multis; > ! }; > ! static struct cclass* cclasses = NULL; > ! static struct cclass* cclass_init(void); > #include "regex/cname.h" > > /* > *************** > *** 174,179 **** > --- 181,189 ---- > pg_wchar *wcp; > #endif > > + if ( cclasses == NULL ) > + cclasses = cclass_init(); > + > #ifdef REDEBUG > #define GOODFLAGS(f) (f) > #else > *************** > *** 884,890 **** > struct cclass *cp; > size_t len; > char *u; > ! char c; > > while (MORE() && pg_isalpha(PEEK())) > NEXT(); > --- 894,900 ---- > struct cclass *cp; > size_t len; > char *u; > ! unsigned char c; > > while (MORE() && pg_isalpha(PEEK())) > NEXT(); > *************** > *** 905,911 **** > > u = cp->chars; > while ((c = *u++) != '\0') > ! CHadd(cs, c); > for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) > MCadd(p, cs, u); > } > --- 915,921 ---- > > u = cp->chars; > while ((c = *u++) != '\0') > ! CHadd(cs, c); > for (u = cp->multis; *u != '\0'; u += strlen(u) + 1) > MCadd(p, cs, u); > } > *************** > *** 1715,1718 **** > --- 1725,1788 ---- > #else > return (islower((unsigned char) c)); > #endif > + } > + > + static struct cclass * > + cclass_init(void) > + { > + struct cclass *cp = NULL; > + struct cclass *classes = NULL; > + struct cclass_factory > + { > + char *name; > + int (*func)(int); > + char *chars; > + } cclass_factories [] = > + { > + { "alnum", isalnum, NULL }, > + { "alpha", isalpha, NULL }, > + { "blank", NULL, " \t" }, > + { "cntrl", iscntrl, NULL }, > + { "digit", NULL, "0123456789" }, > + { "graph", isgraph, NULL }, > + { "lower", islower, NULL }, > + { "print", isprint, NULL }, > + { "punct", ispunct, NULL }, > + { "space", NULL, "\t\n\v\f\r " }, > + { "upper", isupper, NULL }, > + { "xdigit",NULL, "abcdefABCDEF0123456789" }, > + { NULL, NULL, NULL } > + }; > + struct cclass_factory *cf = NULL; > + > + classes = malloc(sizeof(struct cclass) * (sizeof(cclass_factories) / sizeof(struct cclass_factory))); > + if (classes == NULL) > + elog(ERROR,"cclass_init: out of memory"); > + > + cp = classes; > + for(cf = cclass_factories; cf->name != NULL; cf++) > + { > + cp->name = strdup(cf->name); > + if ( cf->chars ) > + cp->chars = strdup(cf->chars); > + else > + { > + int x = 0, y = 0; > + cp->chars = malloc(sizeof(char) * 256); > + if (cp->chars == NULL) > + elog(ERROR,"cclass_init: out of memory"); > + for (x = 0; x < 256; x++) > + { > + if((cf->func)(x)) > + *(cp->chars + y++) = x; > + } > + *(cp->chars + y) = '\0'; > + } > + cp->multis = ""; > + cp++; > + } > + cp->name = cp->chars = NULL; > + cp->multis = ""; > + > + return classes; > } > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I miss that case :-(. Here is the pached patch. > > Regards, > Manuel. I also suggest that cclass_init() is called only if the locale is not "C". -- Tatsuo Ishii
Tatsuo Ishii wrote: > > I miss that case :-(. Here is the pached patch. > > > > Regards, > > Manuel. > > I also suggest that cclass_init() is called only if the locale is not > "C". OK, patch on hold while this is addressed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tatsuo Ishii wrote: > > > I miss that case :-(. Here is the pached patch. > > > > > > Regards, > > > Manuel. > > > > I also suggest that cclass_init() is called only if the locale is not > > "C". > > OK, patch on hold while this is addressed. Here is a patch which addresses Tatsuo's concerns (it does return an static struct instead of constructing it). Regards, Manuel.
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Manuel Sugawara wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Tatsuo Ishii wrote: > > > > I miss that case :-(. Here is the pached patch. > > > > > > > > Regards, > > > > Manuel. > > > > > > I also suggest that cclass_init() is called only if the locale is not > > > "C". > > > > OK, patch on hold while this is addressed. > > Here is a patch which addresses Tatsuo's concerns (it does return an > static struct instead of constructing it). > > Regards, > Manuel. > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
En 17 Apr 2002 22:53:32 -0600 Manuel Sugawara <masm@fciencias.unam.mx> escribió: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Tatsuo Ishii wrote: > > > > I miss that case :-(. Here is the pached patch. > > > > > > > > Regards, > > > > Manuel. > > > > > > I also suggest that cclass_init() is called only if the locale is not > > > "C". > > > > OK, patch on hold while this is addressed. > > Here is a patch which addresses Tatsuo's concerns (it does return an > static struct instead of constructing it). Is there a reason to use "" instead of NULL in the "multis" member of that static struct? -- Alvaro Herrera (<alvherre[a]atentus.com>) "La virtud es el justo medio entre dos defectos" (Aristoteles)
Alvaro Herrera <alvherre@atentus.com> writes: > En 17 Apr 2002 22:53:32 -0600 > Manuel Sugawara <masm@fciencias.unam.mx> escribió: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > > Tatsuo Ishii wrote: > > > > > I miss that case :-(. Here is the pached patch. > > > > > > > > > > Regards, > > > > > Manuel. > > > > > > > > I also suggest that cclass_init() is called only if the locale is not > > > > "C". > > > > > > OK, patch on hold while this is addressed. > > > > Here is a patch which addresses Tatsuo's concerns (it does return an > > static struct instead of constructing it). > > Is there a reason to use "" instead of NULL in the "multis" member of > that static struct? Yes, read the code. Regards, Manuel.