Thread: bytea, index and like operator again and detailed report
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, while changing a column from base255 encoded text (all except null byte) to bytea, I found the following bug in Postgresql's LIKE operator with indexes (it follows a more detailed description then my old mails in -bugs and - -general, including the proof of the bug): The index condition in the query plan for "where bytea_column like 'a%'" is: Index Cond: (bytea_col >= 'a'::bytea) AND (bytea_col < 'b'::bytea)) Filter: (bcol ~~ 'a%'::bytea) This is correct. The index condition in the query plan for "bytea_column like '\\141%'" ("a" in octal is 141) is exaclty the same, including filter condition. Index Cond: ((bcol >= 'a'::bytea) AND (bcol < 'b'::bytea)) Filter: (bcol ~~ 'a%'::bytea) This is also correct. The index condition in the query plan for "bytea_column like '\\001%'" is: Index Cond: (bcol = '0'::bytea) Filter: (bcol ~~ '\\001%'::bytea) THIS IS WRONG! Isn't it? If the byte is displayable in ASCII, then all is OK. If not, it seems that Postgres takes the first character of the octal number and uses this as comparison parameter. With "ä" (344) it takes "3" ... When index scan is disabled or from other reasons seqscan is used, the query plan and the result is correct. The result differs, if index is used or not used. I guess there is too much conversion between different character sets etc. A piece of test SQL and the results are attached. My Version is: PostgreSQL 7.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4 The same was with 7.3.4 Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/z6YbOndlH63J86wRAr+qAKCo6yi3/0HGO13IkKP2KbyH147kMACeKq7T WEKPu3dNKnesLqQUd9puyh0= =Sivh -----END PGP SIGNATURE-----
Attachment
Alvar Freude wrote: > while changing a column from base255 encoded text (all except null byte) to > bytea, I found the following bug in Postgresql's LIKE operator with indexes > (it follows a more detailed description then my old mails in -bugs and > - -general, including the proof of the bug): > Apparently you never read my reply to you all the way to the bottom. I said: Joe Conway wrote:> Alvar Freude wrote:>> select * from test where b like '\001%';> This is weird. I'm sure it worked atone time -- will research.>> Joe I'm actively working on your issue. Please quit spamming all the lists with it. Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > Apparently you never read my reply to you all the way to the bottom. I > said: oh, sorry, I understand your mail wrong! I understand it in this way, that you are not sure that there is something buggy, so I looked more deeply in this issue. > I'm actively working on your issue. Please quit spamming all the lists > with it. I'm sorry, but after reading the desciption of "bugs" again I realised that according to the list desciption "bugs" is the wrong one und "hackers" the right; so, after debugging my code two days I'm now sure that there is something wrong with Postgres. Normally such Problems are in my Application ;-) If there is no answer like "Ah, it really seems that there is a bug, thanks for the pointer, I'll look for it" I every time guess that the issue depends on a mistake by myself or it is forgotten ... ;-) Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/z7Q5OndlH63J86wRAvrHAKDLAEz7X8ZeGah0CvL9QmglVMZrfwCdHAGr H17Kp6qy65jj32lBvsC/9zY= =b1l7 -----END PGP SIGNATURE-----
Alvar Freude wrote: > I'm sorry, but after reading the desciption of "bugs" again I realised that > according to the list desciption "bugs" is the wrong one und "hackers" the > right; so, after debugging my code two days I'm now sure that there is > something wrong with Postgres. Normally such Problems are in my Application > ;-) Actually "bugs" is the best place to post bug reports if you think they might be in Postgres and not your app. > If there is no answer like "Ah, it really seems that there is a bug, thanks > for the pointer, I'll look for it" I every time guess that the issue > depends on a mistake by myself or it is forgotten ... ;-) OK -- I guess I could have been more clear myself. Anyway I've been working on this on-and-off for a day now. I understand the root cause, but am still trying to figure out what the "right" solution is. It may take another day or so due to other things I have going on (like my job ;-)). Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > I understand the root cause, > but am still trying to figure out what the "right" solution is. It may > take another day or so due to other things I have going on (like my job > ;-)). :-) If you need some testing data, I can give you a dump ... > Actually "bugs" is the best place to post bug reports if you think they > might be in Postgres and not your app. perhaps someone should change the list descriptions at http://www.postgresql.org/lists.html pgsql-bugs If you a find a bug, please fill out the form typically located in the PostgreSQL source code at src/pgsql/doc/bug.templateand mail it to this mailing list. [...] pgsql-hackers (Developers List) [...] This list is for the discussion of current development issues, problems and bugsand the discussion of proposed new features. [...] ;-) So, my idea was: Uups, it was the wrong group, therefore no real answer ;) Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/z7giOndlH63J86wRAmBwAJ46KjKlahE8vYcVG33lOsmi2sGqiACgluPn 8CE/HqK+OqYuGQONXnBszP0= =5+rU -----END PGP SIGNATURE-----
Alvar Freude wrote: > while changing a column from base255 encoded text (all except null byte) to > bytea, I found the following bug in Postgresql's LIKE operator with indexes > (it follows a more detailed description then my old mails in -bugs and > - -general, including the proof of the bug): Please try the attached patch and let me know how it works for you. It is against cvs HEAD, but should apply OK to 7.4. Thanks, Joe Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.149 diff -c -r1.149 selfuncs.c *** src/backend/utils/adt/selfuncs.c 29 Nov 2003 19:51:59 -0000 1.149 --- src/backend/utils/adt/selfuncs.c 5 Dec 2003 19:42:39 -0000 *************** *** 184,189 **** --- 184,190 ---- static Selectivity pattern_selectivity(Const *patt, Pattern_Type ptype); static Datum string_to_datum(const char *str, Oid datatype); static Const *string_to_const(const char *str, Oid datatype); + static Const *string_to_bytea_const(const char *str, size_t str_len); /* *************** *** 3135,3154 **** } else { ! patt = DatumGetCString(DirectFunctionCall1(byteaout, patt_const->constvalue)); ! pattlen = toast_raw_datum_size(patt_const->constvalue) - VARHDRSZ; } match = palloc(pattlen + 1); match_pos = 0; - for (pos = 0; pos < pattlen; pos++) { /* % and _ are wildcard characters in LIKE */ if (patt[pos] == '%' || patt[pos] == '_') break; ! /* Backslash quotes the next character */ if (patt[pos] == '\\') { pos++; --- 3136,3166 ---- } else { ! bytea *bstr = DatumGetByteaP(patt_const->constvalue); ! ! pattlen = VARSIZE(bstr) - VARHDRSZ; ! if (pattlen > 0) ! { ! patt = (char *) palloc(pattlen); ! memcpy(patt, VARDATA(bstr), pattlen); ! } ! else ! patt = NULL; ! ! if ((Pointer) bstr != DatumGetPointer(patt_const->constvalue)) ! pfree(bstr); } match = palloc(pattlen + 1); match_pos = 0; for (pos = 0; pos < pattlen; pos++) { /* % and _ are wildcard characters in LIKE */ if (patt[pos] == '%' || patt[pos] == '_') break; ! ! /* Backslash escapes the next character */ if (patt[pos] == '\\') { pos++; *************** *** 3174,3183 **** match[match_pos] = '\0'; rest = &patt[pos]; ! *prefix_const = string_to_const(match, typeid); ! *rest_const = string_to_const(rest, typeid); ! pfree(patt); pfree(match); /* in LIKE, an empty pattern is an exact match! */ --- 3186,3204 ---- match[match_pos] = '\0'; rest = &patt[pos]; ! if (typeid != BYTEAOID) ! { ! *prefix_const = string_to_const(match, typeid); ! *rest_const = string_to_const(rest, typeid); ! } ! else ! { ! *prefix_const = string_to_bytea_const(match, match_pos); ! *rest_const = string_to_bytea_const(rest, pattlen - match_pos); ! } ! if (patt != NULL) ! pfree(patt); pfree(match); /* in LIKE, an empty pattern is an exact match! */ *************** *** 3500,3508 **** } else { ! patt = DatumGetCString(DirectFunctionCall1(byteaout, patt_const->constvalue)); ! pattlen = toast_raw_datum_size(patt_const->constvalue) - VARHDRSZ; } /* Skip any leading %; it's already factored into initial sel */ start = (*patt == '%') ? 1 : 0; --- 3521,3542 ---- } else { ! bytea *bstr = DatumGetByteaP(patt_const->constvalue); ! ! pattlen = VARSIZE(bstr) - VARHDRSZ; ! if (pattlen > 0) ! { ! patt = (char *) palloc(pattlen); ! memcpy(patt, VARDATA(bstr), pattlen); ! } ! else ! patt = NULL; ! ! if ((Pointer) bstr != DatumGetPointer(patt_const->constvalue)) ! pfree(bstr); } + /* patt should never be NULL in practice */ + Assert(patt != NULL); /* Skip any leading %; it's already factored into initial sel */ start = (*patt == '%') ? 1 : 0; *************** *** 3693,3700 **** /* * Try to generate a string greater than the given string or any ! * string it is a prefix of. If successful, return a palloc'd string; ! * else return NULL. * * The key requirement here is that given a prefix string, say "foo", * we must be able to generate another string "fop" that is greater --- 3727,3734 ---- /* * Try to generate a string greater than the given string or any ! * string it is a prefix of. If successful, return a palloc'd string ! * in the form of a Const pointer; else return NULL. * * The key requirement here is that given a prefix string, say "foo", * we must be able to generate another string "fop" that is greater *************** *** 3712,3738 **** make_greater_string(const Const *str_const) { Oid datatype = str_const->consttype; - char *str; char *workstr; int len; /* Get the string and a modifiable copy */ if (datatype == NAMEOID) { ! str = DatumGetCString(DirectFunctionCall1(nameout, str_const->constvalue)); ! len = strlen(str); } else if (datatype == BYTEAOID) { ! str = DatumGetCString(DirectFunctionCall1(byteaout, str_const->constvalue)); ! len = toast_raw_datum_size(str_const->constvalue) - VARHDRSZ; } else { ! str = DatumGetCString(DirectFunctionCall1(textout, str_const->constvalue)); ! len = strlen(str); } - workstr = pstrdup(str); while (len > 0) { --- 3746,3783 ---- make_greater_string(const Const *str_const) { Oid datatype = str_const->consttype; char *workstr; int len; /* Get the string and a modifiable copy */ if (datatype == NAMEOID) { ! workstr = DatumGetCString(DirectFunctionCall1(nameout, ! str_const->constvalue)); ! len = strlen(workstr); } else if (datatype == BYTEAOID) { ! bytea *bstr = DatumGetByteaP(str_const->constvalue); ! ! len = VARSIZE(bstr) - VARHDRSZ; ! if (len > 0) ! { ! workstr = (char *) palloc(len); ! memcpy(workstr, VARDATA(bstr), len); ! } ! else ! workstr = NULL; ! ! if ((Pointer) bstr != DatumGetPointer(str_const->constvalue)) ! pfree(bstr); } else { ! workstr = DatumGetCString(DirectFunctionCall1(textout, ! str_const->constvalue)); ! len = strlen(workstr); } while (len > 0) { *************** *** 3747,3755 **** Const *workstr_const; (*lastchar)++; ! workstr_const = string_to_const(workstr, datatype); - pfree(str); pfree(workstr); return workstr_const; } --- 3792,3802 ---- Const *workstr_const; (*lastchar)++; ! if (datatype != BYTEAOID) ! workstr_const = string_to_const(workstr, datatype); ! else ! workstr_const = string_to_bytea_const(workstr, len); pfree(workstr); return workstr_const; } *************** *** 3771,3778 **** } /* Failed... */ ! pfree(str); ! pfree(workstr); return (Const *) NULL; } --- 3818,3825 ---- } /* Failed... */ ! if (workstr != NULL) ! pfree(workstr); return (Const *) NULL; } *************** *** 3809,3814 **** --- 3856,3877 ---- return makeConst(datatype, ((datatype == NAMEOID) ? NAMEDATALEN : -1), conval, false, false); + } + + /* + * Generate a Const node of bytea type from a binary C string and a length. + */ + static Const * + string_to_bytea_const(const char *str, size_t str_len) + { + bytea *bstr = palloc(VARHDRSZ + str_len); + Datum conval; + + memcpy(VARDATA(bstr), str, str_len); + VARATT_SIZEP(bstr) = VARHDRSZ + str_len; + conval = PointerGetDatum(bstr); + + return makeConst(BYTEAOID, -1, conval, false, false); } /*-------------------------------------------------------------------------
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Joe, - -- Joe Conway <mail@joeconway.com> wrote: > Please try the attached patch and let me know how it works for you. It is > against cvs HEAD, but should apply OK to 7.4. has this about one week time? I travel on monday to Geneva (World Summit on the Information Society) and have a lot to prepare ... :-( I'll make a report when I have some minutes (or hours ;) ) ... Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/0RU8OndlH63J86wRAkKIAKDA6iRUyFg3qm7GRrUs9OnlhozVKwCfSVmk jI5lGQ0vQLj22qsMCTLpn4I= =au3Z -----END PGP SIGNATURE-----
Alvar Freude wrote: > has this about one week time? I travel on monday to Geneva (World Summit on > the Information Society) and have a lot to prepare ... :-( > > I'll make a report when I have some minutes (or hours ;) ) ... Well, 7.4.1 will be bundled up for release on Sunday, so it would be ideal to get some feedback sooner if possible. Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - -- Joe Conway <mail@joeconway.com> wrote: > Well, 7.4.1 will be bundled up for release on Sunday, so it would be > ideal to get some feedback sooner if possible. this is a good argument ... ;) I'll try to check it ... Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/0S1jOndlH63J86wRAkMiAJ4xa9t4OR+Oif/IvUEMgZ5Yl52lpwCgnboU bIyeU+IwfK3C34JltRmBuoU= =zHKC -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > Please try the attached patch and let me know how it works for you. It is > against cvs HEAD, but should apply OK to 7.4. so, I checked it with my database. It looks good, all checks I made are OK. I realised, that converting my text column to bytea is much faster then before: UPDATE forum_gtree SET gid2 = (decode(replace(gid, '\\', '\\\\'), 'escape')); This may be because some other side effects (but it's the same machine and same disc, same partition), but also perhaps some changes in the patch ... MANY THANKS! When I'm back from WSIS I make the next try to change my application from Base255-text to bytea ;-) Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/0Tg6OndlH63J86wRAiLfAJ98wLJeeyEuFXG07Pa7ba/jM7LzxQCgzWa1 RGv3FtQ8JP/MZjCWuqbdnsY= =U9Ng -----END PGP SIGNATURE-----