Thread: bytea, index and like operator again and detailed report

bytea, index and like operator again and detailed report

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

while changing a column from base255 encoded text (all except null byte) to
bytea, I found the following bug in Postgresql's LIKE operator with indexes
(it follows a more detailed description then my old mails in -bugs and
- -general, including the proof of the bug):


The index condition in the query plan for "where bytea_column like 'a%'" is:

   Index Cond: (bytea_col >= 'a'::bytea) AND (bytea_col < 'b'::bytea))
   Filter: (bcol ~~ 'a%'::bytea)

This is correct.


The index condition in the query plan for "bytea_column like '\\141%'" ("a"
in octal is 141) is exaclty the same, including filter condition.

   Index Cond: ((bcol >= 'a'::bytea) AND (bcol < 'b'::bytea))
   Filter: (bcol ~~ 'a%'::bytea)

This is also correct.


The index condition in the query plan for "bytea_column like '\\001%'" is:

   Index Cond: (bcol = '0'::bytea)
   Filter: (bcol ~~ '\\001%'::bytea)


THIS IS WRONG! Isn't it?


If the byte is displayable in ASCII, then all is OK. If not, it seems that
Postgres takes the first character of the octal number and uses this as
comparison parameter.
With "ä" (344) it takes "3" ...


When index scan is disabled or from other reasons seqscan is used, the
query plan and the result is correct.

The result differs, if index is used or not used.

I guess there is too much conversion between different character sets etc.


A piece of test SQL and the results are attached.

My Version is:
PostgreSQL 7.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4

The same was with 7.3.4


Ciao
  Alvar


- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/z6YbOndlH63J86wRAr+qAKCo6yi3/0HGO13IkKP2KbyH147kMACeKq7T
WEKPu3dNKnesLqQUd9puyh0=
=Sivh
-----END PGP SIGNATURE-----

Attachment

Re: bytea, index and like operator again and detailed report

From
Joe Conway
Date:
Alvar Freude wrote:
> while changing a column from base255 encoded text (all except null byte) to
> bytea, I found the following bug in Postgresql's LIKE operator with indexes
> (it follows a more detailed description then my old mails in -bugs and
> - -general, including the proof of the bug):
> 

Apparently you never read my reply to you all the way to the bottom. I said:

Joe Conway wrote:> Alvar Freude wrote:>>   select * from test where b like '\001%';> This is weird. I'm sure it worked
atone time -- will research.>> Joe
 

I'm actively working on your issue. Please quit spamming all the lists 
with it.

Joe




Re: bytea, index and like operator again and detailed

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail@joeconway.com> wrote:

> Apparently you never read my reply to you all the way to the bottom. I
> said:

oh, sorry, I understand your mail wrong!
I understand it in this way, that you are not sure that there is something
buggy, so I looked more deeply in this issue. 

> I'm actively working on your issue. Please quit spamming all the lists
> with it.

I'm sorry, but after reading the desciption of "bugs" again I realised that
according to the list desciption "bugs" is the wrong one und "hackers" the
right; so, after debugging my code two days I'm now sure that there is
something wrong with Postgres. Normally such Problems are in my Application
;-)

If there is no answer like "Ah, it really seems that there is a bug, thanks
for the pointer, I'll look for it" I every time guess that the issue
depends on a mistake by myself or it is forgotten ... ;-)


Ciao Alvar

- -- 
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/z7Q5OndlH63J86wRAvrHAKDLAEz7X8ZeGah0CvL9QmglVMZrfwCdHAGr
H17Kp6qy65jj32lBvsC/9zY=
=b1l7
-----END PGP SIGNATURE-----



Re: bytea, index and like operator again and detailed report

From
Joe Conway
Date:
Alvar Freude wrote:
> I'm sorry, but after reading the desciption of "bugs" again I realised that
> according to the list desciption "bugs" is the wrong one und "hackers" the
> right; so, after debugging my code two days I'm now sure that there is
> something wrong with Postgres. Normally such Problems are in my Application
> ;-)

Actually "bugs" is the best place to post bug reports if you think they 
might be in Postgres and not your app.

> If there is no answer like "Ah, it really seems that there is a bug, thanks
> for the pointer, I'll look for it" I every time guess that the issue
> depends on a mistake by myself or it is forgotten ... ;-)

OK -- I guess I could have been more clear myself. Anyway I've been 
working on this on-and-off for a day now. I understand the root cause, 
but am still trying to figure out what the "right" solution is. It may 
take another day or so due to other things I have going on (like my job 
;-)).

Joe




Re: bytea, index and like operator again and detailed

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail@joeconway.com> wrote:

> I understand the root cause,
> but am still trying to figure out what the "right" solution is. It may
> take another day or so due to other things I have going on (like my job
> ;-)).

:-)

If you need some testing data, I can give you a dump ...


> Actually "bugs" is the best place to post bug reports if you think they
> might be in Postgres and not your app.

perhaps someone should change the list descriptions at
 http://www.postgresql.org/lists.html

 pgsql-bugs If you a find a bug, please fill out the form typically located  in the PostgreSQL source code at
src/pgsql/doc/bug.templateand  mail it to this mailing list.
 
 [...]
 pgsql-hackers (Developers List) [...] This list is for the discussion of current development issues,  problems and
bugsand the discussion of proposed new features.
 
 [...]

;-)


So, my idea was: Uups, it was the wrong group, therefore no real answer ;)

Ciao Alvar

- -- 
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/z7giOndlH63J86wRAmBwAJ46KjKlahE8vYcVG33lOsmi2sGqiACgluPn
8CE/HqK+OqYuGQONXnBszP0=
=5+rU
-----END PGP SIGNATURE-----



Re: bytea, index and like operator again and detailed report

From
Joe Conway
Date:
Alvar Freude wrote:
> while changing a column from base255 encoded text (all except null byte) to
> bytea, I found the following bug in Postgresql's LIKE operator with indexes
> (it follows a more detailed description then my old mails in -bugs and
> - -general, including the proof of the bug):

Please try the attached patch and let me know how it works for you. It
is against cvs HEAD, but should apply OK to 7.4.

Thanks,

Joe

Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.149
diff -c -r1.149 selfuncs.c
*** src/backend/utils/adt/selfuncs.c    29 Nov 2003 19:51:59 -0000    1.149
--- src/backend/utils/adt/selfuncs.c    5 Dec 2003 19:42:39 -0000
***************
*** 184,189 ****
--- 184,190 ----
  static Selectivity pattern_selectivity(Const *patt, Pattern_Type ptype);
  static Datum string_to_datum(const char *str, Oid datatype);
  static Const *string_to_const(const char *str, Oid datatype);
+ static Const *string_to_bytea_const(const char *str, size_t str_len);


  /*
***************
*** 3135,3154 ****
      }
      else
      {
!         patt = DatumGetCString(DirectFunctionCall1(byteaout, patt_const->constvalue));
!         pattlen = toast_raw_datum_size(patt_const->constvalue) - VARHDRSZ;
      }

      match = palloc(pattlen + 1);
      match_pos = 0;
-
      for (pos = 0; pos < pattlen; pos++)
      {
          /* % and _ are wildcard characters in LIKE */
          if (patt[pos] == '%' ||
              patt[pos] == '_')
              break;
!         /* Backslash quotes the next character */
          if (patt[pos] == '\\')
          {
              pos++;
--- 3136,3166 ----
      }
      else
      {
!         bytea   *bstr = DatumGetByteaP(patt_const->constvalue);
!
!         pattlen = VARSIZE(bstr) - VARHDRSZ;
!         if (pattlen > 0)
!         {
!             patt = (char *) palloc(pattlen);
!             memcpy(patt, VARDATA(bstr), pattlen);
!         }
!         else
!             patt = NULL;
!
!         if ((Pointer) bstr != DatumGetPointer(patt_const->constvalue))
!             pfree(bstr);
      }

      match = palloc(pattlen + 1);
      match_pos = 0;
      for (pos = 0; pos < pattlen; pos++)
      {
          /* % and _ are wildcard characters in LIKE */
          if (patt[pos] == '%' ||
              patt[pos] == '_')
              break;
!
!         /* Backslash escapes the next character */
          if (patt[pos] == '\\')
          {
              pos++;
***************
*** 3174,3183 ****
      match[match_pos] = '\0';
      rest = &patt[pos];

!     *prefix_const = string_to_const(match, typeid);
!     *rest_const = string_to_const(rest, typeid);

!     pfree(patt);
      pfree(match);

      /* in LIKE, an empty pattern is an exact match! */
--- 3186,3204 ----
      match[match_pos] = '\0';
      rest = &patt[pos];

!     if (typeid != BYTEAOID)
!     {
!         *prefix_const = string_to_const(match, typeid);
!         *rest_const = string_to_const(rest, typeid);
!     }
!     else
!     {
!         *prefix_const = string_to_bytea_const(match, match_pos);
!         *rest_const = string_to_bytea_const(rest, pattlen - match_pos);
!     }

!     if (patt != NULL)
!         pfree(patt);
      pfree(match);

      /* in LIKE, an empty pattern is an exact match! */
***************
*** 3500,3508 ****
      }
      else
      {
!         patt = DatumGetCString(DirectFunctionCall1(byteaout, patt_const->constvalue));
!         pattlen = toast_raw_datum_size(patt_const->constvalue) - VARHDRSZ;
      }

      /* Skip any leading %; it's already factored into initial sel */
      start = (*patt == '%') ? 1 : 0;
--- 3521,3542 ----
      }
      else
      {
!         bytea   *bstr = DatumGetByteaP(patt_const->constvalue);
!
!         pattlen = VARSIZE(bstr) - VARHDRSZ;
!         if (pattlen > 0)
!         {
!             patt = (char *) palloc(pattlen);
!             memcpy(patt, VARDATA(bstr), pattlen);
!         }
!         else
!             patt = NULL;
!
!         if ((Pointer) bstr != DatumGetPointer(patt_const->constvalue))
!             pfree(bstr);
      }
+     /* patt should never be NULL in practice */
+     Assert(patt != NULL);

      /* Skip any leading %; it's already factored into initial sel */
      start = (*patt == '%') ? 1 : 0;
***************
*** 3693,3700 ****

  /*
   * Try to generate a string greater than the given string or any
!  * string it is a prefix of.  If successful, return a palloc'd string;
!  * else return NULL.
   *
   * The key requirement here is that given a prefix string, say "foo",
   * we must be able to generate another string "fop" that is greater
--- 3727,3734 ----

  /*
   * Try to generate a string greater than the given string or any
!  * string it is a prefix of.  If successful, return a palloc'd string
!  * in the form of a Const pointer; else return NULL.
   *
   * The key requirement here is that given a prefix string, say "foo",
   * we must be able to generate another string "fop" that is greater
***************
*** 3712,3738 ****
  make_greater_string(const Const *str_const)
  {
      Oid            datatype = str_const->consttype;
-     char       *str;
      char       *workstr;
      int            len;

      /* Get the string and a modifiable copy */
      if (datatype == NAMEOID)
      {
!         str = DatumGetCString(DirectFunctionCall1(nameout, str_const->constvalue));
!         len = strlen(str);
      }
      else if (datatype == BYTEAOID)
      {
!         str = DatumGetCString(DirectFunctionCall1(byteaout, str_const->constvalue));
!         len = toast_raw_datum_size(str_const->constvalue) - VARHDRSZ;
      }
      else
      {
!         str = DatumGetCString(DirectFunctionCall1(textout, str_const->constvalue));
!         len = strlen(str);
      }
-     workstr = pstrdup(str);

      while (len > 0)
      {
--- 3746,3783 ----
  make_greater_string(const Const *str_const)
  {
      Oid            datatype = str_const->consttype;
      char       *workstr;
      int            len;

      /* Get the string and a modifiable copy */
      if (datatype == NAMEOID)
      {
!         workstr = DatumGetCString(DirectFunctionCall1(nameout,
!                                                       str_const->constvalue));
!         len = strlen(workstr);
      }
      else if (datatype == BYTEAOID)
      {
!         bytea   *bstr = DatumGetByteaP(str_const->constvalue);
!
!         len = VARSIZE(bstr) - VARHDRSZ;
!         if (len > 0)
!         {
!             workstr = (char *) palloc(len);
!             memcpy(workstr, VARDATA(bstr), len);
!         }
!         else
!             workstr = NULL;
!
!         if ((Pointer) bstr != DatumGetPointer(str_const->constvalue))
!             pfree(bstr);
      }
      else
      {
!         workstr = DatumGetCString(DirectFunctionCall1(textout,
!                                                       str_const->constvalue));
!         len = strlen(workstr);
      }

      while (len > 0)
      {
***************
*** 3747,3755 ****
              Const       *workstr_const;

              (*lastchar)++;
!             workstr_const = string_to_const(workstr, datatype);

-             pfree(str);
              pfree(workstr);
              return workstr_const;
          }
--- 3792,3802 ----
              Const       *workstr_const;

              (*lastchar)++;
!             if (datatype != BYTEAOID)
!                 workstr_const = string_to_const(workstr, datatype);
!             else
!                 workstr_const = string_to_bytea_const(workstr, len);

              pfree(workstr);
              return workstr_const;
          }
***************
*** 3771,3778 ****
      }

      /* Failed... */
!     pfree(str);
!     pfree(workstr);

      return (Const *) NULL;
  }
--- 3818,3825 ----
      }

      /* Failed... */
!     if (workstr != NULL)
!         pfree(workstr);

      return (Const *) NULL;
  }
***************
*** 3809,3814 ****
--- 3856,3877 ----

      return makeConst(datatype, ((datatype == NAMEOID) ? NAMEDATALEN : -1),
                       conval, false, false);
+ }
+
+ /*
+  * Generate a Const node of bytea type from a binary C string and a length.
+  */
+ static Const *
+ string_to_bytea_const(const char *str, size_t str_len)
+ {
+     bytea  *bstr = palloc(VARHDRSZ + str_len);
+     Datum    conval;
+
+     memcpy(VARDATA(bstr), str, str_len);
+     VARATT_SIZEP(bstr) = VARHDRSZ + str_len;
+     conval = PointerGetDatum(bstr);
+
+     return makeConst(BYTEAOID, -1, conval, false, false);
  }

  /*-------------------------------------------------------------------------

Re: bytea, index and like operator again and detailed

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Joe,

- -- Joe Conway <mail@joeconway.com> wrote:

> Please try the attached patch and let me know how it works for you. It is
> against cvs HEAD, but should apply OK to 7.4.

has this about one week time? I travel on monday to Geneva (World Summit on
the Information Society) and have a lot to prepare ... :-(


I'll make a report when I have some minutes (or hours ;) ) ...


Ciao Alvar


- -- 
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/0RU8OndlH63J86wRAkKIAKDA6iRUyFg3qm7GRrUs9OnlhozVKwCfSVmk
jI5lGQ0vQLj22qsMCTLpn4I=
=au3Z
-----END PGP SIGNATURE-----



Re: bytea, index and like operator again and detailed report

From
Joe Conway
Date:
Alvar Freude wrote:
> has this about one week time? I travel on monday to Geneva (World Summit on
> the Information Society) and have a lot to prepare ... :-(
> 
> I'll make a report when I have some minutes (or hours ;) ) ...

Well, 7.4.1 will be bundled up for release on Sunday, so it would be 
ideal to get some feedback sooner if possible.

Joe



Re: bytea, index and like operator again and detailed

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- -- Joe Conway <mail@joeconway.com> wrote:

> Well, 7.4.1 will be bundled up for release on Sunday, so it would be
> ideal to get some feedback sooner if possible.

this is a good argument ... ;)

I'll try to check it ...


Ciao Alvar

- -- 
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/0S1jOndlH63J86wRAkMiAJ4xa9t4OR+Oif/IvUEMgZ5Yl52lpwCgnboU
bIyeU+IwfK3C34JltRmBuoU=
=zHKC
-----END PGP SIGNATURE-----



Re: bytea, index and like operator again and detailed

From
Alvar Freude
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- Joe Conway <mail@joeconway.com> wrote:

> Please try the attached patch and let me know how it works for you. It is
> against cvs HEAD, but should apply OK to 7.4.

so, I checked it with my database. 
It looks good, all checks I made are OK.


I realised, that converting my text column to bytea is much faster then
before:

UPDATE forum_gtree SET gid2 = (decode(replace(gid, '\\', '\\\\'),
'escape'));


This may be because some other side effects (but it's the same machine and
same disc, same partition), but also perhaps some changes in the patch ...


MANY THANKS!


When I'm back from WSIS I make the next try to change my application from
Base255-text to bytea ;-)


Ciao Alvar



- -- 
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
**   Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
**   ODEM.org-Tour: http://tour.odem.org/
**   Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/0Tg6OndlH63J86wRAiLfAJ98wLJeeyEuFXG07Pa7ba/jM7LzxQCgzWa1
RGv3FtQ8JP/MZjCWuqbdnsY=
=U9Ng
-----END PGP SIGNATURE-----