Thread: bytea, index and like operator
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, there is a bug in handling bytea columns with index and the like-operator. At least in 7.3.4. When the FreeBSD Port for 7.4 is ready, I'll test this ... ;-) When an index scan is active, a query dosn't give the correct result: select version(); version --------------------------------------------------------------------- PostgreSQL 7.3.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4 begin; create table test (b bytea); create index tst_idx on test(b); insert into test values ('\001abc\006'); insert into test values ('\001xabc\006'); insert into test values ('\001\002abc\006'); insert into test values ('\000\001\002abc\006'); insert into test values ('\002\003abc\006'); select * from test where b like '\001%'; Result: b --- (0 Zeilen) [0 rows] explain analyze select * from test where b like '\001%'; QUERY PLAN - --------------------------------------------------------------------------- - ------------------- Seq Scan on test (cost=0.00..22.50 rows=5 width=32) (actual time=0.05..0.08 rows=3 loops=1) Filter: (b ~~ '\\001%'::bytea) Total runtime: 0.16 msec (3 Zeilen) explain analyze select * from test where b like '\001%'; QUERY PLAN ------------------------------------------------------------------------ Index Scan using tst_idx on test (cost=0.00..17.07 rows=5 width=32) (actual time=0.04..0.04 rows=0 loops=1) Index Cond: (b = '0'::bytea) Filter: (b ~~ '\\001%'::bytea) Total runtime: 0.14 msec But with seq scan (after vacuuming, creating index later, ...) it works as expected. drop index tst_idx; online_demo=> select * from test where b like '\001%'; b ----------------- \001abc\006 \001xabc\006 \001\002abc\006 (3 Zeilen) explain analyze select * from test where b like '\001%'; QUERY PLAN ------------------------------------------------------------------------ Seq Scan on test (cost=0.00..22.50 rows=5 width=32) (actual time=0.05..0.08 rows=3 loops=1) Filter: (b ~~ '\\001%'::bytea) Total runtime: 0.16 msec hmmm ... It seems, that bytea is no good idea for production use? Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/zhtwOndlH63J86wRAh9VAJ9tjx/MrvbMPjlqhQqvhbXLaIG5owCfRbAn S65xELFQ6I9ObdzAXOTjIWM= =7DuO -----END PGP SIGNATURE-----
Alvar Freude wrote: > PostgreSQL 7.3.4 on i386-portbld-freebsd4.8, compiled by GCC 2.95.4 > > begin; > create table test (b bytea); > create index tst_idx on test(b); > insert into test values ('\001abc\006'); > insert into test values ('\001xabc\006'); > insert into test values ('\001\002abc\006'); > insert into test values ('\000\001\002abc\006'); > insert into test values ('\002\003abc\006'); Note that bytea input strings should be escaped with doubled backslashes, because the string literal parser consumes 1 layer, and the byteain function consumes another. See: http://www.postgresql.org/docs/7.3/static/datatype-binary.html In the strings above, the string literal parser will turn, e.g., "\001" into the single octet '\1' anyway, and byteain will accept it just fine. However "\000" will become '\0', and since byteain requires a null byte terminator, you are actually inserting an empty string into test.b for that row: regression=# select b, b = '' from test; b | ?column? -----------------+---------- \001abc\006 | f \001xabc\006 | f \001\002abc\006 | f | t \002\003abc\006 | f (5 rows) > select * from test where b like '\001%'; This is weird. I'm sure it worked at one time -- will research. Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > > Note that bytea input strings should be escaped with doubled backslashes, > because the string literal parser consumes 1 layer, and the byteain > function consumes another. See: > http://www.postgresql.org/docs/7.3/static/datatype-binary.html oh, shi.... -- Yes, I read this, but forgot it ... ;-( So, yes, you are right. Hmmm, there are a lot of pitfalls with bytea, *puh*! Sorry for confusion! Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/zlUROndlH63J86wRArzBAKDC8hXj0GLZrIFlaaIQBt8pk4yi3gCcCDxy kVnHzc3mklt6/8IcI6ZvD38= =yUYU -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Alvar Freude <alvar@a-blast.org> wrote: > there is a bug in handling bytea columns with index and the like-operator. > At least in 7.3.4. When the FreeBSD Port for 7.4 is ready, I'll test this > ... ;-) sorry, shame on me: I have to put two \ in the query (and inserts) for bytea. But it's interesting that without and with index search the handling of wrong data is different ... :) Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/zmq2OndlH63J86wRAsUvAJ9pTFwNNQAsG5cNXGdUcTwmkpIjeQCeNldH aJuVsiY2juZqrG73VeSERdc= =NJ6+ -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, - -- Joe Conway <mail@joeconway.com> wrote: > Note that bytea input strings should be escaped with doubled backslashes, > because the string literal parser consumes 1 layer, and the byteain > function consumes another. See: > http://www.postgresql.org/docs/7.3/static/datatype-binary.html hmmm, but there remains some confusing stuff. Perhaps I made some other mistake, but: I've a filled table with bytea strings (converted from former text column), and want to make a "like" comparison on it. So, now I'll make double backslashes for each byte: like doesn't select the rows, when a index exists. With "=" one row is selected. When the index is deleted, the correct three rows are returned. See below. Whats going wrong? The same happens in my application, when the strings are given unescaped. db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%'; bytea_col - ----- (0 rows) db=> SELECT bytea_col FROM table WHERE bytea_col = '\\001\\012\\010'; bytea_col - -------------- \001\012\010 (1 row) db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%'; bytea_col - ----- (0 rows) db=> begin; BEGIN db=> drop index table_bytea_col_idx; DROP INDEX db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%'; bytea_col - -------------------------- \001\012\010\001\001\001 \001\012\010 \001\012\010\001\001\002 (3 rows) db=> rollback; ROLLBACK db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010%'; bytea_col - ----- (0 rows) db=> SELECT bytea_col FROM table WHERE bytea_col like '\\001\\012\\010'; bytea_col - ----- (0 rows) Ciao Alvar - -- ** Alvar C.H. Freude -- http://alvar.a-blast.org/ ** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html ** ODEM.org-Tour: http://tour.odem.org/ ** Informationsgesellschaft: http://www.wsis-koordinierungskreis.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (FreeBSD) iD8DBQE/znDNOndlH63J86wRAtAyAKDJYq/KPSH7W4rJvO+VJQGe0OQi3wCfQOfr HmWZiQdc4MW5JecTG0dqwSg= =h3ag -----END PGP SIGNATURE-----