Thread: Zero-length character breaking query?
G'day, I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible. The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH". "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". So, it's as if there is a zero-length character at the end of the value that is preventing a match. Is this possible? Ifso, how could this data have been created? Thanks, Doug Gorley dgorley@aihs.ca
On Mar 15, 2012, at 19:09, Doug Gorley <dgorley@aihs.ca> wrote: > G'day, > > I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible. > > The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH". > > "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. > "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". > > So, it's as if there is a zero-length character at the end of the value that is preventing a match. Is this possible? If so, how could this data have been created? > > Thanks, > > Doug Gorley > dgorley@aihs.ca > Try (in the regexp) adding '\r?\n' after SMITH and see what happens. How did you enter the SMITH record into the table in the first place? David J.
Doug Gorley <dgorley@aihs.ca> writes: > The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH". > "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. > "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". I'd check EXPLAIN (with the actual problematic string, not SMITH). The planner is probably trying to build an index range condition from the regex pattern --- is it doing the right thing given your locale? If the plan looks okay, maybe you need to reindex whatever index it's using. regards, tom lane
On Thu, Mar 15, 2012 at 05:09:32PM -0600, Doug Gorley wrote: > G'day, > > "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. > "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". What does octet_length return? Perhaps this is some nonprintable control character. Unicode is full of those. If it differs from the string length, then you can be pretty sure that's the case. Cheers, Peter -- http://sjamaan.ath.cx -- "The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music." -- Donald Knuth
Doug Gorley wrote: > > "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. > "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". what does the following return? select str_name_l::bytea from tdt_unsent where str_name_l ~ '^SMITH'; It should show you any chars that don't usually display in your locale as hex chars (i.e. \xNN). Bosco.
Thanks Tom; this is at a client site, so I have limited access, but it looks like a REINDEX resolves the issue. Doug -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: March 16, 2012 6:33 AM To: Doug Gorley Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Zero-length character breaking query? Doug Gorley <dgorley@aihs.ca> writes: > The table is called tdt_unsent. The field is str_name_l. For demonstration purposes, the value is "SMITH". > "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows. > "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows. > "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5". I'd check EXPLAIN (with the actual problematic string, not SMITH). The planner is probably trying to build an index range condition from the regex pattern --- is it doing the right thing givenyour locale? If the plan looks okay, maybe you need to reindex whatever index it's using. regards, tom lane