Thread: Zero-length character breaking query?

Zero-length character breaking query?

From
Doug Gorley
Date:
G'day,

I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible.

The table is called tdt_unsent.  The field is str_name_l.  For demonstration purposes,  the value is "SMITH".

"select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
"select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
"select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
"select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

So, it's as if there is a zero-length character at the end of the value that is preventing a match.  Is this possible?
Ifso, how could this data have been created? 

Thanks,

Doug Gorley
dgorley@aihs.ca


Re: Zero-length character breaking query?

From
David Johnston
Date:
On Mar 15, 2012, at 19:09, Doug Gorley <dgorley@aihs.ca> wrote:

> G'day,
>
> I believe I've got some bad data in a table, but I'm not sure how it got there, or how this scenario is possible.
>
> The table is called tdt_unsent.  The field is str_name_l.  For demonstration purposes,  the value is "SMITH".
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".
>
> So, it's as if there is a zero-length character at the end of the value that is preventing a match.  Is this
possible? If so, how could this data have been created? 
>
> Thanks,
>
> Doug Gorley
> dgorley@aihs.ca
>

Try (in the regexp) adding '\r?\n' after SMITH and see what happens.

How did you enter the SMITH record into the table in the first place?

David J.

Re: Zero-length character breaking query?

From
Tom Lane
Date:
Doug Gorley <dgorley@aihs.ca> writes:
> The table is called tdt_unsent.  The field is str_name_l.  For demonstration purposes,  the value is "SMITH".

> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

I'd check EXPLAIN (with the actual problematic string, not SMITH).
The planner is probably trying to build an index range condition from
the regex pattern --- is it doing the right thing given your locale?

If the plan looks okay, maybe you need to reindex whatever index it's
using.

            regards, tom lane

Re: Zero-length character breaking query?

From
Peter Bex
Date:
On Thu, Mar 15, 2012 at 05:09:32PM -0600, Doug Gorley wrote:
> G'day,
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

What does octet_length return?  Perhaps this is some nonprintable
control character.  Unicode is full of those.  If it differs from
the string length, then you can be pretty sure that's the case.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                            -- Donald Knuth

Re: Zero-length character breaking query?

From
Bosco Rama
Date:
Doug Gorley wrote:
>
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

what does the following return?
  select str_name_l::bytea from tdt_unsent where str_name_l ~ '^SMITH';

It should show you any chars that don't usually display in your locale as
hex chars (i.e. \xNN).

Bosco.

Re: Zero-length character breaking query?

From
Doug Gorley
Date:
Thanks Tom; this is at a client site, so I have limited access, but it looks like a REINDEX resolves the issue.

Doug

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: March 16, 2012 6:33 AM
To: Doug Gorley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Zero-length character breaking query?

Doug Gorley <dgorley@aihs.ca> writes:
> The table is called tdt_unsent.  The field is str_name_l.  For demonstration purposes,  the value is "SMITH".

> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" returns "5".

I'd check EXPLAIN (with the actual problematic string, not SMITH).
The planner is probably trying to build an index range condition from the regex pattern --- is it doing the right thing
givenyour locale? 

If the plan looks okay, maybe you need to reindex whatever index it's using.

            regards, tom lane