Thread: Identifying Reason for Column Name Returned by SELECT

Identifying Reason for Column Name Returned by SELECT

From
Rich Shepard
Date:
   I run this SELECT statement on a table:

select distinct(site_id) from chemistry order by site_id;

and in the returned set I see:

  GW-21
  GW-22
  GW-22         +

  GW-24

   I want to find that row returning 'GW-22      +' because I believe it
should be 'GW-23'. However, my attempts to retrieve that row keep failing.
I've tried these statements:

select count(*) from chemistry where site_id = 'GW-22         +';
  count
-------
      0
(1 row)

yet,

select count(*) from chemistry where site_id = 'GW-22';
  count
-------
    803
(1 row)

   Looking for the blank row also fails when I try to specify site_id as is
null, = ' ', or =''.

   Please point me to the proper way of finding this rogue row so I can
correct the value in the site_id column.

TIA,

Rich

Re: Identifying Reason for Column Name Returned by SELECT

From
Steve Crawford
Date:
I suspect you have a multi-line entry and the '+' is just indicating
that the field continues.

Try ...where site_id ~ 'GW-22'... (this may take a while if the table is
very large).

Cheers,
Steve



On 09/14/2011 09:35 AM, Rich Shepard wrote:
>   I run this SELECT statement on a table:
>
> select distinct(site_id) from chemistry order by site_id;
>
> and in the returned set I see:
>
>  GW-21
>  GW-22
>  GW-22         +
>
>  GW-24
>
>   I want to find that row returning 'GW-22      +' because I believe it
> should be 'GW-23'. However, my attempts to retrieve that row keep
> failing.
> I've tried these statements:
>
> select count(*) from chemistry where site_id = 'GW-22         +';
>  count -------
>      0
> (1 row)
>
> yet,
>
> select count(*) from chemistry where site_id = 'GW-22';
>  count -------
>    803
> (1 row)
>
>   Looking for the blank row also fails when I try to specify site_id
> as is
> null, = ' ', or =''.
>
>   Please point me to the proper way of finding this rogue row so I can
> correct the value in the site_id column.
>
> TIA,
>
> Rich
>


Re: Identifying Reason for Column Name Returned by SELECT

From
Darren Duncan
Date:
Your example suggests that the "GW-22" is a substring of the field followed by
trailing spaces so you'll want something that searches substrings, whereas "="
will always just test on matching the entire field.

Try "like" by default, such as "where site_id like 'GW-22 %'".  I added the
space between the 22 and the wildcard % so that the field containing just
'GW-22' isn't also matched.

If you need something more specific than simple substring match, you can use a
regular expression, which I think is spelled like "where site_id ~ '...'" but
check the manual to be sure about regexp syntax.  But "like" will probably do
you here.

-- Darren Duncan

Rich Shepard wrote:
>   I run this SELECT statement on a table:
>
> select distinct(site_id) from chemistry order by site_id;
>
> and in the returned set I see:
>
>  GW-21
>  GW-22
>  GW-22         +
>
>  GW-24
>
>   I want to find that row returning 'GW-22      +' because I believe it
> should be 'GW-23'. However, my attempts to retrieve that row keep failing.
> I've tried these statements:
>
> select count(*) from chemistry where site_id = 'GW-22         +';
>  count -------
>      0
> (1 row)
>
> yet,
>
> select count(*) from chemistry where site_id = 'GW-22';
>  count -------
>    803
> (1 row)
>
>   Looking for the blank row also fails when I try to specify site_id as is
> null, = ' ', or =''.
>
>   Please point me to the proper way of finding this rogue row so I can
> correct the value in the site_id column.
>
> TIA,
>
> Rich
>


Re: Identifying Reason for Column Name Returned by SELECT

From
Richard Broersma
Date:
On Wed, Sep 14, 2011 at 9:42 AM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

> I suspect you have a multi-line entry and the '+' is just indicating that
> the field continues.
>
> Try ...where site_id ~ 'GW-22'... (this may take a while if the table is
> very large).

You might be able to get an index scan if you include a few more criteria:

AND site_id BETWEEN 'GW-22' AND 'GW-23'

Also WHERE site_id '^GW-22' may use and index scan also.



--
Regards,
Richard Broersma Jr.

Re: Identifying Reason for Column Name Returned by SELECT

From
Darren Duncan
Date:
Darren Duncan wrote:
> Try "like" by default, such as "where site_id like 'GW-22 %'".  I added
> the space between the 22 and the wildcard % so that the field containing
> just 'GW-22' isn't also matched.

Sorry, I should have said "where site_id like 'GW-22%' and site_id != 'GW-22'"
(no explicit space) as a better way to exclude 'GW-22' from the results. --
Darren Duncan

Re: Identifying Reason for Column Name Returned by SELECT

From
Rich Shepard
Date:
On Wed, 14 Sep 2011, Steve Crawford wrote:

> I suspect you have a multi-line entry and the '+' is just indicating that the
> field continues.

Steve, et al.:

   It's not multi-line, but malformed.

> Try ...where site_id ~ 'GW-22'... (this may take a while if the table is very
> large).

   This found the appropriate rows. Now, my question is DDL-related:

   What is the appropriate syntax to change 'GW-22  +' to GW-22? Can I use
'like' or '~' in an ALTER TABLE RENAME <column> ... statement?

Thanks guys,

Rich

Re: Identifying Reason for Column Name Returned by SELECT

From
Richard Broersma
Date:
On Wed, Sep 14, 2011 at 10:04 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:

>  This found the appropriate rows. Now, my question is DDL-related:
>
>  What is the appropriate syntax to change 'GW-22  +' to GW-22? Can I use
> 'like' or '~' in an ALTER TABLE RENAME <column> ... statement?

I'm confused.  Do you want to UPDATE the affected records to GW-22. Or
do you want to ALTER the table to add a column constraint to prevent
malformed site_id's in the future?


--
Regards,
Richard Broersma Jr.

Re: Identifying Reason for Column Name Returned by SELECT

From
Rich Shepard
Date:
On Wed, 14 Sep 2011, Richard Broersma wrote:

> I'm confused.

Richard,

   Apparently, I am also confused. Doing too many things simultaneoulsy.

> Do you want to UPDATE the affected records to GW-22. Or do you want to
> ALTER the table to add a column constraint to prevent malformed site_id's
> in the future?

   Update. That makes my question moot.

Thanks,

Rich

Re: Identifying Reason for Column Name Returned by SELECT

From
Merlin Moncure
Date:
On Wed, Sep 14, 2011 at 12:04 PM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
> On Wed, 14 Sep 2011, Steve Crawford wrote:
>
>> I suspect you have a multi-line entry and the '+' is just indicating that
>> the field continues.
>
> Steve, et al.:
>
>  It's not multi-line, but malformed.

It *is* mult-line.  psql uses a '+ to show line breaks:

postgres=# select E'a\nb';
 ?column?
----------
 a       +
 b
(1 row)

To fix your data, i'd consider using the replace() function to knock
out newlines:
postgres=# select replace(E'a\nb', E'\n', '');
 replace
---------
 ab
(1 row)

Also consider adjusting the app and/or the database to block them in the future.

merlin

Re: Identifying Reason for Column Name Returned by SELECT

From
Rich Shepard
Date:
On Wed, 14 Sep 2011, Merlin Moncure wrote:

> It *is* mult-line.  psql uses a '+ to show line breaks:

Merlin,

   Yep. I discovered this when I dumped the table as an ASCII text file and
saw the '\n' after the site_id string on some rows. I've no idea how it got
there.

Thanks,

Rich