Thread: Strange inconsistency with UPDATE

Strange inconsistency with UPDATE

From

"Phoenix Kiula"

Date:

16 August 2007, 23:58:20

I am trying to force a column to have lowercase because Postgresql is
case-sensitive in queries. For the time being I've made an expression
index on lower(KEY). But I would like to have just lower case data and
then drop this expression index.

However, I see some inconsisent behavior from Postgresql. When I issue
an UPDATE command , it shows me a duplicate violation (which could be
correct) --

    -# update TABLE set ACOLUMN = lower(ACOLUMN);
    ERROR:  duplicate key violates unique constraint "TABLE_ACOLUMN_key"

So I try to find out the offending values of this ACOLUMN that become
duplicated when lower(ACOLUMN) is issued:

    -# SELECT lower(ACOLUMN), count(*)  FROM TABLE
         GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ;
        -------+-------
        lower | count
        -------+-------
        (0 rows)

But this doesn't make sense! If there are no columns that get
repeated, how can it violate the UNIQUE constraint?

I am not sure if the following helps, but I'm including the EXPLAIN on
this table. Penny for your thoughts!

-PK.


-# EXPLAIN SELECT lower(ACOLUMN), count(*)  FROM TABLE GROUP BY
lower(ACOLUMN) HAVING count(*) > 1 ;
QUERY PLAN
------------------------------------------------------------------------
GroupAggregate  (cost=1031470.35..1171326.48 rows=4661871 width=10)
 Filter: (count(*) > 1)
 -> Sort  (cost=1031470.35..1043125.03 rows=4661871 width=10)
     Sort Key: lower((ACOLUMN)::text)
     ->  Seq Scan on TABLE  (cost=0.00..228292.39 rows=4661871 width=10)
(5 rows)

Re: Strange inconsistency with UPDATE

From

Ron Johnson

Date:

17 August 2007, 00:23:08

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/16/07 21:58, Phoenix Kiula wrote:
> I am trying to force a column to have lowercase because Postgresql is
> case-sensitive in queries. For the time being I've made an expression
> index on lower(KEY). But I would like to have just lower case data and
> then drop this expression index.
>
> However, I see some inconsisent behavior from Postgresql. When I issue
> an UPDATE command , it shows me a duplicate violation (which could be
> correct) --
>
>     -# update TABLE set ACOLUMN = lower(ACOLUMN);
>     ERROR:  duplicate key violates unique constraint "TABLE_ACOLUMN_key"
>
> So I try to find out the offending values of this ACOLUMN that become
> duplicated when lower(ACOLUMN) is issued:
>
>     -# SELECT lower(ACOLUMN), count(*)  FROM TABLE
>          GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ;
>         -------+-------
>         lower | count
>         -------+-------
>         (0 rows)
>
> But this doesn't make sense! If there are no columns that get
> repeated, how can it violate the UNIQUE constraint?
>
> I am not sure if the following helps, but I'm including the EXPLAIN on
> this table. Penny for your thoughts!

Whatever the issue, you can bet your car that it's not a bug in
PostgreSQL, but you who is misunderstanding how PG works.

Write a script that loops thru the records one by one, updating only
one record per loop iteration.  That will find the problem record.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGxRSUS9HxQb37XmcRApMyAKCGOmpm5xKkfuWR19OnbXLVZMMbkwCcCHmu
4OOXMnRnaixpp8lSjbrA/5w=
=M3jw
-----END PGP SIGNATURE-----

Re: Strange inconsistency with UPDATE

From

Michael Glaesemann

Date:

17 August 2007, 00:27:27

On Aug 16, 2007, at 21:58 , Phoenix Kiula wrote:

> However, I see some inconsisent behavior from Postgresql. When I issue
> an UPDATE command , it shows me a duplicate violation (which could be
> correct) --
>
>     -# update TABLE set ACOLUMN = lower(ACOLUMN);
>     ERROR:  duplicate key violates unique constraint
> "TABLE_ACOLUMN_key"
>
> So I try to find out the offending values of this ACOLUMN that become
> duplicated when lower(ACOLUMN) is issued:
>
>     -# SELECT lower(ACOLUMN), count(*)  FROM TABLE
>          GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ;
>         -------+-------
>         lower | count
>         -------+-------
>         (0 rows)
>
> But this doesn't make sense! If there are no columns that get
> repeated, how can it violate the UNIQUE constraint?

I suspect you're not showing us the exact queries you're running. For
one, you can't have a table named TABLE (without quotes) in PostgreSQL.

# create table TABLE (ACOLUMN text not null unique);
ERROR:  syntax error at or near "TABLE"
LINE 1: create table TABLE (ACOLUMN text not null unique);

Perhaps something else you changed when changing the table name,
maybe to simplify the appearance of the query, affects the results
you're seeing. Things appear to work as expected on my end.

test=# select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5367)
(1 row)

test=# create table strings (a_string text primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"strings_pkey" for table "strings"
CREATE TABLE
test=# insert into strings (a_string) values ('string'), ('STRING'),
('String'), ('number');
INSERT 0 4
test=# select * from strings;
a_string
----------
string
STRING
String
number
(4 rows)

test=# select lower(a_string), count(*) from strings group by lower
(a_string);
lower  | count
--------+-------
string |     3
number |     1
(2 rows)

test=# select lower(a_string), count(*) from strings group by lower
(a_string) having count(*) > 1;;
lower  | count
--------+-------
string |     3
(1 row)

test=# update strings set a_string = lower(a_string);
ERROR:  duplicate key violates unique constraint "strings_pkey"

Another possibility is the setting of LC_COLLATE used during initdb,
but I would think that lower() would be self-consistent under all
collations. You might want to check if you're using a collation other
than C though.

test=# show lc_collate;
lc_collate
------------
C
(1 row)

Hope this gives additional information to help you debug this.

Michael Glaesemann
grzm seespotcode net

Re: Strange inconsistency with UPDATE

From

Tom Lane

Date:

17 August 2007, 00:30:23

"Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> However, I see some inconsisent behavior from Postgresql. When I issue
> an UPDATE command , it shows me a duplicate violation (which could be
> correct) --

>     -# update TABLE set ACOLUMN = lower(ACOLUMN);
>     ERROR:  duplicate key violates unique constraint "TABLE_ACOLUMN_key"

> So I try to find out the offending values of this ACOLUMN that become
> duplicated when lower(ACOLUMN) is issued:

>     -# SELECT lower(ACOLUMN), count(*)  FROM TABLE
>          GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ;
>         -------+-------
>         lower | count
>         -------+-------
>         (0 rows)

Yeah, that *is* pretty bizarre.

We have seen some cases where strcoll() yields inconsistent answers
(leading to arbitrarily silly behavior on Postgres' part) if it is
expecting a character set encoding different from what Postgres is
using.  What is your lc_collate setting, and are you sure it matches
the database encoding?

Another possibility is that there's something corrupt about the
TABLE_ACOLUMN_key index ... does reindexing it change the outcome?

            regards, tom lane

Re: Strange inconsistency with UPDATE

From

"Phoenix Kiula"

Date:

17 August 2007, 01:17:41

> I suspect you're not showing us the exact queries you're running. For
> one, you can't have a table named TABLE (without quotes) in PostgreSQL.


Of course. The data is a tad private, hence the simple table and
column names represented in uppercase.


> Perhaps something else you changed when changing the table name,
> maybe to simplify the appearance of the query, affects the results
> you're seeing. Things appear to work as expected on my end.


Except for the name of the table and the column, everything is exactly
the same as the queries I'm running. I just pasted those things into a
text editor and find-replaced mytable to "TABLE" and mycolumn to
"COLUMN". (Note: mytable and mycolumn are also aliases for posting to
a public forum).

The problem remains. The other tip -- to loop through each record and
update it -- is precisely what I was trying to escape with a simple
SQL query that seems fairly standard.

Re: Strange inconsistency with UPDATE

From

"Phoenix Kiula"

Date:

17 August 2007, 01:26:52

On 17/08/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Phoenix Kiula" <phoenix.kiula@gmail.com> writes:
> > However, I see some inconsisent behavior from Postgresql. When I issue
> > an UPDATE command , it shows me a duplicate violation (which could be
> > correct) --
>
> >     -# update TABLE set ACOLUMN = lower(ACOLUMN);
> >     ERROR:  duplicate key violates unique constraint "TABLE_ACOLUMN_key"
>
> > So I try to find out the offending values of this ACOLUMN that become
> > duplicated when lower(ACOLUMN) is issued:
>
> >     -# SELECT lower(ACOLUMN), count(*)  FROM TABLE
> >          GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ;
> >         -------+-------
> >         lower | count
> >         -------+-------
> >         (0 rows)
>
> Yeah, that *is* pretty bizarre.
>
> We have seen some cases where strcoll() yields inconsistent answers
> (leading to arbitrarily silly behavior on Postgres' part) if it is
> expecting a character set encoding different from what Postgres is
> using.  What is your lc_collate setting, and are you sure it matches
> the database encoding?


- lc_collate is "en_US.UTF-8"
- database encoding is "utf-8"


> Another possibility is that there's something corrupt about the
> TABLE_ACOLUMN_key index ... does reindexing it change the outcome?


Hmm, I can check. It's just a unique index. Should I drop it and recreate?

Last time I ran a massive "UPDATE mytable SET mycol = lower(mycol)"
query on about 6 millions records, the database seemed to be locked
for eternity. Nothing could insert into it, nor update something else.
Is this what is discussed on this list as "deadlock"? How can I avoid
this if I were to reindex and such?

Thanks!