Thread: BUG #6393: cluster sometime fail under heavy concurrent write load
The following bug has been logged on the website: Bug reference: 6393 Logged by: Maxim Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.0.6 Operating system: Linux Ubuntu Description:=20=20=20=20=20=20=20=20 I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not always but more then 50% chance) i'm getting the next error during cluster: db=3D# cluster public.enqueued_mail; ERROR: duplicate key value violates unique constraint "pg_toast_119685646_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119685590, 0) already exists. chunk_id different each time. No uncommon datatypes exists in the table. Currently I work on create reproducible test case (but it seems require 2-3 open write transaction on the table).
Excerpts from maxim.boguk's message of mar ene 10 23:00:59 -0300 2012: > The following bug has been logged on the website: >=20 > Bug reference: 6393 > Logged by: Maxim Boguk > Email address: maxim.boguk@gmail.com > PostgreSQL version: 9.0.6 > Operating system: Linux Ubuntu > Description:=20=20=20=20=20=20=20=20 >=20 > I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not > always but more then 50% chance) i'm getting the next error during cluste= r: >=20 > db=3D# cluster public.enqueued_mail; > ERROR: duplicate key value violates unique constraint > "pg_toast_119685646_index" > DETAIL: Key (chunk_id, chunk_seq)=3D(119685590, 0) already exists. >=20 > chunk_id different each time. >=20 > No uncommon datatypes exists in the table. >=20 > Currently I work on create reproducible test case (but it seems require 2= -3 > open write transaction on the table). I don't see how can this be done at all, given that cluster grabs an exclusive lock on the table in question. An better example illustrating what you're really doing would be useful. --=20 =C3=81lvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi. The test case: The first stage - preparation: CREATE TABLE test (id serial primary key, flag integer, value text); cluster "test_pkey" on test; insert into test select nextval('test_id_seq') as id, 0 as flag,(SELECT array(select random() from generate_series(1,1000)))::text as value from generate_series(1,1000) as gs(id); cluster test; The second stage - test: Console one: postgres@db10:~$ while true; do psql -d test_db -c "begin;update test set flag=3D(random()*1000)::integer where id IN (select (random()*1000)::integer from generate_series(1,30)); select pg_sleep(1); commit;"; done COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT Console two: postgres@db10:~$ while true; do psql -d test_db -c "cluster test;"; done CLUSTER ERROR: duplicate key value violates unique constraint "pg_toast_119812558_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119811577, 0) already exists. ERROR: duplicate key value violates unique constraint "pg_toast_119812564_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119811601, 0) already exists. ERROR: duplicate key value violates unique constraint "pg_toast_119812570_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119811494, 0) already exists. ERROR: duplicate key value violates unique constraint "pg_toast_119812576_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119811552, 0) already exists. ERROR: duplicate key value violates unique constraint "pg_toast_119812582_index" DETAIL: Key (chunk_id, chunk_seq)=3D(119811525, 0) already exists. PS: no such effects happens on 9.0.4 I think it can be related to TOAST related changes in 9.0.6 On Thu, Jan 12, 2012 at 2:24 AM, Alvaro Herrera <alvherre@commandprompt.com>wrote: > > Excerpts from maxim.boguk's message of mar ene 10 23:00:59 -0300 2012: > > The following bug has been logged on the website: > > > > Bug reference: 6393 > > Logged by: Maxim Boguk > > Email address: maxim.boguk@gmail.com > > PostgreSQL version: 9.0.6 > > Operating system: Linux Ubuntu > > Description: > > > > I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not > > always but more then 50% chance) i'm getting the next error during > cluster: > > > > db=3D# cluster public.enqueued_mail; > > ERROR: duplicate key value violates unique constraint > > "pg_toast_119685646_index" > > DETAIL: Key (chunk_id, chunk_seq)=3D(119685590, 0) already exists. > > > > chunk_id different each time. > > > > No uncommon datatypes exists in the table. > > > > Currently I work on create reproducible test case (but it seems require > 2-3 > > open write transaction on the table). > > I don't see how can this be done at all, given that cluster grabs an > exclusive lock on the table in question. An better example illustrating > what you're really doing would be useful. > > -- > =C3=81lvaro Herrera <alvherre@commandprompt.com> > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > --=20 Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.boguk@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk "If they can send one man to the moon... why can't they send them all?" =D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/ "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."
Maxim Boguk <maxim.boguk@gmail.com> writes: > PS: no such effects happens on 9.0.4 > I think it can be related to TOAST related changes in 9.0.6 Yes, this is a consequence of commit 44b6d53b467bfe848c34c7a8a174779bb2f43c39, which tried to preserve toast OIDs when doing a table rewrite. The problem is that CLUSTER needs to copy both the old and new versions of an updated row, since the old one is only "recently dead", and both of those versions are pointing at the same TOAST item because the update didn't affect the toasted column. So when we save those versions into the new table separately, we try to write the same TOAST item twice. We could fix this by having toast_save_datum, in the place where it decides to adopt an old toast OID for the value (line 1345 in HEAD), check to see if that OID already exists in the new table. If so, it could either just assume that the value matches, or we could add enough code to verify that there's a match. The latter seems like overkill, and yet I'm not 100% comfortable with just assuming a collision is OK. Comments? regards, tom lane
I wrote: > Yes, this is a consequence of commit > 44b6d53b467bfe848c34c7a8a174779bb2f43c39, which tried to preserve toast > OIDs when doing a table rewrite. The problem is that CLUSTER needs to > copy both the old and new versions of an updated row, since the old one > is only "recently dead", and both of those versions are pointing at the > same TOAST item because the update didn't affect the toasted column. > So when we save those versions into the new table separately, we try to > write the same TOAST item twice. BTW, it strikes me that the preceding implementation had its own bug in this area: in the same situation of having to copy multiple versions of a row, it would create independent copies of the referenced TOAST value. Not only is this space-wasteful, but the copies attached to the already-dead row versions would never be reclaimed, at least not short of another CLUSTER, because they would be inserted into the toast table as live not deleted rows. So preserving toast item OIDs and relying on them for duplicate detection actually will improve matters in a way unrelated to the original purpose of commit 44b6d53b467bfe848c34c7a8a174779bb2f43c39. regards, tom lane