Thread: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
nickr@mirth.com
Date:
The following bug has been logged on the website: Bug reference: 10189 Logged by: Nick Rupley Email address: nickr@mirth.com PostgreSQL version: 9.3.4 Operating system: CentOS Description: It would appear that when you order on a column such that the query plan scans across a composite, multi-type index (say, bigint and integer), and the plan runs through an Index Cond, doing all that AND a LIMIT clause no longer works. And by "no longer works" I mean the query does not return any results when it should. Assuming that the column being ordered on falls within the index composite, no SORT or SEQ SCAN is done, because the result set is already ordered by virtue of the BTREE. No sorting is needed. So it's just a simple Limit -> Index Scan, Index Cond, Filter. In 9.0.13 this works perfectly, but in 9.3.4 no rows are returned by the query (though a COUNT still works as expected). It also seems that the data distribution matters, because it only appears to happen when the expected result set is small (that is, less than a couple hundred records). Let me see if I can provide enough information to "prove" that it's a bug... Here's the table description: mirthdb=# \d d_mm100 Table "public.d_mm100" Column | Type | Modifiers ----------------+--------------------------+-------------------- id | integer | not null message_id | bigint | not null server_id | character varying(36) | not null received_date | timestamp with time zone | status | character(1) | not null connector_name | text | send_attempts | integer | not null default 0 send_date | timestamp with time zone | response_date | timestamp with time zone | error_code | integer | not null default 0 chain_id | integer | not null order_id | integer | not null Indexes: "d_mm100_pkey" PRIMARY KEY, btree (message_id, id) "d_mm100_fki" btree (message_id) "d_mm100_index1" btree (message_id, id, status) "d_mm100_index2" btree (message_id, server_id, id) Foreign-key constraints: "d_mm100_fkey" FOREIGN KEY (message_id) REFERENCES d_m100(id) ON DELETE CASCADE Referenced by: TABLE "d_mc100" CONSTRAINT "d_mc100_fkey" FOREIGN KEY (message_id, metadata_id) REFERENCES d_mm100(message_id, id) ON DELETE CASCADE TABLE "d_mcm100" CONSTRAINT "d_mcm100_fkey" FOREIGN KEY (message_id, metadata_id) REFERENCES d_mm100(message_id, id) ON DELETE CASCADE mirthdb=# select version(); version -------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.0.13 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit (1 row) Here's the explain/analyze and the actual queries being done in 9.0.13: # /opt/pgsql/bin/psql -U postgres -d mirthdb psql (9.0.13) Type "help" for help. mirthdb=# explain analyze select * from d_mm100 where ID = 0 AND status = 'R' AND server_id = '2f9d9557-e260-45ee-92a3-b3069422cce9' order by (message_id) limit 7 offset 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..54319.46 rows=1 width=100) (actual time=193.292..193.292 rows=0 loops=1) -> Index Scan using d_mm100_index1 on d_mm100 (cost=0.00..54319.46 rows=1 width=100) (actual time=193.281..193.281 rows=0 loops=1) Index Cond: ((id = 0) AND (status = 'R'::bpchar)) Filter: ((server_id)::text = '2f9d9557-e260-45ee-92a3-b3069422cce9'::text) Total runtime: 193.370 ms (5 rows) mirthdb=# select * from d_mm100 where ID = 0 AND status = 'R' AND server_id = '2f9d9557-e260-45ee-92a3-b3069422cce9' order by (message_id) limit 7 offset 0; id | message_id | server_id | received_date | status | connector_name | send_attempts | send_date | response_date | error_code | chain_id | o rder_id ----+------------+--------------------------------------+---------------------------+--------+----------------+---------------+-----------+---------------+------------+----------+-- -------- 0 | 2439985 | 2f9d9557-e260-45ee-92a3-b3069422cce9 | 2014-04-30 15:00:11.13-06 | R | Source | 0 | | | 0 | 0 | 0 (1 row) mirthdb=# select count(*) from d_mm100 where ID = 0 AND status = 'R' AND server_id = '2f9d9557-e260-45ee-92a3-b3069422cce9'; count ------- 1 (1 row) Now in 9.3.4, here are the same explain/analyze and queries: # /opt/pgsql/bin/psql -U postgres -d mirthdb psql (9.3.4) Type "help" for help. mirthdb=# explain analyze select * from d_mm65 where ID = 0 AND status = 'R' AND server_id = '4e2ae77d-480f-4df0-af32-7d140982813a' order by (message_id) limit 7 offset 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..504.59 rows=7 width=115) (actual time=273.808..273.808 rows=0 loops=1) -> Index Scan using d_mm65_index1 on d_mm65 (cost=0.42..38100.94 rows=529 width=115) (actual time=273.801..273.801 rows=0 loops=1) Index Cond: ((id = 0) AND (status = 'R'::bpchar)) Filter: ((server_id)::text = '4e2ae77d-480f-4df0-af32-7d140982813a'::text) Total runtime: 273.939 ms (5 rows) mirthdb=# select * from d_mm65 where ID = 0 AND status = 'R' AND server_id = '4e2ae77d-480f-4df0-af32-7d140982813a' order by (message_id) limit 7 offset 0; id | message_id | server_id | received_date | status | connector_name | send_attempts | send_date | response_date | error_code | chain_id | order_id ----+------------+-----------+---------------+--------+----------------+---------------+-----------+---------------+------------+----------+---------- (0 rows) mirthdb=# select count(*) from d_mm65 where ID = 0 AND status = 'R' AND server_id = '4e2ae77d-480f-4df0-af32-7d140982813a'; count ------- 8 (1 row) And just to confirm, here's the table description, which is exactly the same: mirthdb=# \d d_mm65 Table "public.d_mm65" Column | Type | Modifiers ----------------+--------------------------+-------------------- id | integer | not null message_id | bigint | not null server_id | character varying(36) | not null received_date | timestamp with time zone | status | character(1) | not null connector_name | text | send_attempts | integer | not null default 0 send_date | timestamp with time zone | response_date | timestamp with time zone | error_code | integer | not null default 0 chain_id | integer | not null order_id | integer | not null Indexes: "d_mm65_pkey" PRIMARY KEY, btree (message_id, id) "d_mm65_fki" btree (message_id) "d_mm65_index1" btree (message_id, id, status) "d_mm65_index2" btree (message_id, server_id, id) Foreign-key constraints: "d_mm65_fkey" FOREIGN KEY (message_id) REFERENCES d_m65(id) ON DELETE CASCADE Referenced by: TABLE "d_mc65" CONSTRAINT "d_mc65_fkey" FOREIGN KEY (message_id, metadata_id) REFERENCES d_mm65(message_id, id) ON DELETE CASCADE TABLE "d_mcm65" CONSTRAINT "d_mcm65_fkey" FOREIGN KEY (message_id, metadata_id) REFERENCES d_mm65(message_id, id) ON DELETE CASCADE mirthdb=# select version(); version --------------------------------------------------------------------------------------------------------------- PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 64-bit (1 row) At one point I was even able to isolate the LIMIT clause to a "tipping point". For example this returned 0 records: select * from d_mm65 where id = 0 and d_mm65.status = 'R' and server_id = '4e2ae77d-480f-4df0-af32-7d140982813a' order by message_id limit 519 offset 0 However this returned the 8 rows as expected: select * from d_mm65 where id = 0 and d_mm65.status = 'R' and server_id = '4e2ae77d-480f-4df0-af32-7d140982813a' order by message_id limit 520 offset 0 The query plan goes from this: Limit (cost=0.42..37386.88 rows=519 width=115) -> Index Scan using d_mm65_index1 on d_mm65 (cost=0.42..38107.23 rows=529 width=115) Index Cond: ((id = 0) AND (status = 'R'::bpchar)) Filter: ((server_id)::text = '4e2ae77d-480f-4df0-af32-7d140982813a'::text) To this: Limit (cost=37417.10..37418.40 rows=520 width=115) -> Sort (cost=37417.10..37418.42 rows=529 width=115) Sort Key: message_id -> Seq Scan on d_mm65 (cost=0.00..37393.17 rows=529 width=115) Filter: ((id = 0) AND (status = 'R'::bpchar) AND ((server_id)::text = '4e2ae77d-480f-4df0-af32-7d140982813a'::text))
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
nickr@mirth.com writes: > It would appear that when you order on a column such that the query plan > scans across a composite, multi-type index (say, bigint and integer), and > the plan runs through an Index Cond, doing all that AND a LIMIT clause no > longer works. And by "no longer works" I mean the query does not return any > results when it should. I failed to reproduce any such problem in a little bit of trying. Can you create a self-contained test case? The symptoms seem consistent with the theory that that index is corrupt ... have you tried REINDEXing it? regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
Nick Rupley <nickr@mirthcorp.com> writes: > Thanks Tom. We attempted to reindex: > mirthdb=# reindex index d_mm65_index1; > ERROR: failed to find parent tuple for heap-only tuple at (14808,2) in > table "d_mm65" Ugh :-(. That matches the symptoms of one of the known data-corruption bugs in 9.3.x, specifically the business about mis-replay of the WAL entry for locking a tuple. (I suppose this database has either suffered some crashes, or been promoted from a replication slave?) The bug is fixed as of 9.3.4, but the fix doesn't magically cure existing corruption :-( You can look at the specific row involved here with select * from d_mm65 where ctid = '(14808,2)'; Probably the best recovery strategy is to DELETE that row by ctid, then reinsert the same data. Lather, rinse, repeat until you can reindex successfully. Better check your other tables too. regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Nick Rupley
Date:
Thanks Tom. We attempted to reindex: mirthdb=# reindex index d_mm65_index1; ERROR: failed to find parent tuple for heap-only tuple at (14808,2) in table "d_mm65" Then attempted to drop and recreate: mirthdb=> create index d_mm65_index1 on public.d_mm65(message_id, id, status); ERROR: failed to find parent tuple for heap-only tuple at (14808,2) in table "d_mm65" Possibly related to: http://www.postgresql.org/message-id/5279FCC9-03B8-4BBA-8FF0-5B00B3F1332C@autouncle.com http://dba.stackexchange.com/questions/62819/what-does-error-failed-to-find-parent-tuple-for-heap-only-tuple-at-1192248-5 On Wed, Apr 30, 2014 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > nickr@mirth.com writes: > > It would appear that when you order on a column such that the query plan > > scans across a composite, multi-type index (say, bigint and integer), and > > the plan runs through an Index Cond, doing all that AND a LIMIT clause no > > longer works. And by "no longer works" I mean the query does not return > any > > results when it should. > > I failed to reproduce any such problem in a little bit of trying. Can you > create a self-contained test case? > > The symptoms seem consistent with the theory that that index is corrupt > ... have you tried REINDEXing it? > > regards, tom lane > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Nick Rupley
Date:
Thanks again. So as far as recovery goes, that appears to be working. We actually wrote a PL/pgSQL script to drop constraints/indices on the table, create a copy of the table, and insert from the old table into the copy. Then we could programmatically delete all duplicates, rename the copy back to the original ("_old" "_new" pattern), and then add the indices/constraints back. Works like a charm. You were right, this affected other tables as well. We actually did suffer from some crashes fairly recently. At first since we thought it was a general corruption issue, we expected to find such problems scattered across all our tables. However it was confined to a very specific set of tables (the d_mm*, having the td I posted earlier). Not only that, but when we went to find duplicates after copying the table, we found that all of them showed the same pattern: message_id | id | received_date | status | send_attempts | response_date | ------------+----+-------------------------------+--------+---------------+----------------------------+ 268752 | 0 | 2014-04-17 15:01:12.786786000 | T | 0 | | 268752 | 0 | 2014-04-17 15:01:12.786786000 | R | 1 | 2014-04-17 15:01:12.795+00 | As a reminder, the primary key on that table is (message_id, id). Just to give a little background, that record with the status "R" is inserted first, and there's no send attempts or response date yet. Then asynchronously / concurrently, two things can happen: - The row is updated, setting the send attempts and response date - The row is updated, setting the status to T, and at the same time a FK referencing row is inserted in the associated d_mc table. We also tracked down every single duplicate entry and noted the time it got inserted. What we found is that this issue only ever started happening after we upgraded from 9.0.13 to 9.3.4. The upgrade itself doesn't seem to be the culprit, but messages received *after* the upgrade encountered this issue. Looking at these articles: http://www.postgresql.org/about/news/1511/ http://www.postgresql.org/docs/9.3/static/release-9-3-4.html Those describe pretty much *exactly* what we've been seeing. To recap: - Our server recovered from a system crash recently, a few of them actually - We were getting inconsistent query results (no rows returned versus rows returned), and the difference in results matched the query plans, since one used an index scan, and the other used a sequential scan with a ROW/SORT instead. The query plans I posted earlier show this. - Subsequent processing did indeed result in constraint violations, since conflicting rows were inserted. In our case, the message_id and id should comprise the primary key, but we were actually seeing duplicate rows with that same pair, but different statuses (R vs. T). - The row in the d_mm table (status R, id=0) is referenced by a FK in the d_mc table, and it can indeed be updated concurrently with the creation of the referencing row in d_mc. Now from the news post it sounds like this issue should have been fixed in 9.3.4. We were on 9.0.13 with the same table structure, same mid to high-volume inserts/updates, and have never once encountered this, until we upgraded to 9.3.4. The update press release is a bit confusing, it says "The PostgreSQL Global Development Group has released an update to all supported version of the database system, including versions 9.3.4, 9.2.8, 9.1.13, 9.0.17, and 8.4.21." So one may interpret that as either "9.3.4 is the update" or "this is an update TO 9.3.4". I figure it's the former, but correct me if I'm wrong. Thanks in advance for any input, we're trying to make a self-contained test case in the meantime... -Nick On Wed, Apr 30, 2014 at 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nick Rupley <nickr@mirthcorp.com> writes: > > Thanks Tom. We attempted to reindex: > > mirthdb=# reindex index d_mm65_index1; > > > ERROR: failed to find parent tuple for heap-only tuple at (14808,2) in > > table "d_mm65" > > Ugh :-(. That matches the symptoms of one of the known data-corruption > bugs in 9.3.x, specifically the business about mis-replay of the WAL entry > for locking a tuple. (I suppose this database has either suffered some > crashes, or been promoted from a replication slave?) The bug is fixed as > of 9.3.4, but the fix doesn't magically cure existing corruption :-( > > You can look at the specific row involved here with > > select * from d_mm65 where ctid = '(14808,2)'; > > Probably the best recovery strategy is to DELETE that row by ctid, > then reinsert the same data. Lather, rinse, repeat until you can reindex > successfully. Better check your other tables too. > > regards, tom lane > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
Nick Rupley <nickr@mirthcorp.com> writes: > We also tracked down every single duplicate entry and noted the time it got > inserted. What we found is that this issue only ever started happening > after we upgraded from 9.0.13 to 9.3.4. The upgrade itself doesn't seem to > be the culprit, but messages received *after* the upgrade encountered this > issue. Oh dear. You were never on any earlier 9.3.x release? If not, then there's still some unfixed bug in this area in 9.3.4. There's been one post-9.3.4 fix in this same general area: http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8 but according to the commit message, at least, that bug would not have led to the symptom you're seeing, namely rows disappearing from indexes while they're still visible to seqscans. > Thanks in advance for any input, we're trying to make a self-contained test > case in the meantime... A test case would definitely help. regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Andres Freund
Date:
On 2014-05-02 14:23:50 -0400, Tom Lane wrote: > Nick Rupley <nickr@mirthcorp.com> writes: > > We also tracked down every single duplicate entry and noted the time it got > > inserted. What we found is that this issue only ever started happening > > after we upgraded from 9.0.13 to 9.3.4. The upgrade itself doesn't seem to > > be the culprit, but messages received *after* the upgrade encountered this > > issue. > > Oh dear. You were never on any earlier 9.3.x release? If not, then > there's still some unfixed bug in this area in 9.3.4. > > There's been one post-9.3.4 fix in this same general area: > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8 > but according to the commit message, at least, that bug would not have led > to the symptom you're seeing, namely rows disappearing from indexes while > they're still visible to seqscans. Hm. With a bit of bad luck it might. The bug essentially has the consequence that two updates might succeed for the same row. Consider what happens if the row gets hot updated and then a second hot update, due to the bug, also succeeds. The second update will change t_ctid of the old tuple to point to the second version. If the transaction that did the second update then aborts a index search starting at the root of the hot change won't find any surviving tuple. But a seqscan will. :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-02 14:23:50 -0400, Tom Lane wrote: >> There's been one post-9.3.4 fix in this same general area: >> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8 >> but according to the commit message, at least, that bug would not have led >> to the symptom you're seeing, namely rows disappearing from indexes while >> they're still visible to seqscans. > Hm. With a bit of bad luck it might. The bug essentially has the > consequence that two updates might succeed for the same row. Consider > what happens if the row gets hot updated and then a second hot update, > due to the bug, also succeeds. The second update will change t_ctid of > the old tuple to point to the second version. If the transaction that > did the second update then aborts a index search starting at the root of > the hot change won't find any surviving tuple. But a seqscan will. :(. Hm, good point. Nick, if you're up for applying a hotfix you could try grabbing the aforesaid patch and seeing if it makes things better. If not, we're probably gonna need that test case to figure out where things are still going wrong. regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Alvaro Herrera
Date:
Nick Rupley wrote: > Hey guys, so we applied that patch, and it *appears* to have fixed the > issue! Through our application, we basically have it to the point where we > are able to reliably reproduce the issue within 5 minutes or so. However we > applied the patch, ran the same tests, and it no longer happened at all, > even after an hour of testing. > > We attempted to reproduce the issue in a standalone way, doing all the same > inserts/updates in all the same transactions, but unfortunately we haven't > yet been able to reproduce it there. I'm thinking it's likely a very > timing-sensitive issue, and it just happens to manifest for our application > because of race conditions, etc. Yes, it's extremely timing-sensitive. > Not sure if this is relevant or not, but it looks like the duplicate rows > continue to be inserted here and there on our production box (to which we > haven't yet applied the hotfix). As I stated before that production box did > have some server crashes before, but actually it hasn't had any recently > (in the past week), and yet the duplicate rows continue to happen. This bug is not dependent on a crash; the corruption occurs to the live data. Only the previous bug mentioned by Tom manifested itself during crash recovery. > At one point we did identify and reindex the tables that were needed, > which worked great. But then *after* that, new duplicate rows cropped > up, even without the server having crashed. Does that still make sense > within the context of this bug? Yes. Upgrading to a fixed binary is, of course, strongly recommended. > If we're able to create that self-contained test case (we're trying) we'll > be sure to let you know. Be sure to let us know if you find other bugs, too! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Nick Rupley wrote: >> Not sure if this is relevant or not, but it looks like the duplicate rows >> continue to be inserted here and there on our production box (to which we >> haven't yet applied the hotfix). As I stated before that production box did >> have some server crashes before, but actually it hasn't had any recently >> (in the past week), and yet the duplicate rows continue to happen. > This bug is not dependent on a crash; the corruption occurs to the live > data. Only the previous bug mentioned by Tom manifested itself during > crash recovery. Hm. If people are hitting this in the field, do we need to think about yet another fairly-high-priority update release? regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Alvaro Herrera
Date:
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Nick Rupley wrote: > >> Not sure if this is relevant or not, but it looks like the duplicate rows > >> continue to be inserted here and there on our production box (to which we > >> haven't yet applied the hotfix). As I stated before that production box did > >> have some server crashes before, but actually it hasn't had any recently > >> (in the past week), and yet the duplicate rows continue to happen. > > > This bug is not dependent on a crash; the corruption occurs to the live > > data. Only the previous bug mentioned by Tom manifested itself during > > crash recovery. > > Hm. If people are hitting this in the field, do we need to think about > yet another fairly-high-priority update release? It appears so :-( -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Nick Rupley
Date:
Hey guys, so we applied that patch, and it *appears* to have fixed the issue! Through our application, we basically have it to the point where we are able to reliably reproduce the issue within 5 minutes or so. However we applied the patch, ran the same tests, and it no longer happened at all, even after an hour of testing. We attempted to reproduce the issue in a standalone way, doing all the same inserts/updates in all the same transactions, but unfortunately we haven't yet been able to reproduce it there. I'm thinking it's likely a very timing-sensitive issue, and it just happens to manifest for our application because of race conditions, etc. Not sure if this is relevant or not, but it looks like the duplicate rows continue to be inserted here and there on our production box (to which we haven't yet applied the hotfix). As I stated before that production box did have some server crashes before, but actually it hasn't had any recently (in the past week), and yet the duplicate rows continue to happen. At one point we did identify and reindex the tables that were needed, which worked great. But then *after* that, new duplicate rows cropped up, even without the server having crashed. Does that still make sense within the context of this bug? If we're able to create that self-contained test case (we're trying) we'll be sure to let you know. -Nick On Fri, May 2, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-05-02 14:23:50 -0400, Tom Lane wrote: > >> There's been one post-9.3.4 fix in this same general area: > >> > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8 > >> but according to the commit message, at least, that bug would not have > led > >> to the symptom you're seeing, namely rows disappearing from indexes > while > >> they're still visible to seqscans. > > > Hm. With a bit of bad luck it might. The bug essentially has the > > consequence that two updates might succeed for the same row. Consider > > what happens if the row gets hot updated and then a second hot update, > > due to the bug, also succeeds. The second update will change t_ctid of > > the old tuple to point to the second version. If the transaction that > > did the second update then aborts a index search starting at the root of > > the hot change won't find any surviving tuple. But a seqscan will. :(. > > Hm, good point. Nick, if you're up for applying a hotfix you could try > grabbing the aforesaid patch and seeing if it makes things better. > If not, we're probably gonna need that test case to figure out where > things are still going wrong. > > regards, tom lane > -- CONFIDENTIALITY NOTICE: The information contained in this electronic transmission may be confidential. If you are not an intended recipient, be aware that any disclosure, copying, distribution or use of the information contained in this transmission is prohibited and may be unlawful. If you have received this transmission in error, please notify us by email reply and then erase it from your computer system.
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Nick Rupley
Date:
Thanks for the input Alvaro. Actually we do have another question as well. Are there any implications we should be aware of, if we decide to take 9.3.4 and the aforementioned patch, and use the patched version of 9.3.4 on all future production boxes? To be honest the commit log of that bug is mostly over my head, so I'm not sure if the commit itself is dependent on any other post-9.3.4 commits. On Mon, May 5, 2014 at 11:24 AM, Alvaro Herrera <alvherre@2ndquadrant.com>w= rote: > Nick Rupley wrote: > > Hey guys, so we applied that patch, and it *appears* to have fixed the > > issue! Through our application, we basically have it to the point where > we > > are able to reliably reproduce the issue within 5 minutes or so. Howeve= r > we > > applied the patch, ran the same tests, and it no longer happened at all= , > > even after an hour of testing. > > > > We attempted to reproduce the issue in a standalone way, doing all the > same > > inserts/updates in all the same transactions, but unfortunately we > haven't > > yet been able to reproduce it there. I'm thinking it's likely a very > > timing-sensitive issue, and it just happens to manifest for our > application > > because of race conditions, etc. > > Yes, it's extremely timing-sensitive. > > > Not sure if this is relevant or not, but it looks like the duplicate ro= ws > > continue to be inserted here and there on our production box (to which = we > > haven't yet applied the hotfix). As I stated before that production box > did > > have some server crashes before, but actually it hasn't had any recentl= y > > (in the past week), and yet the duplicate rows continue to happen. > > This bug is not dependent on a crash; the corruption occurs to the live > data. Only the previous bug mentioned by Tom manifested itself during > crash recovery. > > > At one point we did identify and reindex the tables that were needed, > > which worked great. But then *after* that, new duplicate rows cropped > > up, even without the server having crashed. Does that still make sense > > within the context of this bug? > > Yes. Upgrading to a fixed binary is, of course, strongly recommended. > > > If we're able to create that self-contained test case (we're trying) > we'll > > be sure to let you know. > > Be sure to let us know if you find other bugs, too! > > -- > =C3=81lvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > --=20 CONFIDENTIALITY NOTICE: The information contained in this electronic=20 transmission may be confidential. If you are not an intended recipient, be= =20 aware that any disclosure, copying, distribution or use of the information= =20 contained in this transmission is prohibited and may be unlawful. If you=20 have received this transmission in error, please notify us by email reply= =20 and then erase it from your computer system.
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
wayneh
Date:
Hi, I'm a coworker of Nick's and I was finally able to create a standalone test case. I've attached a Java file Test.java <http://postgresql.1045698.n5.nabble.com/file/n5802595/Test.java> . All you have to do is change the first 3 lines of the main method to set your url, username, and password, and make sure to include a JDBC jar. This application will do the following 1. Create tables T1 and T2. 2. Begin populating rows into T1 and T2 and updating them in a "specific" order 3. Every 1000 messages, a reindex will be attempted on T1's primary key. This will fail eventually. I've attached a screenshot of what you'll likely see as the console output. <http://postgresql.1045698.n5.nabble.com/file/n5802595/error.png> Here are some additional notes I have on the matter 1. I was able to reproduce the problem on both Windows and Linux with Postgres 9.3.4. 2. I was able to reproduce the problem with the latest 9.3 JDBC jars (although we are still using 9.1 for our software) 3. After applying the patch that you mentioned, I can no longer reproduce the problem at all. It seems very likely that http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8 <http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c0bd128c8> is in fact the problem that we were seeing, but I'm attaching this test case anyway just in case you can infer anything else from it. How safe would it be if we took the 9.3.4 source code and just applied the patch above. Like Nick said, we aren't aware of exactly what this patch does and whether there are any commit dependencies it relies on. Thanks for all your replies, Wayne Huang -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-10189-Limit-in-9-3-4-no-longer-works-when-ordering-using-a-composite-multi-type-index-tp5802079p5802595.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Re: Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
wayneh <wayneh@mirthcorp.com> writes: > How safe would it be if we took the 9.3.4 source code and just applied the > patch above. Like Nick said, we aren't aware of exactly what this patch does > and whether there are any commit dependencies it relies on. It should be pretty safe -- the patch looks self-contained to me, and there are no other significant changes in those files since 9.3.4. Alvaro might have a different opinion though. If you're concerned about running a build that doesn't correspond to anything that's been tested anywhere, then an alternative would be to grab the REL9_3_STABLE branch tip from our git repository, or perhaps the state of the tree as it stood at commit c0bd128c8. This would just be pulling in some other patches that will also be in 9.3.5. regards, tom lane
Re: Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Alvaro Herrera
Date:
Tom Lane wrote: > wayneh <wayneh@mirthcorp.com> writes: > > How safe would it be if we took the 9.3.4 source code and just applied the > > patch above. Like Nick said, we aren't aware of exactly what this patch does > > and whether there are any commit dependencies it relies on. > > It should be pretty safe -- the patch looks self-contained to me, and > there are no other significant changes in those files since 9.3.4. > Alvaro might have a different opinion though. No, I agree with that assessment. > If you're concerned about running a build that doesn't correspond to > anything that's been tested anywhere, then an alternative would be > to grab the REL9_3_STABLE branch tip from our git repository, or > perhaps the state of the tree as it stood at commit c0bd128c8. > This would just be pulling in some other patches that will also be in > 9.3.5. There's also the nightly snapshots, ftp://ftp.postgresql.org/pub/snapshot/9.3 which are pretty much equivalent to the release tarballs in that they don't require you to have bison, flex and other development tools. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Devrim GÜNDÜZ
Date:
Hi, On Mon, 2014-05-05 at 15:56 -0400, Alvaro Herrera wrote: > > Hm. If people are hitting this in the field, do we need to think > about > > yet another fairly-high-priority update release? > > It appears so :-( Apologies if I missed an email in -committers: Is this fixed in git? If so, should we plan a release around PGCon? Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Andres Freund
Date:
On 2014-05-06 14:56:46 +0300, Devrim G=DCND=DCZ wrote: >=20 > Hi, >=20 > On Mon, 2014-05-05 at 15:56 -0400, Alvaro Herrera wrote: > > > Hm. If people are hitting this in the field, do we need to think > > about > > > yet another fairly-high-priority update release? > >=20 > > It appears so :-( >=20 > Apologies if I missed an email in -committers: Is this fixed in git? Yes: http://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommit;h=3D= 1a917ae8610d44985fd2027da0cfe60ccece9104 Greetings, Andres Freund --=20 Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Devrim GÜNDÜZ
Date:
Hi, On Tue, 2014-05-06 at 14:00 +0200, Andres Freund wrote: > > > > Apologies if I missed an email in -committers: Is this fixed in git? > > Yes: > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1a917ae8610d44985fd2027da0cfe60ccece9104 Great, thanks! Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
narupley__
Date:
Alvaro Herrera-9 wrote > There's also the nightly > snapshots,ftp://ftp.postgresql.org/pub/snapshot/9.3which are pretty much > equivalent to the release tarballs in that theydon't require you to have > bison, flex and other development tools. We do always build from source anyway, so the nightly snapshot should work out just fine. Thanks again for all the help guys! -- View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-10189-Limit-in-9-3-4-no-longer-works-when-ordering-using-a-composite-multi-type-index-tp5802079p5802704.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Re: Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
"Burgess, Freddie"
Date:
We will likely run into this bug at our shop, any idea when the official pa= tch for this condition will be released? Thanks Freddie ________________________________ From: pgsql-bugs-owner@postgresql.org [pgsql-bugs-owner@postgresql.org] on = behalf of narupley__ [nickr@mirthcorp.com] Sent: Tuesday, May 06, 2014 10:41 AM To: pgsql-bugs@postgresql.org Subject: [BUGS] Re: BUG #10189: Limit in 9.3.4 no longer works when orderin= g using a composite multi-type index Alvaro Herrera-9 wrote There's also the nightly snapshots, ftp://ftp.postgresql.org/pub/snapshot/9= .3 which are pretty much equivalent to the release tarballs in that they do= n't require you to have bison, flex and other development tools. We do always build from source anyway, so the nightly snapshot should work = out just fine. Thanks again for all the help guys! ________________________________ View this message in context: Re: BUG #10189: Limit in 9.3.4 no longer work= s when ordering using a composite multi-type index<http://postgresql.104569= 8.n5.nabble.com/BUG-10189-Limit-in-9-3-4-no-longer-works-when-ordering-usin= g-a-composite-multi-type-index-tp5802079p5802704.html> Sent from the PostgreSQL - bugs mailing list archive<http://postgresql.1045= 698.n5.nabble.com/PostgreSQL-bugs-f2117394.html> at Nabble.com.
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Andres Freund
Date:
Hi, On 2014-05-05 14:56:27 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Nick Rupley wrote: > >> Not sure if this is relevant or not, but it looks like the duplicate rows > >> continue to be inserted here and there on our production box (to which we > >> haven't yet applied the hotfix). As I stated before that production box did > >> have some server crashes before, but actually it hasn't had any recently > >> (in the past week), and yet the duplicate rows continue to happen. > > > This bug is not dependent on a crash; the corruption occurs to the live > > data. Only the previous bug mentioned by Tom manifested itself during > > crash recovery. > > Hm. If people are hitting this in the field, do we need to think about > yet another fairly-high-priority update release? So, how about 9.3.5? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-05 14:56:27 -0400, Tom Lane wrote: >> Hm. If people are hitting this in the field, do we need to think about >> yet another fairly-high-priority update release? > So, how about 9.3.5? Nothing's happening till after PGCon, for sure. regards, tom lane
Re: BUG #10189: Limit in 9.3.4 no longer works when ordering using a composite multi-type index
From
Andres Freund
Date:
On 2014-05-20 09:12:04 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-05-05 14:56:27 -0400, Tom Lane wrote: > >> Hm. If people are hitting this in the field, do we need to think about > >> yet another fairly-high-priority update release? > > > So, how about 9.3.5? > > Nothing's happening till after PGCon, for sure. Heh, sure. But I think we shouldn't wait too long after that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services