Re: Non-deterministic 100% CPU hang on postgres 9.3 - Mailing list pgsql-general
From | Fenn Bailey |
---|---|
Subject | Re: Non-deterministic 100% CPU hang on postgres 9.3 |
Date | |
Msg-id | CAKnSMKmXrbHkSkFhbszf9FbG8MDvpFbbHXfPjjgZLaxxK23CDA@mail.gmail.com Whole thread Raw |
In response to | Re: Non-deterministic 100% CPU hang on postgres 9.3 (Fenn Bailey <fenn.bailey@gmail.com>) |
List | pgsql-general |
Hi all,
Is it possible for database statistics to change after a query has been run (independent of transactions)?
I was wondering if anyone had any further insight as to why the query plan would have changed so dramatically between queries, even when isolated by a transaction?
Is it possible for database statistics to change after a query has been run (independent of transactions)?
Thanks in advance,
Fenn.
On Wed, Apr 16, 2014 at 2:25 PM, Fenn Bailey <fenn.bailey@gmail.com> wrote:
Some more information on this:But does the plan change?
> I say "hang", where really I've given up after ~12 hours execution. The
> exact same query can then be terminated and run in <90 seconds, with none
> of the underlying data changing.As it turns out it does. Some further examination shows that the plan changes, at least partially due to the fact that I now see this query attempts to modify the data that it's querying (ie: if any updates have occurred, the subsequent query is not the same).That said, all queries were performed in a transaction, so when they're aborted, no modification should be made to the underlying data.This makes it even more odd that on a subsequent run of the same query (after the rollback) it will complete in ~140 seconds (vs the several hours it was taking on the first run).Is it possible that the initial run is changing the statistics available on the table, hence providing a different query plan (even on the same data)?Query plan on first/second run follows -First run (failure):---------------------------Update on ad_events e (cost=1426714795.64..1426749795.64 rows=1000000 width=237)-> Hash Join (cost=1426714795.64..1426749795.64 rows=1000000 width=237)Hash Cond: (x.ad_event_id = e.ad_event_id)-> Subquery Scan on x (cost=1426638856.33..1426651356.33 rows=1000000 width=144)-> Sort (cost=1426638856.33..1426641356.33 rows=1000000 width=108)Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"-> Nested Loop Left Join (cost=12046.09..1426539198.49 rows=1000000 width=108)Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id = u.ad_user_id) OR (s.ad_user_id = u.merged_id)))-> Hash Join (cost=12046.09..203878.09 rows=1000000 width=92)Hash Cond: (t.user_id = u.orig_distinct_id)-> Seq Scan on ad_events_mv t (cost=0.00..173082.00 rows=1000000 width=112)Filter: (NOT processed)-> Hash (cost=7932.15..7932.15 rows=329115 width=67)-> Seq Scan on ad_users u (cost=0.00..7932.15 rows=329115 width=67)-> Materialize (cost=0.00..5478.88 rows=63392 width=32)-> Seq Scan on ad_sessions s (cost=0.00..5161.92 rows=63392 width=32)-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)-> Seq Scan on ad_events e (cost=0.00..50938.58 rows=2000058 width=101)Second run (completed in ~148 seconds):------------------------------Update on ad_events e (cost=39730372.76..39765372.76 rows=1000000 width=237)-> Hash Join (cost=39730372.76..39765372.76 rows=1000000 width=237)Hash Cond: (x.ad_event_id = e.ad_event_id)-> Subquery Scan on x (cost=39654433.45..39666933.45 rows=1000000 width=144)-> Sort (cost=39654433.45..39656933.45 rows=1000000 width=108)Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"-> Nested Loop Left Join (cost=12054.20..39554775.61 rows=1000000 width=108)Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end))-> Hash Join (cost=12046.09..203878.09 rows=1000000 width=92)Hash Cond: (t.user_id = u.orig_distinct_id)-> Seq Scan on ad_events_mv t (cost=0.00..173082.00 rows=1000000 width=112)Filter: (NOT processed)-> Hash (cost=7932.15..7932.15 rows=329115 width=67)-> Seq Scan on ad_users u (cost=0.00..7932.15 rows=329115 width=67)-> Bitmap Heap Scan on ad_sessions s (cost=8.11..39.22 rows=8 width=32)Recheck Cond: ((ad_user_id = u.ad_user_id) OR (ad_user_id = u.merged_id))-> BitmapOr (cost=8.11..8.11 rows=8 width=0)-> Bitmap Index Scan on ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)Index Cond: (ad_user_id = u.ad_user_id)-> Bitmap Index Scan on ad_sessions_ad_user_id_idx (cost=0.00..4.05 rows=4 width=0)Index Cond: (ad_user_id = u.merged_id)-> Hash (cost=50938.58..50938.58 rows=2000058 width=101)-> Seq Scan on ad_events e (cost=0.00..50938.58 rows=2000058 width=101)---------------From a quick look, the plans diverge at:-> Seq Scan on ad_sessions s (cost=0.00..5161.92 rows=63392 width=32)vs-> Bitmap Heap Scan on ad_sessions s (cost=8.11..39.22 rows=8 width=32)I will continue to look further into this in an effort to "fix" the queries but it would be great if someone could give me insight as to what could be causing the inconsistency in plan.Cheers!
pgsql-general by date: