Thread: Non-deterministic 100% CPU hang on postgres 9.3

Non-deterministic 100% CPU hang on postgres 9.3

From
Fenn Bailey
Date:
Hi all,

I'm experiencing an issue where certain queries appear to non-deterministically "hang", with a CPU pinned at 100%.

I say "hang", where really I've given up after ~12 hours execution. The exact same query can then be terminated and run in <90 seconds, with none of the underlying data changing.

I can completely reset the DB (drop tables/recreate) and re-run and sometimes certain queries will appear to hang forever, sometimes they will execute in minutes.

I've tried official debian (amd64) packages of postgres 9.3.2 - 9.3.4 (same issues).

The workload is ostensibly analytics - As a disclaimer I absolutely understand that these queries may not be non-optimized (I didn't build them/have not reviewed fully). The part I'm trying to work out is why they sometimes finish easily in minutes and other times hang.

I've straced the process when it's pinned and there appears to be no system calls (ie: no IO) just a pure CPU loop.

If anyone could explain this behaviour, or even how I might go about diagnosing, that would be wonderful.

Query/analyze details follow.

Thanks!

Query:
----
UPDATE ad_events e
  set ad_user_id = x.ad_user_id, ad_session_id = x.ad_session_id
FROM 
  (SELECT t.ad_event_id, COALESCE (s.ad_user_id, u.merged_id,u.ad_user_id) ad_user_id,
     case when (name = 'Clickthrough from Email' or properties->('mp_lib') = 'web' ) then s.ad_session_id 
          else null 
       end as ad_session_id, 
     t.timestamp, name , properties 
  from ad_events_mv t 
  INNER JOIN ad_users u ON (t.user_id = u.orig_distinct_id)
  LEFT OUTER JOIN  ad_sessions s ON (t.timestamp between s.session_start and s.session_end  and (s.ad_user_id = u.ad_user_id or s.ad_user_id = u.merged_id))
  where t.processed = false
  order by s.ad_user_id, s.ad_session_id, timestamp) x
WHERE
  x.ad_event_id = e.ad_event_id;
----

EXPLAIN ANALYZE follows:                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on ad_events e  (cost=39730372.92..39765372.92 rows=1000000 width=237) (actual time=130140.450..130140.450 rows=0 loops=1)
   ->  Hash Join  (cost=39730372.92..39765372.92 rows=1000000 width=237) (actual time=54243.877..66848.448 rows=2000000 loops=1)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=39654433.45..39666933.45 rows=1000000 width=144) (actual time=52682.740..57668.998 rows=2000000 loops=1)
               ->  Sort  (cost=39654433.45..39656933.45 rows=1000000 width=108) (actual time=52682.693..55003.467 rows=2000000 loops=1)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     Sort Method: external merge  Disk: 1078104kB
                     ->  Nested Loop Left Join  (cost=12054.20..39554775.61 rows=1000000 width=108) (actual time=204.872..43742.396 rows=2000000 loops=1)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end))
                           Rows Removed by Join Filter: 18792613
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92) (actual time=204.402..4293.175 rows=2000000 loops=1)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112) (actual time=0.022..1813.272 rows=2000000 loops=1)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67) (actual time=204.166..204.166 rows=329115 loops=1)
                                       Buckets: 65536  Batches: 1  Memory Usage: 30590kB
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67) (actual time=0.009..85.231 rows=329115 loops=1)
                           ->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32) (actual time=0.007..0.015 rows=10 loops=2000000)
                                 Recheck Cond: ((ad_user_id = u.ad_user_id) OR (ad_user_id = u.merged_id))
                                 ->  BitmapOr  (cost=8.11..8.11 rows=8 width=0) (actual time=0.005..0.005 rows=0 loops=2000000)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0) (actual time=0.003..0.003 rows=8 loops=2000000)
                                             Index Cond: (ad_user_id = u.ad_user_id)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0) (actual time=0.001..0.001 rows=3 loops=2000000)
:                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on ad_events e  (cost=39730372.92..39765372.92 rows=1000000 width=237) (actual time=130140.450..130140.450 rows=0 loops=1)
   ->  Hash Join  (cost=39730372.92..39765372.92 rows=1000000 width=237) (actual time=54243.877..66848.448 rows=2000000 loops=1)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=39654433.45..39666933.45 rows=1000000 width=144) (actual time=52682.740..57668.998 rows=2000000 loops=1)
               ->  Sort  (cost=39654433.45..39656933.45 rows=1000000 width=108) (actual time=52682.693..55003.467 rows=2000000 loops=1)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     Sort Method: external merge  Disk: 1078104kB
                     ->  Nested Loop Left Join  (cost=12054.20..39554775.61 rows=1000000 width=108) (actual time=204.872..43742.396 rows=2000000 loops=1)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end))
                           Rows Removed by Join Filter: 18792613
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92) (actual time=204.402..4293.175 rows=2000000 loops=1)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112) (actual time=0.022..1813.272 rows=2000000 loops=1)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67) (actual time=204.166..204.166 rows=329115 loops=1)
                                       Buckets: 65536  Batches: 1  Memory Usage: 30590kB
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67) (actual time=0.009..85.231 rows=329115 loops=1)
                           ->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32) (actual time=0.007..0.015 rows=10 loops=2000000)
                                 Recheck Cond: ((ad_user_id = u.ad_user_id) OR (ad_user_id = u.merged_id))
                                 ->  BitmapOr  (cost=8.11..8.11 rows=8 width=0) (actual time=0.005..0.005 rows=0 loops=2000000)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0) (actual time=0.003..0.003 rows=8 loops=2000000)
                                             Index Cond: (ad_user_id = u.ad_user_id)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0) (actual time=0.001..0.001 rows=3 loops=2000000)
                                             Index Cond: (ad_user_id = u.merged_id)
         ->  Hash  (cost=50938.65..50938.65 rows=2000065 width=101) (actual time=1560.037..1560.037 rows=2000000 loops=1)
               Buckets: 262144  Batches: 1  Memory Usage: 253047kB
               ->  Seq Scan on ad_events e  (cost=0.00..50938.65 rows=2000065 width=101) (actual time=0.014..633.555 rows=2000000 loops=1)

Re: Non-deterministic 100% CPU hang on postgres 9.3

From
Tom Lane
Date:
Fenn Bailey <fenn.bailey@gmail.com> writes:
> I'm experiencing an issue where certain queries appear to
> non-deterministically "hang", with a CPU pinned at 100%.

> I say "hang", where really I've given up after ~12 hours execution. The
> exact same query can then be terminated and run in <90 seconds, with none
> of the underlying data changing.

But does the plan change?

> If anyone could explain this behaviour, or even how I might go about
> diagnosing, that would be wonderful.

"perf" or "oprofile" or local equivalent would help identify where the
looping is happening.

            regards, tom lane


Re: Non-deterministic 100% CPU hang on postgres 9.3

From
Fenn Bailey
Date:
Some more information on this:

> I say "hang", where really I've given up after ~12 hours execution. The
> exact same query can then be terminated and run in <90 seconds, with none
> of the underlying data changing.

But does the plan change?


As it turns out it does. Some further examination shows that the plan changes, at least partially due to the fact that I now see this query attempts to modify the data that it's querying (ie: if any updates have occurred, the subsequent query is not the same). 

That said, all queries were performed in a transaction, so when they're aborted, no modification should be made to the underlying data. 

This makes it even more odd that on a subsequent run of the same query (after the rollback) it will complete in ~140 seconds (vs the several hours it was taking on the first run).

Is it possible that the initial run is changing the statistics available on the table, hence providing a different query plan (even on the same data)?

Query plan on first/second run follows - 

First run (failure):
---------------------------
 Update on ad_events e  (cost=1426714795.64..1426749795.64 rows=1000000 width=237)
   ->  Hash Join  (cost=1426714795.64..1426749795.64 rows=1000000 width=237)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=1426638856.33..1426651356.33 rows=1000000 width=144)
               ->  Sort  (cost=1426638856.33..1426641356.33 rows=1000000 width=108)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     ->  Nested Loop Left Join  (cost=12046.09..1426539198.49 rows=1000000 width=108)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id = u.ad_user_id) OR (s.ad_user_id = u.merged_id)))
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67)
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67)
                           ->  Materialize  (cost=0.00..5478.88 rows=63392 width=32)
                                 ->  Seq Scan on ad_sessions s  (cost=0.00..5161.92 rows=63392 width=32)
         ->  Hash  (cost=50938.58..50938.58 rows=2000058 width=101)
               ->  Seq Scan on ad_events e  (cost=0.00..50938.58 rows=2000058 width=101)

Second run (completed in ~148 seconds):
------------------------------
 Update on ad_events e  (cost=39730372.76..39765372.76 rows=1000000 width=237)
   ->  Hash Join  (cost=39730372.76..39765372.76 rows=1000000 width=237)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=39654433.45..39666933.45 rows=1000000 width=144)
               ->  Sort  (cost=39654433.45..39656933.45 rows=1000000 width=108)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     ->  Nested Loop Left Join  (cost=12054.20..39554775.61 rows=1000000 width=108)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end))
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67)
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67)
                           ->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32)
                                 Recheck Cond: ((ad_user_id = u.ad_user_id) OR (ad_user_id = u.merged_id))
                                 ->  BitmapOr  (cost=8.11..8.11 rows=8 width=0)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
                                             Index Cond: (ad_user_id = u.ad_user_id)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
                                             Index Cond: (ad_user_id = u.merged_id)
         ->  Hash  (cost=50938.58..50938.58 rows=2000058 width=101)
               ->  Seq Scan on ad_events e  (cost=0.00..50938.58 rows=2000058 width=101)
---------------

From a quick look, the plans diverge at:
->  Seq Scan on ad_sessions s  (cost=0.00..5161.92 rows=63392 width=32)
vs
->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32)

I will continue to look further into this in an effort to "fix" the queries but it would be great if someone could give me insight as to what could be causing the inconsistency in plan.

Cheers!

Re: Non-deterministic 100% CPU hang on postgres 9.3

From
Fenn Bailey
Date:
Hi all,

I was wondering if anyone had any further insight as to why the query plan would have changed so dramatically between queries, even when isolated by a transaction?

Is it possible for database statistics to change after a query has been run (independent of transactions)?

Thanks in advance,

 Fenn.


On Wed, Apr 16, 2014 at 2:25 PM, Fenn Bailey <fenn.bailey@gmail.com> wrote:
Some more information on this:

> I say "hang", where really I've given up after ~12 hours execution. The
> exact same query can then be terminated and run in <90 seconds, with none
> of the underlying data changing.

But does the plan change?


As it turns out it does. Some further examination shows that the plan changes, at least partially due to the fact that I now see this query attempts to modify the data that it's querying (ie: if any updates have occurred, the subsequent query is not the same). 

That said, all queries were performed in a transaction, so when they're aborted, no modification should be made to the underlying data. 

This makes it even more odd that on a subsequent run of the same query (after the rollback) it will complete in ~140 seconds (vs the several hours it was taking on the first run).

Is it possible that the initial run is changing the statistics available on the table, hence providing a different query plan (even on the same data)?

Query plan on first/second run follows - 

First run (failure):
---------------------------
 Update on ad_events e  (cost=1426714795.64..1426749795.64 rows=1000000 width=237)
   ->  Hash Join  (cost=1426714795.64..1426749795.64 rows=1000000 width=237)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=1426638856.33..1426651356.33 rows=1000000 width=144)
               ->  Sort  (cost=1426638856.33..1426641356.33 rows=1000000 width=108)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     ->  Nested Loop Left Join  (cost=12046.09..1426539198.49 rows=1000000 width=108)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end) AND ((s.ad_user_id = u.ad_user_id) OR (s.ad_user_id = u.merged_id)))
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67)
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67)
                           ->  Materialize  (cost=0.00..5478.88 rows=63392 width=32)
                                 ->  Seq Scan on ad_sessions s  (cost=0.00..5161.92 rows=63392 width=32)
         ->  Hash  (cost=50938.58..50938.58 rows=2000058 width=101)
               ->  Seq Scan on ad_events e  (cost=0.00..50938.58 rows=2000058 width=101)

Second run (completed in ~148 seconds):
------------------------------
 Update on ad_events e  (cost=39730372.76..39765372.76 rows=1000000 width=237)
   ->  Hash Join  (cost=39730372.76..39765372.76 rows=1000000 width=237)
         Hash Cond: (x.ad_event_id = e.ad_event_id)
         ->  Subquery Scan on x  (cost=39654433.45..39666933.45 rows=1000000 width=144)
               ->  Sort  (cost=39654433.45..39656933.45 rows=1000000 width=108)
                     Sort Key: s.ad_user_id, s.ad_session_id, t."timestamp"
                     ->  Nested Loop Left Join  (cost=12054.20..39554775.61 rows=1000000 width=108)
                           Join Filter: ((t."timestamp" >= s.session_start) AND (t."timestamp" <= s.session_end))
                           ->  Hash Join  (cost=12046.09..203878.09 rows=1000000 width=92)
                                 Hash Cond: (t.user_id = u.orig_distinct_id)
                                 ->  Seq Scan on ad_events_mv t  (cost=0.00..173082.00 rows=1000000 width=112)
                                       Filter: (NOT processed)
                                 ->  Hash  (cost=7932.15..7932.15 rows=329115 width=67)
                                       ->  Seq Scan on ad_users u  (cost=0.00..7932.15 rows=329115 width=67)
                           ->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32)
                                 Recheck Cond: ((ad_user_id = u.ad_user_id) OR (ad_user_id = u.merged_id))
                                 ->  BitmapOr  (cost=8.11..8.11 rows=8 width=0)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
                                             Index Cond: (ad_user_id = u.ad_user_id)
                                       ->  Bitmap Index Scan on ad_sessions_ad_user_id_idx  (cost=0.00..4.05 rows=4 width=0)
                                             Index Cond: (ad_user_id = u.merged_id)
         ->  Hash  (cost=50938.58..50938.58 rows=2000058 width=101)
               ->  Seq Scan on ad_events e  (cost=0.00..50938.58 rows=2000058 width=101)
---------------

From a quick look, the plans diverge at:
->  Seq Scan on ad_sessions s  (cost=0.00..5161.92 rows=63392 width=32)
vs
->  Bitmap Heap Scan on ad_sessions s  (cost=8.11..39.22 rows=8 width=32)

I will continue to look further into this in an effort to "fix" the queries but it would be great if someone could give me insight as to what could be causing the inconsistency in plan.

Cheers!