Thread: Parallel query only when EXPLAIN ANALYZEd

Parallel query only when EXPLAIN ANALYZEd

From
Jay Knight
Date:
Hello,

I installed 9.6 on Windows 7, and am experimenting with the new parallel query feature. I've found a behavior that seems inconsistent to me. Consider these two tables:

create table t1 as (
  with r as (
    SELECT generate_series(1,300000) AS id, ceil(random()*25)::int as item, ceil(random()*100000)::int AS low
  )
  select item, low, ceil(low + random()*100)::int as high from r
);
create index on t1(item, low, high);

create table t2 as (
  with r as (
    SELECT generate_series(1,300000) AS id, ceil(random()*25)::int as item, ceil(random()*100000)::int AS low
  )
  select item, low, ceil(low + random()*100)::int as high from r
);
create index on t2(item, low, high);

And this query:

explain analyze
select count(*) 
  from t1 
  join t2 on t1.item = t2.item and t1.low <= t2.high and t1.high >= t2.low;

If max_parallel_workers_per_gather is set to zero, this query take around 54 seconds on my machine (with and without the explain analyze).  But if I set max_parallel_workers_per_gather=4, The query with explain analyze takes 25 seconds, and the query plan shows that it is using some workers to parallelize the work.  But when I run it with max_parallel_workers_per_gather=4 without the explain analyze, it takes 54 seconds again.  Watching my CPU graphs, it looks the same as it does with max_parallel_workers_per_gather=0 (just one core spikes).  Query plan with zero workers: https://explain.depesz.com/s/RUO, and with 4 workers: https://explain.depesz.com/s/AB97.

So, why might postgres parallelize the query when I explain analyze it, but not when I just run it by itself?

Thanks,
Jay K

Re: Parallel query only when EXPLAIN ANALYZEd

From
David Rowley
Date:
On 30 September 2016 at 08:52, Jay Knight <jay@jayknight.com> wrote:
> So, why might postgres parallelize the query when I explain analyze it, but
> not when I just run it by itself?

One theory would be that, the worker might not have been available
when you performed the query execution, but it just happened to be
when you did the EXPLAIN ANALYZE. If the executor can't get a free
worker process, then it'll just do all the work in the main process.
The plan parallel plan that you've shown, given no extra worker
processes, would most likely perform the same as the serial plan you
showed, since the extra Finalize Aggregate node is only handling 1 row
anyway.

What's max_worker_processes set to?

If this is just a test machine, you should be able to see what's going
on if you install auto_explain, and enable auto_explain.log_analyze
(https://www.postgresql.org/docs/current/static/auto-explain.html)
Setting this up will log the EXPLAIN ANALYZE to the PostgreSQL logs
when you execute the query as normal.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Parallel query only when EXPLAIN ANALYZEd

From
Jay Knight
Date:
>What's max_worker_processes set to?

8

>One theory would be that, the worker might not have been available

>when you performed the query execution, but it just happened to be
>when you did the EXPLAIN ANALYZE  

This happens consistently this way.  Every time I run it with explain analyze it uses parallel workers, and every time I run it without it doesn't.  I just enabled auto_explain, and see that it is "planning" to use a worker, but doesn't launch it.  Is there a way to know why a planned worker doesn't get launched?

2016-09-29 16:45:44 CDT LOG:  duration: 50703.595 ms  plan:
    Query Text: select count(*) from t1
    join t2 on t1.item = t2.item and t1.low <= t2.high and t1.high >= t2.low
    Finalize Aggregate  (cost=14609058.99..14609059.00 rows=1 width=8) (actual time=50703.584..50703.584 rows=1 loops=1)
      ->  Gather  (cost=14609058.88..14609058.99 rows=1 width=8) (actual time=50703.581..50703.581 rows=1 loops=1)
            Workers Planned: 1
            Workers Launched: 0
            ->  Partial Aggregate  (cost=14608958.88..14608958.89 rows=1 width=8) (actual time=50703.579..50703.579 rows=1 loops=1)
                  ->  Nested Loop  (cost=0.42..13608937.28 rows=400008641 width=0) (actual time=0.534..50577.673 rows=3669891 loops=1)
                        ->  Parallel Seq Scan on t1  (cost=0.00..3386.71 rows=176471 width=12) (actual time=0.041..18.351 rows=300000 loops=1)
                        ->  Index Only Scan using t2_item_low_high_idx on t2  (cost=0.42..63.77 rows=1333 width=12) (actual time=0.167..0.168 rows=12 loops=300000)
                              Index Cond: ((item = t1.item) AND (low <= t1.high) AND (high >= t1.low))
                              Heap Fetches: 0


Thanks,
Jay K

On Thu, Sep 29, 2016 at 4:29 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On 30 September 2016 at 08:52, Jay Knight <jay@jayknight.com> wrote:
> So, why might postgres parallelize the query when I explain analyze it, but
> not when I just run it by itself?

One theory would be that, the worker might not have been available
when you performed the query execution, but it just happened to be
when you did the EXPLAIN ANALYZE. If the executor can't get a free
worker process, then it'll just do all the work in the main process.
The plan parallel plan that you've shown, given no extra worker
processes, would most likely perform the same as the serial plan you
showed, since the extra Finalize Aggregate node is only handling 1 row
anyway.

What's max_worker_processes set to?

If this is just a test machine, you should be able to see what's going
on if you install auto_explain, and enable auto_explain.log_analyze
(https://www.postgresql.org/docs/current/static/auto-explain.html)
Setting this up will log the EXPLAIN ANALYZE to the PostgreSQL logs
when you execute the query as normal.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel query only when EXPLAIN ANALYZEd

From
David Rowley
Date:
On 30 September 2016 at 10:47, Jay Knight <jay@jayknight.com> wrote:
>>What's max_worker_processes set to?
>
> 8
>
>>One theory would be that, the worker might not have been available
>>when you performed the query execution, but it just happened to be
>>when you did the EXPLAIN ANALYZE
>
> This happens consistently this way.  Every time I run it with explain
> analyze it uses parallel workers, and every time I run it without it
> doesn't.  I just enabled auto_explain, and see that it is "planning" to use
> a worker, but doesn't launch it.  Is there a way to know why a planned
> worker doesn't get launched?

It seems not, at least not unless you're willing to attach a debugger.

> 2016-09-29 16:45:44 CDT LOG:  duration: 50703.595 ms  plan:
>     Query Text: select count(*) from t1
>     join t2 on t1.item = t2.item and t1.low <= t2.high and t1.high >= t2.low
>     Finalize Aggregate  (cost=14609058.99..14609059.00 rows=1 width=8)
> (actual time=50703.584..50703.584 rows=1 loops=1)
>       ->  Gather  (cost=14609058.88..14609058.99 rows=1 width=8) (actual
> time=50703.581..50703.581 rows=1 loops=1)
>             Workers Planned: 1
>             Workers Launched: 0
>             ->  Partial Aggregate  (cost=14608958.88..14608958.89 rows=1
> width=8) (actual time=50703.579..50703.579 rows=1 loops=1)
>                   ->  Nested Loop  (cost=0.42..13608937.28 rows=400008641
> width=0) (actual time=0.534..50577.673 rows=3669891 loops=1)
>                         ->  Parallel Seq Scan on t1  (cost=0.00..3386.71
> rows=176471 width=12) (actual time=0.041..18.351 rows=300000 loops=1)
>                         ->  Index Only Scan using t2_item_low_high_idx on t2
> (cost=0.42..63.77 rows=1333 width=12) (actual time=0.167..0.168 rows=12
> loops=300000)
>                               Index Cond: ((item = t1.item) AND (low <=
> t1.high) AND (high >= t1.low))
>                               Heap Fetches: 0

That's interesting.

How about trying to increase max_worker_processes to say, 16. I know
you've said you've tried multiple times and it seems consistent, but
increasing this seems to be worth a try, if anything, to rule that
out.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Parallel query only when EXPLAIN ANALYZEd

From
Jay Knight
Date:
I've upped max_worker_processes to 16, but I still can't get it to launch workers unless I use EXPLAIN ANALYZE.  I've also found that this simplified setup exhibits the same behavior:

create table big as (
  SELECT generate_series(1,30000000) AS id
);

explain analyze SELECT avg(id) from big where id % 17 = 0;

With explain analyze, the plan includes:

Workers Planned: 4
Workers Launched: 4

But without the explain analyze, it never launches workers:

Workers Planned: 4
Workers Launched: 0

I've tried this on a CentOS VM (VirtualBox on Windows) that I gave 2 cores, and it worked as expected (it launched workers with and without explain analyze), so I've only been able to reproduce this on Windows.

Thanks,
Jay K

On Thu, Sep 29, 2016 at 9:07 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On 30 September 2016 at 10:47, Jay Knight <jay@jayknight.com> wrote:
>>What's max_worker_processes set to?
>
> 8
>
>>One theory would be that, the worker might not have been available
>>when you performed the query execution, but it just happened to be
>>when you did the EXPLAIN ANALYZE
>
> This happens consistently this way.  Every time I run it with explain
> analyze it uses parallel workers, and every time I run it without it
> doesn't.  I just enabled auto_explain, and see that it is "planning" to use
> a worker, but doesn't launch it.  Is there a way to know why a planned
> worker doesn't get launched?

It seems not, at least not unless you're willing to attach a debugger.

> 2016-09-29 16:45:44 CDT LOG:  duration: 50703.595 ms  plan:
>     Query Text: select count(*) from t1
>     join t2 on t1.item = t2.item and t1.low <= t2.high and t1.high >= t2.low
>     Finalize Aggregate  (cost=14609058.99..14609059.00 rows=1 width=8)
> (actual time=50703.584..50703.584 rows=1 loops=1)
>       ->  Gather  (cost=14609058.88..14609058.99 rows=1 width=8) (actual
> time=50703.581..50703.581 rows=1 loops=1)
>             Workers Planned: 1
>             Workers Launched: 0
>             ->  Partial Aggregate  (cost=14608958.88..14608958.89 rows=1
> width=8) (actual time=50703.579..50703.579 rows=1 loops=1)
>                   ->  Nested Loop  (cost=0.42..13608937.28 rows=400008641
> width=0) (actual time=0.534..50577.673 rows=3669891 loops=1)
>                         ->  Parallel Seq Scan on t1  (cost=0.00..3386.71
> rows=176471 width=12) (actual time=0.041..18.351 rows=300000 loops=1)
>                         ->  Index Only Scan using t2_item_low_high_idx on t2
> (cost=0.42..63.77 rows=1333 width=12) (actual time=0.167..0.168 rows=12
> loops=300000)
>                               Index Cond: ((item = t1.item) AND (low <=
> t1.high) AND (high >= t1.low))
>                               Heap Fetches: 0

That's interesting.

How about trying to increase max_worker_processes to say, 16. I know
you've said you've tried multiple times and it seems consistent, but
increasing this seems to be worth a try, if anything, to rule that
out.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Parallel query only when EXPLAIN ANALYZEd

From
Tom Lane
Date:
Jay Knight <jay@jayknight.com> writes:
> I've tried this on a CentOS VM (VirtualBox on Windows) that I gave 2 cores,
> and it worked as expected (it launched workers with and without explain
> analyze), so I've only been able to reproduce this on Windows.

FWIW, I tried your original example on Linux yesterday, and it seemed to
work fine --- it only wanted to use 1 worker, but the speedup was just
about exactly 2X with or without "explain analyze".  So this is somehow
Windows specific.  That's darn odd; in a quick look at the relevant code
I see nothing that looks platform dependent.

Somebody will need to trace through this on Windows and see where it's
going off the rails.

            regards, tom lane


Re: Parallel query only when EXPLAIN ANALYZEd

From
David Rowley
Date:
On 1 October 2016 at 05:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I tried your original example on Linux yesterday, and it seemed to
> work fine --- it only wanted to use 1 worker, but the speedup was just
> about exactly 2X with or without "explain analyze".  So this is somehow
> Windows specific.  That's darn odd; in a quick look at the relevant code
> I see nothing that looks platform dependent.
>
> Somebody will need to trace through this on Windows and see where it's
> going off the rails.

I had assumed this was something very specific to Jay's machine, and
tracing through the code I didn't really see why EXPLAIN ANALYZE would
behave any differently from just the SELECT.

I tried the test case on 9.6.0 on a Windows 8.1 machine, and it works
fine for me.

Full test case below, if anyone else would like to try.

postgres=# create table big as (
postgres(#   SELECT generate_series(1,30000000) AS id
postgres(# );
SELECT 30000000

postgres=# set max_parallel_workers_per_gather=8;
SET
postgres=# load 'auto_explain';
LOAD
postgres=# set auto_explain.log_min_duration=0;
SET
postgres=# set auto_explain.log_analyze=1;
SET
postgres=# \timing
Timing is on.
postgres=# explain analyze SELECT avg(id) from big where id % 17 = 0;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=223819.96..223819.97 rows=1 width=32)
(actual time=1030.778..1030.778 rows=1 loops=1)
   ->  Gather  (cost=223819.43..223819.94 rows=5 width=32) (actual
time=1029.691..1030.762 rows=6 loops=1)
         Workers Planned: 5
         Workers Launched: 5
         ->  Partial Aggregate  (cost=222819.43..222819.44 rows=1
width=32) (actual time=1004.021..1004.021 rows=1 loops=6)
               ->  Parallel Seq Scan on big  (cost=0.00..222744.43
rows=30000 width=4) (actual time=0.132..974.408 rows=294118 loops=6)
                     Filter: ((id % 17) = 0)
                     Rows Removed by Filter: 4705883
 Planning time: 0.066 ms
 Execution time: 1045.064 ms
(10 rows)

Time: 1045.486 ms (00:01.045)

postgres=# SELECT avg(id) from big where id % 17 = 0;
          avg
-----------------------
 15000001.000000000000
(1 row)


Time: 1061.304 ms (00:01.061)

The logs show:

2016-10-01 09:39:30 NZDT [7084]: [11-1]
user=postgres,db=postgres,app=psql,client=::1 LOG:  duration: 1060.801
ms  plan:
Query Text: SELECT avg(id) from big where id % 17 = 0;
Finalize Aggregate  (cost=223819.96..223819.97 rows=1 width=32)
(actual time=1046.424..1046.424 rows=1 loops=1)
 ->  Gather  (cost=223819.43..223819.94 rows=5 width=32) (actual
time=1045.358..1046.414 rows=6 loops=1)
       Workers Planned: 5
       Workers Launched: 5
       ->  Partial Aggregate  (cost=222819.43..222819.44 rows=1
width=32) (actual time=1023.197..1023.197 rows=1 loops=6)
             ->  Parallel Seq Scan on big  (cost=0.00..222744.43
rows=30000 width=4) (actual time=0.143..992.890 rows=294118 loops=6)
                   Filter: ((id % 17) = 0)
                   Rows Removed by Filter: 4705883



--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Parallel query only when EXPLAIN ANALYZEd

From
Tom Lane
Date:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 1 October 2016 at 05:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Somebody will need to trace through this on Windows and see where it's
>> going off the rails.

> I tried the test case on 9.6.0 on a Windows 8.1 machine, and it works
> fine for me.

Hm, but if we assume that EXPLAIN ANALYZE is somehow affecting the
behavior, it's entirely plausible that having auto_explain active
would have the same effect.

Please try it without auto_explain, and just rely on noting the runtime
with psql's \timing to infer whether workers were active or not.

            regards, tom lane


Re: Parallel query only when EXPLAIN ANALYZEd

From
David Rowley
Date:
On 1 October 2016 at 10:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 1 October 2016 at 05:47, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Somebody will need to trace through this on Windows and see where it's
>>> going off the rails.
>
>> I tried the test case on 9.6.0 on a Windows 8.1 machine, and it works
>> fine for me.
>
> Hm, but if we assume that EXPLAIN ANALYZE is somehow affecting the
> behavior, it's entirely plausible that having auto_explain active
> would have the same effect.
>
> Please try it without auto_explain, and just rely on noting the runtime
> with psql's \timing to infer whether workers were active or not.

It's certainly worth trying that, but in Jay's case auto_explain made
no difference.

Results without auto_explain:

postgres=# set max_parallel_workers_per_gather=8;
SET
Time: 0.257 ms
postgres=# SELECT avg(id) from big where id % 17 = 0;
          avg
-----------------------
 15000001.000000000000
(1 row)


Time: 1027.599 ms (00:01.028)
postgres=# set max_parallel_workers_per_gather=0;
SET
Time: 0.163 ms
postgres=# SELECT avg(id) from big where id % 17 = 0;
          avg
-----------------------
 15000001.000000000000
(1 row)


Time: 3262.519 ms (00:03.263)

postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.0 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5,
Built by MSYS2 project) 4.9.2, 64-bit
(1 row)


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services