Thread: Sequential parallel scan / agg

Sequential parallel scan / agg

From
James Sewell
Date:
Hi all,

I have a customer who is making use of the new parallel features on a 64 core box - this is working really well for them. However one issue (which I've since realised I've never seen an example of) is getting the data from these queries into a table.

I can't seem to create a plan which does this and still uses multiple workers, probably best to just show by example.

Below you can see:
  1. INSERT ... SELECT doesn't work
  2. WITH a AS (SELECT ...) INSERT FROM a doesn't work
  3. SELECT .. INTO doesn't work
  4. SELECT with no insert does work as expected


test=# explain
insert into jirotech.test
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.41..13336866.44 rows=1 width=438)
   ->  Subquery Scan on "*SELECT*"  (cost=13336866.41..13336866.44 rows=1 width=438)
         ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
               ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                     ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(10 rows)

test=# explain
with a as
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'  )
INSERT INTO jirotech.test SELECT * FROM A
;
      
                                                                                QUERY PLAN                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.42..13336866.45 rows=1 width=438)
   CTE a
     ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
           ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                 ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
   ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=438)
(11 rows)

test=# explain
select count(*) into jirotech.jamestest FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                   QUERY PLAN                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
   ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
         ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(8 rows)

Time: 29.229 ms

test=# explain
select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7631566.71..7631566.72 rows=1 width=8)
   ->  Gather  (cost=7631566.68..7631566.69 rows=8 width=8)
         Workers Planned: 8
         ->  Partial Aggregate  (cost=7630566.68..7630566.69 rows=1 width=8)
               ->  Append  (cost=0.00..7514121.14 rows=46578216 width=0)
                     ->  Parallel Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..3756417.28 rows=23280094 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..3757703.86 rows=23298121 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(11 rows)

Time: 29.819 ms


Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Sequential parallel scan / agg

From
melvin6925
Date:
Have you tried:
CREATE TABLE blah AS SELECT ...  ?



Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone

-------- Original message --------
From: James Sewell <james.sewell@jirotech.com>
Date: 11/13/16 18:04 (GMT-05:00)
To: pgsql-general <pgsql-general@postgresql.org>
Subject: [GENERAL] Sequential parallel scan / agg

Hi all,

I have a customer who is making use of the new parallel features on a 64 core box - this is working really well for them. However one issue (which I've since realised I've never seen an example of) is getting the data from these queries into a table.

I can't seem to create a plan which does this and still uses multiple workers, probably best to just show by example.

Below you can see:
  1. INSERT ... SELECT doesn't work
  2. WITH a AS (SELECT ...) INSERT FROM a doesn't work
  3. SELECT .. INTO doesn't work
  4. SELECT with no insert does work as expected


test=# explain
insert into jirotech.test
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.41..13336866.44 rows=1 width=438)
   ->  Subquery Scan on "*SELECT*"  (cost=13336866.41..13336866.44 rows=1 width=438)
         ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
               ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                     ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(10 rows)

test=# explain
with a as
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'  )
INSERT INTO jirotech.test SELECT * FROM A
;
      
                                                                                QUERY PLAN                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.42..13336866.45 rows=1 width=438)
   CTE a
     ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
           ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                 ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
   ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=438)
(11 rows)

test=# explain
select count(*) into jirotech.jamestest FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                   QUERY PLAN                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
   ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
         ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(8 rows)

Time: 29.229 ms

test=# explain
select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7631566.71..7631566.72 rows=1 width=8)
   ->  Gather  (cost=7631566.68..7631566.69 rows=8 width=8)
         Workers Planned: 8
         ->  Partial Aggregate  (cost=7630566.68..7630566.69 rows=1 width=8)
               ->  Append  (cost=0.00..7514121.14 rows=46578216 width=0)
                     ->  Parallel Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..3756417.28 rows=23280094 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..3757703.86 rows=23298121 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(11 rows)

Time: 29.819 ms


Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Sequential parallel scan / agg

From
James Sewell
Date:
Hi,

Yes, same result (non-parallel seq scan)

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009

On Mon, Nov 14, 2016 at 10:43 AM, melvin6925 <melvin6925@gmail.com> wrote:
Have you tried:
CREATE TABLE blah AS SELECT ...  ?



Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone

-------- Original message --------
From: James Sewell <james.sewell@jirotech.com>
Date: 11/13/16 18:04 (GMT-05:00)
To: pgsql-general <pgsql-general@postgresql.org>
Subject: [GENERAL] Sequential parallel scan / agg

Hi all,

I have a customer who is making use of the new parallel features on a 64 core box - this is working really well for them. However one issue (which I've since realised I've never seen an example of) is getting the data from these queries into a table.

I can't seem to create a plan which does this and still uses multiple workers, probably best to just show by example.

Below you can see:
  1. INSERT ... SELECT doesn't work
  2. WITH a AS (SELECT ...) INSERT FROM a doesn't work
  3. SELECT .. INTO doesn't work
  4. SELECT with no insert does work as expected


test=# explain
insert into jirotech.test
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.41..13336866.44 rows=1 width=438)
   ->  Subquery Scan on "*SELECT*"  (cost=13336866.41..13336866.44 rows=1 width=438)
         ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
               ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                     ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(10 rows)

test=# explain
with a as
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'  )
INSERT INTO jirotech.test SELECT * FROM A
;
      
                                                                                QUERY PLAN                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.42..13336866.45 rows=1 width=438)
   CTE a
     ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
           ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                 ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
   ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=438)
(11 rows)

test=# explain
select count(*) into jirotech.jamestest FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                   QUERY PLAN                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
   ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
         ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(8 rows)

Time: 29.229 ms

test=# explain
select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7631566.71..7631566.72 rows=1 width=8)
   ->  Gather  (cost=7631566.68..7631566.69 rows=8 width=8)
         Workers Planned: 8
         ->  Partial Aggregate  (cost=7630566.68..7630566.69 rows=1 width=8)
               ->  Append  (cost=0.00..7514121.14 rows=46578216 width=0)
                     ->  Parallel Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..3756417.28 rows=23280094 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..3757703.86 rows=23298121 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(11 rows)

Time: 29.819 ms


Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Sequential parallel scan / agg

From
James Sewell
Date:
OK, looks like this question is answered by the wiki:


Even when parallel query is enabled in general, the query planner will never generate a parallel plan if any of the following are true:
  • The query writes data. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the current implementation which could be lifted in a future release.

Sad, but looks to hold from this testing!

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009

On Mon, Nov 14, 2016 at 11:13 AM, James Sewell <james.sewell@jirotech.com> wrote:
Hi,

Yes, same result (non-parallel seq scan)

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009

On Mon, Nov 14, 2016 at 10:43 AM, melvin6925 <melvin6925@gmail.com> wrote:
Have you tried:
CREATE TABLE blah AS SELECT ...  ?



Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone

-------- Original message --------
From: James Sewell <james.sewell@jirotech.com>
Date: 11/13/16 18:04 (GMT-05:00)
To: pgsql-general <pgsql-general@postgresql.org>
Subject: [GENERAL] Sequential parallel scan / agg

Hi all,

I have a customer who is making use of the new parallel features on a 64 core box - this is working really well for them. However one issue (which I've since realised I've never seen an example of) is getting the data from these queries into a table.

I can't seem to create a plan which does this and still uses multiple workers, probably best to just show by example.

Below you can see:
  1. INSERT ... SELECT doesn't work
  2. WITH a AS (SELECT ...) INSERT FROM a doesn't work
  3. SELECT .. INTO doesn't work
  4. SELECT with no insert does work as expected


test=# explain
insert into jirotech.test
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00' )
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.41..13336866.44 rows=1 width=438)
   ->  Subquery Scan on "*SELECT*"  (cost=13336866.41..13336866.44 rows=1 width=438)
         ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
               ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                     ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(10 rows)

test=# explain
with a as
(select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'  )
INSERT INTO jirotech.test SELECT * FROM A
;
      
                                                                                QUERY PLAN                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on test  (cost=13336866.42..13336866.45 rows=1 width=438)
   CTE a
     ->  Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
           ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
                 ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                 ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
                       Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
   ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=438)
(11 rows)

test=# explain
select count(*) into jirotech.jamestest FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                   QUERY PLAN                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13336866.41..13336866.42 rows=1 width=8)
   ->  Append  (cost=0.00..12405302.12 rows=372625715 width=0)
         ->  Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..6201295.24 rows=186240748 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
         ->  Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..6204006.88 rows=186384966 width=0)
               Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(8 rows)

Time: 29.229 ms

test=# explain
select count(*)  FROM  meter_read mr

where TIMESTAMP_ROUNDED >= (timestamp'2016-03-30 00:00:00' - interval '2' day)

and TIMESTAMP_ROUNDED < timestamp'2016-03-30 00:00:00'
;
                                                                                         QUERY PLAN                        
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7631566.71..7631566.72 rows=1 width=8)
   ->  Gather  (cost=7631566.68..7631566.69 rows=8 width=8)
         Workers Planned: 8
         ->  Partial Aggregate  (cost=7630566.68..7630566.69 rows=1 width=8)
               ->  Append  (cost=0.00..7514121.14 rows=46578216 width=0)
                     ->  Parallel Seq Scan on meter_read mr  (cost=0.00..0.00 rows=1 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_29 mr_1  (cost=0.00..3756417.28 rows=23280094 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
                     ->  Parallel Seq Scan on meter_read_2016_03_28 mr_2  (cost=0.00..3757703.86 rows=23298121 width=0)
                           Filter: ((timestamp_rounded >= '2016-03-28 00:00:00'::timestamp without time zone) AND (timestamp_rounded < '2016-03-30 00:00:00'::timestamp without time zone))
(11 rows)

Time: 29.819 ms


Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.




The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Sequential parallel scan / agg

From
Tobias Bussmann
Date:
Hi,

On 2016-11-13 23:04:44 +0000, James Sewell said:
>     1.    INSERT ... SELECT doesn't work
>     2.    WITH a AS (SELECT ...) INSERT FROM a doesn't work
>     3.    SELECT .. INTO doesn't work
>     4.    SELECT with no insert does work as expected
>>
>> CREATE TABLE blah AS SELECT ...  ?

Unfortunately, in these cases the current parallelism features cannot be used as all of these queries do write data.
Seein the doc [1] at "When Can Parallel Query Be Used?"  

> Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for
agiven query if any of the following are true: 
>     •    The query writes any data or locks any database rows. If a query contains a data-modifying operation either
atthe top level or within a CTE, no parallel plans for that query will be generated. This is a limitation of the
currentimplementation which could be lifted in a future release. 

Best
Tobias

[1]: https://www.postgresql.org/docs/current/static/when-can-parallel-query-be-used.html