Thread: Multi-level hierarchy with parallel append can lead to an extra subplan.

Multi-level hierarchy with parallel append can lead to an extra subplan.

From
Rajkumar Raghuwanshi
Date:
Hi,

I am getting extra subplan when using parallel append with multi-level hierarchy,  leading to data corruption.
Please see below test case.

-- set below parameters to encourage use of parallel plans
SET parallel_setup_cost=0;
SET parallel_tuple_cost=0;
SET min_parallel_table_scan_size=0;
SET max_parallel_workers_per_gather=4;

--create below data set
CREATE TABLE RM38941_inherit (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3 CHAR(10));
INSERT INTO RM38941_inherit SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM generate_series(0, 499,2) i;

CREATE TABLE RM38941_inherit_t1 () INHERITS (RM38941_inherit);
INSERT INTO RM38941_inherit_t1 SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM generate_series(0, 499,3) i;

CREATE TABLE RM38941_inherit_t2 () INHERITS (RM38941_inherit);
INSERT INTO RM38941_inherit_t2 SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM generate_series(0, 499,5) i;

CREATE TABLE RM38941_union_t1 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3 CHAR(10));
INSERT INTO RM38941_union_t1 SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM generate_series(0, 499,2) i;

CREATE TABLE RM38941_union_t2 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3 CHAR(10));
INSERT INTO RM38941_union_t2 SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM generate_series(0, 499,4) i;

ALTER TABLE RM38941_union_t1 SET (parallel_workers = 0);
ALTER TABLE RM38941_inherit_t1 SET (parallel_workers = 0);

--with parallel_append
SET enable_parallel_append=on;
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2 UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM RM38941_union_t1)UA;
                              QUERY PLAN                              
-----------------------------------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 3
         ->  Partial Aggregate
               ->  Parallel Append
                     ->  Seq Scan on rm38941_inherit_t1
                     ->  Seq Scan on rm38941_union_t1
                     ->  Parallel Seq Scan on rm38941_union_t2
                     ->  Parallel Seq Scan on rm38941_inherit
                     ->  Parallel Seq Scan on rm38941_inherit_t2
                     ->  Parallel Append
                           ->  Seq Scan on rm38941_inherit_t1
                           ->  Parallel Seq Scan on rm38941_inherit
                           ->  Parallel Seq Scan on rm38941_inherit_t2
(14 rows)

postgres=# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2 UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM RM38941_union_t1)UA;
         avg          |  sum 
----------------------+-------
 248.6983676366217175 | 86916
(1 row)


--without parallel_append
SET enable_parallel_append=off;
postgres=# EXPLAIN (COSTS OFF)
postgres-# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2 UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM RM38941_union_t1)UA;
                 QUERY PLAN                
--------------------------------------------
 Aggregate
   ->  Append
         ->  Seq Scan on rm38941_union_t2
         ->  Seq Scan on rm38941_inherit
         ->  Seq Scan on rm38941_inherit_t1
         ->  Seq Scan on rm38941_inherit_t2
         ->  Seq Scan on rm38941_union_t1
(7 rows)

postgres=# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2 UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM RM38941_union_t1)UA;
         avg          |  sum 
----------------------+-------
 248.6917040358744395 | 55083
(1 row)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

Re: Multi-level hierarchy with parallel append can lead to an extra subplan.

From
Amit Khandekar
Date:
Thanks Rajkumar for catching this.

There is indeed is an extra child subplan showing up in the Parallel Append :

                     ->  Parallel Append
                           ->  Seq Scan on rm38941_inherit_t1
                           ->  Parallel Seq Scan on rm38941_inherit
                           ->  Parallel Seq Scan on rm38941_inherit_t2

This subplan should not be there. It looks like that's because when
we add up subpaths while preparing Parallel Append path containing mix of
partial and non-partial child paths, in accumulate_append_subpath(),
the subpath being added is an Append path because it's a multi-level
partition. So in the condition "else if (special_subpaths != NULL)",
both *subpaths and *special_paths get updated. But then it is not
returning from there when it should. Instead, the control passes to
the end of function where *subpaths is again populated :
*subpaths = lappend(*subpaths, path);
which leads to an extra child with partial subpath getting added.

I think the fix should be that we should return from the above if
condition block, just like we return from the other if blocks. I
quickly tried to do this and it seems to work :

@@ -1926,6 +1926,7 @@ accumulate_append_subpath(Path *path, List
**subpaths, List **special_subpaths)

apath->first_partial_path);
                        *special_subpaths = list_concat(*special_subpaths,

                 new_special_subpaths);
+                       return;
                }
        }
        else if (IsA(path, MergeAppendPath))

But yet to test it sufficiently.

Actually I had not included UNION ALL testcase since the code flow for
Parallel Append is exactly the same for partitions and UNION ALL case.
But with the patch for this issue, I will also include a UNION ALL
testcase in the regression test. But, I think there must also be some
way to reproduce this same issue with an aggregate query with
multi-level partitioned tables without having to use UNION ALL.

Thanks
-Amit



On 9 January 2018 at 15:39, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:
> Hi,
>
> I am getting extra subplan when using parallel append with multi-level
> hierarchy,  leading to data corruption.
> Please see below test case.
>
> -- set below parameters to encourage use of parallel plans
> SET parallel_setup_cost=0;
> SET parallel_tuple_cost=0;
> SET min_parallel_table_scan_size=0;
> SET max_parallel_workers_per_gather=4;
>
> --create below data set
> CREATE TABLE RM38941_inherit (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3
> CHAR(10));
> INSERT INTO RM38941_inherit SELECT i, i % 125, to_char(i % 4, 'FM0000') FROM
> generate_series(0, 499,2) i;
>
> CREATE TABLE RM38941_inherit_t1 () INHERITS (RM38941_inherit);
> INSERT INTO RM38941_inherit_t1 SELECT i, i % 125, to_char(i % 4, 'FM0000')
> FROM generate_series(0, 499,3) i;
>
> CREATE TABLE RM38941_inherit_t2 () INHERITS (RM38941_inherit);
> INSERT INTO RM38941_inherit_t2 SELECT i, i % 125, to_char(i % 4, 'FM0000')
> FROM generate_series(0, 499,5) i;
>
> CREATE TABLE RM38941_union_t1 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3
> CHAR(10));
> INSERT INTO RM38941_union_t1 SELECT i, i % 125, to_char(i % 4, 'FM0000')
> FROM generate_series(0, 499,2) i;
>
> CREATE TABLE RM38941_union_t2 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3
> CHAR(10));
> INSERT INTO RM38941_union_t2 SELECT i, i % 125, to_char(i % 4, 'FM0000')
> FROM generate_series(0, 499,4) i;
>
> ALTER TABLE RM38941_union_t1 SET (parallel_workers = 0);
> ALTER TABLE RM38941_inherit_t1 SET (parallel_workers = 0);
>
> --with parallel_append
> SET enable_parallel_append=on;
> postgres=# EXPLAIN (COSTS OFF)
> postgres-# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2
> UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM
> RM38941_union_t1)UA;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Finalize Aggregate
>    ->  Gather
>          Workers Planned: 3
>          ->  Partial Aggregate
>                ->  Parallel Append
>                      ->  Seq Scan on rm38941_inherit_t1
>                      ->  Seq Scan on rm38941_union_t1
>                      ->  Parallel Seq Scan on rm38941_union_t2
>                      ->  Parallel Seq Scan on rm38941_inherit
>                      ->  Parallel Seq Scan on rm38941_inherit_t2
>                      ->  Parallel Append
>                            ->  Seq Scan on rm38941_inherit_t1
>                            ->  Parallel Seq Scan on rm38941_inherit
>                            ->  Parallel Seq Scan on rm38941_inherit_t2
> (14 rows)
>
> postgres=# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2
> UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM
> RM38941_union_t1)UA;
>          avg          |  sum
> ----------------------+-------
>  248.6983676366217175 | 86916
> (1 row)
>
>
> --without parallel_append
> SET enable_parallel_append=off;
> postgres=# EXPLAIN (COSTS OFF)
> postgres-# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2
> UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM
> RM38941_union_t1)UA;
>                  QUERY PLAN
> --------------------------------------------
>  Aggregate
>    ->  Append
>          ->  Seq Scan on rm38941_union_t2
>          ->  Seq Scan on rm38941_inherit
>          ->  Seq Scan on rm38941_inherit_t1
>          ->  Seq Scan on rm38941_inherit_t2
>          ->  Seq Scan on rm38941_union_t1
> (7 rows)
>
> postgres=# SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM RM38941_union_t2
> UNION ALL SELECT c1,c2 FROM RM38941_inherit UNION ALL SELECT c1,c2 FROM
> RM38941_union_t1)UA;
>          avg          |  sum
> ----------------------+-------
>  248.6917040358744395 | 55083
> (1 row)
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


Re: Multi-level hierarchy with parallel append can lead to an extra subplan.

From
Robert Haas
Date:
On Tue, Jan 9, 2018 at 5:41 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> This subplan should not be there. It looks like that's because when
> we add up subpaths while preparing Parallel Append path containing mix of
> partial and non-partial child paths, in accumulate_append_subpath(),
> the subpath being added is an Append path because it's a multi-level
> partition. So in the condition "else if (special_subpaths != NULL)",
> both *subpaths and *special_paths get updated. But then it is not
> returning from there when it should. Instead, the control passes to
> the end of function where *subpaths is again populated :
> *subpaths = lappend(*subpaths, path);
> which leads to an extra child with partial subpath getting added.

Boy, what a dumb mistake on my part.  Thanks for finding the bug,
Rajkumar.  Thanks for analyzing it, Amit.  I have committed the fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company