Re: Parallel append plan instability/randomness - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel append plan instability/randomness
Date
Msg-id CAA4eK1LegxN9of0sydNM5n4up=7oCj=-E94edKDOsYFKyb1R0Q@mail.gmail.com
Whole thread Raw
In response to Parallel append plan instability/randomness  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parallel append plan instability/randomness
Re: Parallel append plan instability/randomness
List pgsql-hackers
On Sun, Jan 7, 2018 at 5:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> According to buildfarm member silverfish,
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=silverfish&dt=2018-01-06%2008%3A53%3A38
>
> it's possible to sometimes get this failure in the regression tests:
>
> *** /mnt/buildfarm/buildroot/HEAD/pgsql.build/../pgsql/src/test/regress/expected/select_parallel.out    Tue Dec 19
20:24:022017
 
> --- /mnt/buildfarm/buildroot/HEAD/pgsql.build/src/test/regress/results/select_parallel.out      Sat Jan  6 09:21:39
2018
> ***************
> *** 75,84 ****
>            Workers Planned: 3
>            ->  Partial Aggregate
>                  ->  Parallel Append
>                        ->  Seq Scan on d_star
>                        ->  Seq Scan on f_star
>                        ->  Seq Scan on e_star
> -                      ->  Seq Scan on b_star
>                        ->  Seq Scan on c_star
>                        ->  Seq Scan on a_star
>   (11 rows)
> --- 75,84 ----
>            Workers Planned: 3
>            ->  Partial Aggregate
>                  ->  Parallel Append
> +                      ->  Seq Scan on b_star
>                        ->  Seq Scan on d_star
>                        ->  Seq Scan on f_star
>                        ->  Seq Scan on e_star
>                        ->  Seq Scan on c_star
>                        ->  Seq Scan on a_star
>   (11 rows)
>
> Irreproducible failures in the regression tests are not very acceptable.
> Furthermore, considering that the query being tested is
>
> explain (costs off)
>   select round(avg(aa)), sum(aa) from a_star;
>
> it seems to me that the "expected" order of the sub-scans is mighty
> random to begin with.
>

I think order of sub-scans can be random if the number of rows in
child relations can vary across runs.  For the above case, the
subpaths (non-partial-paths) are always in the descending order of
their cost and I can see that by running it locally.  On my local m/c,
output is as below:

regression=# explain select round(avg(aa)), sum(aa) from a_star;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Finalize Aggregate  (cost=2.30..2.31 rows=1 width=40)
   ->  Gather  (cost=2.28..2.29 rows=3 width=40)
         Workers Planned: 3
         ->  Partial Aggregate  (cost=2.28..2.29 rows=1 width=40)
               ->  Parallel Append  (cost=0.00..2.20 rows=15 width=4)
                     ->  Seq Scan on d_star  (cost=0.00..1.16 rows=16 width=4)
                     ->  Seq Scan on f_star  (cost=0.00..1.16 rows=16 width=4)
                     ->  Seq Scan on e_star  (cost=0.00..1.07 rows=7 width=4)
                     ->  Seq Scan on b_star  (cost=0.00..1.04 rows=4 width=4)
                     ->  Seq Scan on c_star  (cost=0.00..1.04 rows=4 width=4)
                     ->  Seq Scan on a_star  (cost=0.00..1.03 rows=3 width=4)
(11 rows)

The above indicates that paths are listed in the order as expected.
What makes you think that the order of sub-scans can be random?  Is it
possible that the number of rows in child relations can vary across
runs?

One theory that can explain above failure is that the costs of
scanning some of the sub-paths is very close due to which sometimes
the results can vary.  If that is the case, then probably using
fuzz_factor in costs comparison (as is done in attached patch) can
improve the situation, may be we have to consider some other factors
like number of rows in each subpath.  However, it might be better to
first somehow reproduce this case and see what is going wrong, any
ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: heads up: Fix for intel hardware bug will lead to performance regressions
Next
From: Michael Paquier
Date:
Subject: Re: heads up: Fix for intel hardware bug will lead to performance regressions