Re: WIP: Upper planner pathification - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WIP: Upper planner pathification
Date
Msg-id 20160313010354.v7gygdse34vylp5a@alap3.anarazel.de
Whole thread Raw
In response to Re: WIP: Upper planner pathification  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP: Upper planner pathification
List pgsql-hackers
On 2016-03-12 12:22:01 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-03-10 23:38:14 -0500, Tom Lane wrote:
> >> I'll do it ... just send me the list.
> 
> > After exporting make_agg, make_limit, make_sort_from_sortclauses and
> > making some trivial adjustments due to the pull_var_clause changes
> > change, Citus' tests pass on 9.6, bar some noise.
> 
> OK, done.

Thanks.

> > Pathification made
> > some plans switch from hash-agg to sort-agg, and the other way round;
> > but that's obviously ok.
> 
> I wonder whether that's pathification per se.

If you're interested enough, I've uploaded a dump of the schema relevant
table to http://anarazel.de/t/lineitem_95_96_plan.dump.gz

the affected query is (after ANALYZE lineitem_102009)

EXPLAIN SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg,
array_agg(l_orderkey)AS array_agg
 
FROM lineitem_102009 lineitem
WHERE ((l_quantity < '5'::numeric) AND (l_orderkey > 5500) AND (l_orderkey < 9500)) GROUP BY l_quantity;

=# SELECT version();
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                 version                                                  │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc-5.real (Debian 5.3.1-10) 5.3.1 20160224, 64-bit │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

=# SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg,
array_agg(l_orderkey)AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey >
5500)AND (l_orderkey < 9500)) GROUP BY l_quantity;
 
┌────────────┬───────┬──────────┬─────┬────────────────────────────────────────────────┐
│ l_quantity │ count │   avg    │ avg │                   array_agg                    │
├────────────┼───────┼──────────┼─────┼────────────────────────────────────────────────┤
│       1.00 │     9 │ 13044.06 │   9 │ {8997,9026,9158,9184,9220,9222,9348,9383,9476} │
│       4.00 │     7 │ 40868.84 │   7 │ {9091,9120,9281,9347,9382,9440,9473}           │
│       2.00 │     8 │ 26072.02 │   8 │ {9030,9058,9123,9124,9188,9344,9441,9476}      │
│       3.00 │     9 │ 39925.32 │   9 │ {9124,9157,9184,9223,9254,9349,9414,9475,9477} │
└────────────┴───────┴──────────┴─────┴────────────────────────────────────────────────┘
(4 rows)

Time: 0.906 ms
=# EXPLAIN SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg,
array_agg(l_orderkey)AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey >
5500)AND (l_orderkey < 9500)) GROUP BY l_quantity;
 
┌────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         QUERY PLAN                                         │
├────────────────────────────────────────────────────────────────────────────────────────────┤
│ HashAggregate  (cost=137.91..137.93 rows=1 width=21)                                       │
│   Group Key: l_quantity                                                                    │
│   ->  Bitmap Heap Scan on lineitem_102009 lineitem  (cost=13.07..137.44 rows=38 width=21)  │
│         Recheck Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500))                        │
│         Filter: (l_quantity < '5'::numeric)                                                │
│         ->  Bitmap Index Scan on lineitem_pkey_102009  (cost=0.00..13.06 rows=478 width=0) │
│               Index Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500))                    │
└────────────────────────────────────────────────────────────────────────────────────────

vs.


=# SELECT version();

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│                                                                        version
               
 

├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ PostgreSQL 9.6devel on x86_64-pc-linux-gnu, compiled by gcc-6.real (Debian 6-20160228-1) 6.0.0 20160228
(experimental)[trunk revision
 

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(1 row)

=# SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg,
array_agg(l_orderkey)AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey >
5500)AND (l_orderkey < 9500)) GROUP BY l_quantity;
 
┌────────────┬───────┬──────────┬─────┬────────────────────────────────────────────────┐
│ l_quantity │ count │   avg    │ avg │                   array_agg                    │
├────────────┼───────┼──────────┼─────┼────────────────────────────────────────────────┤
│       1.00 │     9 │ 13044.06 │   9 │ {9476,9158,9184,9383,9026,9220,9222,8997,9348} │
│       2.00 │     8 │ 26072.02 │   8 │ {9124,9344,9441,9123,9476,9030,9058,9188}      │
│       3.00 │     9 │ 39925.32 │   9 │ {9477,9124,9157,9184,9223,9254,9349,9414,9475} │
│       4.00 │     7 │ 40868.84 │   7 │ {9440,9347,9473,9091,9281,9382,9120}           │
└────────────┴───────┴──────────┴─────┴────────────────────────────────────────────────┘
(4 rows)

=#  EXPLAIN SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg,
array_agg(l_orderkey)AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey >
5500)AND (l_orderkey < 9500)) GROUP BY l_quantity;
 
┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            QUERY PLAN                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────────┤
│ GroupAggregate  (cost=138.43..139.02 rows=1 width=85)                                            │
│   Group Key: l_quantity                                                                          │
│   ->  Sort  (cost=138.43..138.53 rows=38 width=21)                                               │
│         Sort Key: l_quantity                                                                     │
│         ->  Bitmap Heap Scan on lineitem_102009 lineitem  (cost=13.07..137.44 rows=38 width=21)  │
│               Recheck Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500))                        │
│               Filter: (l_quantity < '5'::numeric)                                                │
│               ->  Bitmap Index Scan on lineitem_pkey_102009  (cost=0.00..13.06 rows=478 width=0) │
│                     Index Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500))                    │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)


As you probably can guess, what made me notice this wsa the difference
in the array_agg output.


> Of the three core regression test EXPLAINs that changed in the
> pathification commit, two actually were a case of finding better
> plans.  The other one was a random-looking swap between two plans with
> near-identical costs.  When I looked into it, I found that the reason
> the planner liked the new plan better was that it was parallel-safe;
> add_path() saw the costs as fuzzily equal and allowed parallel-safe to
> be the determining factor in the choice.  The old code hadn't done
> that because the hard-wired cost comparisons in grouping_planner()
> never took parallel-safety into account.  But I'd call that a
> parallelism change, not a pathification change; it would certainly
> have appeared to be that if the patches had gone in in the opposite
> order.

I've not yet looked deep enough to determine the root cause; I did
however notice that set enable_sort = false; yields a cheaper plan than
the default one, within the fuzz range (137.91..137.93 vs 138.43..139.02).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Refactoring speculative insertion with unique indexes a little
Next
From: Dave Cramer
Date:
Subject: building on windows using VC 2008