Re: Our trial to TPC-DS but optimizer made unreasonable plan - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Our trial to TPC-DS but optimizer made unreasonable plan
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801138AF0@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Our trial to TPC-DS but optimizer made unreasonable plan  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Our trial to TPC-DS but optimizer made unreasonable plan
List pgsql-hackers
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Peter Geoghegan
> Sent: Thursday, August 27, 2015 8:31 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Greg Stark; PostgreSQL-development
> Subject: Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan
> 
> On Mon, Aug 17, 2015 at 6:40 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > I think SortSupport logic provides a reasonable way to solve this
> > kind of problem. For example, btint4sortsupport() informs a function
> > pointer of the fast version of comparator (btint4fastcmp) which takes
> > two Datum argument without indirect memory reference.
> > This mechanism will also make sense for HashAggregate logic, to reduce
> > the cost of function invocations.
> >
> > Please comment on the idea I noticed here.
> 
> Is this a 9.5-based system? If so, then you'd benefit from the
> memcmp() pre-check within varstr_cmp() by being on 9.5, since the
> pre-check is not limited to cases that use text/varchar SortSupport --
> this could make a big difference here. If not, then it might be
> somewhat helpful to add a pre-check that considers total binary
> equality only before bcTruelen() is ever called. Not so sure about the
> latter idea, though.
> 
My measurement is done on v9.5 based system. So, it also seems to me
replacement of CHAR(n) by VARCHAR(n) will make sense.

> I'm not sure if it would help with hash aggregates to use something
> like SortSupport to avoid fmgr overhead. It might make enough of a
> difference to matter, but maybe the easier win would come from
> considering simple binary equality first, and only then using an
> equality operator (think HOT style checks). That would have the
> advantage of requiring no per-type/operator class support at all,
> since it's safe to assume that binary equality is a proxy for
> "equivalence" of sort order (or whatever we call the case where
> 5.00::numeric and 5.000::numeric are considered equal).
>
My presumption was wrong, at least not major portion, according to
the perf result. So, I don't think elimination of fmgr overhead has
the first priority. However, shortcut pass of equality checks seems
to me a great leap, to avoid strict equality checks implemented per
data type; that often takes complicated logic.
Probably, it is more intelligent to apply this binary equality proxy
on only problematic data types, like bpchar(n). But less effective
on simple data types, like int4.

On the other hands, one other big portion of HashAggregate is
calculation of hash-value by all the grouping key.
It may be beneficial to have an option to reference the result
attribute of underlying plan. It potentially allows co-processor
to compute hash-value instead of CPU.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: 9.4 broken on alpha
Next
From: Michael Paquier
Date:
Subject: Allow replication roles to use file access functions