Re: [HACKERS] path toward faster partition pruning - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] path toward faster partition pruning
Date
Msg-id 20171116.115423.140952026.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] path toward faster partition pruning  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Thanks for the interesting test, David.

At Tue, 14 Nov 2017 17:00:12 +1300, David Rowley <david.rowley@2ndquadrant.com> wrote in
<CAKJS1f-1Lc_b=Y1iicPQzvUgSn1keHSgmRqLuOGq_VR6M==zbw@mail.gmail.com>
> On 13 November 2017 at 22:46, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> > On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
> >> Without some aggressive optimization, the loop takes most of the
> >> time to check-and-jump for nothing especially with many
> >> partitions and somewhat unintuitive.
> >>
> >> The following uses a bit tricky bitmap operation but
> >> is straightforward as a whole.
> >>
> >> =====
> >> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
> >> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);

'+='.. ^^;

> >> /* fill up intermediate words */
> >> while (wordnum < uwordnum)
> >>    a->words[wordnum++] = ~(bitmapword) 0;
> >>
> >> /* fill up to BITNUM(upper) bit (0-based) of the last word */
> >> a->workds[wordnum++] |=
> >>      (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
> >> =====
> >
> > Considering also the David's comment downthread, I will try to incorporate
> > this into bms_add_range().
> 
> I've attached an implementation of the patch using this method.
> 
> I've also attached bitset.c which runs each through their paces. I'd
> have expected Kyotaro's method to be faster, but gcc 7.2 with -O2
> generates very slightly slower code. I didn't really check why. clang
> seems to do a better job with it.
..
> $ gcc -O2 bitset.c -o bitset && ./bitset
> bms_add_range in 0.694254 (6.94254 ns per loop)
> bms_add_range2 in 0.726643 (7.26643 ns per loop)
..
> $ gcc --version
> gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Hmm bms_add_range doesn't seem getting so aggressive optimization
but I had a similar result.

Looking the output of gcc -S, I found that bms_add_range() is
embedded in main(). (gcc 7.1.0) It's not surprizing after finding
that but.. Anyway I added __attribute((noinline)) to the two
functions and got the following result.

> bms_add_range in 1.24 (12.4 ns per loop)
> bms_add_range2 in 0.8 (8 ns per loop)

It seems reasonable.

> $ clang -O2 bitset.c -o bitset && ./bitset
> bms_add_range in 0.866554 (8.66554 ns per loop)
> bms_add_range2 in 0.467138 (4.67138 ns per loop)
..
> $ clang --version
> clang version 4.0.1-6 (tags/RELEASE_401/final)
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> 
> Probably just go with Kyotaro's idea (v2). I don't think this is worth
> debating, I just wanted to show it's not that clear-cut.

I agree that it's not so clear-cut.

regard,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Further simplification of c.h's #include section
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Timeline ID in backup_label file