Re: Patch: ResourceOwner optimization for tables with many partitions - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Patch: ResourceOwner optimization for tables with many partitions
Date
Msg-id 20151209163052.48d2d43a@fujitsu
Whole thread Raw
In response to Re: Patch: ResourceOwner optimization for tables with many partitions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Patch: ResourceOwner optimization for tables with many partitions
List pgsql-hackers
Hello, Robert

Thanks for your review. I believe I fixed items 1, 2 and 3 (see
attachment). Also I would like to clarify item 4.

> 4. It mixes together multiple ideas in a single patch, not only
> introducing a hashing concept but also striping a brand-new layer of
> abstraction across the resource-owner mechanism.  I am not sure that
> layer of abstraction is a very good idea, but if it needs to be done,
> I think it should be a separate patch.

Do I right understand that you suggest following?

Current patch should be split in two parts. In first patch we create
and use ResourceArray with array-based implementation (abstraction
layer). Then we apply second patch which change ResourceArray
implementation to hashing based (optimization).

Best regards,
Aleksander

On Tue, 8 Dec 2015 17:30:33 -0500
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Dec 4, 2015 at 7:15 AM, Aleksander Alekseev
> <a.alekseev@postgrespro.ru> wrote:
> > Current implementation of ResourceOwner uses arrays to store
> > resources like TupleDescs, Snapshots, etc. When we want to release
> > one of these resources ResourceOwner finds it with linear scan.
> > Granted, resource array are usually small so its not a problem most
> > of the time. But it appears to be a bottleneck when we are working
> > with tables which have a lot of partitions.
> >
> > To reproduce this issue:
> > 1. run `./gen.pl 10000 | psql my_database postgres`
> > 2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
> > 3. in second terminal run `sudo perf top -u postgres`
> >
> > Both gen.pl and q.sql are attached to this message.
> >
> > You will see that postgres spends a lot of time in
> > ResourceOwnerForget* procedures:
> >
> >  32.80%  postgres               [.] list_nth
> >  20.29%  postgres               [.] ResourceOwnerForgetRelationRef
> >  12.87%  postgres               [.] find_all_inheritors
> >   7.90%  postgres               [.] get_tabstat_entry
> >   6.68%  postgres               [.] ResourceOwnerForgetTupleDesc
> >   1.17%  postgres               [.] hash_search_with_hash_value
> >  ... < 1% ...
> >
> > I would like to suggest a patch (see attachment) witch fixes this
> > bottleneck. Also I discovered that there is a lot of code
> > duplication in ResourceOwner. Patch fixes this too. The idea is
> > quite simple. I just replaced arrays with something that could be
> > considered hash tables, but with some specific optimizations.
> >
> > After applying this patch we can see that bottleneck is gone:
> >
> >  42.89%  postgres               [.] list_nth
> >  18.30%  postgres               [.] find_all_inheritors
> >  10.97%  postgres               [.] get_tabstat_entry
> >   1.82%  postgres               [.] hash_search_with_hash_value
> >   1.21%  postgres               [.] SearchCatCache
> >  ... < 1% ...
> >
> > For tables with thousands partitions we gain in average 32.5% more
> > TPS.
> >
> > As far as I can see in the same time patch doesn't make anything
> > worse. `make check` passes with asserts enabled and disabled. There
> > is no performance degradation according to both standard pgbench
> > benchmark and benchmark described above for tables with 10 and 100
> > partitions.
>
> I do think it's interesting to try to speed up the ResourceOwner
> mechanism when there are many resources owned.  We've had some forays
> in that direction in the past, and I think it's useful to continue
> that work.  But I also think that this patch, while interesting, is
> not something we can seriously consider committing in its current
> form, because:
>
> 1. It lacks adequate comments and submission notes to explain the
> general theory of operation of the patch.
>
> 2. It seems to use uint8 * rather freely as a substitute for char * or
> void *, which doesn't seem like a great idea.
>
> 3. It doesn't follow PostgreSQL formatting conventions (uncuddled
> curly braces, space after if and before open paren, etc.).
>
> 4. It mixes together multiple ideas in a single patch, not only
> introducing a hashing concept but also striping a brand-new layer of
> abstraction across the resource-owner mechanism.  I am not sure that
> layer of abstraction is a very good idea, but if it needs to be done,
> I think it should be a separate patch.
>


Attachment

pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: W-TinyLfu for cache eviction
Next
From: Alexander Korotkov
Date:
Subject: Re: WIP: Rework access method interface