Thread: Patch: ResourceOwner optimization for tables with many partitions

Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Hello all,

Current implementation of ResourceOwner uses arrays to store resources
like TupleDescs, Snapshots, etc. When we want to release one of these
resources ResourceOwner finds it with linear scan. Granted, resource
array are usually small so its not a problem most of the time. But it
appears to be a bottleneck when we are working with tables which have a
lot of partitions.

To reproduce this issue:
1. run `./gen.pl 10000 | psql my_database postgres`
2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
3. in second terminal run `sudo perf top -u postgres`

Both gen.pl and q.sql are attached to this message.

You will see that postgres spends a lot of time in ResourceOwnerForget*
procedures:

 32.80%  postgres               [.] list_nth
 20.29%  postgres               [.] ResourceOwnerForgetRelationRef
 12.87%  postgres               [.] find_all_inheritors
  7.90%  postgres               [.] get_tabstat_entry
  6.68%  postgres               [.] ResourceOwnerForgetTupleDesc
  1.17%  postgres               [.] hash_search_with_hash_value
 ... < 1% ...

I would like to suggest a patch (see attachment) witch fixes this
bottleneck. Also I discovered that there is a lot of code duplication in
ResourceOwner. Patch fixes this too. The idea is quite simple. I just
replaced arrays with something that could be considered hash tables,
but with some specific optimizations.

After applying this patch we can see that bottleneck is gone:

 42.89%  postgres               [.] list_nth
 18.30%  postgres               [.] find_all_inheritors
 10.97%  postgres               [.] get_tabstat_entry
  1.82%  postgres               [.] hash_search_with_hash_value
  1.21%  postgres               [.] SearchCatCache
 ... < 1% ...

For tables with thousands partitions we gain in average 32.5% more TPS.

As far as I can see in the same time patch doesn't make anything worse.
`make check` passes with asserts enabled and disabled. There is no
performance degradation according to both standard pgbench benchmark
and benchmark described above for tables with 10 and 100 partitions.

Best regards,
Aleksander
Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Robert Haas
Date:
On Fri, Dec 4, 2015 at 7:15 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Current implementation of ResourceOwner uses arrays to store resources
> like TupleDescs, Snapshots, etc. When we want to release one of these
> resources ResourceOwner finds it with linear scan. Granted, resource
> array are usually small so its not a problem most of the time. But it
> appears to be a bottleneck when we are working with tables which have a
> lot of partitions.
>
> To reproduce this issue:
> 1. run `./gen.pl 10000 | psql my_database postgres`
> 2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
> 3. in second terminal run `sudo perf top -u postgres`
>
> Both gen.pl and q.sql are attached to this message.
>
> You will see that postgres spends a lot of time in ResourceOwnerForget*
> procedures:
>
>  32.80%  postgres               [.] list_nth
>  20.29%  postgres               [.] ResourceOwnerForgetRelationRef
>  12.87%  postgres               [.] find_all_inheritors
>   7.90%  postgres               [.] get_tabstat_entry
>   6.68%  postgres               [.] ResourceOwnerForgetTupleDesc
>   1.17%  postgres               [.] hash_search_with_hash_value
>  ... < 1% ...
>
> I would like to suggest a patch (see attachment) witch fixes this
> bottleneck. Also I discovered that there is a lot of code duplication in
> ResourceOwner. Patch fixes this too. The idea is quite simple. I just
> replaced arrays with something that could be considered hash tables,
> but with some specific optimizations.
>
> After applying this patch we can see that bottleneck is gone:
>
>  42.89%  postgres               [.] list_nth
>  18.30%  postgres               [.] find_all_inheritors
>  10.97%  postgres               [.] get_tabstat_entry
>   1.82%  postgres               [.] hash_search_with_hash_value
>   1.21%  postgres               [.] SearchCatCache
>  ... < 1% ...
>
> For tables with thousands partitions we gain in average 32.5% more TPS.
>
> As far as I can see in the same time patch doesn't make anything worse.
> `make check` passes with asserts enabled and disabled. There is no
> performance degradation according to both standard pgbench benchmark
> and benchmark described above for tables with 10 and 100 partitions.

I do think it's interesting to try to speed up the ResourceOwner
mechanism when there are many resources owned.  We've had some forays
in that direction in the past, and I think it's useful to continue
that work.  But I also think that this patch, while interesting, is
not something we can seriously consider committing in its current
form, because:

1. It lacks adequate comments and submission notes to explain the
general theory of operation of the patch.

2. It seems to use uint8 * rather freely as a substitute for char * or
void *, which doesn't seem like a great idea.

3. It doesn't follow PostgreSQL formatting conventions (uncuddled
curly braces, space after if and before open paren, etc.).

4. It mixes together multiple ideas in a single patch, not only
introducing a hashing concept but also striping a brand-new layer of
abstraction across the resource-owner mechanism.  I am not sure that
layer of abstraction is a very good idea, but if it needs to be done,
I think it should be a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Hello, Robert

Thanks for your review. I believe I fixed items 1, 2 and 3 (see
attachment). Also I would like to clarify item 4.

> 4. It mixes together multiple ideas in a single patch, not only
> introducing a hashing concept but also striping a brand-new layer of
> abstraction across the resource-owner mechanism.  I am not sure that
> layer of abstraction is a very good idea, but if it needs to be done,
> I think it should be a separate patch.

Do I right understand that you suggest following?

Current patch should be split in two parts. In first patch we create
and use ResourceArray with array-based implementation (abstraction
layer). Then we apply second patch which change ResourceArray
implementation to hashing based (optimization).

Best regards,
Aleksander

On Tue, 8 Dec 2015 17:30:33 -0500
Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Dec 4, 2015 at 7:15 AM, Aleksander Alekseev
> <a.alekseev@postgrespro.ru> wrote:
> > Current implementation of ResourceOwner uses arrays to store
> > resources like TupleDescs, Snapshots, etc. When we want to release
> > one of these resources ResourceOwner finds it with linear scan.
> > Granted, resource array are usually small so its not a problem most
> > of the time. But it appears to be a bottleneck when we are working
> > with tables which have a lot of partitions.
> >
> > To reproduce this issue:
> > 1. run `./gen.pl 10000 | psql my_database postgres`
> > 2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
> > 3. in second terminal run `sudo perf top -u postgres`
> >
> > Both gen.pl and q.sql are attached to this message.
> >
> > You will see that postgres spends a lot of time in
> > ResourceOwnerForget* procedures:
> >
> >  32.80%  postgres               [.] list_nth
> >  20.29%  postgres               [.] ResourceOwnerForgetRelationRef
> >  12.87%  postgres               [.] find_all_inheritors
> >   7.90%  postgres               [.] get_tabstat_entry
> >   6.68%  postgres               [.] ResourceOwnerForgetTupleDesc
> >   1.17%  postgres               [.] hash_search_with_hash_value
> >  ... < 1% ...
> >
> > I would like to suggest a patch (see attachment) witch fixes this
> > bottleneck. Also I discovered that there is a lot of code
> > duplication in ResourceOwner. Patch fixes this too. The idea is
> > quite simple. I just replaced arrays with something that could be
> > considered hash tables, but with some specific optimizations.
> >
> > After applying this patch we can see that bottleneck is gone:
> >
> >  42.89%  postgres               [.] list_nth
> >  18.30%  postgres               [.] find_all_inheritors
> >  10.97%  postgres               [.] get_tabstat_entry
> >   1.82%  postgres               [.] hash_search_with_hash_value
> >   1.21%  postgres               [.] SearchCatCache
> >  ... < 1% ...
> >
> > For tables with thousands partitions we gain in average 32.5% more
> > TPS.
> >
> > As far as I can see in the same time patch doesn't make anything
> > worse. `make check` passes with asserts enabled and disabled. There
> > is no performance degradation according to both standard pgbench
> > benchmark and benchmark described above for tables with 10 and 100
> > partitions.
>
> I do think it's interesting to try to speed up the ResourceOwner
> mechanism when there are many resources owned.  We've had some forays
> in that direction in the past, and I think it's useful to continue
> that work.  But I also think that this patch, while interesting, is
> not something we can seriously consider committing in its current
> form, because:
>
> 1. It lacks adequate comments and submission notes to explain the
> general theory of operation of the patch.
>
> 2. It seems to use uint8 * rather freely as a substitute for char * or
> void *, which doesn't seem like a great idea.
>
> 3. It doesn't follow PostgreSQL formatting conventions (uncuddled
> curly braces, space after if and before open paren, etc.).
>
> 4. It mixes together multiple ideas in a single patch, not only
> introducing a hashing concept but also striping a brand-new layer of
> abstraction across the resource-owner mechanism.  I am not sure that
> layer of abstraction is a very good idea, but if it needs to be done,
> I think it should be a separate patch.
>


Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Robert Haas
Date:
On Wed, Dec 9, 2015 at 8:30 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Hello, Robert
>
> Thanks for your review. I believe I fixed items 1, 2 and 3 (see
> attachment). Also I would like to clarify item 4.
>
>> 4. It mixes together multiple ideas in a single patch, not only
>> introducing a hashing concept but also striping a brand-new layer of
>> abstraction across the resource-owner mechanism.  I am not sure that
>> layer of abstraction is a very good idea, but if it needs to be done,
>> I think it should be a separate patch.
>
> Do I right understand that you suggest following?
>
> Current patch should be split in two parts. In first patch we create
> and use ResourceArray with array-based implementation (abstraction
> layer). Then we apply second patch which change ResourceArray
> implementation to hashing based (optimization).

Well, sorta.  To be honest, I think this patch is really ugly.  If we
were going to do this then, yes, I would want to split the patch into
two parts along those lines.  But actually I don't really want to do
it this way at all.  It's not that I don't want the performance
benefits: I do.  But the current code is really easy to read and
extremely simple, and this changes it into something that is a heck of
a lot harder to read and understand.  I'm not sure exactly what to do
about that, but it seems like a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Well, sorta.  To be honest, I think this patch is really ugly.  If we
> were going to do this then, yes, I would want to split the patch into
> two parts along those lines.  But actually I don't really want to do
> it this way at all.  It's not that I don't want the performance
> benefits: I do.  But the current code is really easy to read and
> extremely simple, and this changes it into something that is a heck of
> a lot harder to read and understand.  I'm not sure exactly what to do
> about that, but it seems like a problem.

I haven't read the patch in any detail, but keep in mind that part of the
API contract for ResourceOwners is that ResourceOwnerRememberFoo() cannot
fail.  Period.  So any patch that makes those functions less than trivial
is going to be really suspect from a reliability standpoint.  It would be
advisable for example that hash_any not suddenly become covered by the
"must not fail" requirement.

BTW, I do not think you can get away with the requirement that all-zeroes
isn't a valid resource representation.  It might be okay today but it's
hardly future-proof.
        regards, tom lane



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Robert Haas
Date:
On Thu, Dec 10, 2015 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Well, sorta.  To be honest, I think this patch is really ugly.  If we
>> were going to do this then, yes, I would want to split the patch into
>> two parts along those lines.  But actually I don't really want to do
>> it this way at all.  It's not that I don't want the performance
>> benefits: I do.  But the current code is really easy to read and
>> extremely simple, and this changes it into something that is a heck of
>> a lot harder to read and understand.  I'm not sure exactly what to do
>> about that, but it seems like a problem.
>
> I haven't read the patch in any detail, but keep in mind that part of the
> API contract for ResourceOwners is that ResourceOwnerRememberFoo() cannot
> fail.  Period.  So any patch that makes those functions less than trivial
> is going to be really suspect from a reliability standpoint.  It would be
> advisable for example that hash_any not suddenly become covered by the
> "must not fail" requirement.

Hmm.  I hadn't thought about that.

I wonder if there's a simpler approach to this whole problem.  What
the patch aims to do is allow resources to be freed in arbitrary order
without slowdown.  But do we really need that?  Resource owners are
currently optimized for freeing resources in the order opposite to the
order of allocation.  I bet in this case the order in which they are
freed isn't random, but is exactly in order of allocation.  If so, we
might be able to either (a) jigger things so that freeing in order of
allocation is efficient or (b) jigger things so that the resources are
released in reverse order of allocation as the resource owner code
expects.  That might be simpler and less invasive than this fix.

Then again, it's somehow appealing for higher-level code not to have
to care about this...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
>> To be honest, I think this patch is really ugly. [...] I'm not sure
>> exactly what to do about that, but it seems like a problem.

I have an idea. There are actually two types of resources - int-like
(buffers, files) and void*-like (RelationRef, TupleDesc, ...). What if
I split ResourceArray into IntResourceArray and PointerResourceArray? It
would definitely solve ugliness problem --- no more memcpy's, char[]
buffers, etc.

>> It would be advisable for example that hash_any not suddenly become
>> covered by the "must not fail" requirement.

Frankly I can't think of any case when hash_any could or should fail.
Maybe we should just add a "must not fail" constraint to hash_any
description?

Also I could use some other hash implementation. It may be reasonable
in this case since size of data I would like to hash is small and known
in advance.

>> BTW, I do not think you can get away with the requirement that
>> all-zeroes isn't a valid resource representation. It might be okay
>> today but it's hardly future-proof.

Agree. I could store a value that should be considered as "zero" in
ResourceArray. It would be InvalidBuffer for buffers, -1 for files and
NULL for all void*-types. Does such solution sounds OK? 

Best regards,
Aleksander



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
> It would be InvalidBuffer for buffers, -1 for files and NULL for all
> void*-types. Does such solution sounds OK? 

On second thought I believe there is no reason for storing anything for
void*-like types. I could just hardcode NULL in PointerResourceArray.

Best regards,
Aleksander




Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Hello, Robert

Here is my fix for item 4.

Best regards,
Aleksander

On Thu, 10 Dec 2015 11:37:23 -0500
Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Dec 9, 2015 at 8:30 AM, Aleksander Alekseev
> <a.alekseev@postgrespro.ru> wrote:
> > Hello, Robert
> >
> > Thanks for your review. I believe I fixed items 1, 2 and 3 (see
> > attachment). Also I would like to clarify item 4.
> >
> >> 4. It mixes together multiple ideas in a single patch, not only
> >> introducing a hashing concept but also striping a brand-new layer
> >> of abstraction across the resource-owner mechanism.  I am not sure
> >> that layer of abstraction is a very good idea, but if it needs to
> >> be done, I think it should be a separate patch.
> >
> > Do I right understand that you suggest following?
> >
> > Current patch should be split in two parts. In first patch we create
> > and use ResourceArray with array-based implementation (abstraction
> > layer). Then we apply second patch which change ResourceArray
> > implementation to hashing based (optimization).
>
> Well, sorta.  To be honest, I think this patch is really ugly.  If we
> were going to do this then, yes, I would want to split the patch into
> two parts along those lines.  But actually I don't really want to do
> it this way at all.  It's not that I don't want the performance
> benefits: I do.  But the current code is really easy to read and
> extremely simple, and this changes it into something that is a heck of
> a lot harder to read and understand.  I'm not sure exactly what to do
> about that, but it seems like a problem.
>


Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Robert Haas
Date:
On Mon, Dec 14, 2015 at 6:47 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Here is my fix for item 4.

I don't know, I'm still not very comfortable with this.  And Tom
didn't like dictating that hash_any() must be no-fail, though I'm not
sure why.

Let's wait to see what others think.  I kind of hope there's a way of
getting the benefits we want here without so much code churn.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I don't know, I'm still not very comfortable with this.  And Tom
> didn't like dictating that hash_any() must be no-fail, though I'm not
> sure why.

What I definitely didn't like was assuming at a distance that it would
be no-fail.  If we're to depend on that, the patch had better attach
a comment saying so to the header comments of the function(s) it's
assuming that about.  Otherwise, somebody could hack up hashfunc.c
in a way that breaks the assumption, without any clue that some code
in a very-far-away module is critically reliant on it.

> Let's wait to see what others think.

A few observations:

* This bit is too cute by half, if not three-quarters:

+    uint32        itemsizelg:2;    /* sizeof one item log 2 */
+    uint32        capacity:30;    /* capacity of array */

Is there a good reason to assume that the only things we'll ever store
in these arrays are of size no more than 8 bytes?  Are we so desperate
to save space that we cannot spare two separate words for itemsize and
capacity?  (ISTM it's a good bet that the extra code for accessing these
bitfields occupies more space than would be saved, considering how few
ResourceOwners typically exist at one time.)  Let's just make it a couple
of ints and be done.  Actually, maybe nitems and capacity should be
size_t, just in case.

* An alternative design would be to forget itemsizelg altogether and insist
that everything stored in the resource arrays be a Datum, which could then
be coerced to/from some form of integer or some form of pointer as
appropriate.  That would waste some space in the int case, but it would
considerably simplify both the ResourceArray code and the APIs to it,
which might be worth the price of assuming we'll never store anything
bigger than 8 bytes.  It also would make this look more like some
existing APIs such as the on_exit callbacks.

* A lot of the code churn comes from the insistence on defining callbacks,
which I'm dubious that we need.  We could instead have a function that is
"get any convenient one of the array elements" and revise the loops in
ResourceOwnerReleaseInternal to be like
while ((item = getconvenientitem(resourcearray))){    drop item in exactly the same way as before}

I find that preferable to the proposed ResourceArrayRemoveAll

+    while (resarr->nitems > 0)
+    {
+        releasecb(resarr->itemsarr, isCommit);
+    }

which certainly looks like it's an infinite loop; it's assuming (again
with no documentation) that the callback function will cause the array
to get smaller somehow.  With the existing coding, it's much more clear
why we think the loops will terminate.

* The reason that ResourceOwnerReleaseInternal was not horribly
inefficient was that its notion of "any convenient one" of the items
to be deleted next was in fact the one that the corresponding Forget
function would examine first, thus avoiding an O(N^2) cost to
re-identify the item to be dropped.  I think we should make an effort
to be more explicit about that connection in any rewrite.  In particular,
it looks to me like when a hash array is in use, things will get slower
not faster because we'll be adding a hash lookup step to each forget
operation.  Maybe we should consider adjusting the APIs so that that
can be avoided.  Or possibly we could have internal state in the
ResourceArrays that says "we expect this item to be dropped in a moment,
check that before going to the trouble of a hash lookup".

* Actually, I'm not convinced that the proposed reimplementation of
ResourceArrayRemove isn't horribly slow much of the time.  It sure
looks like it could degrade to a linear search very easily.

* I still say that the assumption embodied as RESOURCE_ARRAY_ZERO_ELEMENT
(ie that no valid entry is all-zero-bits) is pretty unacceptable.  It
might work for pointers, but I don't like it for resources represented
by integer indexes.
        regards, tom lane



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
I believe I fixed all flaws mentioned so far (see attachment).

Also I did a new benchmark to make sure that new patch makes PostgreSQL
work faster and doesn't cause performance degradation in some cases.

"usual pgbench" row corresponds to `pgbench -j 8 -c 8 -T 30 pgbench`
performed on a 4-core PC.

"N partitions" rows correspond to a benchmark described in a first
message of this thread performed on the same PC. N is and argument
given to gen.pl script i.e. number of partitions in generated
partitioned table.

Here are results (3 tests, TPS excluding connections establishing):

Test            |  Before   |   After   |  Delta avg
----------------|-----------|-----------|-------------
                |    295.7  |    295.0  |
usual pgbench   |    303.1  |    299.6  |   ~ 0%
                |    297.7  |    302.7  |
----------------|-----------|-----------|-------------
                |  28022.3  |  27956.1  |
10 partitions   |  27550.1  |  28916.9  |   ~ 0%
                |  28617.0  |  28022.9  |
----------------|-----------|-----------|-------------
                |   3021.4  |   3184.0  |
100 partitions  |   2949.1  |   3120.1  | 3% more TPS
                |   2870.6  |   2825.2  |
----------------|-----------|-----------|-------------
                |    106.7  |    158.6  |
1000 partitions |    105.2  |    168.4  | 53% more TPS
                |    105.9  |    162.0  |


On Fri, 18 Dec 2015 13:39:01 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Haas <robertmhaas@gmail.com> writes:
> > I don't know, I'm still not very comfortable with this.  And Tom
> > didn't like dictating that hash_any() must be no-fail, though I'm
> > not sure why.
>
> What I definitely didn't like was assuming at a distance that it would
> be no-fail.  If we're to depend on that, the patch had better attach
> a comment saying so to the header comments of the function(s) it's
> assuming that about.  Otherwise, somebody could hack up hashfunc.c
> in a way that breaks the assumption, without any clue that some code
> in a very-far-away module is critically reliant on it.
>
> > Let's wait to see what others think.
>
> A few observations:
>
> * This bit is too cute by half, if not three-quarters:
>
> +    uint32        itemsizelg:2;    /* sizeof one
> item log 2 */
> +    uint32        capacity:30;    /* capacity of
> array */
>
> Is there a good reason to assume that the only things we'll ever store
> in these arrays are of size no more than 8 bytes?  Are we so desperate
> to save space that we cannot spare two separate words for itemsize and
> capacity?  (ISTM it's a good bet that the extra code for accessing
> these bitfields occupies more space than would be saved, considering
> how few ResourceOwners typically exist at one time.)  Let's just make
> it a couple of ints and be done.  Actually, maybe nitems and capacity
> should be size_t, just in case.
>
> * An alternative design would be to forget itemsizelg altogether and
> insist that everything stored in the resource arrays be a Datum,
> which could then be coerced to/from some form of integer or some form
> of pointer as appropriate.  That would waste some space in the int
> case, but it would considerably simplify both the ResourceArray code
> and the APIs to it, which might be worth the price of assuming we'll
> never store anything bigger than 8 bytes.  It also would make this
> look more like some existing APIs such as the on_exit callbacks.
>
> * A lot of the code churn comes from the insistence on defining
> callbacks, which I'm dubious that we need.  We could instead have a
> function that is "get any convenient one of the array elements" and
> revise the loops in ResourceOwnerReleaseInternal to be like
>
>     while ((item = getconvenientitem(resourcearray)))
>     {
>         drop item in exactly the same way as before
>     }
>
> I find that preferable to the proposed ResourceArrayRemoveAll
>
> +    while (resarr->nitems > 0)
> +    {
> +        releasecb(resarr->itemsarr, isCommit);
> +    }
>
> which certainly looks like it's an infinite loop; it's assuming (again
> with no documentation) that the callback function will cause the array
> to get smaller somehow.  With the existing coding, it's much more
> clear why we think the loops will terminate.
>
> * The reason that ResourceOwnerReleaseInternal was not horribly
> inefficient was that its notion of "any convenient one" of the items
> to be deleted next was in fact the one that the corresponding Forget
> function would examine first, thus avoiding an O(N^2) cost to
> re-identify the item to be dropped.  I think we should make an effort
> to be more explicit about that connection in any rewrite.  In
> particular, it looks to me like when a hash array is in use, things
> will get slower not faster because we'll be adding a hash lookup step
> to each forget operation.  Maybe we should consider adjusting the
> APIs so that that can be avoided.  Or possibly we could have internal
> state in the ResourceArrays that says "we expect this item to be
> dropped in a moment, check that before going to the trouble of a hash
> lookup".
>
> * Actually, I'm not convinced that the proposed reimplementation of
> ResourceArrayRemove isn't horribly slow much of the time.  It sure
> looks like it could degrade to a linear search very easily.
>
> * I still say that the assumption embodied as
> RESOURCE_ARRAY_ZERO_ELEMENT (ie that no valid entry is all-zero-bits)
> is pretty unacceptable.  It might work for pointers, but I don't like
> it for resources represented by integer indexes.
>
>             regards, tom lane
>
>


Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Oops, wrong patches - here are correct ones.
Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Stas Kelvich
Date:
Hello.

I have applied this patch and can confirm ~10% speedup by this patch in presence of big amount of inherited tables.

Test case was as suggested by Aleksander: create 1000 inherited tables, no constraints, insert a row in each one, and
issuesingle row queries over this table. 

Xeon-based server 12C/24T, 50 connections, 30-min average:

TPS, no patch: 393 tps
TPS, with patch: 441 tps

The same setup but with single table with 1000 rows give performance about 188_000 tps, so overall speed of requests
overa inherited table in this scenario is quite pathological (probably this is expected because database just execute
1000selects to each inherited table). I've also tried to set range constraints for all inherited tables, so only one
tablewas affected by query, but planning time increased a lot and total tps was again about 500 tps. 

Also attaching two flame graphs measured during tests. It’s clearly visible that PortalDrop takes x4 less time after
patch.

Stas.


> On 24 Dec 2015, at 12:24, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
>
> Oops, wrong patches - here are correct
ones.<resource-owner-optimization-v4-step1.patch><resource-owner-optimization-v4-step2.patch>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Tom Lane
Date:
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
> Oops, wrong patches - here are correct ones.

This is certainly not doing what I had in mind for communication
between ResourceOwnerGetAny and ResourceOwnerRelease, because the
latter pays zero attention to "lastidx", but instead does a blind
hash search regardless.

In general, I'm suspicious of the impact of this patch on "normal"
cases with not-very-large resource arrays.  It might be hard to
measure that because in such cases resowner.c is not a bottleneck;
but that doesn't mean I want to give up performance in cases that
perform well today.  You could probably set up a test harness that
exercises ResourceOwnerAdd/Release/etc in isolation and get good
timings for them that way, to confirm or disprove whether there's
a performance loss for small arrays.

I still suspect it would be advantageous to have two different operating
modes depending on the size of an individual resource array, and not
bother with hashing until you got to more than a couple dozen entries.
Given that you're rehashing anyway when you enlarge the array, this
wouldn't cost anything except a few more lines of code, ISTM --- and
you already have a net code savings because of the refactoring.

Also, there are defined ways to convert between Datum format and
other formats (DatumGetPointer() etc).  This code isn't using 'em.
        regards, tom lane



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Hello, Tom

> Also, there are defined ways to convert between Datum format and
> other formats (DatumGetPointer() etc).  This code isn't using 'em.

Fixed. But I was not sure how to deal with File and Buffer types since
they are ints (see fd.h and buf.h) and there is no DatumGetInt macro,
only DatumGetInt32/Int64. I don't know if there is a good reason for
this. So for now I just added these definitions right into resowner.c:

```
/* Convert File to Datum */
#define FileGetDatum(file) ((Datum)(file))

/* Convert Datum to File */
#define DatumGetFile(datum)((File)(datum))

/* Convert Buffer to Datum */
#define BufferGetDatum(buffer)((Datum)(buffer))

/* Convert Datum to Buffer */
#define DatumGetBuffer(datum)((Buffer)(datum))
```

... to make all code look similar and all intentions --- clear.

I have a feeling you could suggest a better solution :)

> This is certainly not doing what I had in mind for communication
> between ResourceOwnerGetAny and ResourceOwnerRelease, because the
> latter pays zero attention to "lastidx", but instead does a blind
> hash search regardless.
>
> In general, I'm suspicious of the impact of this patch on "normal"
> cases with not-very-large resource arrays.  It might be hard to
> measure that because in such cases resowner.c is not a bottleneck;
> but that doesn't mean I want to give up performance in cases that
> perform well today.  You could probably set up a test harness that
> exercises ResourceOwnerAdd/Release/etc in isolation and get good
> timings for them that way, to confirm or disprove whether there's
> a performance loss for small arrays.
>
> I still suspect it would be advantageous to have two different
> operating modes depending on the size of an individual resource
> array, and not bother with hashing until you got to more than a
> couple dozen entries. Given that you're rehashing anyway when you
> enlarge the array, this wouldn't cost anything except a few more
> lines of code, ISTM --- and you already have a net code savings
> because of the refactoring.

To be honest I don't have much faith in such micro benchmarks. They
don't consider how code will be executed under real load. Therefore any
results of such a benchmark wouldn't give us a clear answer which
solution is preferable. Besides different users can execute the same
code in different fashion.

I compared two implementations - "always use hashing" (step2a.path) and
"use hashing only for large arrays" (step2b.path). Both patches give
the same performance according to benchmark I described in a first
message of this thread.

Since none of these patches is better in respect of problem I'm trying
to solve I suggest to use step2b.path. It seems to be a good idea not to
use hashing for searching in small arrays. Besides in this case
behaviour of PostgreSQL will be changed less after applying a patch.
Users will probably appreciate this.

Best regards,
Aleksander
Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Tom Lane
Date:
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
> I compared two implementations - "always use hashing" (step2a.path) and
> "use hashing only for large arrays" (step2b.path). Both patches give
> the same performance according to benchmark I described in a first
> message of this thread.

Um, that's not too surprising, because they're exactly the same patch?
        regards, tom lane



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
> Um, that's not too surprising, because they're exactly the same patch?

Wrong diff. Here is correct one.

Everything else is right. I just re-checked :)

step2a:

number of transactions actually processed: 16325
latency average: 49.008 ms
latency stddev: 8.780 ms
tps = 163.182594 (including connections establishing)
tps = 163.189818 (excluding connections establishing)

step2b-fixed:

number of transactions actually processed: 16374
latency average: 48.867 ms
latency stddev: 8.223 ms
tps = 163.658269 (including connections establishing)
tps = 163.666318 (excluding connections establishing)


Attachment

Re: Patch: ResourceOwner optimization for tables with many partitions

From
Tom Lane
Date:
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
>> Um, that's not too surprising, because they're exactly the same patch?

> Wrong diff. Here is correct one. 

This still had quite a few bugs, but I fixed them (hope I caught
everything) and pushed it.

I did some performance testing of the ResourceArray code in isolation.
It appears that you need 100 or so elements before the hash code path
is competitive with the array path, so I increased the cutover point
quite a bit from what you had.
        regards, tom lane



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Aleksander Alekseev
Date:
Hello, Tom.

I'm a bit concerned regarding assumption that sizeof int never exceeds 4
bytes. While this could be true today for most C compilers, standard
[1][2] doesn't guarantee that. Perhaps we should add something like:

StaticAssertStmt(sizeof(int) <= sizeof(int32),   "int size exceeds int32 size");

It costs nothing but could save a lot of time (not mentioning data
loss) some unlucky user.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf



Re: Patch: ResourceOwner optimization for tables with many partitions

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 3:57 AM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> I'm a bit concerned regarding assumption that sizeof int never exceeds 4
> bytes. While this could be true today for most C compilers, standard
> [1][2] doesn't guarantee that. Perhaps we should add something like:
>
> StaticAssertStmt(sizeof(int) <= sizeof(int32),
>     "int size exceeds int32 size");

I suspect that if this ever failed to be true, resowner.c would not be
the only thing having problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company