Re: Patch: ResourceOwner optimization for tables with many partitions - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Patch: ResourceOwner optimization for tables with many partitions
Date
Msg-id 20151224111925.56e267e9@fujitsu
Whole thread Raw
In response to Re: Patch: ResourceOwner optimization for tables with many partitions  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Patch: ResourceOwner optimization for tables with many partitions
List pgsql-hackers
I believe I fixed all flaws mentioned so far (see attachment).

Also I did a new benchmark to make sure that new patch makes PostgreSQL
work faster and doesn't cause performance degradation in some cases.

"usual pgbench" row corresponds to `pgbench -j 8 -c 8 -T 30 pgbench`
performed on a 4-core PC.

"N partitions" rows correspond to a benchmark described in a first
message of this thread performed on the same PC. N is and argument
given to gen.pl script i.e. number of partitions in generated
partitioned table.

Here are results (3 tests, TPS excluding connections establishing):

Test            |  Before   |   After   |  Delta avg
----------------|-----------|-----------|-------------
                |    295.7  |    295.0  |
usual pgbench   |    303.1  |    299.6  |   ~ 0%
                |    297.7  |    302.7  |
----------------|-----------|-----------|-------------
                |  28022.3  |  27956.1  |
10 partitions   |  27550.1  |  28916.9  |   ~ 0%
                |  28617.0  |  28022.9  |
----------------|-----------|-----------|-------------
                |   3021.4  |   3184.0  |
100 partitions  |   2949.1  |   3120.1  | 3% more TPS
                |   2870.6  |   2825.2  |
----------------|-----------|-----------|-------------
                |    106.7  |    158.6  |
1000 partitions |    105.2  |    168.4  | 53% more TPS
                |    105.9  |    162.0  |


On Fri, 18 Dec 2015 13:39:01 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Haas <robertmhaas@gmail.com> writes:
> > I don't know, I'm still not very comfortable with this.  And Tom
> > didn't like dictating that hash_any() must be no-fail, though I'm
> > not sure why.
>
> What I definitely didn't like was assuming at a distance that it would
> be no-fail.  If we're to depend on that, the patch had better attach
> a comment saying so to the header comments of the function(s) it's
> assuming that about.  Otherwise, somebody could hack up hashfunc.c
> in a way that breaks the assumption, without any clue that some code
> in a very-far-away module is critically reliant on it.
>
> > Let's wait to see what others think.
>
> A few observations:
>
> * This bit is too cute by half, if not three-quarters:
>
> +    uint32        itemsizelg:2;    /* sizeof one
> item log 2 */
> +    uint32        capacity:30;    /* capacity of
> array */
>
> Is there a good reason to assume that the only things we'll ever store
> in these arrays are of size no more than 8 bytes?  Are we so desperate
> to save space that we cannot spare two separate words for itemsize and
> capacity?  (ISTM it's a good bet that the extra code for accessing
> these bitfields occupies more space than would be saved, considering
> how few ResourceOwners typically exist at one time.)  Let's just make
> it a couple of ints and be done.  Actually, maybe nitems and capacity
> should be size_t, just in case.
>
> * An alternative design would be to forget itemsizelg altogether and
> insist that everything stored in the resource arrays be a Datum,
> which could then be coerced to/from some form of integer or some form
> of pointer as appropriate.  That would waste some space in the int
> case, but it would considerably simplify both the ResourceArray code
> and the APIs to it, which might be worth the price of assuming we'll
> never store anything bigger than 8 bytes.  It also would make this
> look more like some existing APIs such as the on_exit callbacks.
>
> * A lot of the code churn comes from the insistence on defining
> callbacks, which I'm dubious that we need.  We could instead have a
> function that is "get any convenient one of the array elements" and
> revise the loops in ResourceOwnerReleaseInternal to be like
>
>     while ((item = getconvenientitem(resourcearray)))
>     {
>         drop item in exactly the same way as before
>     }
>
> I find that preferable to the proposed ResourceArrayRemoveAll
>
> +    while (resarr->nitems > 0)
> +    {
> +        releasecb(resarr->itemsarr, isCommit);
> +    }
>
> which certainly looks like it's an infinite loop; it's assuming (again
> with no documentation) that the callback function will cause the array
> to get smaller somehow.  With the existing coding, it's much more
> clear why we think the loops will terminate.
>
> * The reason that ResourceOwnerReleaseInternal was not horribly
> inefficient was that its notion of "any convenient one" of the items
> to be deleted next was in fact the one that the corresponding Forget
> function would examine first, thus avoiding an O(N^2) cost to
> re-identify the item to be dropped.  I think we should make an effort
> to be more explicit about that connection in any rewrite.  In
> particular, it looks to me like when a hash array is in use, things
> will get slower not faster because we'll be adding a hash lookup step
> to each forget operation.  Maybe we should consider adjusting the
> APIs so that that can be avoided.  Or possibly we could have internal
> state in the ResourceArrays that says "we expect this item to be
> dropped in a moment, check that before going to the trouble of a hash
> lookup".
>
> * Actually, I'm not convinced that the proposed reimplementation of
> ResourceArrayRemove isn't horribly slow much of the time.  It sure
> looks like it could degrade to a linear search very easily.
>
> * I still say that the assumption embodied as
> RESOURCE_ARRAY_ZERO_ELEMENT (ie that no valid entry is all-zero-bits)
> is pretty unacceptable.  It might work for pointers, but I don't like
> it for resources represented by integer indexes.
>
>             regards, tom lane
>
>


Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Typo in pg_rewind.sgml
Next
From: Dave Page
Date:
Subject: Re: Remove Windows crash dump support?