Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
Date
Msg-id 5d4e500b-8697-e474-b507-93319566a00c@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Yura Sokolov <funny.falcon@gmail.com>)
Responses Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
List pgsql-hackers
On 03/17/2018 12:03 AM, Yura Sokolov wrote:
> 16.03.2018 04:23, Tomas Vondra пишет:
>>
>> ...
>>
>> OK, a few more comments.
>>
>> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
>> my_log2 (that is, we could just call the existing function).
> 
> Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
> more call be more expensive than loop itself?
> 

I very much doubt it there would be a measurable difference. Firstly,
function calls are not that expensive, otherwise we'd be running around
and removing function calls from the hot paths. Secondly, the call only
happens with many XIDs, and in that case the cost should be out-weighted
by faster lookups. And finally, the function does stuff that seems far
more expensive than a single function call (for example allocation,
which may easily trigger a malloc).

In fact, in the interesting cases it's pretty much guaranteed to hit a
malloc, because the number of backend processes needs to be high enough
(say, 256 or more), which means

    GetMaxSnapshotSubxidCount()

will translate to something like

    ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
    = (64 + 1) * 256
    = 16640

and because XIDs are 4B each, that's ~65kB of memory (even ignoring the
ExtendXipSizeForHash business). And aset.c treats chunks larger than 8kB
as separate blocks, that are always malloc-ed independently.

But I may be missing something, and the extra call actually makes a
difference. But I think the onus of proving that is on you, and the
default should be not to duplicate code.

>> 2) Apparently xhlog/subxhlog fields are used for two purposes - to
>> store log2 of the two XID counts, and to remember if the hash table
>> is built. That's a bit confusing (at least it should be explained
>> in a comment) but most importantly it seems a bit unnecessary.
> 
> Ok, I'll add comment.
> 
>> I assume it was done to save space, but I very much doubt that
>> makes any difference. So we could easily keep a separate flag. I'm
>> pretty sure there are holes in the SnapshotData struct, so we could
>> squeeze it the flags in one of those.
> 
> There's hole just right after xhlog. But it will be a bit strange to 
> move fields around.
> 

Is SnapshotData really that sensitive to size increase? I have my doubts
about that, TBH. The default stance should be to make the code easy to
understand. That is, we should only move fields around if it actually
makes a difference.

>> But do we even need a separate flag? We could just as easily keep 
>> the log2 fields set to 0 and only set them after actually building 
>> the hash table.
> 
> There is a need to signal that there is space for hash. It is not
> always allocated. iirc, I didn't cover the case where snapshot were
> restored from file, and some other place or two.
> Only if all places where snapshot is allocated are properly modified
> to allocate space for hash, then flag could be omitted, and log2
> itself used as a flag.
> 

Hmmm, that makes it a bit inconsistent, I guess ... why not to do the
same thing on all those places?

>> Or even better, why not to store the mask so that XidInXip does not
>> need to compute it over and over (granted, that's uint32 instead
>> of just uint8, but I don't think SnapshotData is particularly
>> sensitive to this, especially considering how much larger the xid
>> hash table is).
> 
> I don't like unnecessary sizeof struct increase. And I doubt that 
> computation matters. I could be mistaken though, because it is hot
> place. Do you think it will significantly improve readability?
> 

IMHO the primary goal is to make the code easy to read and understand,
and only optimize struct size if it actually makes a difference. We have
no such proof here, and I very much doubt you'll be able to show any
difference because even with separate flags pahole says this:

struct SnapshotData {
    SnapshotSatisfiesFunc      satisfies;            /*     0     8 */
    TransactionId              xmin;                 /*     8     4 */
    TransactionId              xmax;                 /*    12     4 */
    TransactionId *            xip;                  /*    16     8 */
    uint32                     xcnt;                 /*    24     4 */
    uint8                      xhlog;                /*    28     1 */

    /* XXX 3 bytes hole, try to pack */

    TransactionId *            subxip;               /*    32     8 */
    int32                      subxcnt;              /*    40     4 */
    uint8                      subxhlog;             /*    44     1 */
    bool                       suboverflowed;        /*    45     1 */
    bool                       takenDuringRecovery;  /*    46     1 */
    bool                       copied;               /*    47     1 */
    CommandId                  curcid;               /*    48     4 */
    uint32                     speculativeToken;     /*    52     4 */
    uint32                     active_count;         /*    56     4 */
    uint32                     regd_count;           /*    60     4 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    pairingheap_node           ph_node;              /*    64    24 */
    TimestampTz                whenTaken;            /*    88     8 */
    XLogRecPtr                 lsn;                  /*    96     8 */

    /* size: 104, cachelines: 2, members: 19 */
    /* sum members: 101, holes: 1, sum holes: 3 */
    /* last cacheline: 40 bytes */
};

That is, the whole structure is only 104B - had it got over 128B, it
might have made a difference due to extra cacheline. In fact, even if
you make the xhlog and subxhlog uint32 (which would be enough to store
the mask, which is what simplehash does), it'd be only 120B.

Based on this I'd dare to say neither of those changes would have
negative impact.

So I suggest to split each of the "combined" fields into a simple bool
flag ("hash table built") and a uint32 mask, and see if it simplifies
the code (I believe it will) and what impact it has (I believe there
will be none).

Sorry if this seems like a bikeshedding ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Yura Sokolov
Date:
Subject: Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
Next
From: Tomas Vondra
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries