Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
Date
Msg-id 057a9a95-19d2-05f0-17e2-f46ff20e9b3e@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Yura Sokolov <funny.falcon@gmail.com>)
Responses Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Yura Sokolov <funny.falcon@gmail.com>)
List pgsql-hackers
On 03/06/2018 06:23 AM, Yura Sokolov wrote:
> 05.03.2018 18:00, Tom Lane пишет:
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> Snapshots are static (we don't really add new XIDs into existing ones,
>>> right?), so why don't we simply sort the XIDs and then use bsearch to
>>> lookup values? That should fix the linear search, without need for any
>>> local hash table.
>>
>> +1 for looking into that, since it would avoid adding any complication
>> to snapshot copying.  In principle, with enough XIDs in the snap, an
>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
>> dubious that we are often in the range where that would matter.
>> We do need to worry about the cost of snapshot copying, too.
> 
> Sorting - is the first thing I've tried. Unfortunately, sorting
> itself eats too much cpu. Filling hash table is much faster.
> 

I've been interested in the sort-based approach, so I've spent a bit of
time hacking on it (patch attached). It's much less invasive compared to
the hash-table, but Yura is right the hashtable gives better results.

I've tried to measure the benefits using the same script I shared on
Tuesday, but I kept getting really strange numbers. In fact, I've been
unable to even reproduce the results I shared back then. And after a bit
of head-scratching I think the script is useless - it can't possibly
generate snapshots with many XIDs because all the update threads sleep
for exactly the same time. And first one to sleep is first one to wake
up and commit, so most of the other XIDs are above xmax (and thus not
included in the snapshot). I have no idea why I got the numbers :-/

But with this insight, I quickly modified the script to also consume
XIDs by another thread (which simply calls txid_current). With that I'm
getting snapshots with as many XIDs as there are UPDATE clients (this
time I actually checked that using gdb).

And for a 60-second run the tps results look like this (see the attached
chart, showing the same data):

    writers     master      hash       sort
   -----------------------------------------
    16           1068       1057       1070
    32           1005        995       1033
    64           1064       1042       1117
    128          1058       1021       1051
    256           977       1004        928
    512           773        935        808
    768           576        815        670
    1000          413        752        573

The sort certainly does improve performance compared to master, but it's
only about half of the hashtable improvement.

I don't know how much this simple workload resembles the YCSB benchmark,
but I seem to recall it's touching individual rows. In which case it's
likely worse due to the pg_qsort being more expensive than building the
hash table.

So I agree with Yura the sort is not a viable alternative to the hash
table, in this case.

> Can I rely on snapshots being static? May be then there is no need
> in separate raw representation and hash table. I may try to fill hash
> table directly in GetSnapshotData instead of lazy filling. Though it
> will increase time under lock, so it is debatable and should be
> carefully measured.
> 

Yes, I think you can rely on snapshots not being modified later. That's
pretty much the definition of a snapshot.

You may do that in GetSnapshotData, but you certainly can't do that in
the for loop there. Not only we don't want to add more work there, but
you don't know the number of XIDs in that loop. And we don't want to
build hash tables for small number of XIDs.

One reason against building the hash table in GetSnapshotData is that
we'd build it even when the snapshot is never queried. Or when it is
queried, but we only need to check xmin/xmax.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] Subscription code improvements
Next
From: Peter Eisentraut
Date:
Subject: Re: INOUT parameters in procedures