Re: optimize lookups in snapshot [sub]xip arrays - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: optimize lookups in snapshot [sub]xip arrays
Date
Msg-id 20220714180938.GA3125784@nathanxps13
Whole thread Raw
In response to Re: optimize lookups in snapshot [sub]xip arrays  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi Bharath,

Thanks for taking a look.

On Thu, Jul 14, 2022 at 03:10:56PM +0530, Bharath Rupireddy wrote:
> Aren't these snapshot arrays always sorted? I see the following code:
> 
> /* sort so we can bsearch() */
> qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
> 
> /* sort so we can bsearch() later */
> qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);

AFAICT these arrays are sorted in limited cases, such as
pg_current_snapshot() and logical replication.  GetSnapshotData() does not
appear to sort them, so I don't think we can always assume they are sorted.
In the previous thread, Tomas analyzed simply sorting the arrays [0] and
found that it provided much less improvement compared to the hash table
approach, so I have not seriously considered it here.

> If the ordering isn't an invariant of these snapshot arrays, can we
> also use the hash table mechanism for all of the snapshot arrays
> infrastructure rather than qsort+bsearch in a few places and hash
> table for others?

Unless there is demonstrable benefit in doing so for the few places that
sort the arrays, I'm ѕkeptical it's worth the complexity.  This patch is
targeted to XidInMVCCSnapshot(), which we can demonstrate has clear impact
on TPS for some workloads.

> + * The current value worked well in testing, but it's still mostly a guessed-at
> + * number that might need updating in the future.
> + */
> +#define XIP_HASH_MIN_ELEMENTS (128)
> +
> 
> Do you see a regression with a hash table for all the cases? Why can't
> we just build a hash table irrespective of these limits and use it for
> all the purposes instead of making it complex with different
> approaches if we don't have measurable differences in the performance
> or throughput?

I performed the same tests as before with a variety of values.  Here are
the results:

     writers  HEAD  1   16  32  64  128
    ------------------------------------
     16       659   698 678 659 665 664
     32       645   661 688 657 649 663
     64       659   656 653 649 663 692
     128      641   636 639 679 643 716
     256      619   641 619 643 653 610
     512      530   609 582 602 605 702
     768      469   610 608 551 571 582
     1000     367   610 538 557 556 577

I was surpised to see that there really wasn't a regression at the low end,
but keep in mind that this is a rather large machine and a specialized
workload for generating snapshots with long [sub]xip arrays.  That being
said, there really wasn't any improvement at the low end, either.  If we
always built a hash table, we'd be introducing more overhead and memory
usage in return for approximately zero benefit.  My intent was to only take
on the overhead in cases where we believe it might have a positive impact,
which is why I picked the somewhat conservative value of 128.  If the
overhead isn't a concern, it might be feasible to always make [sub]xip a
hash table.

> +static inline bool
> +XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt,
> + xip_hash_hash **xiph)
> 
> + /* Make sure the hash table is built. */
> + if (*xiph == NULL)
> + {
> + *xiph = xip_hash_create(TopTransactionContext, xcnt, NULL);
> +
> + for (int i = 0; i < xcnt; i++)
> 
> Why create a hash table on the first search? Why can't it be built
> while inserting or creating these snapshots? Basically, instead of the
> array, can these snapshot structures be hash tables by themselves? I
> know this requires a good amount of code refactoring, but worth
> considering IMO as it removes bsearch thus might improve the
> performance further.

The idea is to avoid the overhead unless something actually needs to
inspect these arrays.

[0] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: Nathan Bossart
Date:
Subject: Re: Pre-allocating WAL files