Re: Speed up transaction completion faster after many relations are accessed in a transaction - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date
Msg-id 3709.1563898420@sss.pgh.pa.us
Whole thread Raw
In response to Re: Speed up transaction completion faster after many relations areaccessed in a transaction  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Speed up transaction completion faster after many relations areaccessed in a transaction
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached v7, which really is v6 with some comments adjusted and
> the order of the hash_get_num_entries and hash_get_max_bucket function
> calls swapped.  I think hash_get_num_entries() will return 0 most of
> the time where we're calling it, so it makes sense to put the
> condition that's less likely to be true first in the if condition.

> I'm pretty happy with v7 now. If anyone has any objections to it,
> please speak up very soon.  Otherwise, I plan to push it about this
> time tomorrow.

I dunno, this seems close to useless in this form.  As it stands,
once hash_get_max_bucket has exceeded the threshold, you will
arbitrarily reset the table 1000 transactions later (since the
max bucket is certainly not gonna decrease otherwise).  So that's
got two big problems:

1. In the assumed-common case where most of the transactions take
few locks, you wasted cycles for 999 transactions.

2. You'll reset the table independently of subsequent history,
even if the session's usage is pretty much always over the
threshold.  Admittedly, if you do this only once per 1K
transactions, it's probably not a horrible overhead --- but you
can't improve point 1 without making it a bigger overhead.

I did complain about the cost of tracking the stats proposed by
some earlier patches, but I don't think the answer to that is to
not track any stats at all.  The proposed hash_get_max_bucket()
function is quite cheap, so I think it wouldn't be out of line to
track the average value of that at transaction end over the
session's lifespan, and reset if the current value is more than
some-multiple of the average.

The tricky part here is that if some xact kicks that up to
say 64 entries, and we don't choose to reset, then the reading
for subsequent transactions will be 64 even if they use very
few locks.  So we probably need to not use a plain average,
but account for that effect somehow.  Maybe we could look at
how quickly the number goes up after we reset?

[ thinks for awhile... ]  As a straw-man proposal, I suggest
the following (completely untested) idea:

* Make the table size threshold variable, not constant.

* If, at end-of-transaction when the table is empty,
the table bucket count exceeds the threshold, reset
immediately; but if it's been less than K transactions
since the last reset, increase the threshold (by say 10%).

I think K can be a constant; somewhere between 10 and 100 would
probably work.  At process start, we should act as though the last
reset were more than K transactions ago (i.e., don't increase the
threshold at the first reset).

The main advantage this has over v7 is that we don't have the
1000-transaction delay before reset, which ISTM is giving up
much of the benefit of the whole idea.  Also, if the session
is consistently using lots of locks, we'll adapt to that after
awhile and not do useless table resets.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jehan-Guillaume de Rorthais
Date:
Subject: Fetching timeline during recovery
Next
From: Justin Pryzby
Date:
Subject: stress test for parallel workers