Re: Speed up transaction completion faster after many relations are accessed in a transaction - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Speed up transaction completion faster after many relations are accessed in a transaction |
Date | |
Msg-id | 3709.1563898420@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Speed up transaction completion faster after many relations areaccessed in a transaction (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Speed up transaction completion faster after many relations areaccessed in a transaction
|
List | pgsql-hackers |
David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached v7, which really is v6 with some comments adjusted and > the order of the hash_get_num_entries and hash_get_max_bucket function > calls swapped. I think hash_get_num_entries() will return 0 most of > the time where we're calling it, so it makes sense to put the > condition that's less likely to be true first in the if condition. > I'm pretty happy with v7 now. If anyone has any objections to it, > please speak up very soon. Otherwise, I plan to push it about this > time tomorrow. I dunno, this seems close to useless in this form. As it stands, once hash_get_max_bucket has exceeded the threshold, you will arbitrarily reset the table 1000 transactions later (since the max bucket is certainly not gonna decrease otherwise). So that's got two big problems: 1. In the assumed-common case where most of the transactions take few locks, you wasted cycles for 999 transactions. 2. You'll reset the table independently of subsequent history, even if the session's usage is pretty much always over the threshold. Admittedly, if you do this only once per 1K transactions, it's probably not a horrible overhead --- but you can't improve point 1 without making it a bigger overhead. I did complain about the cost of tracking the stats proposed by some earlier patches, but I don't think the answer to that is to not track any stats at all. The proposed hash_get_max_bucket() function is quite cheap, so I think it wouldn't be out of line to track the average value of that at transaction end over the session's lifespan, and reset if the current value is more than some-multiple of the average. The tricky part here is that if some xact kicks that up to say 64 entries, and we don't choose to reset, then the reading for subsequent transactions will be 64 even if they use very few locks. So we probably need to not use a plain average, but account for that effect somehow. Maybe we could look at how quickly the number goes up after we reset? [ thinks for awhile... ] As a straw-man proposal, I suggest the following (completely untested) idea: * Make the table size threshold variable, not constant. * If, at end-of-transaction when the table is empty, the table bucket count exceeds the threshold, reset immediately; but if it's been less than K transactions since the last reset, increase the threshold (by say 10%). I think K can be a constant; somewhere between 10 and 100 would probably work. At process start, we should act as though the last reset were more than K transactions ago (i.e., don't increase the threshold at the first reset). The main advantage this has over v7 is that we don't have the 1000-transaction delay before reset, which ISTM is giving up much of the benefit of the whole idea. Also, if the session is consistently using lots of locks, we'll adapt to that after awhile and not do useless table resets. regards, tom lane
pgsql-hackers by date: