Re: MultiXact\SLRU buffers configuration - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: MultiXact\SLRU buffers configuration |
Date | |
Msg-id | CA+hUKGKEdeVKpOqBxzRF+Z4=j0T+CA7ERrXni5De71Mm6-dBWA@mail.gmail.com Whole thread Raw |
In response to | Re: MultiXact\SLRU buffers configuration (Andrey Borodin <x4mmm@yandex-team.ru>) |
Responses |
Re: MultiXact\SLRU buffers configuration
|
List | pgsql-hackers |
On Thu, Apr 1, 2021 at 10:09 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > 29 марта 2021 г., в 11:26, Andrey Borodin <x4mmm@yandex-team.ru> написал(а): > > My TODO list: > > 1. Try to break patch set v13-[0001-0004] > > 2. Think how to measure performance of linear search versus hash search in SLRU buffer mapping. > > Hi Thomas! > I'm still doing my homework. And to this moment all my catch is that "utils/dynahash.h" is not necessary. Thanks. Here's a new patch with that fixed, and also: 1. New names ("... Mapping Table" -> "... Lookup Table" in pg_shmem_allocations view, "clog_buffers" -> "xact_buffers") and a couple of typos fixed here and there. 2. Remove the cap of 128 buffers for xact_buffers as agreed. We still need a cap though, to avoid a couple of kinds of overflow inside slru.c, both when computing the default value and accepting a user-provided number. I introduced SLRU_MAX_ALLOWED_BUFFERS to keep it <= 1GB, and tested this on a 32 bit build with extreme block sizes. Likewise, I removed the cap of 16 buffers for commit_ts_buffers, but only if you have track_commit_timestamp enabled. It seems silly to waste 1MB per 1GB of shared_buffers on a feature you're not using. So the default is capped at 16 in that case to preserve existing behaviour, but otherwise can be set very large if you want. I think it's plausible that we'll want to make the multixact sizes adaptive too, but I that might have to be a job for later. Likewise, I am sure that substransaction-heavy workloads might be slower than they need to be due to the current default, but I have not done the research, With these new GUCs, people are free to experiment and develop theories about what the defaults should be in v15. 3. In the special case of xact_buffers, there is a lower cap of 512MB, because there's no point trying to cache more xids than there can be in existence, and that is computed by working backwards from CLOG_XACTS_PER_PAGE etc. It's not possible to do the same sort of thing for the other SLRUs without overflow problems, and it doesn't seem worth trying to fix that right now (1GB of cached commit timestamps ought to be enough for anyone™, while the theoretical maximum is 10 bytes for 2b xids = 20GB). To make this more explicit for people not following our discussion in detail: with shared_buffers=0...512MB, the behaviour doesn't change. But for shared_buffers=1GB you'll get twice as many xact_buffers as today (2MB instead of being capped at 1MB), and it keeps scaling linearly from there at 0.2%. In other words, all real world databases will get a boost in this department. 4. Change the default for commit_ts_buffers back to shared_buffers / 1024 (with a minimum of 4), because I think you might have changed it by a copy and paste error -- or did you intend to make the default higher? 5. Improve descriptions for the GUCs, visible in pg_settings view, to match the documentation for related wait events. So "for commit log SLRU" -> "for the transaction status SLRU cache", and similar corrections elsewhere. (I am tempted to try to find a better word than "SLRU", which doesn't seem relevant to users, but for now consistency is good.) 6. Added a callback so that SHOW xact_buffers and SHOW commit_ts_buffers display the real size in effect (instead of "0" for default). I tried running with xact_buffers=1 and soon saw why you change it to interpret 1 the same as 0; with 1 you hit buffer starvation and get stuck. I wish there were some way to say "the default for this GUC is 0, but if it's not zero then it's got to be at least 4". I didn't study the theoretical basis for the previous minimum value of 4, so I think we should keep it that way, so that if you say 3 you get 4. I thought it was better to express that like so: /* Use configured value if provided. */ if (xact_buffers > 0) return Max(4, xact_buffers); return Min(CLOG_MAX_ALLOWED_BUFFERS, Max(4, NBuffers / 512)); > I'm thinking about hashtables and measuring performance near optimum of linear search does not seem a good idea now. > It's impossible to prove that difference is statistically significant on all platforms. But even on one platform measurementsare just too noisy. > > Shared buffers lookup table is indeed very similar to this SLRU lookup table. And it does not try to use more memory thanneeded. I could not find pgbench-visible impact of growing shared buffer lookup table. Obviously, because it's not abottleneck on regular workload. And it's hard to guess representative pathological workload. Thanks for testing. I agree that it's a good idea to follow the main buffer pool's approach for our first version of this. One small difference is that the SLRU patch performs the hash computation while it holds the lock. If we computed the hash first and used hash_search_with_hash_value(), we could compute it before we obtain the lock, like the main buffer pool. If we computed the hash value first, we could also ignore the rule in the documentation for hash_search_with_hash_value() that says that you must calculate it with get_hash_value(), and just call the hash function ourselves, so that it's fully inlinable. The same opportunity exists for the main buffer pool. That'd get you one of the micro-optimisations that simplehash.h offers. Whether it's worth bothering with, I don't know. > In fact, this thread started with proposal to use reader-writer lock for multis (instead of exclusive lock), and this proposalencountered same problem. It's very hard to create stable reproduction of pathological workload when this lock isheavily contented. Many people observed the problem, but still there is no open repro. > > I bet it will be hard to prove that simplehash is any better then HTAB. But if it is really better, shared buffers couldbenefit from the same technique. Agreed, there are a lot of interesting future projects in this area, when you compare the main buffer pool, these special buffer pools, and maybe also the "shared relfilenode" pool patch I have proposed for v15 (CF entry 2933). All have mapping tables and buffer replacement algorithms (why should they be different?), one has partitions, some have atomic-based header locks, they interlock with WAL differently (on page LSN, FPIs and checksum support), ... etc etc. > I think its just fine to use HTAB with normal size, as long as shared buffers do so. But there we allocate slightly morespace InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS). Don't we need to allocate nslots + 1 ? It seems that we alwaysdo SlruMappingRemove() before SlruMappingAdd() and it is not necessary. Yeah, we never try to add more elements than allowed, because we have a big lock around the mapping. The main buffer mapping table has a more concurrent design and might temporarily have one extra entry per partition.
Attachment
pgsql-hackers by date: