Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [Patch] Optimize dropping of relation buffers using dlist |
Date | |
Msg-id | 20200806213334.3bzadeirly3mdtzl@development Whole thread Raw |
In response to | RE: [Patch] Optimize dropping of relation buffers using dlist ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>) |
Responses |
Re: [Patch] Optimize dropping of relation buffers using dlist
Re: [Patch] Optimize dropping of relation buffers using dlist |
List | pgsql-hackers |
On Thu, Aug 06, 2020 at 01:23:31AM +0000, k.jamison@fujitsu.com wrote: >On Saturday, August 1, 2020 5:24 AM, Andres Freund wrote: > >Hi, >Thank you for your constructive review and comments. >Sorry for the late reply. > >> Hi, >> >> On 2020-07-31 15:50:04 -0400, Tom Lane wrote: >> > Andres Freund <andres@anarazel.de> writes: >> > > Indeed. The buffer mapping hashtable already is visible as a major >> > > bottleneck in a number of workloads. Even in readonly pgbench if s_b >> > > is large enough (so the hashtable is larger than the cache). Not to >> > > speak of things like a cached sequential scan with a cheap qual and wide >> rows. >> > >> > To be fair, the added overhead is in buffer allocation not buffer lookup. >> > So it shouldn't add cost to fully-cached cases. As Tomas noted >> > upthread, the potential trouble spot is where the working set is >> > bigger than shared buffers but still fits in RAM (so there's no actual >> > I/O needed, but we do still have to shuffle buffers a lot). >> >> Oh, right, not sure what I was thinking. >> >> >> > > Wonder if the temporary fix is just to do explicit hashtable probes >> > > for all pages iff the size of the relation is < s_b / 500 or so. >> > > That'll address the case where small tables are frequently dropped - >> > > and dropping large relations is more expensive from the OS and data >> > > loading perspective, so it's not gonna happen as often. >> > >> > Oooh, interesting idea. We'd need a reliable idea of how long the >> > relation had been (preferably without adding an lseek call), but maybe >> > that's do-able. >> >> IIRC we already do smgrnblocks nearby, when doing the truncation (to figure out >> which segments we need to remove). Perhaps we can arrange to combine the >> two? The layering probably makes that somewhat ugly :( >> >> We could also just use pg_class.relpages. It'll probably mostly be accurate >> enough? >> >> Or we could just cache the result of the last smgrnblocks call... >> >> >> One of the cases where this type of strategy is most intersting to me is the partial >> truncations that autovacuum does... There we even know the range of tables >> ahead of time. > >Konstantin tested it on various workloads and saw no regression. Unfortunately Konstantin did not share any details about what workloads he tested, what config etc. But I find the "no regression" hypothesis rather hard to believe, because we're adding non-trivial amount of code to a place that can be quite hot. And I can trivially reproduce measurable (and significant) regression using a very simple pgbench read-only test, with amount of data that exceeds shared buffers but fits into RAM. The following numbers are from a x86_64 machine with 16 cores (32 w HT), 64GB of RAM, and 8GB shared buffers, using pgbench scale 1000 (so 16GB, i.e. twice the SB size). With simple "pgbench -S" tests (warmup and then 15 x 1-minute runs with 1, 8 and 16 clients - see the attached script for details) I see this: 1 client 8 clients 16 clients ---------------------------------------------- master 38249 236336 368591 patched 35853 217259 349248 -6% -8% -5% This is average of the runs, but the conclusions for medians are almost exactly te same. >But I understand the sentiment on the added overhead on BufferAlloc. >Regarding the case where the patch would potentially affect workloads >that fit into RAM but not into shared buffers, could one of Andres' >suggested idea/s above address that, in addition to this patch's >possible shared invalidation fix? Could that settle the added overhead >in BufferAlloc() as temporary fix? Not sure. >Thomas Munro is also working on caching relation sizes [1], maybe that >way we could get the latest known relation size. Currently, it's >possible only during recovery in smgrnblocks. It's not clear to me how would knowing the relation size help reducing the overhead of this patch? Can't we somehow identify cases when this optimization might help and only actually enable it in those cases? Like in a recovery, with a lot of truncates, or something like that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: