Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Date
Msg-id 20181018204408.tk3km7zwusbbt5gd@alap3.anarazel.de
Whole thread Raw
In response to Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2018-10-18 12:54:27 -0700, Peter Geoghegan wrote:
> I can show a nice improvement in latency on a slightly-rate-limited
> TPC-C workload when backend_flush_after=0 (something like a 40%
> reduction on average), but that doesn't hold up when oltpbench isn't
> rate-limited and/or has backend_flush_after set. Usually, there is a
> 1% - 2% regression, despite the big improvements in index size, and
> despite the big reduction in the amount of buffers that backends must
> write out themselves.

What kind of backend_flush_after values where you trying?
backend_flush_after=0 obviously is the default, so I'm not clear on
that.   How large is the database here, and how high is shared_buffers


> The obvious explanation is that throughput is decreased due to our
> doing extra work (truncation) while under an exclusive buffer lock.
> However, I've worked hard on that, and, as I said, I can sometimes
> observe a nice improvement in latency. This makes me doubt the obvious
> explanation. My working theory is that this has something to do with
> shared_buffers eviction. Maybe we're making worse decisions about
> which buffer to evict, or maybe the scalability of eviction is hurt.
> Perhaps both.

Is it possible that there's new / prolonged cases where a buffer is read
from disk after the patch? Because that might require doing *write* IO
when evicting the previous contents of the victim buffer, and obviously
that can take longer if you're running with backend_flush_after > 0.

I wonder if it'd make sense to hack up a patch that logs when evicting a
buffer while already holding another lwlock. That shouldn't be too hard.


> You can download results from a recent benchmark to get some sense of
> this. It includes latency and throughput graphs, plus details
> statistics collector stats:
> 
> https://drive.google.com/file/d/1oIjJ3YpSPiyRV_KF6cAfAi4gSm7JdPK1/view?usp=sharing

I'm uncllear which runs are what here? I assume "public" is your
patchset, and master is master? Do you reset the stats inbetween runs?

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Large writable variables
Next
From: Peter Geoghegan
Date:
Subject: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons