Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Scaling shared buffer eviction
Date
Msg-id 20140925144025.GF9633@alap3.anarazel.de
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Scaling shared buffer eviction
List pgsql-hackers
On 2014-09-25 10:09:30 -0400, Robert Haas wrote:
> I think the long-term solution here is that we need a lock-free hash
> table implementation for our buffer mapping tables, because I'm pretty
> sure that just cranking the number of locks up and up is going to
> start to have unpleasant side effects at some point.  We may be able
> to buy a few more years by just cranking it up, though.

I think mid to long term we actually need something else than a
hashtable. Capable of efficiently looking for the existance of
'neighboring' buffers so we can intelligently prefetch far enough that
the read actually completes when we get there. Also I'm pretty sure that
we'll need a way to efficiently remove all buffers for a relfilenode
from shared buffers - linearly scanning for that isn't a good
solution. So I think we need a different data structure.

I've played a bit around with just replacing buf_table.c with a custom
handrolled hashtable because I've seen more than one production workload
where hash_search_with_hash_value() is both cpu and cache miss wise
top#1 of profiles. With most calls coming from the buffer mapping and
then from the lock manager.

There's two reasons for that: a) dynahash just isn't very good and it
does a lot of things that will never be necessary for these hashes. b)
the key into the hash table is *far* too wide. A significant portion of
the time is spent comparing buffer/lock tags.

The aforementioned replacement hash table was a good bit faster for
fully cached workloads - but at the time I wrote I could still make it
crash in very high cache pressure workloads, so that should be taken
with a fair bit of salt.

I think we can comparatively easily get rid of the tablespace in buffer
tags. Getting rid of the database already would be a fair bit harder. I
haven't really managed to get an idea how to remove the fork number
without making the catalog much more complicated.  I don't think we can
go too long without at least some of these steps :(.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Next
From: Robert Haas
Date:
Subject: Re: Scaling shared buffer eviction