Home > mailing lists

Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Scaling shared buffer eviction
Date	September 25, 2014 14:40:32
Msg-id	20140925144025.GF9633@alap3.anarazel.de Whole thread Raw
In response to	Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Scaling shared buffer eviction
List	pgsql-hackers

Tree view

On 2014-09-25 10:09:30 -0400, Robert Haas wrote:
> I think the long-term solution here is that we need a lock-free hash
> table implementation for our buffer mapping tables, because I'm pretty
> sure that just cranking the number of locks up and up is going to
> start to have unpleasant side effects at some point.  We may be able
> to buy a few more years by just cranking it up, though.

I think mid to long term we actually need something else than a
hashtable. Capable of efficiently looking for the existance of
'neighboring' buffers so we can intelligently prefetch far enough that
the read actually completes when we get there. Also I'm pretty sure that
we'll need a way to efficiently remove all buffers for a relfilenode
from shared buffers - linearly scanning for that isn't a good
solution. So I think we need a different data structure.

I've played a bit around with just replacing buf_table.c with a custom
handrolled hashtable because I've seen more than one production workload
where hash_search_with_hash_value() is both cpu and cache miss wise
top#1 of profiles. With most calls coming from the buffer mapping and
then from the lock manager.

There's two reasons for that: a) dynahash just isn't very good and it
does a lot of things that will never be necessary for these hashes. b)
the key into the hash table is *far* too wide. A significant portion of
the time is spent comparing buffer/lock tags.

The aforementioned replacement hash table was a good bit faster for
fully cached workloads - but at the time I wrote I could still make it
crash in very high cache pressure workloads, so that should be taken
with a fair bit of salt.

I think we can comparatively easily get rid of the tablespace in buffer
tags. Getting rid of the database already would be a fair bit harder. I
haven't really managed to get an idea how to remove the fork number
without making the catalog much more complicated.  I don't think we can
go too long without at least some of these steps :(.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Robert Haas
Date: 25 September 2014, 14:35:33
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Robert Haas
Date: 25 September 2014, 14:42:37
Subject: Re: Scaling shared buffer eviction

Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

Previous

Next