Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Clock sweep not caching enough B-Tree leaf pages?
Date
Msg-id CAOeZVicEUNUk-YC0KaZvR37kZWL=otm0TcaV_gzPPEfAqWf9aQ@mail.gmail.com
Whole thread Raw
In response to Re: Clock sweep not caching enough B-Tree leaf pages?  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Clock sweep not caching enough B-Tree leaf pages?
List pgsql-hackers

On Fri, Apr 18, 2014 at 7:27 AM, Peter Geoghegan <pg@heroku.com> wrote:

A way I have in mind about eviction policy is to introduce a way to have an ageing factor in each buffer and take the ageing factor into consideration when evicting a buffer.

Consider a case where a table is pretty huge and spread across multiple pages. The querying pattern is like a time series pattern i.e. a set of rows is queried pretty frequently for some time, making the corresponding page hot. Then, the next set of rows is queried frequently making that page hot and so on.

Consider a new page entering the shared buffers with refcount 1 and usage_count 1. If that page is a part of the workload described above, it is likely that it shall not be used for a considerable amount of time after it has entered the buffers but will be used eventually.

Now, the current hypothetical situation is that we have three pages:

1) The page that used to be hot at the previous time window but is no longer hot and is actually the correct candidate for eviction.
2) The current hot page (It wont be evicted anyway for now).
3) The new page which just got in and should not be evicted since it can be hot soon (for this workload it will be hot in the next time window).

When Clocksweep algorithm runs the next time, it will see the new buffer page as the one to be evicted (since page (1) may still have usage_count > 0 i.e. it may be 'cooling' but not 'cool' yet.)

This can be changed by introducing an ageing factor that sees how much time the current buffer has spend in shared buffers. If the time that the buffer has spent is large enough (relatively) and it is not hot currently, that means it has had its chance and can be evicted. This shall save the new page (3) from being evicted since it's time in shared buffers shall not be high enough to mandate eviction and it shall be given more chances.

Since gettimeofday() is an expensive call and hence cannot be done in the tight loop, we can count the number of clocksweeps the current buffer has seen (rather, survived). This shall give us a rough idea of the estimate of the relative age of the buffer.

When an eviction happens, all the candidates with refcount = 0 shall be taken.Then, among them, the one with highest ageing factor shall be evicted.

Of course, there may be better ways of doing the same, but I want to highlight the point (or possibility) of introducing an ageing factor to prevent eviction of relatively younger pages early in the eviction process.

The overhead isnt too big. We just need to add another attribute in buffer header for the number of clocksweeps seen (rather, survived) and check it when an eviction is taking place.The existing spinlock for buffer headers shall be good for protecting contention and access. The access rules can be similar to that of usage_count.

Thoughts and comments?

Regards,

Atri

--
Regards,
 
Atri
l'apprenant

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Typo fix in src/backend/access/transam/recovery.conf.sample
Next
From: "MauMau"
Date:
Subject: Re: [bug fix] pg_ctl always uses the same event source