Re: Scaling shared buffer eviction - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Scaling shared buffer eviction
Date
Msg-id CAA4eK1+mXQCOS6DZfcXnR7itfDHeOgh3eGkD0Om=qOv-h43Utg@mail.gmail.com
Whole thread Raw
In response to Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Sep 3, 2014 at 1:45 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Aug 28, 2014 at 7:11 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have updated the patch to address the feedback.  Main changes are:
> >
> > 1. For populating freelist, have a separate process (bgreclaimer)
> > instead of doing it by bgwriter.
> > 2. Autotune the low and high threshold values for buffers
> > in freelist. I have used the formula as suggested by you upthread.
> > 3. Cleanup of locking regimen as discussed upthread (completely
> > eliminated BufFreelist Lock).
> > 4. Improved comments and general code cleanup.
>
> +Background Reclaimer's Processing
> +---------------------------------
>
> I suggest titling this section "Background Reclaim".

I don't mind changing it, but currently used title is based on similar
title "Background Writer's Processing".  It is used in previous
paragraph.  Is there a reason to title this differently?

> +The background reclaimer is designed to move buffers to freelist that are
>
> I suggest replacing the first three words of this sentence with "bgreclaimer".

Again what I have used is matching with BgWriter's explanation. I thought
it would be better if wording used is similar.

>
> +    while (tmp_num_to_free > 0)
>
> I am not sure it's a good idea for this value to be fixed at loop
> start and then just decremented.

It is based on the idea what bgwriter does for num_to_scan and
calling it once has advantage that we need to take freelist_lck
just once.

> Shouldn't we loop and do the whole
> thing over once we reach the high watermark, only stopping when
> StrategySyncStartAndEnd() says num_to_free is 0?

Do you mean to say that for high water mark check, we should
always refer StrategySyncStartAndEnd() rather than getting the
value in begining?

Are you thinking that somebody else would have already put some
buffers onto freelist which would change initially identified high water
mark?  I think that can be done only during very few and less used
operations.  Do you think for that we start referring
StrategySyncStartAndEnd() in loop?
>
> In freelist.c, it seems like a poor idea to have two spinlocks as
> consecutive structure members; they'll be in the same cache line,
> leading to false sharing.  If we merge them into a single spinlock,
> does that hurt performance?

I have kept them separate so that backends searching for a buffer
in freelist doesn't contend with bgreclaimer (while doing clock sweep)
or clock sweep being done by other backends.  I think it will be bit
tricky to devise a test where this can hurt, however it doesn't seem
too bad to have two separate locks in this case.

> If we put them further apart, e.g. by
> moving the freelist_lck to the start of the structure, followed by the
> latches, and leaving victimbuf_lck where it is, does that help
> performance?

I can investigate.

> +            /*
> +             * If the buffer is pinned or has a nonzero usage_count,
> we cannot use
> +             * it; discard it and retry.  (This can only happen if VACUUM put a
> +             * valid buffer in the freelist and then someone else
> used it before
> +             * we got to it.  It's probably impossible altogether as
> of 8.3, but
> +             * we'd better check anyway.)
> +             */
> +
>
> This comment is clearly obsolete.

Okay, but this patch hasn't changed anything w.r.t above comment,
so I haven't changed it. Do you want me to remove second part of
comment starting with "(This can only happen"?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Commitfest status
Next
From: Pavel Stehule
Date:
Subject: Re: Commitfest status