Re: CLOG extension - Mailing list pgsql-hackers

From Robert Haas
Subject Re: CLOG extension
Date
Msg-id CA+TgmoYZxHnMAyGcRd130V1EOw=sDYGybGXZ5F3m0-jrxn4YJQ@mail.gmail.com
Whole thread Raw
In response to Re: CLOG extension  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Fri, May 4, 2012 at 9:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 May 2012 13:59, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, May 4, 2012 at 3:35 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On Thu, May 3, 2012 at 9:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Thu, May 3, 2012 at 3:20 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>>> Your two paragraphs have roughly opposite arguments...
>>>>>
>>>>> Doing it every 32 pages would give you 30 seconds to complete the
>>>>> fsync, if you kicked it off when half way through the previous file -
>>>>> at current maximum rates. So there is utility in doing it in larger
>>>>> chunks.
>>>>
>>>> Maybe, but I'd like to try changing one thing at a time.  If we change
>>>> too much at once, it's likely to be hard to figure out where the
>>>> improvement is coming from.  Moving the task to a background process
>>>> is one improvement; doing it in larger chunks is another.  Those
>>>> deserve independent testing.
>>>
>>> You gave a good argument why background pre-allocation wouldn't work
>>> very well if we do it a page at a time. I believe you.
>>
>> Your confidence is sort of gratifying, but in this case I believe it's
>> misplaced.  On more careful analysis, it seems that ExtendCLOG() does
>> just two things: (1) evict a CLOG buffer and replace it with a zero'd
>> page representing the new page and (2)  write an XLOG record for the
>> change.  Apparently, "extending" CLOG doesn't actually involve
>> extending anything on disk at all.  We rely on the future buffer
>> eviction to do that, which is surprisingly different from the way
>> relation extension is handled.
>>
>> So CLOG extension is normally fast, but occasionally something goes
>> wrong.
>
> I don't agree its normally fast.
>
> WALInsert contention is high, so there is usually a long queue. As
> we've discussed this can be done offline and and so (2) can completely
> avoided in the main line. Considering that all new xids wait for this
> action, any wait at all is bad and takes time to drain once it clears.
>
> Evicting a clog has cost because the tail is almost always dirty when
> we switch pages.
>
> Doing both of those will ensure switch to new page requires zero wait time.
>
> So you have the solution. Not sure what else you're looking for.

Nothing, really.  I was just mooting some ideas before I went and
started coding, to see what people thought.  I've got your opinion and
Tom's, and of course my own, so now I'm off to test some different
approaches.  At the moment I'm running a battery of tests on
background-writing CLOG, which I will post about when they are
complete, and I intend to play around with some of the ideas from this
thread as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Next
From: Robert Haas
Date:
Subject: Re: Advisory locks seem rather broken