Re: Speed up Clog Access by increasing CLOG buffers - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Speed up Clog Access by increasing CLOG buffers
Date
Msg-id CAA4eK1KpXReQcFL-qKw6T7buYqQAmAEPwYgwCzmRtS+9J4dq0Q@mail.gmail.com
Whole thread Raw
In response to Re: Speed up Clog Access by increasing CLOG buffers  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Mon, Sep 7, 2015 at 7:04 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Andres Freund wrote:
>
> > The buffer replacement algorithm for clog is rather stupid - I do wonder
> > where the cutoff is that it hurts.
> >
> > Could you perhaps try to create a testcase where xids are accessed that
> > are so far apart on average that they're unlikely to be in memory?
> >

Yes, I am working on it, what I have in mind is to create a table with
large number of rows (say 50000000) and have each row with different
transaction id.  Now each transaction should try to update rows that
are at least 1048576 (number of transactions whose status can be held in
32 CLog buffers) distance apart, that way for each update it will try to access
Clog page that is not in-memory.  Let me know if you can think of any
better or simpler way.


> > There's two reasons that I'd like to see that: First I'd like to avoid
> > regression, second I'd like to avoid having to bump the maximum number
> > of buffers by small buffers after every hardware generation...
>
> I wonder if it would make sense to explore an idea that has been floated
> for years now -- to have pg_clog pages be allocated as part of shared
> buffers rather than have their own separate pool.
>

There could be some benefits of it, but I think we still have to acquire
Exclusive lock while committing transaction or while Extending Clog
which are also major sources of contention in this area.  I think the
benefits of moving it to shared_buffers could be that the upper limit on
number of pages that can be retained in memory could be increased and even
if we have to replace the page, responsibility to flush it could be delegated
to checkpoint.  So yes, there could be benefits with this idea, but not sure
if they are worth investigating this idea, one thing we could try if you think
that is beneficial is that just skip fsync during write of clog pages and if thats
beneficial, then we can think of pushing it to checkpoint (something similar
to what Andres has mentioned on nearby thread).

Yet another way could be to have configuration variable for clog buffers
(Clog_Buffers).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Robert Haas
Date:
Subject: Re: proposal: function parse_ident