Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: checkpointer continuous flushing
Date	January 12, 2016 13:47:54
Msg-id	CAA4eK1LxUPRyvY4SYN2T6s00v60pvEbVN+YZkfpkSAEbapbDYg@mail.gmail.com Whole thread Raw
In response to	Re: checkpointer continuous flushing (Andres Freund <andres@anarazel.de>)
Responses	Re: checkpointer continuous flushing
List	pgsql-hackers

Tree view

On Tue, Jan 12, 2016 at 5:52 PM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2016-01-12 17:50:36 +0530, Amit Kapila wrote:
> > On Tue, Jan 12, 2016 at 12:57 AM, Andres Freund <andres@anarazel.de> wrote:>
> > >
> > > My theory is that this happens due to the sorting: pgbench is an update
> > > heavy workload, the first few pages are always going to be used if
> > > there's free space as freespacemap.c essentially prefers those. Due to
> > > the sorting all a relation's early pages are going to be in "in a row".
> > >
> >
> > Not sure, what is best way to tackle this problem, but I think one way could
> > be to perform sorting at flush requests level rather than before writing
> > to OS buffers.
>
> I'm not following. If you just sort a couple hundred more or less random
> buffers - which is what you get if you look in buf_id order through
> shared_buffers - the likelihood of actually finding neighbouring writes
> is pretty low.

Why can't we do it at larger intervals (relative to total amount of writes)?

To explain, what I have in mind, let us assume that checkpoint interval

is longer (10 mins) and in the mean time all the writes are being done

by bgwriter which it registers in shared memory so that later checkpoint

can perform corresponding fsync's, now when the request queue

becomes threshhold size (let us say 1/3rd) full, then we can perform

sorting and merging and issue flush hints. Checkpointer task can

also follow somewhat similar technique which means that once it

has written 1/3rd or so of buffers (which we need to track), it can

perform flush hints after sort+merge. Now, I think we can also

do it in checkpointer alone rather than in bgwriter and checkpointer.

Basically, I think this can lead to lesser merging of neighbouring

writes, but might not hurt if sync_file_range() API is cheap.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michal Novotny
Date: 12 January 2016, 13:32:53
Subject: Re: Question about DROP TABLE

From: Andres Freund
Date: 12 January 2016, 13:48:27
Subject: Re: checkpointer continuous flushing

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next