Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: checkpointer continuous flushing
Date
Msg-id 20150906213958.GE19425@alap3.anarazel.de
Whole thread Raw
In response to Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: checkpointer continuous flushing  (Amit Kapila <amit.kapila16@gmail.com>)
Re: checkpointer continuous flushing  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 2015-09-06 16:05:01 +0200, Fabien COELHO wrote:
> >Wouldn't it be just as easy to put this logic into the checkpointing code?
> 
> Not sure it would simplify anything, because the checkpointer currently
> knows about buffers but flushing is about files, which are hidden from
> view.

It'd not really simplify things, but it'd keep it local.

> >* Wouldn't a binary heap over the tablespaces + progress be nicer?
> 
> I'm not sure where it would fit exactly.

Imagine a binaryheap.h style heap over a structure like (tablespaceid,
progress, progress_inc, nextbuf) where the comparator compares the progress.

> Anyway, I think it would complicate the code significantly (compared to the
> straightforward array)

I doubt it. I mean instead of your GetNext you'd just do:   next_tblspc = DatumGetPointer(binaryheap_first(heap));   if
(next_tblspc== 0)       return 0;   next_tblspc.progress += next_tblspc.progress_slice;
binaryheap_replace_first(PointerGetDatum(next_tblspc));
   return next_tblspc.nextbuf++;


progress_slice is the number of buffers in the tablespace divided by the
number of total buffers, to avoid doing any sort of expensive math in
the more frequently executed path.

> Moreover such a data structure would probably require some kind of pointer
> (probably 8 bytes added per node, maybe more), and the amount of memory is
> already a concern, at least to me, and moreover it has to reside in shared
> memory which does not simplify allocation of tree data structures.

I'm not seing where you'd need an extra pointer? Maybe the
misunderstanding is that I'm proposing to do a heap over the
*tablespaces* not the actual buffers.

> >If you make the sorting criterion include the tablespace id you wouldn't
> >need the lookahead loop in NextBufferToWrite().
> 
> Yep, I thought of it. It would mean 4 more bytes per buffer, and bsearch to
> find the boundaries, so significantly less simple code.

What for would you need to bsearch?


> I think that the current approach is ok as the number of tablespace
> should be small.

Right that's often the case.

> >Isn't the current approach O(NBuffers^2) in the worst case?
> 
> ISTM that the overall lookahead complexity is Nbuffers * Ntablespace:
> buffers are scanned once for each tablespace.

Which in the worst case is NBuffers * 2...

> ISTM that using a tablespace in the sorting would reduce the complexity to
> ln(NBuffers) * Ntablespace for finding the boundaries, and then Nbuffers *
> (Ntablespace/Ntablespace) = NBuffers for scanning, at the expense of more
> memory and code complexity.

Afaics finding the boundaries can be done as part of the enumeration of
tablespaces in BufferSync(). That code needs to be moved, but that's not
too bad. I don't see the code be that much more complicated?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "andres@anarazel.de"
Date:
Subject: Re: [PATCH] Refactoring of LWLock tranches
Next
From: Jeff Janes
Date:
Subject: Re: Too many duplicated condition query return wrong value