Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Refactoring the checkpointer's fsync request queue
Date
Msg-id CA+TgmoYVEAUxNGwdsBJ8BXh5UwnsnixuDvZ_tugunkgF-AG+NA@mail.gmail.com
Whole thread Raw
In response to Re: Refactoring the checkpointer's fsync request queue  (Andres Freund <andres@anarazel.de>)
Responses Re: Refactoring the checkpointer's fsync request queue  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Tue, Nov 13, 2018 at 1:07 PM Andres Freund <andres@anarazel.de> wrote:
> On 2018-11-13 12:04:23 -0500, Robert Haas wrote:
> > I still feel like this whole pass-the-fds-to-the-checkpointer thing is
> > a bit of a fool's errand, though.  I mean, there's no guarantee that
> > the first FD that gets passed to the checkpointer is the first one
> > opened, or even the first one written, is there?
> I'm not sure I understand the danger you're seeing here. It doesn't have
> to be the first fd opened, it has to be an fd that's older than all the
> writes that we need to ensure made it to disk. And that ought to be
> guaranteed by the logic?  Between the FileWrite() and the
> register_dirty_segment() (and other relevant paths) the FD cannot be
> closed.

Suppose backend A and backend B open a segment around the same time.
Is it possible that backend A does a write before backend B, but
backend B's copy of the fd reaches the checkpointer before backend A's
copy?  If you send the FD to the checkpointer before writing anything
then I think it's fine, but if you write first and then send the FD to
the checkpointer I don't see what guarantees the ordering.

> > It seems like if you wanted to make this work reliably, you'd need to
> > do it the other way around: have the checkpointer (or some other
> > background process) open all the FDs, and anybody else who wants to
> > have one open get it from the checkpointer.
>
> That'd require a process context switch for each FD opened, which seems
> clearly like a no-go?

I don't know how bad that would be.  But hey, no cost is too great to
pay as a workaround for insane kernel semantics, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Sync ECPG scanner with core
Next
From: Alvaro Herrera
Date:
Subject: Re: Sync ECPG scanner with core