Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Date
Msg-id 22762.1075937339@sss.pgh.pa.us
Whole thread Raw
In response to Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint  (Kevin Brown <kevin@sysexperts.com>)
Responses Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint  (Kevin Brown <kevin@sysexperts.com>)
List pgsql-hackers
Kevin Brown <kevin@sysexperts.com> writes:
> Tom Lane wrote:
>> The more finely you slice your workspace, the more likely it becomes
>> that one particular part will run out of space.  So the inefficient case
>> where a backend isn't able to insert something into the appropriate list
>> will become considerably more of a factor.

> Well, running out of space in the list isn't that much of a problem.  If
> the backends run out of list space (and the max size of the list could
> be a configurable thing, either as a percentage of shared memory or as
> an absolute size), then all that happens is that the background writer
> might end up fsync()ing some files that have already been fsync()ed.
> But that's not that big of a deal -- the fact they've already been
> fsync()ed means that there shouldn't be any data in the kernel buffers
> left to write to disk, so subsequent fsync()s should return quickly.

Yes, it's a big deal.  You're arguing as though the bgwriter is the
thing that needs to be fast, when actually what we care about is the
backends being fast.  If the bgwriter isn't doing the vast bulk of the
writing (and especially the fsync waits) then we are wasting our time
having one at all.  So we need a scheme that makes it as unlikely as
possible that backends will have to do their own fsyncs.  Small
per-backend fsync lists aren't the way to do that.

> Perhaps a better way to do it would be to store the list of all the
> relfilenodes of everything in pg_class, with a flag for each indicating
> whether or not an fsync() of the file needs to take place.

You're forgetting that we have a fixed-size workspace to do this in ...
and no way to know at postmaster start how many relations there are in
any of our databases, let alone predict how many there might be later on.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Beta freeze? (was Re: array surprising behavior)
Next
From: "Simon Riggs"
Date:
Subject: Re: PITR Dead horse?