Thread: get_controlfile() can leak fds in the backend

get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
Hi all,
(CC-ing Joe as of dc7d70e)

I was just looking at the offline checksum patch, and noticed some
sloppy coding in controldata_utils.c.  The control file gets opened in
get_controlfile(), and if it generates an error then the file
descriptor remains open.  As basic code rule in the backend we should
only use BasicOpenFile() when opening files, so I think that the issue
should be fixed as attached, even if this requires including fd.h for
the backend compilation which is kind of ugly.

Another possibility would be to just close the file descriptor before
any error, saving appropriately errno before issuing any %m portion,
but that still does not respect the backend policy regarding open().

We also do not have a wait event for the read() call, maybe we should
have one, but knowing that this gets called only for the SQL-level
functions accessing the control file, I don't really think that's
worth it.

Thoughts?
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Fabien COELHO
Date:
Bonjour Michaël,

> Thoughts?

None.

However, while at it, there is also the question of whether the control 
file should be locked when updated, eg with flock(2) to avoid race 
conditions between concurrent commands. ISTM that there is currently not 
such thing in the code, but that it would be desirable.

-- 
Fabie.

Re: get_controlfile() can leak fds in the backend

From
Andres Freund
Date:
Hi,

On 2019-02-27 10:23:45 +0100, Fabien COELHO wrote:
> However, while at it, there is also the question of whether the control file
> should be locked when updated, eg with flock(2) to avoid race conditions
> between concurrent commands. ISTM that there is currently not such thing in
> the code, but that it would be desirable.

Shouldn't be necessary - the control file fits into a single page, and
writes of that size ought to always be atomic. And I also think
introducing flock usage for this would be quite disproportional.

Greetings,

Andres Freund


Re: get_controlfile() can leak fds in the backend

From
Fabien COELHO
Date:
>> However, while at it, there is also the question of whether the control file
>> should be locked when updated, eg with flock(2) to avoid race conditions
>> between concurrent commands. ISTM that there is currently not such thing in
>> the code, but that it would be desirable.
>
> Shouldn't be necessary - the control file fits into a single page, and
> writes of that size ought to always be atomic. And I also think
> introducing flock usage for this would be quite disproportional.

Ok, fine.

Note that my concern is not about the page size, but rather that as more 
commands may change the cluster status by editing the control file, it 
would be better that a postmaster does not start while a pg_rewind or 
enable checksum or whatever is in progress, and currently there is a 
possible race condition between the read and write that can induce an 
issue, at least theoretically.

-- 
Fabien.


Re: get_controlfile() can leak fds in the backend

From
Joe Conway
Date:
On 2/27/19 2:47 AM, Michael Paquier wrote:
> Hi all,
> (CC-ing Joe as of dc7d70e)
>
> I was just looking at the offline checksum patch, and noticed some
> sloppy coding in controldata_utils.c.  The control file gets opened in
> get_controlfile(), and if it generates an error then the file
> descriptor remains open.  As basic code rule in the backend we should
> only use BasicOpenFile() when opening files, so I think that the issue
> should be fixed as attached, even if this requires including fd.h for
> the backend compilation which is kind of ugly.
>
> Another possibility would be to just close the file descriptor before
> any error, saving appropriately errno before issuing any %m portion,
> but that still does not respect the backend policy regarding open().

In fd.c I see:
8<--------------------
* AllocateFile, AllocateDir, OpenPipeStream and OpenTransientFile are
* wrappers around fopen(3), opendir(3), popen(3) and open(2),
* respectively. They behave like the corresponding native functions,
* except that the handle is registered with the current subtransaction,
* and will be automatically closed at abort. These are intended mainly
* for short operations like reading a configuration file; there is a
* limit on the number of files that can be opened using these functions
* at any one time.
*
* Finally, BasicOpenFile is just a thin wrapper around open() that can
* release file descriptors in use by the virtual file descriptors if
* necessary. There is no automatic cleanup of file descriptors returned
* by BasicOpenFile, it is solely the caller's responsibility to close
* the file descriptor by calling close(2).
8<--------------------

According to that comment BasicOpenFile does not seem to solve the issue
you are pointing out (leaking of file descriptor on ERROR). Perhaps
OpenTransientFile() is more appropriate? I am on the road at the moment
so have not looked very deeply at this yet though.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: get_controlfile() can leak fds in the backend

From
Joe Conway
Date:
On 2/27/19 10:26 AM, Joe Conway wrote:
> On 2/27/19 2:47 AM, Michael Paquier wrote:
>> Hi all,
>> (CC-ing Joe as of dc7d70e)

> According to that comment BasicOpenFile does not seem to solve the issue
> you are pointing out (leaking of file descriptor on ERROR). Perhaps
> OpenTransientFile() is more appropriate? I am on the road at the moment
> so have not looked very deeply at this yet though.


It seems to me that OpenTransientFile() is more appropriate. Patch done
that way attached.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
On Wed, Feb 27, 2019 at 11:50:17AM +0100, Fabien COELHO wrote:
>> Shouldn't be necessary - the control file fits into a single page, and
>> writes of that size ought to always be atomic. And I also think
>> introducing flock usage for this would be quite disproportional.

There are static assertions to make sure that the side of control file
data never gets higher than 512 bytes for this purpose.

> Note that my concern is not about the page size, but rather that as more
> commands may change the cluster status by editing the control file, it would
> be better that a postmaster does not start while a pg_rewind or enable
> checksum or whatever is in progress, and currently there is a possible race
> condition between the read and write that can induce an issue, at least
> theoretically.

Something that I think we could live instead is a special flag in the
control file to mark the postmaster as in maintenance mode.  This
would be useful to prevent the postmaster to start if seeing this flag
in the control file, as well to find out that a host has crashed in
the middle of a maintenance operation.  We don't give this insurance
now when running pg_rewind, which is bad.  That's also separate from
the checksum-related patches and pg_rewind.

flock() can be something hard to live with for cross-platform
compatibility like Windows (LockFileEx) or fancy platforms.  And note
that we don't use it yet in the tree.  And flock() would help in the
first case I am mentioning, but not in the second.
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Andres Freund
Date:
Hi,

On 2019-02-27 11:50:17 +0100, Fabien COELHO wrote:
> Note that my concern is not about the page size, but rather that as more
> commands may change the cluster status by editing the control file, it would
> be better that a postmaster does not start while a pg_rewind or enable
> checksum or whatever is in progress, and currently there is a possible race
> condition between the read and write that can induce an issue, at least
> theoretically.

Seems odd to bring this up in this thread, it really has nothing to do
with the topic.  If we were to want to do more here, ISTM the right
approach would use the postmaster pid file, not the control file.

Greetings,

Andres Freund


Re: get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
> It seems to me that OpenTransientFile() is more appropriate. Patch done
> that way attached.

Works for me, thanks for sending a patch!  While on it, could you
clean up the comment on top of get_controlfile()?  "char" is mentioned
instead of "const char" for DataDir which is incorrect.  I would
remove the complete set of arguments from the description and just
keep the routine name.
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Fabien COELHO
Date:
Hello Andres,

>> Note that my concern is not about the page size, but rather that as more
>> commands may change the cluster status by editing the control file, it would
>> be better that a postmaster does not start while a pg_rewind or enable
>> checksum or whatever is in progress, and currently there is a possible race
>> condition between the read and write that can induce an issue, at least
>> theoretically.
>
> Seems odd to bring this up in this thread, it really has nothing to do
> with the topic.

Indeed. I raised it here because it is in the same area of code and 
Michaël was looking at it.

> If we were to want to do more here, ISTM the right approach would use 
> the postmaster pid file, not the control file.

ISTM that this just means re-inventing a manual poor-featured 
race-condition-prone lock API around another file, which seems to be 
created more or less only by "pg_ctl", while some other commands use the 
control file (eg pg_rewind, AFAICS).

-- 
Fabien.

Re: get_controlfile() can leak fds in the backend

From
Joe Conway
Date:
On 2/27/19 7:54 PM, Michael Paquier wrote:
> On Wed, Feb 27, 2019 at 07:45:11PM -0500, Joe Conway wrote:
>> It seems to me that OpenTransientFile() is more appropriate. Patch done
>> that way attached.
>
> Works for me, thanks for sending a patch!  While on it, could you
> clean up the comment on top of get_controlfile()?  "char" is mentioned
> instead of "const char" for DataDir which is incorrect.  I would
> remove the complete set of arguments from the description and just
> keep the routine name.

Sure, will do. What are your thoughts on backpatching? This seems
unlikely to be a practical concern in the field, so my inclination is a
master only fix.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
> Sure, will do. What are your thoughts on backpatching? This seems
> unlikely to be a practical concern in the field, so my inclination is a
> master only fix.

I agree that this would unlikely become an issue as an error on the
control file would most likely be a PANIC when updating it, so fixing
only HEAD sounds fine to me.  Thanks for asking!
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Andres Freund
Date:
Hi,

On 2019-02-28 09:54:48 +0100, Fabien COELHO wrote:
> > If we were to want to do more here, ISTM the right approach would use
> > the postmaster pid file, not the control file.
> 
> ISTM that this just means re-inventing a manual poor-featured
> race-condition-prone lock API around another file, which seems to be created
> more or less only by "pg_ctl", while some other commands use the control
> file (eg pg_rewind, AFAICS).

Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
see if the backend has finished starting up etc. It's precisely what the
backend uses to prevent two postmasters to start etc. It's also what say
pg_resetwal checks to protect against a concurrently running lcuster
(albeit in a racy way). If we want to make things more bulletproof,
that's the place.  The control file is constantly written to, sometimes
by different processes, it'd just not be a good file for such lockout
mechanisms.

Greetings,

Andres Freund


Re: get_controlfile() can leak fds in the backend

From
Joe Conway
Date:
On 2/28/19 7:20 AM, Michael Paquier wrote:
> On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
>> Sure, will do. What are your thoughts on backpatching? This seems
>> unlikely to be a practical concern in the field, so my inclination is a
>> master only fix.
>
> I agree that this would unlikely become an issue as an error on the
> control file would most likely be a PANIC when updating it, so fixing
> only HEAD sounds fine to me.  Thanks for asking!


Committed and push that way.

By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
On Thu, Feb 28, 2019 at 04:09:32PM -0500, Joe Conway wrote:
> Committed and push that way.

Thanks for committing a fix.

> By the way, while looking at this, I noted at least a couple of places
> where OpenTransientFile() is being passed O_RDWR when the usage is
> pretty clearly intended to be read-only. For example at least two
> instances in slru.c -- SlruPhysicalReadPage() and
> SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
> fixing those instances?

There are roughly 40~42 callers of OpenTransientFile().  Looking at
them I can see that RestoreSlotFromDisk() could also switch to RDONLY
instead of RDWR.  I am also a bit tired of the lack error handling
around CloseTransientFile().  While in some code paths the file
descriptors are closed for an error, in some others we should report
something.  I am going to send a patch after a lookup.  Let's see all
that on a separate thread.
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Michael Paquier
Date:
On Thu, Feb 28, 2019 at 01:07:23PM -0800, Andres Freund wrote:
> Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
> see if the backend has finished starting up etc. It's precisely what the
> backend uses to prevent two postmasters to start etc. It's also what say
> pg_resetwal checks to protect against a concurrently running lcuster
> (albeit in a racy way). If we want to make things more bulletproof,
> that's the place.  The control file is constantly written to, sometimes
> by different processes, it'd just not be a good file for such lockout
> mechanisms.

Hijacking more my own thread...  I can do that, right?  Or not.

One thing is that we don't protect a data folder to be started when it
is in the process of being treated by an external tool, like
pg_rewind, or pg_checksums.  So having an extra flag in the control
file, which can be used by external tools to tell that the data folder
is being treated for something does not sound that crazy to me.
Having a tool write a fake postmaster.pid for this kind of task does
not sound right from a correctness point of view, because the instance
is not started.

And a lock API won't protect much if the host is unplugged as well if
its state is made durable...

Let's keep the discussions where they are by the way.  Joe has just
closed the report of this thread with 4598a99, so I am moving on to
the correct places.

My apologies for the digressions.
--
Michael

Attachment

Re: get_controlfile() can leak fds in the backend

From
Andres Freund
Date:
Hi,

On 2019-03-01 10:11:53 +0900, Michael Paquier wrote:
> One thing is that we don't protect a data folder to be started when it
> is in the process of being treated by an external tool, like
> pg_rewind, or pg_checksums.  So having an extra flag in the control
> file, which can be used by external tools to tell that the data folder
> is being treated for something does not sound that crazy to me.
> Having a tool write a fake postmaster.pid for this kind of task does
> not sound right from a correctness point of view, because the instance
> is not started.

I think putting this into the control file is a seriously bad
idea. Postmaster interlocks against other postmasters running via
postmaster.pid.  Having a second interlock mechanism, in a different
file, doesn't make any sort of sense. Nor does it seem sane to have
external tool write over data as INTENSELY critical as the control file,
when they then have to understand CRCs etc.


> Let's keep the discussions where they are by the way.  Joe has just
> closed the report of this thread with 4598a99, so I am moving on to
> the correct places.

I don't know what that means, given you replied to the above in this
thread?

Greetings,

Andres Freund


Re: get_controlfile() can leak fds in the backend

From
Fabien COELHO
Date:
Hello Andres,

> I think putting this into the control file is a seriously bad
> idea. Postmaster interlocks against other postmasters running via
> postmaster.pid.

> Having a second interlock mechanism, in a different file, doesn't make 
> any sort of sense. Nor does it seem sane to have external tool write 
> over data as INTENSELY critical as the control file, when they then have 
> to understand CRCs etc.

On this point, there are functions for that, get/update_controlfile, so 
this should be factored out.

The initial insentive for raising the issue, probably in the wrong thread 
and without a clear understanding of the matter, is that I've been 
reviewing a patch to enable/disable checksums on a stopped cluster.

The patch updates all the checksums in all the files, and changes the 
control file to tell that there are checksums. Currently it checks and 
creates a fake "posmaster.pid" file to attempt to prevent another tool to 
run concurrently to this operation, with ISTM a procedure prone to race 
conditions thus does not warrant that it would be the only tool running on 
the cluster. This looked to me as a bad hack. Given that other command 
that take on a cluster seemed to use the controlfile to signal that they 
are doing something, I'd thought that it would be the way to go, but then 
I noticed that the control file read/write procedure looks as bad as the 
postmaster.pid hack to ensure that only one command is active.

Nevertheless, I'm ranting in the wrong thread and it seems that these is 
no real problem to solve, so I'll say fine to the to-me-unconvincing 
"postmaster.pid" hack proposed in the patch.

-- 
Fabien.