Thread: [HACKERS] Error while creating subscription when server is running in singleuser mode

Hi,

There is an error while creating subscription when server is running in 
single user mode

centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
PostgreSQL stand-alone backend 10beta1
backend> create subscription sub connection 'dbname=postgres port=5433 
user=centos' publication p with (create_slot=0,enabled=off);
2017-05-31 12:53:09.318 BST [10469] LOG:  statement: create subscription 
sub connection 'dbname=postgres port=5433 user=centos' publication p 
with (create_slot=0,enabled=off);

2017-05-31 12:53:09.326 BST [10469] ERROR:  epoll_ctl() failed: Bad file 
descriptor

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




On Wed, May 31, 2017 at 7:54 AM, tushar <tushar.ahuja@enterprisedb.com> wrote:
> centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
> PostgreSQL stand-alone backend 10beta1
> backend> create subscription sub connection 'dbname=postgres port=5433
> user=centos' publication p with (create_slot=0,enabled=off);
> 2017-05-31 12:53:09.318 BST [10469] LOG:  statement: create subscription sub
> connection 'dbname=postgres port=5433 user=centos' publication p with
> (create_slot=0,enabled=off);
>
> 2017-05-31 12:53:09.326 BST [10469] ERROR:  epoll_ctl() failed: Bad file
> descriptor

IMHO, In single user mode, it can not support replication (it can not
have background WALReciver task). However, I believe there should be a
proper error if the above statement is correct.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, May 31, 2017 at 7:01 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Wed, May 31, 2017 at 7:54 AM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>> centos@centos-cpula bin]$ ./postgres --single postgres -D m1data
>> PostgreSQL stand-alone backend 10beta1
>> backend> create subscription sub connection 'dbname=postgres port=5433
>> user=centos' publication p with (create_slot=0,enabled=off);
>> 2017-05-31 12:53:09.318 BST [10469] LOG:  statement: create subscription sub
>> connection 'dbname=postgres port=5433 user=centos' publication p with
>> (create_slot=0,enabled=off);
>>
>> 2017-05-31 12:53:09.326 BST [10469] ERROR:  epoll_ctl() failed: Bad file
>> descriptor
>
> IMHO, In single user mode, it can not support replication (it can not
> have background WALReciver task). However, I believe there should be a
> proper error if the above statement is correct.

Yeah, see 0e0f43d6 for example. A simple fix is to look at
IsUnderPostmaster when creating, altering or dropping a subscription
in subscriptioncmds.c.
-- 
Michael



On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Yeah, see 0e0f43d6 for example. A simple fix is to look at
> IsUnderPostmaster when creating, altering or dropping a subscription
> in subscriptioncmds.c.

Yeah, below patch fixes that.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Wed, May 31, 2017 at 10:49 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Wed, May 31, 2017 at 2:20 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Yeah, see 0e0f43d6 for example. A simple fix is to look at
>> IsUnderPostmaster when creating, altering or dropping a subscription
>> in subscriptioncmds.c.
>
> Yeah, below patch fixes that.

Thanks, this looks correct to me at quick glance.

+    if (!IsUnderPostmaster)
+        ereport(FATAL,
+                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+            errmsg("subscription commands are not supported by
single-user servers")));
The messages could be more detailed, like directly the operation of
CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.
-- 
Michael



On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Thanks, this looks correct to me at quick glance.
>
> +    if (!IsUnderPostmaster)
> +        ereport(FATAL,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +            errmsg("subscription commands are not supported by
> single-user servers")));
> The messages could be more detailed, like directly the operation of
> CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.

Thanks for looking into it.  Yeah, I think it's better to give
specific message instead of generic because we still support some of
the subscription commands even in single-user mode i.e ALTER
SUBSCRIPTION OWNER.  My patch doesn't block this command and there is
no harm in supporting this in single-user mode but does this make any
sense?  We may create some use case like creation subscription in
normal mode and then ALTER OWNER in single user mode but it makes
little sense to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 6/1/17 04:49, Dilip Kumar wrote:
> On Thu, Jun 1, 2017 at 1:02 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Thanks, this looks correct to me at quick glance.
>>
>> +    if (!IsUnderPostmaster)
>> +        ereport(FATAL,
>> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> +            errmsg("subscription commands are not supported by
>> single-user servers")));
>> The messages could be more detailed, like directly the operation of
>> CREATE/ALTER/DROP SUBCRIPTION in each error message. But that's a nit.
> 
> Thanks for looking into it.  Yeah, I think it's better to give
> specific message instead of generic because we still support some of
> the subscription commands even in single-user mode i.e ALTER
> SUBSCRIPTION OWNER.  My patch doesn't block this command and there is
> no harm in supporting this in single-user mode but does this make any
> sense?  We may create some use case like creation subscription in
> normal mode and then ALTER OWNER in single user mode but it makes
> little sense to me.

We should look at what the underlying problem is before we prohibit
anything at a high level.

When I try it, I get a

TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)

which might indicate that there is a more general problem with latch use
in single-user mode.

If I remove that assertion, things work fine after that.  The originally
reported error "epoll_ctl() failed: Bad file descriptor" might indicate
that there is platform-dependent behavior.

I think the general problem is that the latch code that checks for
postmaster death does not handle single-user mode well.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:
> We should look at what the underlying problem is before we prohibit
> anything at a high level.

I'm not sure there's any underlying issue here, except being in single
user mode.


> When I try it, I get a
> 
> TRAP: FailedAssertion("!(event->fd != (-1))", File: "latch.c", Line: 861)
> 
> which might indicate that there is a more general problem with latch use
> in single-user mode.

That just means that the latch isn't initialized.  Which makes:

> If I remove that assertion, things work fine after that.  The originally
> reported error "epoll_ctl() failed: Bad file descriptor" might indicate
> that there is platform-dependent behavior.

quite unsurprising.  I'm not sure how this hints at platform dependent
behaviour?

libpqrcv_connect() uses MyProc->procLatch, which doesn't exist/isn't
initialized in single user mode.  I'm very unclear why that code uses
MyProc->procLatch rather than MyLatch, but that'd not change anything -
the tablesync stuff etc would still not work.

- Andres



On 6/1/17 21:55, Andres Freund wrote:
> On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:
>> We should look at what the underlying problem is before we prohibit
>> anything at a high level.
> 
> I'm not sure there's any underlying issue here, except being in single
> user mode.

My point is that we shouldn't be putting checks into DDL commands about
single-user mode if the actual cause of the issue is in a lower-level
system.  Not all uses of a particular DDL command necessary use a latch,
for example.  Also, there could be other things that hit a latch that
are reachable in single-user mode that we haven't found yet.

So I think the check should either go somewhere in the latch code, or
possibly in the libpqwalreceiver code.  Or we make the latch code work
so that the check-for-postmaster-death code becomes a noop in
single-user mode.  Suggestions?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On 2017-06-02 15:00:21 -0400, Peter Eisentraut wrote:
> On 6/1/17 21:55, Andres Freund wrote:
> > On 2017-06-01 21:42:41 -0400, Peter Eisentraut wrote:
> >> We should look at what the underlying problem is before we prohibit
> >> anything at a high level.
> > 
> > I'm not sure there's any underlying issue here, except being in single
> > user mode.
> 
> My point is that we shouldn't be putting checks into DDL commands about
> single-user mode if the actual cause of the issue is in a lower-level
> system.

But it's not really.


> Not all uses of a particular DDL command necessary use a latch,
> for example.  Also, there could be other things that hit a latch that
> are reachable in single-user mode that we haven't found yet.

Latches work in single user mode, it's just that the new code for some
reason uses uninitialized memory as the latch.  As I pointed out above,
the new code really should just use MyLatch instead of
MyProc->procLatch.


> So I think the check should either go somewhere in the latch code, or
> possibly in the libpqwalreceiver code.  Or we make the latch code work
> so that the check-for-postmaster-death code becomes a noop in
> single-user mode.  Suggestions?

I don't think the postmaster death code is really the issue here.  Nor
is libpqwalreceiver really the issue.  We can put ERRORs in a bunch of
unrelated subsystems, sure, but that doesn't really solve the issue that
logical rep pretty essentially requires multiple processes.  We've
prevented parallelism from being used in general (cf. standard_planner),
we've not put checks in all the subsystems it uses.

Greetings,

Andres Freund



Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> My point is that we shouldn't be putting checks into DDL commands about
> single-user mode if the actual cause of the issue is in a lower-level
> system.  Not all uses of a particular DDL command necessary use a latch,
> for example.  Also, there could be other things that hit a latch that
> are reachable in single-user mode that we haven't found yet.

> So I think the check should either go somewhere in the latch code, or
> possibly in the libpqwalreceiver code.  Or we make the latch code work
> so that the check-for-postmaster-death code becomes a noop in
> single-user mode.  Suggestions?

It's certainly plausible that we could have the latch code just ignore
WL_POSTMASTER_DEATH if not IsUnderPostmaster.  I think that the original
reasoning for not doing that was that the calling code should know which
environment it's in, and not pass an unimplementable wait-exit reason;
so silently ignoring the bit could mask a bug.  Perhaps that argument is
no longer attractive.  Alternatively, we could fix the relevant call sites
to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
behavior for the majority of call sites.
        regards, tom lane



On 6/2/17 15:41, Tom Lane wrote:
> It's certainly plausible that we could have the latch code just ignore
> WL_POSTMASTER_DEATH if not IsUnderPostmaster.  I think that the original
> reasoning for not doing that was that the calling code should know which
> environment it's in, and not pass an unimplementable wait-exit reason;
> so silently ignoring the bit could mask a bug.  Perhaps that argument is
> no longer attractive.  Alternatively, we could fix the relevant call sites
> to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
> behavior for the majority of call sites.

There are a lot of those call sites.  (And a lot of duplicate code for
what to do if postmaster death actually happens.)  I doubt we want to
check them all.

The attached patch fixes the reported issue for me.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 6/2/17 15:24, Andres Freund wrote:
> but that doesn't really solve the issue that
> logical rep pretty essentially requires multiple processes.

But it may be sensible to execute certain DDL commands for repair, which
is why I'm arguing for a finer-grained approach than just prohibiting
everything.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jun 02, 2017 at 11:06:52PM -0400, Peter Eisentraut wrote:
> On 6/2/17 15:41, Tom Lane wrote:
> > It's certainly plausible that we could have the latch code just ignore
> > WL_POSTMASTER_DEATH if not IsUnderPostmaster.  I think that the original
> > reasoning for not doing that was that the calling code should know which
> > environment it's in, and not pass an unimplementable wait-exit reason;
> > so silently ignoring the bit could mask a bug.  Perhaps that argument is
> > no longer attractive.  Alternatively, we could fix the relevant call sites
> > to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
> > behavior for the majority of call sites.
> 
> There are a lot of those call sites.  (And a lot of duplicate code for
> what to do if postmaster death actually happens.)  I doubt we want to
> check them all.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com



On 6/2/17 23:06, Peter Eisentraut wrote:
> On 6/2/17 15:41, Tom Lane wrote:
>> It's certainly plausible that we could have the latch code just ignore
>> WL_POSTMASTER_DEATH if not IsUnderPostmaster.  I think that the original
>> reasoning for not doing that was that the calling code should know which
>> environment it's in, and not pass an unimplementable wait-exit reason;
>> so silently ignoring the bit could mask a bug.  Perhaps that argument is
>> no longer attractive.  Alternatively, we could fix the relevant call sites
>> to do "(IsUnderPostmaster ? WL_POSTMASTER_DEATH : 0)", and keep the strict
>> behavior for the majority of call sites.
> 
> There are a lot of those call sites.  (And a lot of duplicate code for
> what to do if postmaster death actually happens.)  I doubt we want to
> check them all.
> 
> The attached patch fixes the reported issue for me.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:
> Latches work in single user mode, it's just that the new code for some
> reason uses uninitialized memory as the latch.  As I pointed out above,
> the new code really should just use MyLatch instead of
> MyProc->procLatch.

We seem to have accumulated quite a few instance of that.

[rhaas pgsql]$ git grep MyLatch | wc -l    116
[rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l     33

Most of the offenders are in src/backend/replication, but there are
some that are related to parallelism as well (bgworker.c, pqmq.c,
parallel.c, condition_variable.c).  Maybe we (you?) should just go and
change them all.  I don't think using MyLatch instead of
MyProc->procLatch has become automatic for everyone yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 2017-06-06 15:48:42 -0400, Robert Haas wrote:
> On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:
> > Latches work in single user mode, it's just that the new code for some
> > reason uses uninitialized memory as the latch.  As I pointed out above,
> > the new code really should just use MyLatch instead of
> > MyProc->procLatch.

FWIW, I'd misremembered some code here, and we actually reach the
function initializing the shared latch, even in single user mode.


> We seem to have accumulated quite a few instance of that.
> 
> [rhaas pgsql]$ git grep MyLatch | wc -l
>      116
> [rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l
>       33
> 
> Most of the offenders are in src/backend/replication, but there are
> some that are related to parallelism as well (bgworker.c, pqmq.c,
> parallel.c, condition_variable.c).  Maybe we (you?) should just go and
> change them all.  I don't think using MyLatch instead of
> MyProc->procLatch has become automatic for everyone yet.

Nevertheless this should be changed.  Will do.


- Andres



On 2017-06-06 12:53:21 -0700, Andres Freund wrote:
> On 2017-06-06 15:48:42 -0400, Robert Haas wrote:
> > On Fri, Jun 2, 2017 at 3:24 PM, Andres Freund <andres@anarazel.de> wrote:
> > > Latches work in single user mode, it's just that the new code for some
> > > reason uses uninitialized memory as the latch.  As I pointed out above,
> > > the new code really should just use MyLatch instead of
> > > MyProc->procLatch.
> 
> FWIW, I'd misremembered some code here, and we actually reach the
> function initializing the shared latch, even in single user mode.
> 
> 
> > We seem to have accumulated quite a few instance of that.
> > 
> > [rhaas pgsql]$ git grep MyLatch | wc -l
> >      116
> > [rhaas pgsql]$ git grep 'MyProc->procLatch' | wc -l
> >       33
> > 
> > Most of the offenders are in src/backend/replication, but there are
> > some that are related to parallelism as well (bgworker.c, pqmq.c,
> > parallel.c, condition_variable.c).  Maybe we (you?) should just go and
> > change them all.  I don't think using MyLatch instead of
> > MyProc->procLatch has become automatic for everyone yet.
> 
> Nevertheless this should be changed.  Will do.

Here's the patch for that, also addressing some issues I found while
updating those callsites (separate thread started, too).

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment