Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)
Date
Msg-id 506DA6BD.9020809@vmware.com
Whole thread Raw
In response to Re: Switching timeline over streaming replication  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On 03.10.2012 18:15, Amit Kapila wrote:
> 35.WalSenderMain(void)
> {
> ..
> +                if (walsender_shutdown_requested)
> +                        ereport(FATAL,
> +                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
> +                                         errmsg("terminating replication
> connection due to administrator command")));
> +
> +                /* Tell the client that we are ready to receive commands */
>
> +                ReadyForQuery(DestRemote);
> +
> ..
>
> +                if (walsender_shutdown_requested)
> +                        ereport(FATAL,
> +                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
> +                                         errmsg("terminating replication
> connection due to administrator command")));
> +
>
> is it necessary to check walsender_shutdown_requested 2 times in a
> loop, if yes, then can we write comment why it is important to check
> it again.

The idea was to check for shutdown request before and after the
pq_getbyte() call, because that can block for a long time.

Looking closer, we don't currently (ie. without this patch) make any
effort to react to SIGTERM in a walsender, while it's waiting for a
command from the client. After starting replication, it does check
walsender_shutdown_requested in the loop, and it's also checked during a
base backup (although only when switching to send next file, which seems
too seldom). This issue is orthogonal to handling timeline changes over
streaming replication, although that patch will make it more important
to handle SIGTERM quickly while waiting for a command, because you stay
in that mode for longer and more often.

I think walsender needs to share more infrastructure with regular
backends to handle this better. When we first implemented streaming
replication in 9.0, it made sense to implement just the bare minimum
needed to accept the handshake commands before entering the Copy state,
but now that the replication command set has grown to cover base
backups, and fetching timelines with the patch being discussed, we
should bite the bullet and make the command loop more feature-complete
and robust.

In a regular backend, the command loop reacts to SIGTERM immediately,
setting ImmediateInterruptOK at the right places, and calling
CHECK_FOR_INTERRUPTS() at strategic places. I propose that we let
PostgresMain handle the main command loop for walsender processes too,
like it does for regular backends, and use ProcDiePending and the
regular die() signal handler for SIGTERM in walsender as well.

So I propose the attached patch. I made small changes to postgres.c to
make it call exec_replication_command() instead of exec_simple_query(),
and reject extend query protocol, in a WAL sender process. A lot of code
related to handling the main command loop and signals is removed from
walsender.c.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Make pg_basebackup configure and start standby [Review]
Next
From: Gaetano Mendola
Date:
Subject: hackers newsgroup hacked ?