Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date
Msg-id CAEepm=2cvEJveJhNqsNZgWN=S1ZAhMXyATr6fYhibFtpQRrDKA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "Causal reads" mode for load balancing reads without stale data  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Re: Proposal: "Causal reads" mode for load balancing reads without stale data
List pgsql-hackers
On Wed, Mar 23, 2016 at 12:37 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 14, 2016 at 2:38 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>> What's with the extra block?
>>
>> Yeah, that's silly, thanks.  Tidied up for the next version.
>
> Some more comments on 0001:
>
> +        <literal>remote_write</>, <literal> remote_apply</>,
> <literal>local</>, and <literal>off</>.
>
> Extra space.

Fixed.

> +         * apply when the transaction eventuallly commits or aborts.
>
> Spelling.

Fixed.

> +        if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> +            assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_FLUSH, NULL);
> +
> +        SyncRepWaitForLSN(gxact->prepare_end_lsn);
> +
> +        if (synchronous_commit == SYNCHRONOUS_COMMIT_REMOTE_APPLY)
> +            assign_synchronous_commit(SYNCHRONOUS_COMMIT_REMOTE_APPLY, NULL);
>
> You can't do this.  Directly changing the value of a variable that is
> backing a GUC is verboten, and doing it through the thin veneer of
> calling the assign-hook will not avoid the terrible wrath of the
> powers that dwell in the outer dark, and/or Pittsburgh.  You probably
> need a dance here similar to the way forcePageWrites/fullPageWrites
> work.

Yeah, that was terrible.  Instead of that I have now made this interface change:

-SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
+SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)

If you want to wait for non-commit records (prepare being the only
case of that), you pass false in the second argument, and then the
wait level is capped at remote flush inside that function.  There is
no way to wait for non-commit records to be applied and there is no
point in making it so that you can (they don't have any user-visible
effect).

>      /*
> +     * Check if the caller would like to ask standbys for immediate feedback
> +     * once this commit is applied.
> +    */
>
> Whitespace.

Fixed.

> +    /*
> +     * Check if the caller would like to ask standbys for immediate feedback
> +     * once this abort is applied.
> +    */
>
> Whitespace again.

Fixed.

>  /*
> + * doRequestWalReceiverReply is used by recovery code to ask the main recovery
> + * loop to trigger a walreceiver reply.
> + */
> +static bool doRequestWalReceiverReply;
>
> This is the sort of comment that leads me to ask "why even bother
> writing a comment?".  Try to say something that's not obvious from the
> variable name. The comment for XLogRequestWalReceiverReply has a
> similar issue.

Changed.

> +static void WalRcvBlockSigUsr2(void)
>
> Style - newline after void.

Fixed.

> +static void WalRcvUnblockSigUsr2(void)
>
> And again here.

Fixed.

> +                WalRcvUnblockSigUsr2();
>                  len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
> +                WalRcvBlockSigUsr2();
>
> This does not seem like it will be cheap on all operating systems.  I
> think you should try to rejigger this somehow so that it can just set
> the process latch and the wal receiver figures it out from looking at
> shared memory.  Like maybe a flag in WalRcvData?  An advantage of this
> is that it should cut down on the number of signals significantly,
> because it won't need to send SIGUSR1 when the latch is already set.

Still experimenting with a latch here.  I will come back on this point soon.

> + * Although only "on", "off", "remote_apply", "remote_write", and "local" are
> + * documented, we accept all the likely variants of "on" and "off".
>
> Maybe switch to listing the undocumented values.

It follows a pattern used by several nearby bits of code, so it
doesn't look like it should be different, and besides you can see what
the undocumented values are, they're right below.

Here are some test results run on a bunch of Amazon EC2 "m3.large"
under Ubuntu Trusty in the Oregon zone, all in the same subnet.
Defaults except 1GB shared_buffers.

1.  Simple sequential updates (using 'test-causal-reads.c', already
posted up-thread):

synchronous_commit    TPS
==================== ====
off                  9234
local                1223
remote_write          907
on                    587
remote_apply          555

causal_reads          TPS
==================== ====
0 cr standbys        1112
1 cr standbys         541
2 cr standbys         487
3 cr standbys         467

2. Some pgbench -c4 -j2 -N bench2 runs:

synchronous_commit    TPS
==================== ====
off                  3937
local                1984
remote_write         1701
on                   1373
remote_apply         1349

causal_reads          TPS
==================== ====
0 cr standbys        1973
1 cr standbys        1413
2 cr standbys        1282
3 cr standbys        1163

--
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: BRIN is missing in multicolumn indexes documentation
Next
From: Robert Haas
Date:
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data