Re: Replication slot stats misgivings - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Replication slot stats misgivings |
Date | |
Msg-id | CALDaNm1ZkBc=ycQyi=mavZK0CUg-iQPXUc7mxszNvk7L5+7TKA@mail.gmail.com Whole thread Raw |
In response to | Re: Replication slot stats misgivings (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Tue, Apr 13, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Apr 12, 2021 at 9:16 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > >
> > > Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.
> > >
> > > <entry role="catalog_table_entry"><para role="column_definition">
> > > + <structfield>total_bytes</structfield><type>bigint</type>
> > > + </para>
> > > + <para>
> > > + Amount of decoded transactions data sent to the decoding output plugin
> > > + while decoding the changes from WAL for this slot. This can be used to
> > > + gauge the total amount of data sent during logical decoding.
> > >
> > > Can we slightly extend it to say something like: Note that this
> > > includes the bytes streamed and or spilled. Similarly, we can extend
> > > it for total_txns.
> > >
> >
> > Thanks for the comments, the comments are fixed in the v8 patch attached.
> > Thoughts?
>
> Here are review comments on new TAP tests:
Thanks for the comments.
> +# Create replication slots.
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot1',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot2',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot3',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot4',
> 'test_decoding')");
>
> and
>
> +
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can do those similar queries in a single psql connection
> like follows:
>
> # Create replication slots.
> $node->safe_psql('postgres',
> qq[
> SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
> ]);
>
> and
>
> $node->safe_psql('postgres',
> qq[
> SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> ]);
>
Modified.
> ---
> +# Wait for the statistics to be updated.
> +my $slot1_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot2_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot3_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot4_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
> +
> +# Verify that the statistics have been updated.
> +$node->poll_query_until('postgres', $slot1_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot2_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot3_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot4_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
>
> We can simplify the above code to something like:
>
> $node->poll_query_until(
> 'postgres', qq[
> SELECT count(slot_name) >= 4
> FROM pg_stat_replication_slots
> WHERE slot_name ~ 'regression_slot'
> AND total_txns > 0
> AND total_bytes > 0;
> ]) or die "Timed out while waiting for statistics to be updated";
>
Modified.
> ---
> +# Test to remove one of the replication slots and adjust max_replication_slots
> +# accordingly to the number of slots and verify replication statistics data is
> +# fine after restart.
>
> I think it's better if we explain in detail what cases we're trying to
> test. How about the following description?
>
> Test to remove one of the replication slots and adjust
> max_replication_slots accordingly to the number of slots. This leads
> to a mismatch of the number of slots between in the stats file and on
> shared memory, simulating the message for dropping a slot got lost. We
> verify replication statistics data is fine after restart.
Slightly reworded and modified it.
These comments are fixed as part of the v9 patch posted at [1].
[1] - https://www.postgresql.org/message-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR%2B_w8ok1kCcykveuL2Q%40mail.gmail.com
>
> On Mon, Apr 12, 2021 at 9:16 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:
> > > >
> > >
> > > Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002.
> > >
> > > <entry role="catalog_table_entry"><para role="column_definition">
> > > + <structfield>total_bytes</structfield><type>bigint</type>
> > > + </para>
> > > + <para>
> > > + Amount of decoded transactions data sent to the decoding output plugin
> > > + while decoding the changes from WAL for this slot. This can be used to
> > > + gauge the total amount of data sent during logical decoding.
> > >
> > > Can we slightly extend it to say something like: Note that this
> > > includes the bytes streamed and or spilled. Similarly, we can extend
> > > it for total_txns.
> > >
> >
> > Thanks for the comments, the comments are fixed in the v8 patch attached.
> > Thoughts?
>
> Here are review comments on new TAP tests:
Thanks for the comments.
> +# Create replication slots.
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot1',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot2',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot3',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot4',
> 'test_decoding')");
>
> and
>
> +
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can do those similar queries in a single psql connection
> like follows:
>
> # Create replication slots.
> $node->safe_psql('postgres',
> qq[
> SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding');
> ]);
>
> and
>
> $node->safe_psql('postgres',
> qq[
> SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> ]);
>
Modified.
> ---
> +# Wait for the statistics to be updated.
> +my $slot1_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot2_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot3_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot4_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
> +
> +# Verify that the statistics have been updated.
> +$node->poll_query_until('postgres', $slot1_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot2_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot3_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot4_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
>
> We can simplify the above code to something like:
>
> $node->poll_query_until(
> 'postgres', qq[
> SELECT count(slot_name) >= 4
> FROM pg_stat_replication_slots
> WHERE slot_name ~ 'regression_slot'
> AND total_txns > 0
> AND total_bytes > 0;
> ]) or die "Timed out while waiting for statistics to be updated";
>
Modified.
> ---
> +# Test to remove one of the replication slots and adjust max_replication_slots
> +# accordingly to the number of slots and verify replication statistics data is
> +# fine after restart.
>
> I think it's better if we explain in detail what cases we're trying to
> test. How about the following description?
>
> Test to remove one of the replication slots and adjust
> max_replication_slots accordingly to the number of slots. This leads
> to a mismatch of the number of slots between in the stats file and on
> shared memory, simulating the message for dropping a slot got lost. We
> verify replication statistics data is fine after restart.
Slightly reworded and modified it.
These comments are fixed as part of the v9 patch posted at [1].
[1] - https://www.postgresql.org/message-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR%2B_w8ok1kCcykveuL2Q%40mail.gmail.com
Regards,
Vignesh
Vignesh
pgsql-hackers by date: