Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From vignesh C
Subject Re: Replication slot stats misgivings
Date
Msg-id CALDaNm3jhjd5P8-w=AKRORa0Q_A1sprG8Qk7A5-0GaT1J7dZSw@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21@gmail.com> wrote:
> >
>
> Few comments on the latest patches:
> Comments on 0001
> --------------------------------
> 1.
> @@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
> TransactionId xid, bool create,
>   dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
>   AssertTXNLsnOrder(rb);
>   }
> +
> + rb->totalTxns++;
>   }
>   else
>   txn = NULL; /* not found and not asked to create */
> @@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
>   {
>   txn->size += sz;
>   rb->size += sz;
> + rb->totalBytes += sz;
>
> I think this will include the txns that are aborted and for which we
> don't send anything. It might be better to update these stats in
> ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
> have sent the data. We can probably use size/total_size in txn. We
> need to be careful to not double include the totaltxn or totalBytes
> for streaming xacts as we might process the same txn multiple times.
>
> 2.
> +        Amount of decoded transactions data sent to the decoding output plugin
> +        while decoding the changes from WAL for this slot. This and total_txns
> +        for this slot can be used to gauge the total amount of data during
> +        logical decoding.
>
> I think we can slightly modify the second line here: "This can be used
> to gauge the total amount of data sent during logical decoding.". Why
> we need to include total_txns along with it.
>
> 0002
> ----------
> 3.
> +  -- we don't want to wait forever; loop will exit after 30 seconds
> +  FOR i IN 1 .. 5 LOOP
> +
> ...
> ...
> +
> +    -- wait a little
> +    perform pg_sleep_for('100 milliseconds');
>
> I think this loop needs to be executed 300 times instead of 5 times,
> if the above comments and code needs to do what is expected here?
>
>
> 4.
> +# Test to drop one of the subscribers and verify replication statistics data is
> +# fine after publisher is restarted.
> +$node->safe_psql('postgres', "SELECT
> pg_drop_replication_slot('regression_slot4')");
> +
> +$node->stop;
> +$node->start;
> +
> +# Verify statistics data present in pg_stat_replication_slots are sane after
> +# publisher is restarted
> +$result = $node->safe_psql('postgres',
> + "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
> + FROM pg_stat_replication_slots ORDER BY slot_name"
>
> Various comments in the 0002 refer to publisher/subscriber which is
> not what we are using here.
>
> 5.
> +# Create table.
> +$node->safe_psql('postgres',
> +        "CREATE TABLE test_repl_stat(col1 int)");
> +$node->safe_psql('postgres',
> +        "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +        "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +        "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> +        "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can save the above calls to pg_logical_slot_get_changes if
> we create table before creating the slots in this test.
>
> 0003
> ---------
> 6. In the tests/code, publisher is used at multiple places. I think
> that is not required because this can happen via plugin as well.
> 7.
> + if (max_replication_slots == nReplSlotStats)
> + {
> + ereport(pgStatRunningInCollector ? LOG : WARNING,
> + (errmsg("skipping \"%s\" replication slot statistics as
> pg_stat_replication_slots does not have enough slots",
> + NameStr(replSlotStats[nReplSlotStats].slotname))));
> + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
>
> Do we need memset here? Isn't this location is past the max location?

Thanks for the comments, I will fix and post a patch for this soon.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Stronger safeguard for archive recovery not to miss data
Next
From: Kazutaka Onishi
Date:
Subject: Re: TRUNCATE on foreign table