Re: wal stats questions - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: wal stats questions
Date
Msg-id 2724b8e5-9eed-b0aa-4cc7-75504b90a738@oss.nttdata.com
Whole thread Raw
In response to Re: wal stats questions  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: wal stats questions
List pgsql-hackers

On 2021/04/13 9:33, Fujii Masao wrote:
> 
> 
> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>> OK, I added the condition to the fast-return check. I noticed that I
>> misunderstood that the purpose is to avoid expanding a clock check using WAL
>> stats counters. But, the purpose is to make the conditions stricter, right?
> 
> Yes. Currently if the following condition is false even when the WAL counters
> are updated, nothing is sent to the stats collector. But with your patch,
> in this case the WAL stats are sent.
> 
>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>         !have_function_stats && !disconnect)
> 
> Thanks for the patch! It now fails to be applied to the master cleanly.
> So could you rebase the patch?

Thanks for your comments!
I rebased it.


> pgstat_initialize() should initialize pgWalUsage with zero?

Thanks. But, I didn't handle it because I undo the logic to calculate the diff
as you mentioned below.


> +    /*
> +     * set the counters related to generated WAL data.
> +     */
> +    WalStats.m_wal_records = pgWalUsage.wal_records;
> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
> 
> This should be skipped if pgWalUsage.wal_records is zero?

Yes, fixed it.


> + * Be careful that the counters are cleared after reporting them to
> + * the stats collector although you can use WalUsageAccumDiff()
> + * to computing differences to previous values. For backends,
> + * the counters may be reset after a transaction is finished and
> + * pgstat_send_wal() is invoked, so you can compute the difference
> + * in the same transaction only.
> 
> On the second thought, I'm afraid that this can be likely to be a foot-gun
> in the future. So I'm now wondering if the current approach (i.e., calculate
> the diff between the current and previous pgWalUsage and don't reset it
> to zero) is better. Thought? Since the similar data structure pgBufferUsage
> doesn't have this kind of restriction, I'm afraid that the developer can
> easily miss that only pgWalUsage has the restriction.
> 
> But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
> even when the WAL counters are not updated. Then if that calculated diff is
> zero, we skip sending the WAL stats. This logic should be improved. That is,
> probably we should be able to check whether the WAL counters are updated
> or not, without calculating the diff, because the calculation is not free.
> We can implement this by introducing new flag variable that we discussed
> upthread. This flag is set to true whenever the WAL counters are incremented
> and used to determine whether the WAL stats need to be sent.

Sound reasonable. I agreed that the restriction has a risk to lead mistakes.
I made the patch introducing a new flag.

- v4-0001-performance-improvements-of-reporting-wal-stats.patch


I think introducing a new flag is not necessary because we can know if the WAL
stats are updated or not using the counters of the generated wal record, wal
writes and wal syncs. It's fast compared to get timestamp. The attached patch
is to check if the counters are updated or not using them.

-
v4-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch


> If we do this, another issue is that the data types for wal_records and wal_fpi
> in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
> replaced with uint64.

Yes, I fixed. BufferUsage's counters have same issue, so I fixed them too.

BTW, is it better to change PgStat_Counter from int64 to uint64 because there
aren't any counters with negative number?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION


Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows
Next
From: Tomas Vondra
Date:
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96