Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: New statistics for tuning WAL buffer size
Date
Msg-id 25c55c57ca9af60707bff6667307a3c7@oss.nttdata.com
Whole thread Raw
In response to Re: New statistics for tuning WAL buffer size  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: New statistics for tuning WAL buffer size  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On 2020-09-15 17:10, Fujii Masao wrote:
> On 2020/09/15 15:52, Masahiro Ikeda wrote:
>> On 2020-09-11 01:40, Fujii Masao wrote:
>>> On 2020/09/09 13:57, Masahiro Ikeda wrote:
>>>> On 2020-09-07 16:19, Fujii Masao wrote:
>>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>>>>> Thanks for the review and advice!
>>>>>> 
>>>>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>>>>> +/* ----------
>>>>>>>>> + * Backend types
>>>>>>>>> + * ----------
>>>>>>>>> 
>>>>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>>>>> This issue could cause the following compiler warning.
>>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block 
>>>>>>>>> comment [-Wcomment]
>>>>>>>> 
>>>>>>>> Thanks for the comment. I fixed.
>>>>>>> 
>>>>>>> Thanks for the fix! But why are those comments necessary?
>>>>>> 
>>>>>> Sorry about that. This comment is not necessary.
>>>>>> I removed it.
>>>>>> 
>>>>>>>> The pg_stat_walwriter is not security restricted now, so 
>>>>>>>> ordinary users can access it.
>>>>>>>> It has the same security level as pg_stat_archiver. If you have 
>>>>>>>> any comments, please let me know.
>>>>>>> 
>>>>>>> +       <structfield>dirty_writes</structfield> 
>>>>>>> <type>bigint</type>
>>>>>>> 
>>>>>>> I guess that the column name "dirty_writes" derived from
>>>>>>> the DTrace probe name. Isn't this name confusing? We should
>>>>>>> rename it to "wal_buffers_full" or something?
>>>>>> 
>>>>>> I agree and rename it to "wal_buffers_full".
>>>>>> 
>>>>>>> +/* ----------
>>>>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to 
>>>>>>> update statistics.
>>>>>>> 
>>>>>>> This comment seems not accurate because backends also send it.
>>>>>>> 
>>>>>>> +/*
>>>>>>> + * WAL writes statistics counter is updated in XLogWrite 
>>>>>>> function
>>>>>>> + */
>>>>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>>>> 
>>>>>>> This comment seems not right because the counter is not updated 
>>>>>>> in XLogWrite().
>>>>>> 
>>>>>> Right. I fixed it to "Sent by each backend and background workers 
>>>>>> to update WAL statistics."
>>>>>> In the future, other statistics will be included so I remove the 
>>>>>> function's name.
>>>>>> 
>>>>>> 
>>>>>>> +-- There will surely and maximum one record
>>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>>>> 
>>>>>>> What about changing this comment to "There must be only one 
>>>>>>> record"?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>>>>                      LWLockRelease(WALWriteLock);
>>>>>>> 
>>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be 
>>>>>>> protected
>>>>>>> with WALWriteLock, isn't it better to increment that after 
>>>>>>> releasing the lock?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>>> +CREATE VIEW pg_stat_walwriter AS
>>>>>>> +    SELECT
>>>>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>>>>> +
>>>>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>>>> 
>>>>>>> In system_views.sql, the definition of pg_stat_walwriter should 
>>>>>>> be
>>>>>>> placed just after that of pg_stat_bgwriter not 
>>>>>>> pg_stat_progress_analyze.
>>>>>> 
>>>>>> OK, I fixed it.
>>>>>> 
>>>>>>>      }
>>>>>>> -
>>>>>>>      /*
>>>>>>>       * We found an existing collector stats file. Read it and 
>>>>>>> put all the
>>>>>>> 
>>>>>>> You seem to accidentally have removed the empty line here.
>>>>>> 
>>>>>> Sorry about that. I fixed it.
>>>>>> 
>>>>>>> -                 errhint("Target must be \"archiver\" or 
>>>>>>> \"bgwriter\".")));
>>>>>>> +                 errhint("Target must be \"archiver\" or 
>>>>>>> \"bgwriter\" or
>>>>>>> \"walwriter\".")));
>>>>>>> 
>>>>>>> There are two "or" in the message, but the former should be 
>>>>>>> replaced with ","?
>>>>>> 
>>>>>> Thanks, I fixed.
>>>>>> 
>>>>>> 
>>>>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>>>> 
>>>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:
>>>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com>
>>>>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>>>>> pg_stat_walwriter.
>>>>>>>>>>> I think it is better to match naming scheme with other views
>>>>>>>> like
>>>>>>>>>> pg_stat_bgwriter,
>>>>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>>>>> related to backend.
>>>>>>>>>> 
>>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency 
>>>>>>>>>> with
>>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>>>>> I'd like to hear more opinons about this.
>>>>>>>>> 
>>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>>>>> misunderstanding because its information is not limited to WAL
>>>>>>>> writer.
>>>>>>>>> 
>>>>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>>>> 
>>>>>>>> Sounds reasonable.
>>>>>>> 
>>>>>>> +1.
>>>>>>> 
>>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now 
>>>>>>> --
>>>>>>> it became even more apparent when the checkpointer was split out 
>>>>>>> to
>>>>>>> it's own process, and that's not exactly a recent change. And it 
>>>>>>> had
>>>>>>> allocs in it from day one...
>>>>>>> 
>>>>>>> I think naming it for what the data in it is ("wal") rather than 
>>>>>>> which
>>>>>>> process deals with it ("walwriter") is correct, unless the 
>>>>>>> statistics
>>>>>>> can be known to only *ever* affect one type of process. (And then
>>>>>>> different processes can affect different columns in the view). As 
>>>>>>> a
>>>>>>> general rule -- and that's from what I can tell exactly what's 
>>>>>>> being
>>>>>>> proposed.
>>>>>> 
>>>>>> Thanks for your comments. I agree with your opinions.
>>>>>> I changed the view name to "pg_stat_wal".
>>>>>> 
>>>>>> 
>>>>>> I fixed the code to send the WAL statistics from not only backend 
>>>>>> and walwriter
>>>>>> but also checkpointer, walsender and autovacuum worker.
>>>>> 
>>>>> Good point! Thanks for updating the patch!
>>>>> 
>>>>> 
>>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams 
>>>>> *params,
>>>>>                           onerel->rd_rel->relisshared,
>>>>>                           Max(new_live_tuples, 0),
>>>>>                           vacrelstats->new_dead_tuples);
>>>>> +    pgstat_send_wal();
>>>>> 
>>>>> I guess that you changed heap_vacuum_rel() as above so that 
>>>>> autovacuum
>>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>>>>> the processes (e.g., backends) other than autovacuum workers? Also
>>>>> what happens if autovacuum workers just do ANALYZE only? In that 
>>>>> case,
>>>>> heap_vacuum_rel() may not be called.
>>>>> 
>>>>> Currently autovacuum worker reports the stats at the exit via
>>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>>>>> is not the process that basically keeps running during the service. 
>>>>> It exits
>>>>> after it does vacuum or analyze. So ISTM that it's not bad to 
>>>>> report the stats
>>>>> only at the exit, in autovacuum worker case. There is no need to 
>>>>> add extra
>>>>> code for WAL stats report by autovacuum worker. Thought?
>>>> 
>>>> Thanks, I understood. I removed this code.
>>>> 
>>>>> 
>>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>>>>          else
>>>>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>>>>  +        /* Send wal statistics */
>>>>> +        pgstat_send_wal();
>>>>> 
>>>>> AFAIR logical walsender uses three loops in WalSndLoop(), 
>>>>> WalSndWriteData()
>>>>> and WalSndWaitForWal(). But could you tell me why added 
>>>>> pgstat_send_wal()
>>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is 
>>>>> the best
>>>>> for that purpose.
>>>> 
>>>> I checked what function calls XLogBackgroundFlush() which calls
>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>> 
>>>> I found that WalSndWaitForWal() calls it, so I added it.
>>> 
>>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
>>> second argument opportunistic=true, so in this case WAL write by
>>> wal_buffers full seems to never happen. Right? If this understanding
>>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
>>> Probably also walwriter doesn't need to do that.
> 
> Thanks for updating the patch! This patch adds pgstat_send_wal() in
> walwriter main loop. But isn't this unnecessary because of the above 
> reason?
> That is, since walwriter calls AdvanceXLInsertBuffer() with
> the second argument "opportunistic" = true via XLogBackgroundFlush(),
> the event of full wal_buffers will never happen. No?

Right, I fixed it.

>>> 
>>> The logical rep walsender can generate WAL and call
>>> AdvanceXLInsertBuffer() when it executes the replication commands 
>>> like
>>> CREATE_REPLICATION_SLOT. But this case is already covered by
>>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
>>> with your patch. So no more calls to pgstat_send_wal() seems 
>>> necessary
>>> for logical rep walsender.
>> 
>> Thanks for your reviews. I didn't notice that.
>> I updated the patches.
> 
> Sorry, the above my analysis might be incorrect. During logical 
> replication,
> walsender may access to the system table. Which may cause HOT pruning
> or killing of dead index tuple. Also which can cause WAL and
> full wal_buffers event. Thought?

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

WalSndLoop()
  -> XLogSendLogical()
   -> LogicalDecodingProcessRecord()
    -> DecodeXactOp()
     -> DecodeCommit()
      -> ReorderBufferCommit()
       -> ReorderBufferProcessTXN()
        -> RelidByRelfilenode()
         -> systable_getnext()

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

Is it necessary to introduce a mechanism to send in bulk?
But I worried about how to implement is best. Is it good to send wal 
statistics per X recoreds?


I think there are other background processes that access the system 
tables,
so I organized which process must send wal metrics and added 
pgstat_send_wal() to the main loop of some background processes
for example, autovacuum launcher, logical replication launcher, and 
logical replication worker's one.

(*) [x]: it needs to send it
     [ ]: it don't need to send it

* [ ] postmaster
* [ ] background writer
* [x] checkpointer: it generates wal for checkpoint.
* [ ] walwriter
* [x] autovacuum launcher: it accesses to the system tables to get the 
database list.
* [x] autovacuum worker: it generates wal for vacuum.
* [ ] stats collector
* [x] backend: it generate wal for query execution.
* [ ] startup
* [ ] archiver
* [x] walsender: it accesses to the system tables if logical replication 
is performed.
* [ ] walreceiver
* [x] logical replication launcher: it accesses to the system tables to 
get the subscription list.
* [x] logical replication worker: it accesses to the system tables to 
get oid from relname.
* [x] parallel worker: it generates wal for query execution.

If my understanding is wrong, please let me know.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION
Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: New statistics for tuning WAL buffer size