Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers

From Nitin Jadhav
Subject Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date
Msg-id CAMm1aWa_3N1Z2-_YwdNxfogs_rxzpu90VZeKjqwjerCVq9rQAw@mail.gmail.com
Whole thread Raw
In response to Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)  (Andres Freund <andres@anarazel.de>)
Responses Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
List pgsql-hackers
> Have you measured the performance effects of this? On fast storage with large
> shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
> be good to verify that.

To understand the performance effects of the above, I have taken the
average of five checkpoints with the patch and without the patch in my
environment. Here are the results.
With patch: 269.65 s
Without patch: 269.60 s

It looks fine. Please share your views.

> This view is depressingly complicated. Added up the view definitions for
> the already existing pg_stat_progress* views add up to a measurable part of
> the size of an empty database:

Thank you so much for sharing the detailed analysis. We can remove a
few fields which are not so important to make it simple.

Thanks & Regards,
Nitin Jadhav

On Sat, Mar 19, 2022 at 5:45 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> This is a long thread, sorry for asking if this has been asked before.
>
> On 2022-03-08 20:25:28 +0530, Nitin Jadhav wrote:
> >        * Sort buffers that need to be written to reduce the likelihood of random
> > @@ -2129,6 +2132,8 @@ BufferSync(int flags)
> >               bufHdr = GetBufferDescriptor(buf_id);
> >
> >               num_processed++;
> > +             pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_PROCESSED,
> > +                                                                      num_processed);
> >
> >               /*
> >                * We don't need to acquire the lock here, because we're only looking
> > @@ -2149,6 +2154,8 @@ BufferSync(int flags)
> >                               TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> >                               PendingCheckpointerStats.m_buf_written_checkpoints++;
> >                               num_written++;
> > +                             pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_WRITTEN,
> > +                                                                                      num_written);
> >                       }
> >               }
>
> Have you measured the performance effects of this? On fast storage with large
> shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
> be good to verify that.
>
>
> > @@ -1897,6 +1897,112 @@ pg_stat_progress_basebackup| SELECT s.pid,
> >      s.param4 AS tablespaces_total,
> >      s.param5 AS tablespaces_streamed
> >     FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5,
param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18,
param19,param20); 
> > +pg_stat_progress_checkpoint| SELECT s.pid,
> > +        CASE s.param1
> > +            WHEN 1 THEN 'checkpoint'::text
> > +            WHEN 2 THEN 'restartpoint'::text
> > +            ELSE NULL::text
> > +        END AS type,
> > +    (((((((
> > +        CASE
> > +            WHEN ((s.param2 & (1)::bigint) > 0) THEN 'shutdown '::text
> > +            ELSE ''::text
> > +        END ||
> > +        CASE
> > +            WHEN ((s.param2 & (2)::bigint) > 0) THEN 'end-of-recovery '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (4)::bigint) > 0) THEN 'immediate '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (8)::bigint) > 0) THEN 'force '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (16)::bigint) > 0) THEN 'flush-all '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (32)::bigint) > 0) THEN 'wait '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (128)::bigint) > 0) THEN 'wal '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param2 & (256)::bigint) > 0) THEN 'time '::text
> > +            ELSE ''::text
> > +        END) AS flags,
> > +    (((((((
> > +        CASE
> > +            WHEN ((s.param3 & (1)::bigint) > 0) THEN 'shutdown '::text
> > +            ELSE ''::text
> > +        END ||
> > +        CASE
> > +            WHEN ((s.param3 & (2)::bigint) > 0) THEN 'end-of-recovery '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (4)::bigint) > 0) THEN 'immediate '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (8)::bigint) > 0) THEN 'force '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (16)::bigint) > 0) THEN 'flush-all '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (32)::bigint) > 0) THEN 'wait '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (128)::bigint) > 0) THEN 'wal '::text
> > +            ELSE ''::text
> > +        END) ||
> > +        CASE
> > +            WHEN ((s.param3 & (256)::bigint) > 0) THEN 'time '::text
> > +            ELSE ''::text
> > +        END) AS next_flags,
> > +    ('0/0'::pg_lsn + (
> > +        CASE
> > +            WHEN (s.param4 < 0) THEN pow((2)::numeric, (64)::numeric)
> > +            ELSE (0)::numeric
> > +        END + (s.param4)::numeric)) AS start_lsn,
> > +    to_timestamp(((946684800)::double precision + ((s.param5)::double precision / (1000000)::double precision)))
ASstart_time, 
> > +        CASE s.param6
> > +            WHEN 1 THEN 'initializing'::text
> > +            WHEN 2 THEN 'getting virtual transaction IDs'::text
> > +            WHEN 3 THEN 'checkpointing replication slots'::text
> > +            WHEN 4 THEN 'checkpointing logical replication snapshot files'::text
> > +            WHEN 5 THEN 'checkpointing logical rewrite mapping files'::text
> > +            WHEN 6 THEN 'checkpointing replication origin'::text
> > +            WHEN 7 THEN 'checkpointing commit log pages'::text
> > +            WHEN 8 THEN 'checkpointing commit time stamp pages'::text
> > +            WHEN 9 THEN 'checkpointing subtransaction pages'::text
> > +            WHEN 10 THEN 'checkpointing multixact pages'::text
> > +            WHEN 11 THEN 'checkpointing predicate lock pages'::text
> > +            WHEN 12 THEN 'checkpointing buffers'::text
> > +            WHEN 13 THEN 'processing file sync requests'::text
> > +            WHEN 14 THEN 'performing two phase checkpoint'::text
> > +            WHEN 15 THEN 'performing post checkpoint cleanup'::text
> > +            WHEN 16 THEN 'invalidating replication slots'::text
> > +            WHEN 17 THEN 'recycling old WAL files'::text
> > +            WHEN 18 THEN 'truncating subtransactions'::text
> > +            WHEN 19 THEN 'finalizing'::text
> > +            ELSE NULL::text
> > +        END AS phase,
> > +    s.param7 AS buffers_total,
> > +    s.param8 AS buffers_processed,
> > +    s.param9 AS buffers_written,
> > +    s.param10 AS files_total,
> > +    s.param11 AS files_synced
> > +   FROM pg_stat_get_progress_info('CHECKPOINT'::text) s(pid, datid, relid, param1, param2, param3, param4, param5,
param6,param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18,
param19,param20); 
> >  pg_stat_progress_cluster| SELECT s.pid,
> >      s.datid,
> >      d.datname,
>
> This view is depressingly complicated. Added up the view definitions for
> the already existing pg_stat_progress* views add up to a measurable part of
> the size of an empty database:
>
> postgres[1160866][1]=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE
ev_class::regclass::textLIKE '%progress%'; 
> ┌───────┬───────┐
> │  sum  │  sum  │
> ├───────┼───────┤
> │ 97410 │ 19786 │
> └───────┴───────┘
> (1 row)
>
> and this view looks to be a good bit more complicated than the existing
> pg_stat_progress* views.
>
> Indeed:
> template1[1165473][1]=# SELECT ev_class::regclass, length(ev_action), pg_column_size(ev_action) FROM pg_rewrite WHERE
ev_class::regclass::textLIKE '%progress%' ORDER BY length(ev_action) DESC; 
> ┌───────────────────────────────┬────────┬────────────────┐
> │           ev_class            │ length │ pg_column_size │
> ├───────────────────────────────┼────────┼────────────────┤
> │ pg_stat_progress_checkpoint   │  43290 │           5409 │
> │ pg_stat_progress_create_index │  23293 │           4177 │
> │ pg_stat_progress_cluster      │  18390 │           3704 │
> │ pg_stat_progress_analyze      │  16121 │           3339 │
> │ pg_stat_progress_vacuum       │  16076 │           3392 │
> │ pg_stat_progress_copy         │  15124 │           3080 │
> │ pg_stat_progress_basebackup   │   8406 │           2094 │
> └───────────────────────────────┴────────┴────────────────┘
> (7 rows)
>
> pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664
>
>
> pg_rewrite is the second biggest relation in an empty database already...
>
> template1[1164827][1]=# SELECT relname, pg_total_relation_size(oid) FROM pg_class WHERE relkind = 'r' ORDER BY 2 DESC
LIMIT5; 
> ┌────────────────┬────────────────────────┐
> │    relname     │ pg_total_relation_size │
> ├────────────────┼────────────────────────┤
> │ pg_proc        │                1212416 │
> │ pg_rewrite     │                 745472 │
> │ pg_attribute   │                 704512 │
> │ pg_description │                 630784 │
> │ pg_collation   │                 409600 │
> └────────────────┴────────────────────────┘
> (5 rows)
>
> Greetings,
>
> Andres Freund



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: better page-level checksums
Next
From: Nitin Jadhav
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)