Home > mailing lists

Re: Use pgBufferUsage for block reporting in analyze - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Use pgBufferUsage for block reporting in analyze
Date	July 29, 2024 23:13:11
Msg-id	CAD21AoAqmKpPZHZTibLk08XWcec61TZ3eJoWymK8kSA94ffuHA@mail.gmail.com Whole thread
In response to	Re: Use pgBufferUsage for block reporting in analyze (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Responses	Re: Use pgBufferUsage for block reporting in analyze
List	pgsql-hackers

Tree view

Hi,

On Mon, Jul 29, 2024 at 12:12 AM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> On Sat, Jul 27, 2024 at 12:35 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > An alternative idea would
> > be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
> > write its messages there. This is somewhat similar to what we do in
> > the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
> > much but I think it could be better than writing logs in the single
> > format.
>

I have one comment on 0001 patch:

                         /*
                          * Calculate the difference in the Page
Hit/Miss/Dirty that
                          * happened as part of the analyze by
subtracting out the
                          * pre-analyze values which we saved above.
                          */
-                        AnalyzePageHit = VacuumPageHit - AnalyzePageHit;
-                        AnalyzePageMiss = VacuumPageMiss - AnalyzePageMiss;
-                        AnalyzePageDirty = VacuumPageDirty - AnalyzePageDirty;
+                        memset(&bufferusage, 0, sizeof(BufferUsage));
+                        BufferUsageAccumDiff(&bufferusage,
&pgBufferUsage, &startbufferusage);
+
+                        total_blks_hit = bufferusage.shared_blks_hit +
+                                bufferusage.local_blks_hit;
+                        total_blks_read = bufferusage.shared_blks_read +
+                                bufferusage.local_blks_read;
+                        total_blks_dirtied = bufferusage.shared_blks_dirtied +
+                                bufferusage.local_blks_dirtied;

The comment should also be updated or removed.

And here are some comments on 0002 patch:

-           TimestampDifference(starttime, endtime, &secs_dur, &usecs_dur);
+           delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);

I think that this change is to make vacuum code consistent with
analyze code, particularly the following part:

            /*
             * We do not expect an analyze to take > 25 days and it simplifies
             * things a bit to use TimestampDifferenceMilliseconds.
             */
            delay_in_ms = TimestampDifferenceMilliseconds(starttime, endtime);

However, as the above comment says, delay_in_ms can have a duration up
to 25 days. I guess it would not be a problem for analyze cases but
could be in vacuum cases as vacuum could sometimes be running for a
very long time. I've seen vacuums running even for almost 1 month. So
I think we should keep this part.

---
         /* measure elapsed time iff autovacuum logging requires it */
-        if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
+        if (instrument)

The comment should also be updated.

---
Could you split the 0002 patch into two patches? One is to have
ANALYZE command (with VERBOSE option) write the buffer usage, and
second one is to add WAL usage to both ANALYZE command and
autoanalyze. I think adding WAL usage to ANALYZE could be
controversial as it should not be WAL-intensive operation, so I'd like
to keep them separate.


> I've tested this approach, it definitely looks better. I've added a
> logbuf StringInfo to AcquireSampleRowsFunc which will receive the
> logs. elevel was removed as it is not used anymore. Since everything
> is in the same log line, I've removed the relation name in the acquire
> sample functions.
>
> For partitioned tables, I've also added the processed partition table
> being sampled. The output will look like:
>
> INFO:  analyze of table "postgres.public.test_partition"
> Sampling rows from child "public.test_partition_1"
> pages: 5 of 5 scanned
> tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
> estimated total tuples
> Sampling rows from child "public.test_partition_2"
> pages: 5 of 5 scanned
> tuples: 1000 live tuples, 0 are dead; 1000 tuples in sample, 1000
> estimated total tuples
> avg read rate: 2.604 MB/s, avg write rate: 0.000 MB/s
> ...
>
> For a file_fdw, the output will be:
>
> INFO:  analyze of table "postgres.public.pglog"
> tuples: 60043 tuples; 30000 tuples in sample
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> ...

Thank you for updating the patch. With your patch, I got the following
logs for when executing ANALYZE VERBOSE on a partitioned table:

postgres(1:3971560)=# analyze (verbose) p;
INFO:  analyzing "public.p" inheritance tree
INFO:  analyze of table "postgres.public.p"
Sampling rows from child "public.c1"
pages: 10000 of 14750 scanned
tuples: 2259833 live tuples, 0 are dead; 10000 tuples in sample,
3333254 estimated total tuples
Sampling rows from child "public.c2"
pages: 10000 of 14750 scanned
tuples: 2260000 live tuples, 0 are dead; 10000 tuples in sample,
3333500 estimated total tuples
Sampling rows from child "public.c3"
pages: 10000 of 14750 scanned
tuples: 2259833 live tuples, 0 are dead; 10000 tuples in sample,
3333254 estimated total tuples
avg read rate: 335.184 MB/s, avg write rate: 0.031 MB/s
buffer usage: 8249 hits, 21795 reads, 2 dirtied
WAL usage: 6 records, 1 full page images, 8825 bytes
system usage: CPU: user: 0.46 s, system: 0.03 s, elapsed: 0.50 s
:

Whereas the current log messages are like follow:

INFO:  analyzing "public.p" inheritance tree
INFO:  "c1": scanned 10000 of 14750 pages, containing 2259833 live
rows and 0 dead rows; 10000 rows in sample, 3333254 estimated total
rows
INFO:  "c2": scanned 10000 of 14750 pages, containing 2259834 live
rows and 0 dead rows; 10000 rows in sample, 3333255 estimated total
rows
INFO:  "c3": scanned 10000 of 14750 pages, containing 2259833 live
rows and 0 dead rows; 10000 rows in sample, 3333254 estimated total
rows
:

It seems to me that the current style is more concise and readable (3
rows per table vs. 1 row per table). We might need to consider a
better output format for partitioned tables as the number of
partitions could be high. I don't have a good idea now, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Kirill Reshke
Date: 29 July 2024, 22:32:19
Subject: Re: Incremental View Maintenance, take 2

From: Jacob Champion
Date: 29 July 2024, 23:15:33
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER

Re: Use pgBufferUsage for block reporting in analyze - Mailing list pgsql-hackers

Previous

Next