Thread: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Hi all , It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show the one for temp files. I think it's not a bug but could be an item for PG15. As mentioned on that thread, this would be useful for users in a case where temp buffers I/O used most of the time. So I've written the patch for that. Please note that the patch includes only to show temp buffer I/O timing to EXPLAIN but not other possibly related changes such as pg_stat_statement improvements yet. Before (w/o patch): postgres(1:14101)=# explain (analyze, buffers) select count(*) from generate_series(1,100000); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual time=59.025..59.026 rows=1 loops=1) Buffers: temp read=171 written=171 -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1) Buffers: temp read=171 written=171 Planning Time: 0.041 ms Execution Time: 70.867 ms (6 rows) After (w/ patch): postgres(1:28754)=# explain (analyze, buffers) select count(*) from generate_series(1,100000); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual time=56.189..56.190 rows=1 loops=1) Buffers: temp read=171 written=171 I/O Timings: temp read=0.487 write=2.073 -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1) Buffers: temp read=171 written=171 I/O Timings: temp read=0.487 write=2.073 Planning Time: 0.041 ms Execution Time: 59.928 ms (8 rows) Feedback is very welcome. Regards, [1] https://www.postgresql.org/message-id/20210709084355.GA6251%40depesz.com -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all ,
It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
the one for temp files. I think it's not a bug but could be an item
for PG15. As mentioned on that thread, this would be useful for users
in a case where temp buffers I/O used most of the time. So I've
written the patch for that. Please note that the patch includes only
to show temp buffer I/O timing to EXPLAIN but not other possibly
related changes such as pg_stat_statement improvements yet.
Before (w/o patch):
postgres(1:14101)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=59.025..59.026 rows=1 loops=1)
Buffers: temp read=171 written=171
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
Buffers: temp read=171 written=171
Planning Time: 0.041 ms
Execution Time: 70.867 ms
(6 rows)
After (w/ patch):
postgres(1:28754)=# explain (analyze, buffers) select count(*) from
generate_series(1,100000);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
time=56.189..56.190 rows=1 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
-> Function Scan on generate_series (cost=0.00..1000.00
rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
Buffers: temp read=171 written=171
I/O Timings: temp read=0.487 write=2.073
Planning Time: 0.041 ms
Execution Time: 59.928 ms
(8 rows)
Feedback is very welcome.
The presentation seems a little confusing, wouldn't it be better?
I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
I think can remove this lines:
+ if (has_temp_timing)
+ appendStringInfoChar(es->str, ',');
+ appendStringInfoChar(es->str, ',');
regards,
Ranier Vilela
On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: >> >> Hi all , >> >> It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show >> the one for temp files. I think it's not a bug but could be an item >> for PG15. As mentioned on that thread, this would be useful for users >> in a case where temp buffers I/O used most of the time. So I've >> written the patch for that. Please note that the patch includes only >> to show temp buffer I/O timing to EXPLAIN but not other possibly >> related changes such as pg_stat_statement improvements yet. >> >> Before (w/o patch): >> postgres(1:14101)=# explain (analyze, buffers) select count(*) from >> generate_series(1,100000); >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual >> time=59.025..59.026 rows=1 loops=1) >> Buffers: temp read=171 written=171 >> -> Function Scan on generate_series (cost=0.00..1000.00 >> rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1) >> Buffers: temp read=171 written=171 >> Planning Time: 0.041 ms >> Execution Time: 70.867 ms >> (6 rows) >> >> After (w/ patch): >> postgres(1:28754)=# explain (analyze, buffers) select count(*) from >> generate_series(1,100000); >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual >> time=56.189..56.190 rows=1 loops=1) >> Buffers: temp read=171 written=171 >> I/O Timings: temp read=0.487 write=2.073 >> -> Function Scan on generate_series (cost=0.00..1000.00 >> rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1) >> Buffers: temp read=171 written=171 >> I/O Timings: temp read=0.487 write=2.073 >> Planning Time: 0.041 ms >> Execution Time: 59.928 ms >> (8 rows) >> >> Feedback is very welcome. > Thank you for the comments! > The presentation seems a little confusing, wouldn't it be better? > > I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073 Yeah, it looks better to add "shared/local". > > I think can remove this lines: > + if (has_temp_timing) > + appendStringInfoChar(es->str, ','); But I think that it's consistent with buffers statistics in EXPLAIN command. For example, "Buffers" in the output of EXPLAIN command could be like: Buffers: shared hit=1398, temp read=526 written=526 Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > >> > >> Hi all , > >> > >> It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show > >> the one for temp files. I think it's not a bug but could be an item > >> for PG15. As mentioned on that thread, this would be useful for users > >> in a case where temp buffers I/O used most of the time. So I've > >> written the patch for that. Please note that the patch includes only > >> to show temp buffer I/O timing to EXPLAIN but not other possibly > >> related changes such as pg_stat_statement improvements yet. > >> > >> Before (w/o patch): > >> postgres(1:14101)=# explain (analyze, buffers) select count(*) from > >> generate_series(1,100000); > >> QUERY PLAN > >> ----------------------------------------------------------------------------------------------------------------------------------- > >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual > >> time=59.025..59.026 rows=1 loops=1) > >> Buffers: temp read=171 written=171 > >> -> Function Scan on generate_series (cost=0.00..1000.00 > >> rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1) > >> Buffers: temp read=171 written=171 > >> Planning Time: 0.041 ms > >> Execution Time: 70.867 ms > >> (6 rows) > >> > >> After (w/ patch): > >> postgres(1:28754)=# explain (analyze, buffers) select count(*) from > >> generate_series(1,100000); > >> QUERY PLAN > >> ----------------------------------------------------------------------------------------------------------------------------------- > >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual > >> time=56.189..56.190 rows=1 loops=1) > >> Buffers: temp read=171 written=171 > >> I/O Timings: temp read=0.487 write=2.073 > >> -> Function Scan on generate_series (cost=0.00..1000.00 > >> rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1) > >> Buffers: temp read=171 written=171 > >> I/O Timings: temp read=0.487 write=2.073 > >> Planning Time: 0.041 ms > >> Execution Time: 59.928 ms > >> (8 rows) > >> > >> Feedback is very welcome. > > > > Thank you for the comments! > > > The presentation seems a little confusing, wouldn't it be better? > > > > I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073 > > Yeah, it looks better to add "shared/local". I've attached the updated patch that incorporates the above comment. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote:
> >
> > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
> >>
> >> Hi all ,
> >>
> >> It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show
> >> the one for temp files. I think it's not a bug but could be an item
> >> for PG15. As mentioned on that thread, this would be useful for users
> >> in a case where temp buffers I/O used most of the time. So I've
> >> written the patch for that. Please note that the patch includes only
> >> to show temp buffer I/O timing to EXPLAIN but not other possibly
> >> related changes such as pg_stat_statement improvements yet.
> >>
> >> Before (w/o patch):
> >> postgres(1:14101)=# explain (analyze, buffers) select count(*) from
> >> generate_series(1,100000);
> >> QUERY PLAN
> >> -----------------------------------------------------------------------------------------------------------------------------------
> >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
> >> time=59.025..59.026 rows=1 loops=1)
> >> Buffers: temp read=171 written=171
> >> -> Function Scan on generate_series (cost=0.00..1000.00
> >> rows=100000 width=0) (actual time=21.695..45.524 rows=100000 loops=1)
> >> Buffers: temp read=171 written=171
> >> Planning Time: 0.041 ms
> >> Execution Time: 70.867 ms
> >> (6 rows)
> >>
> >> After (w/ patch):
> >> postgres(1:28754)=# explain (analyze, buffers) select count(*) from
> >> generate_series(1,100000);
> >> QUERY PLAN
> >> -----------------------------------------------------------------------------------------------------------------------------------
> >> Aggregate (cost=1250.00..1250.01 rows=1 width=8) (actual
> >> time=56.189..56.190 rows=1 loops=1)
> >> Buffers: temp read=171 written=171
> >> I/O Timings: temp read=0.487 write=2.073
> >> -> Function Scan on generate_series (cost=0.00..1000.00
> >> rows=100000 width=0) (actual time=21.072..42.886 rows=100000 loops=1)
> >> Buffers: temp read=171 written=171
> >> I/O Timings: temp read=0.487 write=2.073
> >> Planning Time: 0.041 ms
> >> Execution Time: 59.928 ms
> >> (8 rows)
> >>
> >> Feedback is very welcome.
> >
>
> Thank you for the comments!
>
> > The presentation seems a little confusing, wouldn't it be better?
> >
> > I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073
>
> Yeah, it looks better to add "shared/local".
I've attached the updated patch that incorporates the above comment.
The patch looks fine to me.
regards,
Ranier Vilela
Hi, ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote: > Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > > > > > > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > >> > > > >> Hi all , > > > >> > > > >> It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show <snip> > > > > I've attached the updated patch that incorporates the above comment. > > The patch looks fine to me. > The patch looks good to me too. However I do wonder why the timing is added only on the if (es->format == EXPLAIN_FORMAT_TEXT) block and is not added when, for example, the format is json. The instrumentation has clearly recorded the timings regardless of the output format. Also, it might be worth while to consider adding some regression tests. To my understanding, explain.sql provides a function, explain_filter, which helps create a stable result. For example, such a test case can be: set track_io_timing = 'on'; select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)'); then it would be enough to verify that the line: I/O Timings: temp read=N.N write=N.N is present. The above would apply on the json output via `explain_filter_to_json` of course. Thoughts? Cheers, //Georgios
On Tue, Sep 7, 2021 at 12:02 AM <gkokolatos@pm.me> wrote: > > Hi, > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Tuesday, August 24th, 2021 at 13:20, Ranier Vilela <ranier.vf@gmail.com> wrote: > > > Em ter., 24 de ago. de 2021 às 03:11, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > > > On Mon, Aug 23, 2021 at 10:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > > > > > > > > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > > >> > > > > >> Hi all , > > > > >> > > > > >> It's reported on pgsql-bugs[1] that I/O timings in EXPLAIN don't show > > <snip> > > > > > > > I've attached the updated patch that incorporates the above comment. > > > > The patch looks fine to me. > > > Thank you for the comments! > The patch looks good to me too. However I do wonder why the timing is added only on > the > > if (es->format == EXPLAIN_FORMAT_TEXT) > > block and is not added when, for example, the format is json. The instrumentation has > clearly recorded the timings regardless of the output format. Good point. Fixed. > > Also, it might be worth while to consider adding some regression tests. To my > understanding, explain.sql provides a function, explain_filter, which helps create > a stable result. For example, such a test case can be: > > set track_io_timing = 'on'; > select explain_filter('explain (analyze, buffers) select count(*) from generate_series(1,100000)'); > > then it would be enough to verify that the line: > > I/O Timings: temp read=N.N write=N.N > > is present. The above would apply on the json output via `explain_filter_to_json` > of course. Agreed. I've added regression tests. I've attached an updated patch. Please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > The presentation seems a little confusing, wouldn't it be better? > > > > I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073 > > Yeah, it looks better to add "shared/local". Using the patch, I do feel like the EXPLAIN format of shared/local xxx, temp xxx is a bit confusing. If temp is going to be its own EXPLAIN IO timing category (as opposed to being summed with relation data block IO from local and shared buffers), then it seems like local and shared should be separated as well. shared xxx, local xxx, temp xxx With your patch applied, below is the top of the EXPLAIN output for a query joining a temporary table (so using local buffers) to a regular table (using shared buffers) and spilling the hash join (temp files). Aggregate (actual rows=1 loops=1) Buffers: shared read=4425, local read=4425 dirtied=4425 written=4423, temp read=5963 written=5963 I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198 I found that using the same terminology as the "EXPLAIN BUFFERS" output but not using the same categories was kind of confusing. If it is only meaningful to distinguish between relation data IO and query temp file IO, then maybe the words used in I/O Timings in EXPLAIN should be "rel data" and "temp" or something like that. - Melanie
Hi, On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote: > > I've attached an updated patch. Please review it. It seems that the regression tests aren't entirely stable, per cfbot: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298. The failures look like: diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 @@ -512,9 +512,10 @@ I/O Timings: temp read=N.N write=N.N -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) I/O Timings: temp read=N.N write=N.N + I/O Timings: shared/local read=N.N Planning Time: N.N ms Execution Time: N.N ms -(6 rows) +(7 rows) select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)'); explain_filter I don't see any obvious error in the code, so I'm wondering if it's simply the result of populating the cache with generate_series() info.
Hi, Sorry for the late reply. On Fri, Nov 19, 2021 at 7:24 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Sun, Aug 22, 2021 at 9:47 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Thu, Aug 19, 2021 at 10:52 PM Ranier Vilela <ranier.vf@gmail.com> wrote: > > > > > > Em qui., 19 de ago. de 2021 às 09:21, Masahiko Sawada <sawada.mshk@gmail.com> escreveu: > > > > > The presentation seems a little confusing, wouldn't it be better? > > > > > > I/O Timings: shared/local read=xxxx write=xxx temp read=0.487 write=2.073 > > > > Yeah, it looks better to add "shared/local". > > Using the patch, I do feel like the EXPLAIN format of > > shared/local xxx, temp xxx > > is a bit confusing. If temp is going to be its own EXPLAIN IO timing > category (as opposed to being summed with relation data block IO from > local and shared buffers), then it seems like local and shared should > be separated as well. > > shared xxx, local xxx, temp xxx I think the I/O timing shown as shared/local is the time spent on disk I/O so it doesn't matter if these disk I/O are for shared buffers or local buffers. > > With your patch applied, below is the top of the EXPLAIN output for a > query joining a temporary table (so using local buffers) to a regular > table (using shared buffers) and spilling the hash join (temp files). > > Aggregate (actual rows=1 loops=1) > Buffers: shared read=4425, local read=4425 dirtied=4425 > written=4423, temp read=5963 written=5963 > I/O Timings: shared/local read=23.546, temp read=13.309 write=74.198 > > I found that using the same terminology as the "EXPLAIN BUFFERS" output > but not using the same categories was kind of confusing. > > If it is only meaningful to distinguish between relation data IO and > query temp file IO, then maybe the words used in I/O Timings in EXPLAIN > should be "rel data" and "temp" or something like that. But if we do that, we end up using different terminology in "I/O Timing" and "Buffers". I think it's better to use consistent words used in them. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Tue, Nov 16, 2021 at 04:37:44PM +0900, Masahiko Sawada wrote: > > > > I've attached an updated patch. Please review it. > > It seems that the regression tests aren't entirely stable, per cfbot: > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298. > > The failures look like: > > diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out > --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 > +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 > @@ -512,9 +512,10 @@ > I/O Timings: temp read=N.N write=N.N > -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) > I/O Timings: temp read=N.N write=N.N > + I/O Timings: shared/local read=N.N > Planning Time: N.N ms > Execution Time: N.N ms > -(6 rows) > +(7 rows) > > select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)'); > explain_filter > > > I don't see any obvious error in the code, so I'm wondering if it's simply > the result of populating the cache with generate_series() info. Thank you for reporting. You're right, the regression test with track_io_timing = on is not entirely stable because we may read catalog data during planning time, resulting in an additional line in the EXPLAIN output. I've removed the regression tests. I've attached updated patches. I've included an improvement of pg_stat_statement support to support temp I/O timing. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
Hi, On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote: > On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > It seems that the regression tests aren't entirely stable, per cfbot: > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298. > > > > The failures look like: > > > > diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out > > --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 > > +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 > > @@ -512,9 +512,10 @@ > > I/O Timings: temp read=N.N write=N.N > > -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) > > I/O Timings: temp read=N.N write=N.N > > + I/O Timings: shared/local read=N.N > > Planning Time: N.N ms > > Execution Time: N.N ms > > -(6 rows) > > +(7 rows) > > > > select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)'); > > explain_filter > > > > > > I don't see any obvious error in the code, so I'm wondering if it's simply > > the result of populating the cache with generate_series() info. > > Thank you for reporting. > > You're right, the regression test with track_io_timing = on is not > entirely stable because we may read catalog data during planning time, > resulting in an additional line in the EXPLAIN output. I've removed > the regression tests. Hmm, but AFAICS the json format would be stable as the counters are always shown even if zero. So just doing the json format first and then the text format should also work. Although if you're really unlucky there could be a cache invalidation in between so we could just ignore the text format. But I think we should at least keep a regression test with the json format, with a comment explain why only this one is tested. > I've attached updated patches. I've included an improvement of > pg_stat_statement support to support temp I/O timing. Great! Some other considerations: - should we update pg_stat_statements documentation (and comments) for blk_(read|write)_time to mention that it's for *data file* blocks rather than simply blocks? It seems obvious now that we'd have temp_blk_(read|write)_time, but still. This part should probably be backpatched. - not really your patch fault I guess, but I see that extendBufFile() isn't handled. There shouldn't be much activity there so maybe it's ok. This is likely because smgr_extend is also not handled, but this one seems much more likely to take quite some time, and therefore should bump the timing counters.
On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > On Tue, Mar 01, 2022 at 11:35:32AM +0900, Masahiko Sawada wrote: > > On Wed, Jan 19, 2022 at 5:52 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > It seems that the regression tests aren't entirely stable, per cfbot: > > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/36/3298. > > > > > > The failures look like: > > > > > > diff -U3 /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out > > > --- /tmp/cirrus-ci-build/src/test/recovery/../regress/expected/explain.out 2022-01-19 03:50:37.087931908 +0000 > > > +++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/explain.out 2022-01-19 03:57:41.013616137 +0000 > > > @@ -512,9 +512,10 @@ > > > I/O Timings: temp read=N.N write=N.N > > > -> Function Scan on generate_series (cost=N.N..N.N rows=N width=N) (actual time=N.N..N.N rows=N loops=N) > > > I/O Timings: temp read=N.N write=N.N > > > + I/O Timings: shared/local read=N.N > > > Planning Time: N.N ms > > > Execution Time: N.N ms > > > -(6 rows) > > > +(7 rows) > > > > > > select explain_filter('explain (analyze, buffers, format json) select count(*) from generate_series(1, 100000)'); > > > explain_filter > > > > > > > > > I don't see any obvious error in the code, so I'm wondering if it's simply > > > the result of populating the cache with generate_series() info. > > > > Thank you for reporting. > > > > You're right, the regression test with track_io_timing = on is not > > entirely stable because we may read catalog data during planning time, > > resulting in an additional line in the EXPLAIN output. I've removed > > the regression tests. > > Hmm, but AFAICS the json format would be stable as the counters are always > shown even if zero. So just doing the json format first and then the text > format should also work. Although if you're really unlucky there could be a > cache invalidation in between so we could just ignore the text format. But I > think we should at least keep a regression test with the json format, with a > comment explain why only this one is tested. Fair point. By commit 7e12256b478 we disabled track_io_timing, but probably we can temporarily enable it and run one query with "buffers" and "format json" options. > > > I've attached updated patches. I've included an improvement of > > pg_stat_statement support to support temp I/O timing. > > Great! > > Some other considerations: > > - should we update pg_stat_statements documentation (and comments) for > blk_(read|write)_time to mention that it's for *data file* blocks rather than > simply blocks? It seems obvious now that we'd have > temp_blk_(read|write)_time, but still. This part should probably be > backpatched. Agreed. > > - not really your patch fault I guess, but I see that extendBufFile() isn't > handled. There shouldn't be much activity there so maybe it's ok. > This is likely because smgr_extend is also not handled, but this one seems > much more likely to take quite some time, and therefore should bump the > timing counters. You mean we should include the time for opening files as write time? IIUC smgrextend() writes data while extending file whereas extendBufFile() doesn't do that but just opens a new file. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote: > On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Hmm, but AFAICS the json format would be stable as the counters are always > > shown even if zero. So just doing the json format first and then the text > > format should also work. Although if you're really unlucky there could be a > > cache invalidation in between so we could just ignore the text format. But I > > think we should at least keep a regression test with the json format, with a > > comment explain why only this one is tested. > > Fair point. By commit 7e12256b478 we disabled track_io_timing, but > probably we can temporarily enable it and run one query with "buffers" > and "format json" options. Yes, enabling it for just this query. It can't really find any problem with the values themselves but at least the new code path would be partially executed. > > > > - not really your patch fault I guess, but I see that extendBufFile() isn't > > handled. There shouldn't be much activity there so maybe it's ok. > > This is likely because smgr_extend is also not handled, but this one seems > > much more likely to take quite some time, and therefore should bump the > > timing counters. > > You mean we should include the time for opening files as write time? Yes. In normal circumstances it shouldn't need a lot of time to do that, but I'm not so sure with e.g. network filesystems. I'm not strongly in favor of counting it, especially since smgrextend doesn't either. > IIUC smgrextend() writes data while extending file whereas > extendBufFile() doesn't do that but just opens a new file. Note that smgrextend can also call register_dirty_segment(), which can also take some time, so we can't just assume that all the time there is IO time.
On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Apr 05, 2022 at 12:51:12AM +0900, Masahiko Sawada wrote: > > On Mon, Apr 4, 2022 at 1:30 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > Hmm, but AFAICS the json format would be stable as the counters are always > > > shown even if zero. So just doing the json format first and then the text > > > format should also work. Although if you're really unlucky there could be a > > > cache invalidation in between so we could just ignore the text format. But I > > > think we should at least keep a regression test with the json format, with a > > > comment explain why only this one is tested. > > > > Fair point. By commit 7e12256b478 we disabled track_io_timing, but > > probably we can temporarily enable it and run one query with "buffers" > > and "format json" options. > > Yes, enabling it for just this query. It can't really find any problem with > the values themselves but at least the new code path would be partially > executed. > > > > > > > - not really your patch fault I guess, but I see that extendBufFile() isn't > > > handled. There shouldn't be much activity there so maybe it's ok. > > > This is likely because smgr_extend is also not handled, but this one seems > > > much more likely to take quite some time, and therefore should bump the > > > timing counters. > > > > You mean we should include the time for opening files as write time? > > Yes. In normal circumstances it shouldn't need a lot of time to do that, but > I'm not so sure with e.g. network filesystems. I'm not strongly in favor of > counting it, especially since smgrextend doesn't either. Good point. I think that adding a new place to track I/O timing can be a separate patch so probably we can work on it for PG16 or later. I've attached updated patches, please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Attachment
On Tue, Apr 05, 2022 at 10:40:04AM +0900, Masahiko Sawada wrote: > On Tue, Apr 5, 2022 at 1:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Yes. In normal circumstances it shouldn't need a lot of time to do that, but > > I'm not so sure with e.g. network filesystems. I'm not strongly in favor of > > counting it, especially since smgrextend doesn't either. > > Good point. I think that adding a new place to track I/O timing can be > a separate patch so probably we can work on it for PG16 or later. Agreed. > I've attached updated patches, please review it. It looks good to me, just one minor thing in 002: @@ -183,8 +184,10 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ - double blk_read_time; /* time spent reading, in msec */ - double blk_write_time; /* time spent writing, in msec */ + double blk_read_time; /* time spent reading blocks, in msec */ + double blk_write_time; /* time spent writing blocks, in msec */ + double temp_blk_read_time; /* time spent reading temp blocks, in msec */ + double temp_blk_write_time; /* time spent writing temp blocks, in msec */ maybe the comments should respectively be data file blocks and temp file blocks. This is a minor detail and the rest of the patch looks good to me, so I'm marking the patch as Ready for Committer!
On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote: > This is a minor detail and the rest of the patch looks good to me, so I'm > marking the patch as Ready for Committer! @@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file) + if (track_io_timing) + INSTR_TIME_SET_CURRENT(io_start); In places where we don't have clock_gettime(), this means using gettimeofday(). I would not underestimate the performance impact of such a change, even if track_io_timing is already known to have a certain overhead on some platforms. -- Michael
Attachment
On Thu, Apr 07, 2022 at 03:58:46PM +0900, Michael Paquier wrote: > On Tue, Apr 05, 2022 at 11:57:14AM +0800, Julien Rouhaud wrote: > > This is a minor detail and the rest of the patch looks good to me, so I'm > > marking the patch as Ready for Committer! > > @@ -440,10 +442,14 @@ BufFileLoadBuffer(BufFile *file) > + if (track_io_timing) > + INSTR_TIME_SET_CURRENT(io_start); > > In places where we don't have clock_gettime(), this means using > gettimeofday(). I would not underestimate the performance impact of > such a change, even if track_io_timing is already known to have a > certain overhead on some platforms. Sure, but gettimeofday() has been implemented in vDSO for quite some time on most platforms, so it shouldn't hurt that much on mainstream platforms especially compared to the cost of whatever operation is actually using that temporary file. I don't think that having an extra GUC for temp IO is sensible, if that's why you're suggesting? Or are you just asking to do some benchmarking on some platform where getting the time is known to be slow (Windows?).
On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote: > Sure, but gettimeofday() has been implemented in vDSO for quite some time on > most platforms, so it shouldn't hurt that much on mainstream platforms > especially compared to the cost of whatever operation is actually using that > temporary file. > > I don't think that having an extra GUC for temp IO is sensible, if that's why > you're suggesting? Or are you just asking to do some benchmarking on some > platform where getting the time is known to be slow (Windows?). I am asking about the latter, but the former could be one solution if the latter proves to be a problem, and this has not been discussed on the thread yet. So, with some kind of worst-case scenario, how much worse the performance gets once you add those extra calls when compared to HEAD? I think that we'd better be careful with any additions of INSTR_TIME_SET_CURRENT(). -- Michael
Attachment
On Thu, Apr 07, 2022 at 04:24:54PM +0900, Michael Paquier wrote: > On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote: > > Sure, but gettimeofday() has been implemented in vDSO for quite some time on > > most platforms, so it shouldn't hurt that much on mainstream platforms > > especially compared to the cost of whatever operation is actually using that > > temporary file. > > > > I don't think that having an extra GUC for temp IO is sensible, if that's why > > you're suggesting? Or are you just asking to do some benchmarking on some > > platform where getting the time is known to be slow (Windows?). > > I am asking about the latter, but the former could be one solution if > the latter proves to be a problem, and this has not been discussed on > the thread yet. So, with some kind of worst-case scenario, how much > worse the performance gets once you add those extra calls when > compared to HEAD? I think that we'd better be careful with any > additions of INSTR_TIME_SET_CURRENT(). I just did a quick test on my linux box, using this data: CREATE TABLE tt AS select generate_series(1, 10000) id; VACUUM ANALYZE tt; and this scenario: SET work_mem TO '64kB'; SELECT count(*) FROM (SELECT id FROM tt ORDER BY id) s; which yields this plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Aggregate (cost=1349.39..1349.40 rows=1 width=8) (actual time=5.417..5.417 rows=1 loops=1) -> Sort (cost=1199.39..1224.39 rows=10000 width=4) (actual time=2.910..4.422 rows=10000 loops=1) Sort Key: tt.id Sort Method: external merge Disk: 144kB -> Seq Scan on tt (cost=0.00..145.00 rows=10000 width=4) (actual time=0.008..1.239 rows=10000 loops=1) Planning Time: 0.405 ms Execution Time: 5.524 ms So maybe not the worst that could be tested, but probably bad enough for this patch. I ran that with pgbench, 4 clients (I have 4 cores) for 30 seconds, 3 times. Comparing master and this patch with track_io_timing activated, I see a 0.95% overhead, with a 2.6% noise level.
On Thu, Apr 7, 2022 at 4:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Apr 07, 2022 at 04:24:54PM +0900, Michael Paquier wrote: > > On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote: > > > Sure, but gettimeofday() has been implemented in vDSO for quite some time on > > > most platforms, so it shouldn't hurt that much on mainstream platforms > > > especially compared to the cost of whatever operation is actually using that > > > temporary file. > > > > > > I don't think that having an extra GUC for temp IO is sensible, if that's why > > > you're suggesting? Or are you just asking to do some benchmarking on some > > > platform where getting the time is known to be slow (Windows?). > > > > I am asking about the latter, but the former could be one solution if > > the latter proves to be a problem, and this has not been discussed on > > the thread yet. So, with some kind of worst-case scenario, how much > > worse the performance gets once you add those extra calls when > > compared to HEAD? I think that we'd better be careful with any > > additions of INSTR_TIME_SET_CURRENT(). > > I just did a quick test on my linux box, using this data: > CREATE TABLE tt AS select generate_series(1, 10000) id; > VACUUM ANALYZE tt; > > and this scenario: > SET work_mem TO '64kB'; > SELECT count(*) FROM (SELECT id FROM tt ORDER BY id) s; > > which yields this plan: > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=1349.39..1349.40 rows=1 width=8) (actual time=5.417..5.417 rows=1 loops=1) > -> Sort (cost=1199.39..1224.39 rows=10000 width=4) (actual time=2.910..4.422 rows=10000 loops=1) > Sort Key: tt.id > Sort Method: external merge Disk: 144kB > -> Seq Scan on tt (cost=0.00..145.00 rows=10000 width=4) (actual time=0.008..1.239 rows=10000 loops=1) > Planning Time: 0.405 ms > Execution Time: 5.524 ms > > So maybe not the worst that could be tested, but probably bad enough for this > patch. > > I ran that with pgbench, 4 clients (I have 4 cores) for 30 seconds, 3 times. > > Comparing master and this patch with track_io_timing activated, I see a 0.95% > overhead, with a 2.6% noise level. I've done the same test with a larger data set (10M tuples) on my machine (MacOS): HEAD: 5418.869 ms Patched: 5367.234 ms I can see about 1% overhead. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Apr 7, 2022 at 5:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Apr 7, 2022 at 4:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Thu, Apr 07, 2022 at 04:24:54PM +0900, Michael Paquier wrote: > > > On Thu, Apr 07, 2022 at 03:14:01PM +0800, Julien Rouhaud wrote: > > > > Sure, but gettimeofday() has been implemented in vDSO for quite some time on > > > > most platforms, so it shouldn't hurt that much on mainstream platforms > > > > especially compared to the cost of whatever operation is actually using that > > > > temporary file. > > > > > > > > I don't think that having an extra GUC for temp IO is sensible, if that's why > > > > you're suggesting? Or are you just asking to do some benchmarking on some > > > > platform where getting the time is known to be slow (Windows?). > > > > > > I am asking about the latter, but the former could be one solution if > > > the latter proves to be a problem, and this has not been discussed on > > > the thread yet. So, with some kind of worst-case scenario, how much > > > worse the performance gets once you add those extra calls when > > > compared to HEAD? I think that we'd better be careful with any > > > additions of INSTR_TIME_SET_CURRENT(). > > > > I just did a quick test on my linux box, using this data: > > CREATE TABLE tt AS select generate_series(1, 10000) id; > > VACUUM ANALYZE tt; > > > > and this scenario: > > SET work_mem TO '64kB'; > > SELECT count(*) FROM (SELECT id FROM tt ORDER BY id) s; > > > > which yields this plan: > > > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------ > > Aggregate (cost=1349.39..1349.40 rows=1 width=8) (actual time=5.417..5.417 rows=1 loops=1) > > -> Sort (cost=1199.39..1224.39 rows=10000 width=4) (actual time=2.910..4.422 rows=10000 loops=1) > > Sort Key: tt.id > > Sort Method: external merge Disk: 144kB > > -> Seq Scan on tt (cost=0.00..145.00 rows=10000 width=4) (actual time=0.008..1.239 rows=10000 loops=1) > > Planning Time: 0.405 ms > > Execution Time: 5.524 ms > > > > So maybe not the worst that could be tested, but probably bad enough for this > > patch. > > > > I ran that with pgbench, 4 clients (I have 4 cores) for 30 seconds, 3 times. > > > > Comparing master and this patch with track_io_timing activated, I see a 0.95% > > overhead, with a 2.6% noise level. > > I've done the same test with a larger data set (10M tuples) on my > machine (MacOS): > > HEAD: 5418.869 ms > Patched: 5367.234 ms Oops, the results are opposite: HEAD: 5367.234 ms Patched: 5418.869 ms Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Apr 07, 2022 at 09:48:02PM +0900, Masahiko Sawada wrote: > Oops, the results are opposite: > > HEAD: 5367.234 ms > Patched: 5418.869 ms I have been playing with external sorts & friends after running an instance on scissors (fsync=off, PGDATA on tmpfs, etc.), even forcing a compilation of the code with gettimeofday(), and I can see a tendency of a ~1% impact with the patch in this configuration. So there is a tendency, while this is also rather close to the usual noise range. Anyway, this is going to be helpful for debugging when temp file I/O is a bottleneck, so.. Applied 0001 with some tweaks to the docs and some indentation fixes. Now, 0002 is straight-forward but I need more coffee and lunch.. -- Michael
Attachment
On Fri, Apr 08, 2022 at 11:34:17AM +0900, Michael Paquier wrote: > Now, 0002 is straight-forward but I need more coffee and lunch.. Done this one as well, as of 76cbf7e with few tweaks. 1.9 and 1.10 changed the definition of pg_stat_statements, so I have added two extra queries for those upgrade paths in oldextversions.sql. -- Michael
Attachment
On Fri, Apr 8, 2022 at 1:37 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Apr 08, 2022 at 11:34:17AM +0900, Michael Paquier wrote: > > Now, 0002 is straight-forward but I need more coffee and lunch.. > > Done this one as well, as of 76cbf7e with few tweaks. 1.9 and 1.10 > changed the definition of pg_stat_statements, so I have added two > extra queries for those upgrade paths in oldextversions.sql. Thank you for committing both patches. Agreed with these changes. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/