Thread: Buffer pool statistics in Explain Analyze
<div dir="ltr">Hi,<br /><br />I believe it makes sense adding some more details to explain analyze output like the numberof pages read/written.<br />This will allow one to understand the workload the query puts on the server making it easierto tune queries, choose the best indices, etc.<br /><br />As far as I understand, this patch is rather trivial as backend\storage\buffercollects the required information. The only thing is to add several fields to the struct Instrumentation(include\executor\instrument.h), fill them in during Instr* calls in (backend\executor\instrument.c) and updatebackend\commands\explain.c to report the collected data.<br /><div dir="ltr"><br />I think think this change is goingto split page access statistics by each and every execution node (much like 'actual time' is split now).<br /><br />Willthis patch ever be accepted?<br /><br /><br />Regards,<br />Vladimir Sitnikov</div></div>
"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> writes: > Hi, > > I believe it makes sense adding some more details to explain analyze output > like the number of pages read/written. This will allow one to understand the > workload the query puts on the server making it easier to tune queries, > choose the best indices, etc. Hah, funny you should bring this up -- I was just experimenting here with just such an idea. The main problem I ran into was that the instrumentation nodes currently are nested. That is, all the time for your children counts against you as well. Is that what we want for I/O costs? If it is then I think it's fairly simple, have a global set of counters for various I/O events which are zeroed when the executor starts. Every time an instrumentation node starts it notes the starting point for all those counters, whenever it ends take the difference and add that to its personal counts. If it isn't then things are a bit trickier. We could store a pointer to the "current" instrumentation node in a global variable but the trick is what to restore when a node finishes. We could keep a stack of instrumentation nodes but I wonder if that would really match the executor stack perfectly. I don't see why not but it seems fragile. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!
The main problem I ran into was that the instrumentation nodes currently are
nested. That is, all the time for your children counts against you as well. Is
that what we want for I/O costs?
As for me, I see nothing wrong with such costs model. I think it is good to know stuff like "the whole query took 3244 I/O" and "this join takes 34 I/O". So, every instrumentation node should not try to figure out its intrinsic I/O counters.
Another point is both "time" and "I/O" metrics should match. I do not see a reason to change current "actual time" behaviour.
Another point is both "time" and "I/O" metrics should match. I do not see a reason to change current "actual time" behaviour.
If it is then I think it's fairly simple, have a global set of counters for
various I/O events which are zeroed when the executor starts. Every time an
instrumentation node starts it notes the starting point for all those
counters, whenever it ends take the difference and add that to its personal
counts.
I've tried to use "ReadBufferCount and friends" from storage\buffer\buf_init.c, however it is showing zeroes for some unknown yet reason. Hope, there is no fundamental problem behind.
One more problem with current counters in buf_init.c is ResetBufferUsage that zeroes those counters at random (e.g. if log_executor_stats or log_statement_stats is enabled). I am not sure if it could interfere with instrumentation.
Regards,
Vladimir Sitnikov
One more problem with current counters in buf_init.c is ResetBufferUsage that zeroes those counters at random (e.g. if log_executor_stats or log_statement_stats is enabled). I am not sure if it could interfere with instrumentation.
Regards,
Vladimir Sitnikov
"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> wrote: > I've tried to use "ReadBufferCount and friends" from > storage\buffer\buf_init.c, however it is showing zeroes for some unknown yet > reason. Hope, there is no fundamental problem behind. I think those vairables are hard to use and have no reliability for your purpose. How about adding buffer usage counters into HeapScanDesc and IndexScanDesc? It requires only changes in ReadBufferWithStrategy and lower layer functions. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Hi all,
Here is a patch that adds "buffer pool statistics" to the explain analyze output revealing the number of buffer pages hit at each and every execution step.
It uses counters from storage/buffer/bufmgr.c (I believe all that counters are relevant for investigation of query performance).
Here is the sample output:
create table test as
select i/10 as a, round(random()*10000) as b
from generate_series(1,100000) as x(i)
order by 1;
create index ix_a on test(a);
create index ix_b on test(b);
vacuum analyze test;
explain analyze
select count(*) from test x, test y
where a.b = 5
and y.b = x.b;
Aggregate (cost=413.88..413.89 rows=1 width=0) (actual time=1.380..1.382 rows=1 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
-> Nested Loop (cost=4.35..413.59 rows=118 width=0) (actual time=0.088..1.230 rows=96 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
-> Index Scan using ix_a on test x (cost=0.00..8.44 rows=10 width=8) (actual time=0.010..0.028 rows=10 loops=1 read_shared=3(3) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Index Cond: (a = 5)
-> Bitmap Heap Scan on test y (cost=4.35..40.38 rows=11 width=8) (actual time=0.034..0.080 rows=10 loops=10 read_shared=116(108) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Recheck Cond: (y.b = x.b)
-> Bitmap Index Scan on ix_b (cost=0.00..4.34 rows=11 width=0) (actual time=0.028..0.028 rows=10 loops=10 read_shared=20(12) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Index Cond: (y.b = x.b)
Total runtime: 1.438 ms
read_shared=116(108) for "Bitmap Heap Scan" means the operation fetched 116 pages into shared buffers and 108 of those 116 were buffer hits.
Here is a patch that adds "buffer pool statistics" to the explain analyze output revealing the number of buffer pages hit at each and every execution step.
It uses counters from storage/buffer/bufmgr.c (I believe all that counters are relevant for investigation of query performance).
Here is the sample output:
create table test as
select i/10 as a, round(random()*10000) as b
from generate_series(1,100000) as x(i)
order by 1;
create index ix_a on test(a);
create index ix_b on test(b);
vacuum analyze test;
explain analyze
select count(*) from test x, test y
where a.b = 5
and y.b = x.b;
Aggregate (cost=413.88..413.89 rows=1 width=0) (actual time=1.380..1.382 rows=1 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
-> Nested Loop (cost=4.35..413.59 rows=118 width=0) (actual time=0.088..1.230 rows=96 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
-> Index Scan using ix_a on test x (cost=0.00..8.44 rows=10 width=8) (actual time=0.010..0.028 rows=10 loops=1 read_shared=3(3) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Index Cond: (a = 5)
-> Bitmap Heap Scan on test y (cost=4.35..40.38 rows=11 width=8) (actual time=0.034..0.080 rows=10 loops=10 read_shared=116(108) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Recheck Cond: (y.b = x.b)
-> Bitmap Index Scan on ix_b (cost=0.00..4.34 rows=11 width=0) (actual time=0.028..0.028 rows=10 loops=10 read_shared=20(12) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Index Cond: (y.b = x.b)
Total runtime: 1.438 ms
read_shared=116(108) for "Bitmap Heap Scan" means the operation fetched 116 pages into shared buffers and 108 of those 116 were buffer hits.
Sincerely yours,
Vladimir Sitnikov
Vladimir Sitnikov
Attachment
What did you want done with this patch? It is unlikely we want to see those counters by default, and we have had little demand for them. --------------------------------------------------------------------------- Vladimir Sitnikov wrote: > Hi all, > > Here is a patch that adds "buffer pool statistics" to the explain analyze > output revealing the number of buffer pages hit at each and every execution > step. > > It uses counters from storage/buffer/bufmgr.c (I believe all that counters > are relevant for investigation of query performance). > > > Here is the sample output: > > create table test as > select i/10 as a, round(random()*10000) as b > from generate_series(1,100000) as x(i) > order by 1; > > create index ix_a on test(a); > create index ix_b on test(b); > > vacuum analyze test; > > explain analyze > select count(*) from test x, test y > where a.b = 5 > and y.b = x.b; > > Aggregate (cost=413.88..413.89 rows=1 width=0) (actual time=1.380..1.382 > rows=1 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 > file_read=0 file_write=0) > -> Nested Loop (cost=4.35..413.59 rows=118 width=0) (actual > time=0.088..1.230 rows=96 loops=1 read_shared=119(111) read_local=0(0) > flush=0 local_flush=0 file_read=0 file_write=0) > -> Index Scan using ix_a on test x (cost=0.00..8.44 rows=10 > width=8) (actual time=0.010..0.028 rows=10 loops=1 read_shared=3(3) > read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0) > Index Cond: (a = 5) > -> Bitmap Heap Scan on test y (cost=4.35..40.38 rows=11 width=8) > (actual time=0.034..0.080 rows=10 loops=10 read_shared=116(108) > read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0) > Recheck Cond: (y.b = x.b) > -> Bitmap Index Scan on ix_b (cost=0.00..4.34 rows=11 > width=0) (actual time=0.028..0.028 rows=10 loops=10 read_shared=20(12) > read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0) > Index Cond: (y.b = x.b) > Total runtime: 1.438 ms > > read_shared=116(108) for "Bitmap Heap Scan" means the operation fetched 116 > pages into shared buffers and 108 of those 116 were buffer hits. > > > Sincerely yours, > Vladimir Sitnikov [ Attachment, skipping... ] > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > What did you want done with this patch? It is unlikely we want to see > those counters by default, and we have had little demand for them. >> Here is a patch that adds "buffer pool statistics" to the explain analyze >> output revealing the number of buffer pages hit at each and every execution >> step. This was already rejected in connection with pg_stat_statements, no? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > What did you want done with this patch? It is unlikely we want to see > > those counters by default, and we have had little demand for them. > > >> Here is a patch that adds "buffer pool statistics" to the explain analyze > >> output revealing the number of buffer pages hit at each and every execution > >> step. > > This was already rejected in connection with pg_stat_statements, no? You know, I thought we discussed it as part of that patch, but the author was Vladimir Sitnikov, not the pg_stat_statement author, Alex Hunsaker, so I am just checking. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian escribió: > > What did you want done with this patch? It is unlikely we want to see > those counters by default, and we have had little demand for them. If there are two people who need them bad enough to have written a patch for them, this seems to say that there is a certain demand for this feature. (I'm not advocating this patch in any way, but surely your "little demand" argument is not entirely true.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jan 8, 2009 at 17:30, Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >> > What did you want done with this patch? It is unlikely we want to see >> > those counters by default, and we have had little demand for them. >> >> >> Here is a patch that adds "buffer pool statistics" to the explain analyze >> >> output revealing the number of buffer pages hit at each and every execution >> >> step. >> >> This was already rejected in connection with pg_stat_statements, no? > > You know, I thought we discussed it as part of that patch, but the > author was Vladimir Sitnikov, not the pg_stat_statement author, Alex > Hunsaker, so I am just checking. I assume you mean Itagaki-san not me :)
"Alex Hunsaker" <badalex@gmail.com> wrote: > >> > What did you want done with this patch? It is unlikely we want to see > >> > those counters by default, and we have had little demand for them. > >> > >> This was already rejected in connection with pg_stat_statements, no? > > > > You know, I thought we discussed it as part of that patch, but the > > author was Vladimir Sitnikov, not the pg_stat_statement author, Alex > > Hunsaker, so I am just checking. > > I assume you mean Itagaki-san not me :) I think there two independent items here: [1] Should we add those statistics to pg_stat_statements or not? [2] Should weadd those statistics to EXPLAIN ANALYZE or not? I wanted to have [1] and proposed it, but it is rejected from 8.4. However, the reason is not because we have little demand for it, but [1] and [2] are mixed in the patch and they are bad designed. [2] is requested by Vladimir (not me), but I think it is reasonable. Expart users, including core members ;-), can easily find why an plan node is slow when they have an output form EXPLAIN ANALYZE. However, it is not true for beginners. They will be able to find the reason if there are CPU and I/O information for each plan node. Personally I think both of them are TODO items, but there are some additional requirements for them. [1] should be independent from [2] and ideally to be implemented only in contrib modules, apart from the core. [2] have at least two requirements: - Avoid overheads. If checking cpu usage is slow, we'd better to have only buffer counters.- Output plans with statistics more readable way. It would be better to use XML or tables to show the information. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > I think there two independent items here: > [1] Should we add those statistics to pg_stat_statements or not? > [2] Should we add those statistics to EXPLAIN ANALYZE or not? > I wanted to have [1] and proposed it, but it is rejected from 8.4. > However, the reason is not because we have little demand for it, > but [1] and [2] are mixed in the patch and they are bad designed. No, I think you misunderstood me entirely. The reason that I rejected those parts of the patch is that I think the statistics that are available are wrong/useless. If the bufmgr.c counters were really good for something they'd have been exposed long since (and we'd probably never have built a lot of the other stats collection infrastructure). The EXPLAIN ANALYZE code you submitted is actually kinda cute, and I'd have had no problem with it if I thought it were displaying numbers that were useful and unlikely to be obsoleted in future releases. The work that needs to be done is on collecting the numbers more than displaying them. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > No, I think you misunderstood me entirely. The reason that I rejected > those parts of the patch is that I think the statistics that are > available are wrong/useless. If the bufmgr.c counters were really good > for something they'd have been exposed long since (and we'd probably > never have built a lot of the other stats collection infrastructure). The collective stats across the whole cluster and the individual stats for a specific query broken down by plan node are complementary. Depending on the circumstance people sometimes need each. I actually also wrote a patch exposing this same data. I think the bufmgr counters are flawed but still useful. Just as an example think of how often you have to explain why a sequential scan of a small table can be faster than an index scan. Seeing the index scan actually require more logical buffer fetches than a sequential scan would go a long way to clearing up that confusion. Better yet, users would be in a position to see whether the planner is actually estimating i/o costs accurately. > The EXPLAIN ANALYZE code you submitted is actually kinda cute, and > I'd have had no problem with it if I thought it were displaying > numbers that were useful and unlikely to be obsoleted in future > releases. The work that needs to be done is on collecting the > numbers more than displaying them. I agree that we need more data -- my favourite direction is to use a programmatic interface to dtrace to find out how many buffer reads are satisfied from filesystem cache and how many from physical reads. But when we do that doesn't obviate the need for these stats, it would enhance them. You would get a clear view of how many buffer fetches were satisfied from shared buffers, filesystem cache, and physical reads. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!