Thread: Buffer pool statistics in Explain Analyze

Buffer pool statistics in Explain Analyze

From
"Vladimir Sitnikov"
Date:
<div dir="ltr">Hi,<br /><br />I believe it makes sense adding some more details to explain analyze output like the
numberof pages read/written.<br />This will allow one to understand the workload the query puts on the server making it
easierto tune queries, choose the best indices, etc.<br /><br />As far as I understand, this patch is rather trivial as
backend\storage\buffercollects the required information. The only thing is to add several fields to the struct
Instrumentation(include\executor\instrument.h), fill them in during Instr* calls in (backend\executor\instrument.c) and
updatebackend\commands\explain.c to report the collected data.<br /><div dir="ltr"><br />I think think this change is
goingto split page access statistics by each and every execution node (much like 'actual time' is split now).<br /><br
/>Willthis patch ever be accepted?<br /><br /><br />Regards,<br />Vladimir Sitnikov</div></div> 

Re: Buffer pool statistics in Explain Analyze

From
Gregory Stark
Date:
"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> writes:

> Hi,
>
> I believe it makes sense adding some more details to explain analyze output
> like the number of pages read/written. This will allow one to understand the
> workload the query puts on the server making it easier to tune queries,
> choose the best indices, etc.

Hah, funny you should bring this up -- I was just experimenting here with just
such an idea.

The main problem I ran into was that the instrumentation nodes currently are
nested. That is, all the time for your children counts against you as well. Is
that what we want for I/O costs?

If it is then I think it's fairly simple, have a global set of counters for
various I/O events which are zeroed when the executor starts. Every time an
instrumentation node starts it notes the starting point for all those
counters, whenever it ends take the difference and add that to its personal
counts.

If it isn't then things are a bit trickier. We could store a pointer to the
"current" instrumentation node in a global variable but the trick is what to
restore when a node finishes. We could keep a stack of instrumentation nodes
but I wonder if that would really match the executor stack perfectly. I don't
see why not but it seems fragile.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Buffer pool statistics in Explain Analyze

From
"Vladimir Sitnikov"
Date:
The main problem I ran into was that the instrumentation nodes currently are
nested. That is, all the time for your children counts against you as well. Is
that what we want for I/O costs? 
As for me, I see nothing wrong with such costs model. I think it is good to know stuff like "the whole query took 3244 I/O" and "this join takes 34 I/O". So, every instrumentation node should not try to figure out its intrinsic I/O counters.

Another point is both "time" and "I/O" metrics should match. I do not see a reason to change current "actual time" behaviour.

 
If it is then I think it's fairly simple, have a global set of counters for
various I/O events which are zeroed when the executor starts. Every time an
instrumentation node starts it notes the starting point for all those
counters, whenever it ends take the difference and add that to its personal
counts.
I've tried to use "ReadBufferCount and friends" from storage\buffer\buf_init.c, however it is showing zeroes for some unknown yet reason. Hope, there is no fundamental problem behind.

One more problem with current counters in buf_init.c is ResetBufferUsage that zeroes those counters at random (e.g. if log_executor_stats or log_statement_stats is enabled). I am not sure if it could interfere with instrumentation.


Regards,
Vladimir Sitnikov

Re: Buffer pool statistics in Explain Analyze

From
ITAGAKI Takahiro
Date:
"Vladimir Sitnikov" <sitnikov.vladimir@gmail.com> wrote:

> I've tried to use "ReadBufferCount and friends" from
> storage\buffer\buf_init.c, however it is showing zeroes for some unknown yet
> reason. Hope, there is no fundamental problem behind.

I think those vairables are hard to use and have no reliability
for your purpose.

How about adding buffer usage counters into HeapScanDesc and IndexScanDesc?
It requires only changes in ReadBufferWithStrategy and lower layer functions.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Buffer pool statistics in Explain Analyze

From
"Vladimir Sitnikov"
Date:
Hi all,

Here is a patch that adds "buffer pool statistics" to the explain analyze output revealing the number of buffer pages hit at each and every execution step.

It uses counters from storage/buffer/bufmgr.c (I believe all that counters are relevant for investigation of query performance).


Here is the sample output:

create table test as
   select i/10 as a, round(random()*10000) as b
     from generate_series(1,100000) as x(i)
    order by 1;

create index ix_a on test(a);
create index ix_b on test(b);

vacuum analyze test;

explain analyze
select count(*) from test x, test y
 where a.b = 5
   and y.b = x.b;

 Aggregate  (cost=413.88..413.89 rows=1 width=0) (actual time=1.380..1.382 rows=1 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
   ->  Nested Loop  (cost=4.35..413.59 rows=118 width=0) (actual time=0.088..1.230 rows=96 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
         ->  Index Scan using ix_a on test x  (cost=0.00..8.44 rows=10 width=8) (actual time=0.010..0.028 rows=10 loops=1 read_shared=3(3) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
               Index Cond: (a = 5)
         ->  Bitmap Heap Scan on test y  (cost=4.35..40.38 rows=11 width=8) (actual time=0.034..0.080 rows=10 loops=10 read_shared=116(108) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
               Recheck Cond: (y.b = x.b)
               ->  Bitmap Index Scan on ix_b  (cost=0.00..4.34 rows=11 width=0) (actual time=0.028..0.028 rows=10 loops=10 read_shared=20(12) read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
                     Index Cond: (y.b = x.b)
 Total runtime: 1.438 ms

read_shared=116(108)  for "Bitmap Heap Scan" means the operation fetched 116 pages into shared buffers and 108 of those 116 were buffer hits.


Sincerely yours,
Vladimir Sitnikov
Attachment

Re: Buffer pool statistics in Explain Analyze

From
Bruce Momjian
Date:
What did you want done with this patch?  It is unlikely we want to see
those counters by default, and we have had little demand for them.

---------------------------------------------------------------------------

Vladimir Sitnikov wrote:
> Hi all,
> 
> Here is a patch that adds "buffer pool statistics" to the explain analyze
> output revealing the number of buffer pages hit at each and every execution
> step.
> 
> It uses counters from storage/buffer/bufmgr.c (I believe all that counters
> are relevant for investigation of query performance).
> 
> 
> Here is the sample output:
> 
> create table test as
>    select i/10 as a, round(random()*10000) as b
>      from generate_series(1,100000) as x(i)
>     order by 1;
> 
> create index ix_a on test(a);
> create index ix_b on test(b);
> 
> vacuum analyze test;
> 
> explain analyze
> select count(*) from test x, test y
>  where a.b = 5
>    and y.b = x.b;
> 
>  Aggregate  (cost=413.88..413.89 rows=1 width=0) (actual time=1.380..1.382
> rows=1 loops=1 read_shared=119(111) read_local=0(0) flush=0 local_flush=0
> file_read=0 file_write=0)
>    ->  Nested Loop  (cost=4.35..413.59 rows=118 width=0) (actual
> time=0.088..1.230 rows=96 loops=1 read_shared=119(111) read_local=0(0)
> flush=0 local_flush=0 file_read=0 file_write=0)
>          ->  Index Scan using ix_a on test x  (cost=0.00..8.44 rows=10
> width=8) (actual time=0.010..0.028 rows=10 loops=1 read_shared=3(3)
> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>                Index Cond: (a = 5)
>          ->  Bitmap Heap Scan on test y  (cost=4.35..40.38 rows=11 width=8)
> (actual time=0.034..0.080 rows=10 loops=10 read_shared=116(108)
> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>                Recheck Cond: (y.b = x.b)
>                ->  Bitmap Index Scan on ix_b  (cost=0.00..4.34 rows=11
> width=0) (actual time=0.028..0.028 rows=10 loops=10 read_shared=20(12)
> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>                      Index Cond: (y.b = x.b)
>  Total runtime: 1.438 ms
> 
> read_shared=116(108)  for "Bitmap Heap Scan" means the operation fetched 116
> pages into shared buffers and 108 of those 116 were buffer hits.
> 
> 
> Sincerely yours,
> Vladimir Sitnikov

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Buffer pool statistics in Explain Analyze

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> What did you want done with this patch?  It is unlikely we want to see
> those counters by default, and we have had little demand for them.

>> Here is a patch that adds "buffer pool statistics" to the explain analyze
>> output revealing the number of buffer pages hit at each and every execution
>> step.

This was already rejected in connection with pg_stat_statements, no?
        regards, tom lane


Re: Buffer pool statistics in Explain Analyze

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > What did you want done with this patch?  It is unlikely we want to see
> > those counters by default, and we have had little demand for them.
> 
> >> Here is a patch that adds "buffer pool statistics" to the explain analyze
> >> output revealing the number of buffer pages hit at each and every execution
> >> step.
> 
> This was already rejected in connection with pg_stat_statements, no?

You know, I thought we discussed it as part of that patch, but the
author was Vladimir Sitnikov, not the pg_stat_statement author, Alex
Hunsaker, so I am just checking.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Buffer pool statistics in Explain Analyze

From
Alvaro Herrera
Date:
Bruce Momjian escribió:
> 
> What did you want done with this patch?  It is unlikely we want to see
> those counters by default, and we have had little demand for them.

If there are two people who need them bad enough to have written a patch
for them, this seems to say that there is a certain demand for this
feature.  (I'm not advocating this patch in any way, but surely your
"little demand" argument is not entirely true.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Buffer pool statistics in Explain Analyze

From
"Alex Hunsaker"
Date:
On Thu, Jan 8, 2009 at 17:30, Bruce Momjian <bruce@momjian.us> wrote:
> Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>> > What did you want done with this patch?  It is unlikely we want to see
>> > those counters by default, and we have had little demand for them.
>>
>> >> Here is a patch that adds "buffer pool statistics" to the explain analyze
>> >> output revealing the number of buffer pages hit at each and every execution
>> >> step.
>>
>> This was already rejected in connection with pg_stat_statements, no?
>
> You know, I thought we discussed it as part of that patch, but the
> author was Vladimir Sitnikov, not the pg_stat_statement author, Alex
> Hunsaker, so I am just checking.

I assume you mean Itagaki-san not me :)


Re: Buffer pool statistics in Explain Analyze

From
ITAGAKI Takahiro
Date:
"Alex Hunsaker" <badalex@gmail.com> wrote:

> >> > What did you want done with this patch?  It is unlikely we want to see
> >> > those counters by default, and we have had little demand for them.
> >>
> >> This was already rejected in connection with pg_stat_statements, no?
> >
> > You know, I thought we discussed it as part of that patch, but the
> > author was Vladimir Sitnikov, not the pg_stat_statement author, Alex
> > Hunsaker, so I am just checking.
> 
> I assume you mean Itagaki-san not me :)

I think there two independent items here: [1] Should we add those statistics to pg_stat_statements or not? [2] Should
weadd those statistics to EXPLAIN ANALYZE or not?
 

I wanted to have [1] and proposed it, but it is rejected from 8.4.
However, the reason is not because we have little demand for it,
but [1] and [2] are mixed in the patch and they are bad designed.

[2] is requested by Vladimir (not me), but I think it is reasonable.
Expart users, including core members ;-), can easily find why an plan
node is slow when they have an output form EXPLAIN ANALYZE.
However, it is not true for beginners. They will be able to find
the reason if there are CPU and I/O information for each plan node.


Personally I think both of them are TODO items,
but there are some additional requirements for them.

[1] should be independent from [2] and ideally to be implemented
only in contrib modules, apart from the core.

[2] have at least two requirements: - Avoid overheads. If checking cpu usage is slow, we'd better   to have only buffer
counters.- Output plans with statistics more readable way.   It would be better to use XML or tables to show the
information.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Buffer pool statistics in Explain Analyze

From
Tom Lane
Date:
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> I think there two independent items here:
>   [1] Should we add those statistics to pg_stat_statements or not?
>   [2] Should we add those statistics to EXPLAIN ANALYZE or not?

> I wanted to have [1] and proposed it, but it is rejected from 8.4.
> However, the reason is not because we have little demand for it,
> but [1] and [2] are mixed in the patch and they are bad designed.

No, I think you misunderstood me entirely.  The reason that I rejected
those parts of the patch is that I think the statistics that are
available are wrong/useless.  If the bufmgr.c counters were really good
for something they'd have been exposed long since (and we'd probably
never have built a lot of the other stats collection infrastructure).

The EXPLAIN ANALYZE code you submitted is actually kinda cute, and
I'd have had no problem with it if I thought it were displaying
numbers that were useful and unlikely to be obsoleted in future
releases.  The work that needs to be done is on collecting the
numbers more than displaying them.
        regards, tom lane


Re: Buffer pool statistics in Explain Analyze

From
Gregory Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> No, I think you misunderstood me entirely.  The reason that I rejected
> those parts of the patch is that I think the statistics that are
> available are wrong/useless.  If the bufmgr.c counters were really good
> for something they'd have been exposed long since (and we'd probably
> never have built a lot of the other stats collection infrastructure).

The collective stats across the whole cluster and the individual stats for a
specific query broken down by plan node are complementary. Depending on the
circumstance people sometimes need each.

I actually also wrote a patch exposing this same data. I think the bufmgr
counters are flawed but still useful. Just as an example think of how often
you have to explain why a sequential scan of a small table can be faster than
an index scan. Seeing the index scan actually require more logical buffer
fetches than a sequential scan would go a long way to clearing up that
confusion. Better yet, users would be in a position to see whether the planner
is actually estimating i/o costs accurately.

> The EXPLAIN ANALYZE code you submitted is actually kinda cute, and
> I'd have had no problem with it if I thought it were displaying
> numbers that were useful and unlikely to be obsoleted in future
> releases.  The work that needs to be done is on collecting the
> numbers more than displaying them.

I agree that we need more data -- my favourite direction is to use a
programmatic interface to dtrace to find out how many buffer reads are
satisfied from filesystem cache and how many from physical reads. But when we
do that doesn't obviate the need for these stats, it would enhance them. You
would get a clear view of how many buffer fetches were satisfied from shared
buffers, filesystem cache, and physical reads.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!