Thread: Explain buffers display units.

Explain buffers display units.

From
Greg Stark
Date:
So this is what I did about my two complaints earlier about the
explain buffer patch.

a) Changed the line description to "Total Buffer Usage" which at least
hints that it's something more akin to the "Total runtime" listed at
the bottom than the "actual time".

b) Used units of memory -- I formatted them with 3 significant digits
(unless the unit is bytes or kB where that would be silly). It's just
what looked best to my eye.

I'm finding "hit" and "read" kind of confusing myself but don't really
have any better idea. It's not entirely clear whether read is the
total accesses out of which some are cache hits or if they're two
disjoint sets.

postgres=# explain (analyze,buffers) select * from x limit 10000;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..266.68 rows=10000 width=105) (actual
time=0.023..53.964 rows=10000 loops=1)
   Total Buffer Usage: shared hit=8kB read=1.30MB
   ->  Seq Scan on x  (cost=0.00..10667.00 rows=400000 width=105)
(actual time=0.019..20.311 rows=10000 loops=1)
         Total Buffer Usage: shared hit=8kB read=1.30MB
 Total runtime: 71.074 ms
(5 rows)

--
greg

Attachment

Re: Explain buffers display units.

From
Greg Smith
Date:
Greg Stark wrote:
> b) Used units of memory -- I formatted them with 3 significant digits
> (unless the unit is bytes or kB where that would be silly). It's just
> what looked best to my eye.
>   

How does this compare with what comes out of pg_size_pretty 
(src/backend/utils/adt/dbsize.c)? I already have code floating around 
that parses the output from pg_size_pretty when I'm slurping in things 
from PostgreSQL, and it's not immediately obvious to me what having a 
format that's similar to but not quite the same as that one is buying here.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Explain buffers display units.

From
Robert Haas
Date:
On Sun, Feb 14, 2010 at 8:25 PM, Greg Stark <stark@mit.edu> wrote:
> So this is what I did about my two complaints earlier about the
> explain buffer patch.
>
> a) Changed the line description to "Total Buffer Usage" which at least
> hints that it's something more akin to the "Total runtime" listed at
> the bottom than the "actual time".
>
> b) Used units of memory -- I formatted them with 3 significant digits
> (unless the unit is bytes or kB where that would be silly). It's just
> what looked best to my eye.

I wasn't aware we had consensus on making this change, which I see you
committed less than an hour after posting this.

> I'm finding "hit" and "read" kind of confusing myself but don't really
> have any better idea. It's not entirely clear whether read is the
> total accesses out of which some are cache hits or if they're two
> disjoint sets.

Keep in mind these terms are taken from other parts of the system
where they existed prior to this patch.  We probably want to stick
with them at this point for consistency, but in any case it's
certainly a separate discussion.

...Robert


Re: Explain buffers display units.

From
Greg Stark
Date:
On Mon, Feb 15, 2010 at 2:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> a) Changed the line description to "Total Buffer Usage" which at least
>> hints that it's something more akin to the "Total runtime" listed at
>> the bottom than the "actual time".
>>
>> b) Used units of memory -- I formatted them with 3 significant digits
>> (unless the unit is bytes or kB where that would be silly). It's just
>> what looked best to my eye.
>
> I wasn't aware we had consensus on making this change, which I see you
> committed less than an hour after posting this.

Well there was a 30+ message thread almost a week ago where there
seemed to be some contention over the issue of whether the numbers
should be averages or totals. But were there was no dispute over the
idea of printing in memory units instead of blocks.

Given the controversy over whether to display averages or totals and
given the issues raised towards the end of the thread that there are
no comparable estimated values printed so there's no particular need
to average them I opted for the minimal change of just labelling it
"Total Buffer Usage". It didn't seem there was consensus to change it
to averages per loop or to change the whole plan output to display
totals. And I didn't see anyone argue that saying calling out that it
was a total was a bad idea.

We can always continue tweak the details of the format such as adding
spaces before the units to make it similar to the pg_size_pretty().
I'm not sure I like the idea of making it exactly equivalent because
pg_size_pretty() doesn't print any decimals so it's pretty imprecise
for smaller values.


-- 
greg


Re: Explain buffers display units.

From
Greg Smith
Date:
Greg Stark wrote:
> We can always continue tweak the details of the format such as adding
> spaces before the units to make it similar to the pg_size_pretty().
> I'm not sure I like the idea of making it exactly equivalent because
> pg_size_pretty() doesn't print any decimals so it's pretty imprecise
> for smaller values.
>   

That's a reasonable position; I'd be fine with upgrading the 
requirements for a text scraping app to handle either "8 kB" or "1.356 
kB" if it wanted to share some code to consume either type of info, if 
all you did was throw a space in there.  I'd suggest either removing the 
PB units support from your implementation, or adding it to 
pg_size_pretty, just to keep those two routines more like one another in 
terms of what they might produce as output given the same scale of input.

Also, a quick comment in the new code explaining what you just said 
above might be helpful, just to preempt a similar "how is this different 
from pg_size_pretty?" question from popping up again one day.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us



Re: Explain buffers display units.

From
Robert Haas
Date:
On Mon, Feb 15, 2010 at 9:55 AM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Feb 15, 2010 at 2:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> a) Changed the line description to "Total Buffer Usage" which at least
>>> hints that it's something more akin to the "Total runtime" listed at
>>> the bottom than the "actual time".
>>>
>>> b) Used units of memory -- I formatted them with 3 significant digits
>>> (unless the unit is bytes or kB where that would be silly). It's just
>>> what looked best to my eye.
>>
>> I wasn't aware we had consensus on making this change, which I see you
>> committed less than an hour after posting this.
>
> Well there was a 30+ message thread almost a week ago where there
> seemed to be some contention over the issue of whether the numbers
> should be averages or totals. But were there was no dispute over the
> idea of printing in memory units instead of blocks.

Hmm.... yeah, I guess it wasn't discussed.  I'm still not sure it's an
improvement.  If a query hit one buffer, is that really the same as
saying it hit 8kB?  To me, buffers seem like discrete (and unitless)
entities, and we handle them that way elsewhere in the system (see,
e.g. pg_stat_database, pg_statio_all_tables).  I don't know that it's
a good idea to display that same information here in a different
format.

> We can always continue tweak the details of the format such as adding
> spaces before the units to make it similar to the pg_size_pretty().
> I'm not sure I like the idea of making it exactly equivalent because
> pg_size_pretty() doesn't print any decimals so it's pretty imprecise
> for smaller values.

I definitely do not want to do anything that loses accuracy.  This is
probably accurate enough for most uses, but it's still not as accurate
as just printing the raw numbers.

...Robert


Re: Explain buffers display units.

From
Greg Stark
Date:
On Mon, Feb 15, 2010 at 6:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Well there was a 30+ message thread almost a week ago where there
>> seemed to be some contention over the issue of whether the numbers
>> should be averages or totals. But were there was no dispute over the
>> idea of printing in memory units instead of blocks.
>
> Hmm.... yeah, I guess it wasn't discussed.  I'm still not sure it's an
> improvement.  If a query hit one buffer, is that really the same as
> saying it hit 8kB?

Well you can always convert between them. The only time it would make
a difference is if you're sure it's random i/o and you're concerned
with the number of iops. However it's impossible to tell from this
output how many of these buffers are read sequentially and how many
randomly. Even if it's sequential you don't know how much it read
between interruptions to handle the inner side of a join or whether
the cached blocks were interspersed throughout the file or were all at
the beginning or end.

I think we should provide better tools to measure these things
directly rather than force users to make deductions from buffer
counts. I'm still excited about using dtrace to get real counts of
iops, seeks, etc.

>  To me, buffers seem like discrete (and unitless)
> entities, and we handle them that way elsewhere in the system (see,
> e.g. pg_stat_database, pg_statio_all_tables).  I don't know that it's
> a good idea to display that same information here in a different
> format.
>...
> I definitely do not want to do anything that loses accuracy.  This is
> probably accurate enough for most uses, but it's still not as accurate
> as just printing the raw numbers.

I left the XML/JSON output in terms of blocks on the theory that tools
reading this data can look up the block size and convert all it wants.
Likewise the pg_stat* functions are for extracting raw data. Any tool
or query that extracts this data can present it in any friendly form
it wants.

Incidentally looking at the pg_size_pretty() functions reminds me that
these counters are all 32-bit. That means they'll do funny things if
you have a query which accesses over 16TB of data... I suspect this
should probably be changed though I'm feeling lazy about it unless
someone else wants to push me to do it now.



--
greg


Re: Explain buffers display units.

From
Robert Haas
Date:
On Mon, Feb 15, 2010 at 1:29 PM, Greg Stark <stark@mit.edu> wrote:
> On Mon, Feb 15, 2010 at 6:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> Well there was a 30+ message thread almost a week ago where there
>>> seemed to be some contention over the issue of whether the numbers
>>> should be averages or totals. But were there was no dispute over the
>>> idea of printing in memory units instead of blocks.
>>
>> Hmm.... yeah, I guess it wasn't discussed.  I'm still not sure it's an
>> improvement.  If a query hit one buffer, is that really the same as
>> saying it hit 8kB?
>
> Well you can always convert between them. The only time it would make
> a difference is if you're sure it's random i/o and you're concerned
> with the number of iops. However it's impossible to tell from this
> output how many of these buffers are read sequentially and how many
> randomly. Even if it's sequential you don't know how much it read
> between interruptions to handle the inner side of a join or whether
> the cached blocks were interspersed throughout the file or were all at
> the beginning or end.

All true, although "you can always converted between them" assumes you
know the block size.  I don't imagine many people change that, but...

> I think we should provide better tools to measure these things
> directly rather than force users to make deductions from buffer
> counts. I'm still excited about using dtrace to get real counts of
> iops, seeks, etc.

Sure.

>>  To me, buffers seem like discrete (and unitless)
>> entities, and we handle them that way elsewhere in the system (see,
>> e.g. pg_stat_database, pg_statio_all_tables).  I don't know that it's
>> a good idea to display that same information here in a different
>> format.

This seems like an important point that you need to respond to.  Why
should we print out this information in kB here when we display it as
raw numbers elsewhere?  I can't see any reason at all.

>> I definitely do not want to do anything that loses accuracy.  This is
>> probably accurate enough for most uses, but it's still not as accurate
>> as just printing the raw numbers.
>
> I left the XML/JSON output in terms of blocks on the theory that tools
> reading this data can look up the block size and convert all it wants.

I think this is a really terrible idea.  You've got a lot of very
specific formatting code in explain.c which anyone who wants to use
the JSON and XML output will very possibly need to reimplement.  I
have worked really hard to keep the text format in sync with all the
others, and up until now they have been.

> Incidentally looking at the pg_size_pretty() functions reminds me that
> these counters are all 32-bit. That means they'll do funny things if
> you have a query which accesses over 16TB of data... I suspect this
> should probably be changed though I'm feeling lazy about it unless
> someone else wants to push me to do it now.

Well that will require fixing a whole lot of bits in the stats
infrastructure that are only minimally related to this patch.  That is
certainly 9.1 material.

Basically, I think this whole change is a bad idea and should be
reverted.  You've made the text format EXPLAIN inconsistent with both
the non-text formats and with the rest of the buffer statistics stuff
for absolutely no benefit that I can see.

...Robert


Re: Explain buffers display units.

From
Greg Stark
Date:
On Mon, Feb 15, 2010 at 7:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>  To me, buffers seem like discrete (and unitless)
>>> entities, and we handle them that way elsewhere in the system (see,
>>> e.g. pg_stat_database, pg_statio_all_tables).  I don't know that it's
>>> a good idea to display that same information here in a different
>>> format.
>
> This seems like an important point that you need to respond to.  Why
> should we print out this information in kB here when we display it as
> raw numbers elsewhere?  I can't see any reason at all.

I did respond to it. The whole point is that the text output is for a
human to read. It should be printed in human-readable units. Not some
arbitrary internal unit of accounting that they then have to do
arithmetic on to make sense of.

We do *not* display raw block numbers anywhere else. Generally I think
we should have a policy of outputing human-readable standard units of
memory whenever displaying a memory quantity. Actually I thought we
already had that policy, hence things like:

postgres=# show shared_buffers;shared_buffers
----------------28MB
(1 row)

postgres=# show checkpoint_timeout;checkpoint_timeout
--------------------5min
(1 row)

The other examples you name are all internal or machine-readable
fomats which have to be formatted somehow using sql queries or tools
if you want to inspect the values directly. The user is free to format
the output of the pg_stat* functions using pg_size_pretty() though
it's annoying that it's not in the same base unit that
pg_relation_size() outputs. but these are the only interface to these
internal counters so there's no way to know if they're being used for
human-readable output or for gathering raw data for statistics or
other purposes.

>>> I definitely do not want to do anything that loses accuracy.  This is
>>> probably accurate enough for most uses, but it's still not as accurate
>>> as just printing the raw numbers.
>>
>> I left the XML/JSON output in terms of blocks on the theory that tools
>> reading this data can look up the block size and convert all it wants.
>
> I think this is a really terrible idea.  You've got a lot of very
> specific formatting code in explain.c which anyone who wants to use
> the JSON and XML output will very possibly need to reimplement.  I
> have worked really hard to keep the text format in sync with all the
> others, and up until now they have been.

You're assuming the JSON and XML program is planning to display the
measurements? They might not be. They might be gathering them for
charting or for alerts or all kinds of other things. Even if they do
plan to output them they'll want to format it in way that makes sense
for the context it's used in which might include more or fewer digits
or plug into some widget which requires raw values and does the
formatting automatically.

Whereas the human-readable format should display values in a form
humans can parse, the machine-readable output should include the raw
measurements with enough information for the tool to make sense of it.Probably the XML schema should include the units
asan attribute for 
each tag so tools don't have to hard-code knowledge about what unit
each tag is in.

--
greg


Re: Explain buffers display units.

From
Greg Smith
Date:
Greg Stark wrote:<br /><blockquote cite="mid:407d949e1002151544y53f6da3dmb7fe7a5b39cce41a@mail.gmail.com"
type="cite"><prewrap="">We do *not* display raw block numbers anywhere else. Generally I think
 
we should have a policy of outputing human-readable standard units of
memory whenever displaying a memory quantity. Actually I thought we
already had that policy, hence things like... </pre></blockquote><br /> The first counter example I thought of is
log_checkpointswhich looks like this:<br /><br /><font size="2">LOG: checkpoint complete: wrote 133795 buffers (25.5%);
0transaction log file(s) added, 0 removed, 98 recycled; write=112.281 s, sync=108.809 s, total=221.166 s</font><br
/><br/><blockquote cite="mid:407d949e1002151544y53f6da3dmb7fe7a5b39cce41a@mail.gmail.com" type="cite"><pre wrap="">
Probablythe XML schema should include the units as an attribute for
 
each tag so tools don't have to hard-code knowledge about what unit
each tag is in. </pre></blockquote><br /> I don't know if it's practical at this point, but it might be helpful for the
trulymachine-targeted output formats to include specifically BLCKSZ somewhere in their header--just so there's a
universalway to interpret the output even if the user tuned that.<br /><br /><pre class="moz-signature" cols="72">-- 
 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
<a class="moz-txt-link-abbreviated" href="mailto:greg@2ndQuadrant.com">greg@2ndQuadrant.com</a>   <a
class="moz-txt-link-abbreviated"href="http://www.2ndQuadrant.us">www.2ndQuadrant.us</a>
 
</pre>

Re: Explain buffers display units.

From
Robert Haas
Date:
On Mon, Feb 15, 2010 at 6:44 PM, Greg Stark <stark@mit.edu> wrote:
> I did respond to it. The whole point is that the text output is for a
> human to read. It should be printed in human-readable units. Not some
> arbitrary internal unit of accounting that they then have to do
> arithmetic on to make sense of.

Well, I disagree with your statement the previous output was not
printed in human-readable units: it was printed in blocks, which I
find to be a perfectly good unit.  It's true that the basic unit of
blocks can be converted into kilobytes, but so what?  We aren't really
measuring kilobytes; we're measuring blocks.  We could equally well
convert the sort and hash output from kilobytes into blocks, but it
would be equally wrong: the sort and hash statistics are measuring
memory usage by adding up actual memory allocations.  The buffer
statistics are simply counting the number of blocks that are read or
written.  Multiplying by the block size makes it sound as if all the
memory was read or used, which is simply not the case - especially for
things like buffer hits, which don't actually read or allocate any
memory at all.

> We do *not* display raw block numbers anywhere else. Generally I think
> we should have a policy of outputing human-readable standard units of
> memory whenever displaying a memory quantity. Actually I thought we
> already had that policy, hence things like:
>
> postgres=# show shared_buffers;
>  shared_buffers
> ----------------
>  28MB
> (1 row)
>
> postgres=# show checkpoint_timeout;
>  checkpoint_timeout
> --------------------
>  5min
> (1 row)

We certainly do that for GUCs, and in that context it seems to me to
make sense.  If you set your shared buffers to a gigabyte, PG will use
an additional GB of memory.  But if you hit a "gigabyte" of shared
buffers, you may be examining anywhere from one 8K block over and over
again all the way up to a full GB of memory.  Block hits and reads
just don't add in the same way that actual memory allocations do.

And at any rate, what we DON'T do for GUCs is produce differing output
format for the same parameter based on the magnitude of the output
value, as you've done here.  We accept input in several different
formats, but there is only one canonical output formal for any
particular GUC, which is furthermore always chosen in such a way that
the exact value of the setting is preserved (again, unlike what you've
done here).

> The other examples you name are all internal or machine-readable
> fomats which have to be formatted somehow using sql queries or tools
> if you want to inspect the values directly. The user is free to format
> the output of the pg_stat* functions using pg_size_pretty() though
> it's annoying that it's not in the same base unit that
> pg_relation_size() outputs  but these are the only interface to these
> internal counters so there's no way to know if they're being used for
> human-readable output or for gathering raw data for statistics or
> other purposes.

So, you're saying we shouldn't look at the way that the pg_stat
functions format the output because somebody might write a view over
it that formats it in some different way that may or may not match
what you've done for the EXPLAIN output?  What makes you think that
people don't just look at the raw numbers?  I certainly have, and
there's no suggestion in the documentation that users should do
anything else.

pg_stat_statements doesn't do what you're suggesting either; it, too,
presents raw numbers, and lets the user make of it what they will.
They might, for example, want to compute a hit ratio, as in the
example provided in the docs.  In the case of EXPLAIN of an index
scan, they might want to estimate the number of seeks, on the theory
that an inner-indexscan is going to be all random IO.

>> I think this is a really terrible idea.  You've got a lot of very
>> specific formatting code in explain.c which anyone who wants to use
>> the JSON and XML output will very possibly need to reimplement.  I
>> have worked really hard to keep the text format in sync with all the
>> others, and up until now they have been.
>
> You're assuming the JSON and XML program is planning to display the
> measurements? They might not be. They might be gathering them for
> charting or for alerts or all kinds of other things.  Even if they do
> plan to output them they'll want to format it in way that makes sense
> for the context it's used in which might include more or fewer digits
> or plug into some widget which requires raw values and does the
> formatting automatically.

Yes, they might want to write their own formatting code, but they also
might not.  They might want to calculate hit ratios, or they might
want to alter the number of decimal places, or they might just want to
output the exact same information as the text format, but in a GUI
format rather than using ASCII art.

> Whereas the human-readable format should display values in a form
> humans can parse, the machine-readable output should include the raw
> measurements with enough information for the tool to make sense of it.
>  Probably the XML schema should include the units as an attribute for
> each tag so tools don't have to hard-code knowledge about what unit
> each tag is in.

This doesn't seem to be a very carefully thought out proposal, because
you haven't explained how it would work for JSON or YAML output.  A
format-neutral solution which we've already used for sort and hash
information (and for GUCs) is to include the unit designator in the
output..  But I generally think that trying to make the EXPLAIN output
self-documenting to the point where programs don't need any specific
knowledge of the meanings of the various properties is rather
hopeless.  If we want to provide that knowledge, it should be encoded
in some sort of separate XML schema or data dictionary rather than
encoded in the output of every run of EXPLAIN.  I think Andres Freund
actually wrote a relaxNG schema or something of that sort, which might
or might not be a useful starting point, but nobody felt motivated to
do anything with it.

Upon further review, I also notice that this patch seems to have
falsified the EXPLAIN documentation - both the description of the
BUFFERS option and the description of the FORMAT option are no longer
accurate.

...Robert


Re: Explain buffers display units.

From
Greg Stark
Date:
On Tue, Feb 16, 2010 at 2:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Multiplying by the block size makes it sound as if all the
> memory was read or used, which is simply not the case - especially for
> things like buffer hits, which don't actually read or allocate any
> memory at all.

In which case it represents how much data would have had to have been
read if it wasn't in the buffer cache which is a perfectly reasonable
measurement. It's exactly what a cache profiler should be measuring.
These are figures that users have to compare with their buffer cache
size and with the output of iostat or other tools. Presenting them in
arbitrary internal units makes that difficult.


> We certainly do that for GUCs, and in that context it seems to me to
> make sense.  If you set your shared buffers to a gigabyte, PG will use
> an additional GB of memory.  But if you hit a "gigabyte" of shared
> buffers, you may be examining anywhere from one 8K block over and over
> again all the way up to a full GB of memory.  Block hits and reads
> just don't add in the same way that actual memory allocations do.

Accessing the same 8kB of memory 100,1000 times is 1GB of memory
bandwidth. The output of explain doesn't give you enough information
to distinguish that from accessing 1GB of different data which is too
bad but there's a limit to how much information we can fit in a
reasonable amount of space. But 1GB of memory bandwidth is still an
interesting figure even if it's the same 8kB a hundred thousand times.
I think it's a lot more meaningful for a human reader than "131072".

> And at any rate, what we DON'T do for GUCs is produce differing output
> format for the same parameter based on the magnitude of the output
> value, as you've done here.

No, that's *exactly* what we do:

postgres=# set work_mem = 64;
SET
postgres=# show work_mem;work_mem
----------64kB
(1 row)

postgres=# set work_mem = 1024;
SET
postgres=# show work_mem;work_mem
----------1MB
(1 row)

postgres=# set work_mem = 1048576;
SET
postgres=# show work_mem;work_mem
----------1GB
(1 row)


> We accept input in several different
> formats, but there is only one canonical output formal for any
> particular GUC, which is furthermore always chosen in such a way that
> the exact value of the setting is preserved (again, unlike what you've
> done here).

I don't think the use case for GUCs is the same as for empirical
measurements. Empirical results are never going to come out as a round
number of megabytes so only using larger units in that case would be
useless. In the case of GUCs I assume the argument was that someone
should be able to copy the output into another postgresql.conf and get
the same value, something which is irrelevant for empirical
measurements.

In any case the machine-readable form of GUC settings is not this one
canonical format you describe for SHOW:

postgres=# select name,setting,unit,min_val,max_val,boot_val,reset_val
from pg_settings where name = 'work_mem';  name   | setting | unit | min_val | max_val | boot_val | reset_val
----------+---------+------+---------+---------+----------+-----------work_mem | 1048576 | kB   | 64      | 2097151 |
1024    | 1024 
(1 row)

This is similar to how I think the XML output should work. It should
have the raw internal values with enough meta data in it that a tool
can figure out how to display it or work with it.

> So, you're saying we shouldn't look at the way that the pg_stat
> functions format the output because somebody might write a view over
> it that formats it in some different way that may or may not match
> what you've done for the EXPLAIN output?  What makes you think that
> people don't just look at the raw numbers?  I certainly have, and
> there's no suggestion in the documentation that users should do
> anything else.

I'm not sure users need suggestions that they should format the data
in whatever way they want. We still have to document the programmatic
interface they use to get the raw data.

> pg_stat_statements doesn't do what you're suggesting either; it, too,
> presents raw numbers, and lets the user make of it what they will.
> They might, for example, want to compute a hit ratio, as in the
> example provided in the docs.  In the case of EXPLAIN of an index
> scan, they might want to estimate the number of seeks, on the theory
> that an inner-indexscan is going to be all random IO.

You can compute the hit ratio just fine from measurements with units.
And if you're doing it in an automated way you'll want to use
machine-readable output, rather than parsing the formatted text.

> This doesn't seem to be a very carefully thought out proposal, because
> you haven't explained how it would work for JSON or YAML output.  A
> format-neutral solution which we've already used for sort and hash
> information (and for GUCs) is to include the unit designator in the
> output..  But I generally think that trying to make the EXPLAIN output
> self-documenting to the point where programs don't need any specific
> knowledge of the meanings of the various properties is rather
> hopeless.

I think there's a lot of room between "provide the units for the
measurement" and "100% self-documenting". Most tools are just going to
display the measurement and want to know how to format the result
nicely anyways.

A fixed out-of-band schema isn't going to be able to handle more
flexible cases either. Say we add dtrace support for counting
arbitrary probes and including them in the explain output. The schema
can't possibly list every possible probe and some of them might well
be quantities of memory or time or other units.

I *do* think we should have a schema decription for the structural
tags -- eg. <join>, <append>, <sort> and the like. Tools should be
able to know which tags they should expect to have subnodes and how
many to expect. And they should be able to know which ones they should
expect to have measurements.

> Upon further review, I also notice that this patch seems to have
> falsified the EXPLAIN documentation - both the description of the
> BUFFERS option and the description of the FORMAT option are no longer
> accurate

Oops. Well, I would like to know if I'm in the minority and have to
roll this back before I fix that.

--
greg


Re: Explain buffers display units.

From
Alvaro Herrera
Date:
Greg Stark escribió:
> On Tue, Feb 16, 2010 at 2:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> > Upon further review, I also notice that this patch seems to have
> > falsified the EXPLAIN documentation - both the description of the
> > BUFFERS option and the description of the FORMAT option are no longer
> > accurate
> 
> Oops. Well, I would like to know if I'm in the minority and have to
> roll this back before I fix that.

My personal opinion is that displaying number of blocks in all EXPLAIN
formats is more consistent.  What are you going to do with YAML output
anyway, which is machine readable yet some people prefer over our legacy
text format?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Explain buffers display units.

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Greg Stark escribi�:
>> Oops. Well, I would like to know if I'm in the minority and have to
>> roll this back before I fix that.

> My personal opinion is that displaying number of blocks in all EXPLAIN
> formats is more consistent.

FWIW, I vote for number of blocks too.  I tend to see those numbers as
more indicative of number of I/O requests than amount of memory used.
        regards, tom lane


Re: Explain buffers display units.

From
Greg Stark
Date:
On Tue, Feb 16, 2010 at 3:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Greg Stark escribió:
>>> Oops. Well, I would like to know if I'm in the minority and have to
>>> roll this back before I fix that.
>
>> My personal opinion is that displaying number of blocks in all EXPLAIN
>> formats is more consistent.
>
> FWIW, I vote for number of blocks too.  I tend to see those numbers as
> more indicative of number of I/O requests than amount of memory used.

Ok, that's 3:1 against.

I suspect we'll revisit this once you see all the other
instrumentation I plan for 9.1. It will be much easier to make sense
of all the numbers in consistent units. But we'll see then.

I won't be able to do the rollback until about 11pm EST again today.


--
greg