Thread: Improve output of BitmapAnd EXPLAIN ANALYZE

Improve output of BitmapAnd EXPLAIN ANALYZE

From

Jim Nasby

Date:

20 October 2016, 20:24:41

A customer just pinged me wondering how it was that a BitmapAnd node was 
reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
fact generated tuples.

While this is mentioned in the docs, I think it would be very helpful to 
have ANALYZE spit out "N/A" instead of 0 for these nodes. AFAICT that 
would just require adding a special case to the "if (es->costs)" block 
at line ~1204 in explain.c?

BTW, it looks like it would actually be possible to return a real 
row-count if none of the TIDBitmap pages are chunks, but I'm not sure if 
it's worth the extra effort.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Tom Lane

Date:

20 October 2016, 20:35:16

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> A customer just pinged me wondering how it was that a BitmapAnd node was 
> reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
> fact generated tuples.

> While this is mentioned in the docs, I think it would be very helpful to 
> have ANALYZE spit out "N/A" instead of 0 for these nodes.

That would break code that tries to parse that stuff, eg depesz.com.
        regards, tom lane

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Stephen Frost

Date:

20 October 2016, 20:43:28

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> > A customer just pinged me wondering how it was that a BitmapAnd node was
> > reporting 0 tuples when the Bitmap Heap Scan above it showed it had in
> > fact generated tuples.
>
> > While this is mentioned in the docs, I think it would be very helpful to
> > have ANALYZE spit out "N/A" instead of 0 for these nodes.
>
> That would break code that tries to parse that stuff, eg depesz.com.

I don't believe Jim was suggesting that we back-patch such a change.

Changing it in a new major release seems entirely reasonable.

Thanks!

Stephen

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Tom Lane

Date:

20 October 2016, 23:20:14

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> That would break code that tries to parse that stuff, eg depesz.com.

> I don't believe Jim was suggesting that we back-patch such a change.

I don't either.

> Changing it in a new major release seems entirely reasonable.

It's still a crock though.  I wonder whether it wouldn't be better to
change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
it expends extra effort to try to produce a rowcount number.

We could certainly run through the result bitmap and count the number
of exact-TID bits.  I don't see a practical way of doing something
with lossy page bits, but maybe those occur infrequently enough
that we could ignore them?  Or we could arbitrarily decide that
a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
less arbitrarily, count it as the relation's average number
of tuples per page.
        regards, tom lane

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Stephen Frost

Date:

21 October 2016, 13:22:31

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Changing it in a new major release seems entirely reasonable.
>
> It's still a crock though.  I wonder whether it wouldn't be better to
> change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
> it expends extra effort to try to produce a rowcount number.

I'm certainly all for doing something better, just didn't think that we
should be worried about making a change to the EXPLAIN ANALYZE output in
a major release because Depesz might have to update the explain site.

> We could certainly run through the result bitmap and count the number
> of exact-TID bits.  I don't see a practical way of doing something
> with lossy page bits, but maybe those occur infrequently enough
> that we could ignore them?  Or we could arbitrarily decide that
> a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
> less arbitrarily, count it as the relation's average number
> of tuples per page.

Counting each page as the relation's average number of tuples per page
seems entirely reasonable to me, for what that is trying to report.

That said, I'm a big fan of how we have more detail for things like a
HashJoin (buckets, batches, memory usage) and it might be nice to have
more information like that for a BitmapAnd (and friends).  In
particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
Knowing that the bitmap has gotten to the point of being lossy might
indicate that a user could up work_mem, for example, and possibly avoid
recheck costs.

Thanks!

Stephen

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Jim Nasby

Date:

21 October 2016, 15:28:30

On 10/21/16 8:21 AM, Stephen Frost wrote:
> Counting each page as the relation's average number of tuples per page
> seems entirely reasonable to me, for what that is trying to report.

My concern is that still leaves a lot of room for confusion when 
interpreting EXPLAIN ANALYZE. Every other node will tell you exactly 
what happened and it's pretty easy to reason about whether rows should 
have gone up or down based on the type of node. You can't do that for 
Bitmap(And|Or) unless you know the details of how TIDBitmaps work. 
Reporting N/A makes it crystal clear that these nodes operate very 
differently than all the others.

(On a related note, it would also be nice if we reported fractional rows 
when the row count low and loops is high.)

> That said, I'm a big fan of how we have more detail for things like a
> HashJoin (buckets, batches, memory usage) and it might be nice to have
> more information like that for a BitmapAnd (and friends).  In
> particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
> Knowing that the bitmap has gotten to the point of being lossy might
> indicate that a user could up work_mem, for example, and possibly avoid
> recheck costs.

I think that's the best way to handle this: report N/A in the header and 
then provide details on exact vs lossy. That provides a clear indication 
to users that these kinds of nodes are special, as well as a reminder as 
to why they're special. Certainly the node could report an exact 
rowcount in the header if there were no lossy pages too.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Stephen Frost

Date:

21 October 2016, 17:31:16

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 10/21/16 8:21 AM, Stephen Frost wrote:
> >Counting each page as the relation's average number of tuples per page
> >seems entirely reasonable to me, for what that is trying to report.
>
> My concern is that still leaves a lot of room for confusion when
> interpreting EXPLAIN ANALYZE. Every other node will tell you exactly
> what happened and it's pretty easy to reason about whether rows
> should have gone up or down based on the type of node. You can't do
> that for Bitmap(And|Or) unless you know the details of how
> TIDBitmaps work. Reporting N/A makes it crystal clear that these
> nodes operate very differently than all the others.

I don't see why you think the numbers reported by BitmapAnd based on
this approach wouldn't go up and down in a similar manner to what you
would expect to get, based on that node type.  Reporting N/A is entirely
punting on it when we have perfectly useful information that can be
reported.

> (On a related note, it would also be nice if we reported fractional
> rows when the row count low and loops is high.)

I can certainly understand that, though I think I'd rather have an
actual 'total' value or similar instead, but that's really a different
discussion.

> >That said, I'm a big fan of how we have more detail for things like a
> >HashJoin (buckets, batches, memory usage) and it might be nice to have
> >more information like that for a BitmapAnd (and friends).  In
> >particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
> >Knowing that the bitmap has gotten to the point of being lossy might
> >indicate that a user could up work_mem, for example, and possibly avoid
> >recheck costs.
>
> I think that's the best way to handle this: report N/A in the header
> and then provide details on exact vs lossy. That provides a clear
> indication to users that these kinds of nodes are special, as well
> as a reminder as to why they're special. Certainly the node could
> report an exact rowcount in the header if there were no lossy pages
> too.

I don't see why we would want to stick 'N/A' in for the header, even if
we are reporting the details, when we can provide a pretty reasonable
number.  In particular, I certainly don't think we would want to report
N/A sometimes (lossy case) and then an actual number other times (all
exact case).  That strikes me as much more likely to be confusing.

Thanks!

Stephen

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Jim Nasby

Date:

21 October 2016, 21:06:18

On 10/21/16 12:30 PM, Stephen Frost wrote:
> I don't see why we would want to stick 'N/A' in for the header, even if
> we are reporting the details, when we can provide a pretty reasonable
> number.

Because then it's absolutely clear that we don't have a valid rowcount, 
only a guess (and a guess that's potentially off by a lot).

No one is used to seeing "N/A" in explain, so when they do see it 
they'll immediately realize they don't know what's going on and hit 
google or the docs up. Otherwise they'll just think it's an accurate 
rowcount like for any other node...

> In particular, I certainly don't think we would want to report
> N/A sometimes (lossy case) and then an actual number other times (all
> exact case).  That strikes me as much more likely to be confusing.

Fair enough. I'd certainly rather have a constant N/A then a guess at 
the rowcount.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Emre Hasegeli

Date:

31 October 2016, 10:57:01

The BRIN Bitmap Index Scan has the same problem.  I have seen people
confused by this.  I think N/A would clearly improve the situation.

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Robert Haas

Date:

01 November 2016, 13:21:24

On Mon, Oct 31, 2016 at 6:56 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
> The BRIN Bitmap Index Scan has the same problem.  I have seen people
> confused by this.  I think N/A would clearly improve the situation.

I agree.  Or perhaps better still, leave rows=%.0f out altogether when
we don't have a meaningful value to report.  If it were OK to use some
unimportant-looking value as a proxy for "undefined", the SQL standard
wouldn't include nulls.

I don't like Tom's proposal of trying to fake up a value here when
EXPLAIN ANALYZE is in use.  Reporting "exact" and "lossy" values for
BitmapAnd would be a fine enhancement, but artificially trying to
flatten that back into a row count is going to be confusing, not
helpful.  (Just last week I saw a case where the fact that many pages
were being lossified caused a performance problem ... so treating
lossy pages as if they don't exist would have led to a lot of
head-scratching, because under Tom's proposal the row count would have
been way off.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Tom Lane

Date:

01 November 2016, 13:46:53

Robert Haas <robertmhaas@gmail.com> writes:
> I don't like Tom's proposal of trying to fake up a value here when
> EXPLAIN ANALYZE is in use.  Reporting "exact" and "lossy" values for
> BitmapAnd would be a fine enhancement, but artificially trying to
> flatten that back into a row count is going to be confusing, not
> helpful.  (Just last week I saw a case where the fact that many pages
> were being lossified caused a performance problem ... so treating
> lossy pages as if they don't exist would have led to a lot of
> head-scratching, because under Tom's proposal the row count would have
> been way off.)

It would very often be the case that the value I suggested would be exact,
so this complaint seems off-base to me.

If we were willing to add an additional output line, we could also report
the number of lossy pages in the result bitmap, and people would then
know not to trust the reported rowcount as gospel.  But it's still useful
to have it.  I'm envisioning output like
  ->  BitmapOr  (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1)

in the no-lossy-pages case, otherwise
  ->  BitmapOr  (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1)        Lossy Bitmap: exact
entries=2469,lossy pages=123 

There's nothing misleading about that, IMO.  (Exercise for the reader:
what rows/page estimate did I assume?)
        regards, tom lane

Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From

Robert Haas

Date:

01 November 2016, 14:11:03

On Tue, Nov 1, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I don't like Tom's proposal of trying to fake up a value here when
>> EXPLAIN ANALYZE is in use.  Reporting "exact" and "lossy" values for
>> BitmapAnd would be a fine enhancement, but artificially trying to
>> flatten that back into a row count is going to be confusing, not
>> helpful.  (Just last week I saw a case where the fact that many pages
>> were being lossified caused a performance problem ... so treating
>> lossy pages as if they don't exist would have led to a lot of
>> head-scratching, because under Tom's proposal the row count would have
>> been way off.)
>
> It would very often be the case that the value I suggested would be exact,
> so this complaint seems off-base to me.

From my point of view, something that very often gives the right
answers isn't acceptable.  We certainly wouldn't accept a query
optimization that very often gives the right answers.  It's gotta
always give the right answer.

> If we were willing to add an additional output line, we could also report
> the number of lossy pages in the result bitmap, and people would then
> know not to trust the reported rowcount as gospel.  But it's still useful
> to have it.  I'm envisioning output like
>
>    ->  BitmapOr  (cost=... rows=2000 width=0) (actual time=... rows=1942 loops=1)
>
> in the no-lossy-pages case, otherwise
>
>    ->  BitmapOr  (cost=... rows=4000 width=0) (actual time=... rows=3945 loops=1)
>          Lossy Bitmap: exact entries=2469, lossy pages=123
>
> There's nothing misleading about that, IMO.  (Exercise for the reader:
> what rows/page estimate did I assume?)

(4000-2469)/123 = 12.44715 ?

I think it's inherently misleading to report values that were
concocted specifically for EXPLAIN ANALYZE.  Things that we report
there should have some underlying reality or relevance.  People -
including me - tend to assume they do, and you don't want to spend
time chasing down something that's PURELY an EXPLAIN ANALYZE artifact
with no actual relevance to the runtime behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company