Re: Improve output of BitmapAnd EXPLAIN ANALYZE - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Improve output of BitmapAnd EXPLAIN ANALYZE
Date
Msg-id 20161021173010.GC13284@tamriel.snowman.net
Whole thread Raw
In response to Re: Improve output of BitmapAnd EXPLAIN ANALYZE  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Improve output of BitmapAnd EXPLAIN ANALYZE  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 10/21/16 8:21 AM, Stephen Frost wrote:
> >Counting each page as the relation's average number of tuples per page
> >seems entirely reasonable to me, for what that is trying to report.
>
> My concern is that still leaves a lot of room for confusion when
> interpreting EXPLAIN ANALYZE. Every other node will tell you exactly
> what happened and it's pretty easy to reason about whether rows
> should have gone up or down based on the type of node. You can't do
> that for Bitmap(And|Or) unless you know the details of how
> TIDBitmaps work. Reporting N/A makes it crystal clear that these
> nodes operate very differently than all the others.

I don't see why you think the numbers reported by BitmapAnd based on
this approach wouldn't go up and down in a similar manner to what you
would expect to get, based on that node type.  Reporting N/A is entirely
punting on it when we have perfectly useful information that can be
reported.

> (On a related note, it would also be nice if we reported fractional
> rows when the row count low and loops is high.)

I can certainly understand that, though I think I'd rather have an
actual 'total' value or similar instead, but that's really a different
discussion.

> >That said, I'm a big fan of how we have more detail for things like a
> >HashJoin (buckets, batches, memory usage) and it might be nice to have
> >more information like that for a BitmapAnd (and friends).  In
> >particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
> >Knowing that the bitmap has gotten to the point of being lossy might
> >indicate that a user could up work_mem, for example, and possibly avoid
> >recheck costs.
>
> I think that's the best way to handle this: report N/A in the header
> and then provide details on exact vs lossy. That provides a clear
> indication to users that these kinds of nodes are special, as well
> as a reminder as to why they're special. Certainly the node could
> report an exact rowcount in the header if there were no lossy pages
> too.

I don't see why we would want to stick 'N/A' in for the header, even if
we are reporting the details, when we can provide a pretty reasonable
number.  In particular, I certainly don't think we would want to report
N/A sometimes (lossy case) and then an actual number other times (all
exact case).  That strikes me as much more likely to be confusing.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Default setting for autovacuum_freeze_max_age
Next
From: Robert Haas
Date:
Subject: Re: PSA: Systemd will kill PostgreSQL