Thread: Re: Incorrect EXPLAIN ANALYZE output in bloom index docs

Re: Incorrect EXPLAIN ANALYZE output in bloom index docs

From

Yan Chengpeng

Date:

10 December 2024, 04:33:44


> On Dec 9, 2024, at 15:53, David Rowley <dgrowleyml@gmail.com> wrote:
> 
> I was fixing up the patch in [1] with the intention of committing it
> when I noticed that there are a few outdated EXPLAIN ANALYZE examples
> in the documents for the bloom contrib module.
> 
> The example outputs look like they've been created with a 100 thousand
> row table, but the commands given are to insert 10 million rows into
> that table. I suspect someone did a 100x on the example row count at
> some point during development and forgot to update the EXPLAIN output.
> 
> The patch I want to commit adds buffer outputs to these EXPLAINs, so I
> kinda need to fix these before adding that, otherwise what I want to
> add does not make any sense.
> 
> Patch attached. I propose to backpatch this fix.
> 
> David
> <v1-0001-Doc-fix-incorrect-EXPLAIN-ANALYZE-output-for-bloo.patch>

Most changes look good to me. Only two small comments here:

1. Why did the following part change from ‘never executed’ to execute. Why the previous state is `never executed`?
```
-         ->  Bitmap Index Scan on btreeidx2  (cost=0.00..12.04 rows=500 width=0) (never executed)
+         ->  Bitmap Index Scan on btreeidx2  (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8
loops=1)
```

2. There is one sentence in the old one that says, `Although this query runs much faster than with either of the single
indexes,we pay a penalty in index size. Each of the single-column btree indexes occupies 2 MB, so the total space
neededis 12 MB, eight times the space used by the bloom index.` I think the size also needs to be changed.
 

- Yan

Re: Incorrect EXPLAIN ANALYZE output in bloom index docs

From

David Rowley

Date:

10 December 2024, 04:56:40

On Tue, 10 Dec 2024 at 14:33, Yan Chengpeng <chengpeng_yan@outlook.com> wrote:
> Most changes look good to me. Only two small comments here:
>
> 1. Why did the following part change from ‘never executed’ to execute. Why the previous state is `never executed`?
> ```
> -         ->  Bitmap Index Scan on btreeidx2  (cost=0.00..12.04 rows=500 width=0) (never executed)
> +         ->  Bitmap Index Scan on btreeidx2  (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8
loops=1)
> ```

This was me just aligning the output I observed with the documents.  I
expect it previously stated "(never executed)" because the smaller
number of rows made it less likely that a matching row was found.
Since I ran the queries with 100x more rows than what the previous
output had shown, it's much more likely that one of the random numbers
generated during the INSERT matched the WHERE clause.

As for why the short-circuit was previously hit and execution was
skipped for that node, that's because of the following nodeBitmapAnd.c
code:

/*
* If at any stage we have a completely empty bitmap, we can fall out
* without evaluating the remaining subplans, since ANDing them can no
* longer change the result.  (Note: the fact that indxpath.c orders
* the subplans by selectivity should make this case more likely to
* occur.)
*/
if (tbm_is_empty(result))
   break;

i.e, if we find a BitmapAnd child with an empty bitmap result, there's
no point in executing any more children as (logically) it's guaranteed
that a bitwise-AND with the empty set results in an empty set.

> 2. There is one sentence in the old one that says, `Although this query runs much faster than with either of the
singleindexes, we pay a penalty in index size. Each of the single-column btree indexes occupies 2 MB, so the total
spaceneeded is 12 MB, eight times the space used by the bloom index.` I think the size also needs to be changed. 

Thanks for checking. I adjusted this earlier and pushed the result
[1]. I should have mentioned that here, but I forgot, sorry.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=36d0229b8ff5907