Thread: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
Hi,

I think that lossy-heap-block information for a bitmap heap scan, not just "Rows
Removed by Index Recheck" information, would also be a clue used to tune
work_mem for better performance especially when the bitmap heap scan uses an
index such as gin or gist, not btree.

So here's a patch that adds the information to the EXPLAIN ANALYZE output.  The
following shows an example.  The number of lossy-heap-block fetches (ie
tbmres->ntuples = -1) as well as that of exact-heap-block fetches (ie
tbmres->ntuples >= 0) are shown in the "Heap Blocks" line.

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
                QUERY PLAN
 
--------------------------------------------------------------------------------
------------------------------------------------Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34)
(actual
time=24.907..1119.961 rows=100047 loops=1)  Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
precision))  Rows Removed by Index Recheck: 5484114  Heap Blocks: exact=11975 lossy=46388  ->  Bitmap Index Scan on
demo_idx (cost=0.00..2690.09 rows=105766 width=0)
 
(actual time=22.821..22.821 rows=100047 loops=1)        Index Cond: ((col2 >= 0.01::double precision) AND (col2 <=
0.02::double
precision))Total runtime: 1129.334 ms
(7 rows)

Comments welcome.

Thanks,

Best regards,
Etsuro Fujita

Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Fujii Masao
Date:
On Thu, Oct 31, 2013 at 7:54 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Hi,
>
> I think that lossy-heap-block information for a bitmap heap scan, not just "Rows
> Removed by Index Recheck" information, would also be a clue used to tune
> work_mem for better performance especially when the bitmap heap scan uses an
> index such as gin or gist, not btree.
>
> So here's a patch that adds the information to the EXPLAIN ANALYZE output.  The
> following shows an example.  The number of lossy-heap-block fetches (ie
> tbmres->ntuples = -1) as well as that of exact-heap-block fetches (ie
> tbmres->ntuples >= 0) are shown in the "Heap Blocks" line.
>
> postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
>                                                            QUERY PLAN
> --------------------------------------------------------------------------------
> ------------------------------------------------
>  Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) (actual
> time=24.907..1119.961 rows=100047 loops=1)
>    Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
> precision))
>    Rows Removed by Index Recheck: 5484114
>    Heap Blocks: exact=11975 lossy=46388
>    ->  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
> (actual time=22.821..22.821 rows=100047 loops=1)
>          Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
> precision))
>  Total runtime: 1129.334 ms
> (7 rows)
>
> Comments welcome.

This is what I'm looking for! This feature is really useful for tuning work_mem
when using full text search with pg_trgm.

I'm not sure if it's good idea to show the number of the fetches because
it seems difficult to tune work_mem from that number. How can we calculate
how much to increase work_mem to avoid lossy bitmap from the number of
the fetches in EXPLAIN output?

Anyway, could you add the patch into next CF?

Regards,

-- 
Fujii Masao



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
> From: Fujii Masao [mailto:masao.fujii@gmail.com]

> This is what I'm looking for! This feature is really useful for tuning
work_mem
> when using full text search with pg_trgm.
> 
> I'm not sure if it's good idea to show the number of the fetches because it
> seems difficult to tune work_mem from that number. How can we calculate how
> much to increase work_mem to avoid lossy bitmap from the number of the fetches
> in EXPLAIN output?

We can calculate that from the following equation in tbm_create():
 nbuckets = maxbytes /   (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))   + sizeof(Pointer) +
sizeof(Pointer)),

where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
designated by work_mem, and nbuckets is the estimated number of hashtable
entries we can have within maxbytes.  From this, the size of work_mem within
which we can have every hashtable entry as an exact bitmap is calculated as
follows:
 work_mem = (the number of exact pages + the number of lossy pages) *   (MAXALIGN(sizeof(HASHELEMENT)) +
MAXALIGN(sizeof(PagetableEntry))  + sizeof(Pointer) + sizeof(Pointer)) /   (1024 * 1024).
 

I'll show you an example.  The following is the result for work_mem = 1MB:

> > postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
> 0.02;
> >                                                            QUERY PLAN
> > ----------------------------------------------------------------------
> > ----------
> > ------------------------------------------------
> >  Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766
> > width=34) (actual
> > time=24.907..1119.961 rows=100047 loops=1)
> >    Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <=
> > 0.02::double
> > precision))
> >    Rows Removed by Index Recheck: 5484114
> >    Heap Blocks: exact=11975 lossy=46388
> >    ->  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
> > width=0) (actual time=22.821..22.821 rows=100047 loops=1)
> >          Index Cond: ((col2 >= 0.01::double precision) AND (col2 <=
> > 0.02::double
> > precision))
> >  Total runtime: 1129.334 ms
> > (7 rows)

So, by setting work_mem to
 work_mem = (11975 + 46388) *   (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))   + sizeof(Pointer) +
sizeof(Pointer))/   (1024 * 1024),
 

which is about 5MB, we have the following (Note that no lossy heap pages!):

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
                QUERY PLAN
 

--------------------------------------------------------------------------------
----------------------------
--------------------Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) (actual
time=42.981..120.252 rows=1
00047 loops=1)  Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
precision))  Heap Blocks: exact=58363  ->  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
(actual time=26.023..26.023 r
ows=100047 loops=1)        Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
precision))Total runtime: 129.304 ms
(6 rows)

BTW, as the EXPLAIN ANALYZE output, the number of exact/lossy heap pages would
be fine with me.

> Anyway, could you add the patch into next CF?

Done.

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Amit Khandekar
Date:



On 1 November 2013 16:32, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
> From: Fujii Masao [mailto:masao.fujii@gmail.com]

> This is what I'm looking for! This feature is really useful for tuning
work_mem
> when using full text search with pg_trgm.
>
> I'm not sure if it's good idea to show the number of the fetches because it
> seems difficult to tune work_mem from that number. How can we calculate how
> much to increase work_mem to avoid lossy bitmap from the number of the fetches
> in EXPLAIN output?

We can calculate that from the following equation in tbm_create():

  nbuckets = maxbytes /
    (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
    + sizeof(Pointer) + sizeof(Pointer)),

where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
designated by work_mem, and nbuckets is the estimated number of hashtable
entries we can have within maxbytes.  From this, the size of work_mem within
which we can have every hashtable entry as an exact bitmap is calculated as
follows:

  work_mem = (the number of exact pages + the number of lossy pages) *
    (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
    + sizeof(Pointer) + sizeof(Pointer)) /
    (1024 * 1024).

I am yet to give more thought on the above formula (particularly exact_pages + lossy_pages), but  I was also wondering if the user would indeed be able to figure out the above way to estimate the memory, or the explain itself should show the estimated memory  required for the bitmap. For hash joins we do show the memory taken by the hash table in show_hash_info(). We can show the memory requirement in addition to the number of exact/lossy pages. 


I'll show you an example.  The following is the result for work_mem = 1MB:

> > postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
> 0.02;
> >                                                            QUERY PLAN
> > ----------------------------------------------------------------------
> > ----------
> > ------------------------------------------------
> >  Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766
> > width=34) (actual
> > time=24.907..1119.961 rows=100047 loops=1)
> >    Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <=
> > 0.02::double
> > precision))
> >    Rows Removed by Index Recheck: 5484114
> >    Heap Blocks: exact=11975 lossy=46388
> >    ->  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
> > width=0) (actual time=22.821..22.821 rows=100047 loops=1)
> >          Index Cond: ((col2 >= 0.01::double precision) AND (col2 <=
> > 0.02::double
> > precision))
> >  Total runtime: 1129.334 ms
> > (7 rows)

So, by setting work_mem to

  work_mem = (11975 + 46388) *
    (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
    + sizeof(Pointer) + sizeof(Pointer)) /
    (1024 * 1024),

which is about 5MB, we have the following (Note that no lossy heap pages!):

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
                                                           QUERY PLAN

--------------------------------------------------------------------------------
----------------------------
--------------------
 Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) (actual
time=42.981..120.252 rows=1
00047 loops=1)
   Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
precision))
   Heap Blocks: exact=58363
   ->  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
(actual time=26.023..26.023 r
ows=100047 loops=1)
         Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double
precision))
 Total runtime: 129.304 ms
(6 rows)

BTW, as the EXPLAIN ANALYZE output, the number of exact/lossy heap pages would
be fine with me.

> Anyway, could you add the patch into next CF?

Done.

Thanks,

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
From: Amit Khandekar [mailto:amit.khandekar@enterprisedb.com]
> On 1 November 2013 16:32, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:
>> From: Fujii Masao [mailto:masao.fujii@gmail.com]

>>> I'm not sure if it's good idea to show the number of the fetches because it
>>> seems difficult to tune work_mem from that number. How can we calculate how
>>> much to increase work_mem to avoid lossy bitmap from the number of the fetches
>>> in EXPLAIN output?

>> We can calculate that from the following equation in tbm_create():

>>   nbuckets = maxbytes /
>>     (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
>>     + sizeof(Pointer) + sizeof(Pointer)),

>> where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
>> designated by work_mem, and nbuckets is the estimated number of hashtable
>> entries we can have within maxbytes.  From this, the size of work_mem within
>> which we can have every hashtable entry as an exact bitmap is calculated as
>> follows:

>>   work_mem = (the number of exact pages + the number of lossy pages) *
>>     (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
>>     + sizeof(Pointer) + sizeof(Pointer)) /
>>     (1024 * 1024).

> I am yet to give more thought on the above formula
> (particularly exact_pages + lossy_pages), but  I was also wondering if the user
> would indeed be able to figure out the above way to estimate the memory, or the
> explain itself should show the estimated memory  required for the bitmap. For
> hash joins we do show the memory taken by the hash table in show_hash_info(). We
> can show the memory requirement in addition to the number of exact/lossy pages.

Thank you for the review!

Reconsidering that, I wish to know your opinion.  The patch shows the number of exact/lossy pages that has been fetched
ina bitmap heap scan.  But the number varies with the fraction of tuples to be retrieved like the following. 

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
                  QUERY PLAN 

------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on demo  (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.684..1302.382 rows=99803 loops=1)
RecheckCond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))  Rows Removed by Index Recheck:
6279502 Heap Blocks: exact=1990 lossy=59593  ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919
width=0)(actual time=23.330..23.330 rows=99803 loops=1)        Index Cond: ((col2 >= 0.01::double precision) AND (col2
<=0.02::double precision))Total runtime: 1311.949 ms 
(7 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02 LIMIT 5000;
                                QUERY PLAN 

------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=2187.35..7008.26 rows=5000 width=42) (actual time=23.543..86.093 rows=5000 loops=1)  ->  Bitmap Heap Scan on demo
(cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.542..85.196 rows=5000 loops=1)        Recheck Cond:
((col2>= 0.01::double precision) AND (col2 <= 0.02::double precision))        Rows Removed by Index Recheck: 312179
  Heap Blocks: exact=99 lossy=2963        ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919
width=0)(actual time=23.189..23.189 rows=99803 loops=1)              Index Cond: ((col2 >= 0.01::double precision) AND
(col2<= 0.02::double precision))Total runtime: 86.626 ms 
(8 rows)

So, my question is, we should show the number of exact/lossy pages in a TIDBitmap, not the number of these pages that
hasbeen fetched in the bitmap heap scan? 

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Amit Khandekar
Date:



On 25 November 2013 13:37, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:

Reconsidering that, I wish to know your opinion.  The patch shows the number of exact/lossy pages that has been fetched in a bitmap heap scan.  But the number varies with the fraction of tuples to be retrieved like the following.

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.684..1302.382 rows=99803 loops=1)
   Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))
   Rows Removed by Index Recheck: 6279502
   Heap Blocks: exact=1990 lossy=59593
   ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.330..23.330 rows=99803 loops=1)
         Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))
 Total runtime: 1311.949 ms
(7 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02 LIMIT 5000;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2187.35..7008.26 rows=5000 width=42) (actual time=23.543..86.093 rows=5000 loops=1)
   ->  Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919 width=42) (actual time=23.542..85.196 rows=5000 loops=1)
         Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))
         Rows Removed by Index Recheck: 312179
         Heap Blocks: exact=99 lossy=2963
         ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919 width=0) (actual time=23.189..23.189 rows=99803 loops=1)
               Index Cond: ((col2 >= 0.01::double precision) AND (col2 <= 0.02::double precision))
 Total runtime: 86.626 ms
(8 rows)

So, my question is, we should show the number of exact/lossy pages in a TIDBitmap, not the number of these pages that has been fetched in the bitmap heap scan?

Yes, I agree that rather than looking at the bitmap heap scan to track the number of pages, we should look somewhere in the underlying index scan. Yes, we should get a constant number of index pages regardless of the actual parent table rows. I can see that btgetbitmap() adds all the tuples into the bitmap, so somewhere below under btgetbitmap() might be the right place to track.  Somewhere in tbm_create_pagetable(), but not sure.


Thanks,

Best regards,
Etsuro Fujita


Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
Amit Khandekar wrote:
> On 25 November 2013 13:37, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote:

>> So, my question is, we should show the number of exact/lossy pages in a
>> TIDBitmap, not the number of these pages that has been fetched in the bitmap
>> heap scan?

> Yes, I agree that rather than looking at the bitmap heap scan to track the
> number of pages, we should look somewhere in the underlying index scan. Yes,
> we should get a constant number of index pages regardless of the actual
> parent table rows. I can see that btgetbitmap() adds all the tuples into the
> bitmap, so somewhere below under btgetbitmap() might be the right place to
> track.  Somewhere in tbm_create_pagetable(), but not sure.

Thank you for the comment!

I agree with you.  I'll modify the patch to show 1) the number of the exact/lossy pages in a TIDBitmap by examining the
underlyingindex scan, not the number of these pages that have been fetched in the bitmap heap scan, and 2) the memory
requirement.

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
I wrote:
> Amit Khandekar wrote:
> > Yes, I agree that rather than looking at the bitmap heap scan to track
> > the number of pages, we should look somewhere in the underlying index
> > scan. Yes, we should get a constant number of index pages regardless
> > of the actual parent table rows.

> I agree with you.  I'll modify the patch to show 1) the number of the
> exact/lossy pages in a TIDBitmap by examining the underlying index scan,
> not the number of these pages that have been fetched in the bitmap heap
> scan, and 2) the memory requirement.

Though at first I agreed on this, while working on this I start to think information about (2) is enough for tuning
work_mem. Here are examples using a version under development, where "Bitmap Memory Usage" means (peak) memory space
usedby a TIDBitmap, and "Desired" means the memory required to guarantee non-lossy storage of a TID set, which is shown
onlywhen the TIDBitmap has been lossified.  (work_mem = 1MB.) 

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.0001 and 0.0005 ;
                   QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on demo  (cost=77.14..12142.69 rows=3581 width=42) (actual time=1.748..53.203 rows=4112 loops=1)  Recheck
Cond:((col2 >= 0.0001::double precision) AND (col2 <= 0.0005::double precision))  Bitmap Memory Usage: 315kB  ->
BitmapIndex Scan on demo_col2_idx  (cost=0.00..76.25 rows=3581 width=0) (actual time=1.113..1.113 rows=4112 loops=1)
   Index Cond: ((col2 >= 0.0001::double precision) AND (col2 <= 0.0005::double precision))Total runtime: 53.804 ms 
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.05 ;
                   QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------------Bitmap
HeapScan on demo  (cost=8307.41..107635.14 rows=391315 width=42) (actual time=84.818..2709.015 rows=400172 loops=1)
RecheckCond: ((col2 >= 0.01::double precision) AND (col2 <= 0.05::double precision))  Rows Removed by Index Recheck:
8815752 Bitmap Memory Usage: 1025kB (desired 20573kB)  ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..8209.58
rows=391315width=0) (actual time=83.664..83.664 rows=400172 loops=1)        Index Cond: ((col2 >= 0.01::double
precision)AND (col2 <= 0.05::double precision))Total runtime: 2747.088 ms 
(7 rows)

We should look at (1) as well?  (Honestly, I don't know what to show about (1) when using a bitmap scan on the inside
ofa nestloop join.  For memory usage and desired memory I think the maximum values would be fine.)  I re-wish to know
youropinion. 

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Robert Haas
Date:
On Fri, Dec 6, 2013 at 5:02 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Though at first I agreed on this, while working on this I start to think information about (2) is enough for tuning
work_mem. Here are examples using a version under development, where "Bitmap Memory Usage" means (peak) memory space
usedby a TIDBitmap, and "Desired" means the memory required to guarantee non-lossy storage of a TID set, which is shown
onlywhen the TIDBitmap has been lossified.  (work_mem = 1MB.) 

I'd be wary of showing a desired value unless it's highly likely to be accurate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
Robert Haas wrote:
> On Fri, Dec 6, 2013 at 5:02 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>
> wrote:
> > Though at first I agreed on this, while working on this I start to
> > think information about (2) is enough for tuning work_mem.  Here are
> > examples using a version under development, where "Bitmap Memory
> > Usage" means (peak) memory space used by a TIDBitmap, and "Desired"
> > means the memory required to guarantee non-lossy storage of a TID set,
> > which is shown only when the TIDBitmap has been lossified.  (work_mem
> > = 1MB.)

> I'd be wary of showing a desired value unless it's highly likely to be
> accurate.

Thank you for the comments!

The desired value is accurately estimated based on (a) the total number of
exact/lossy pages stored in the TIDBitmap and (b) the following equation in
tbm_create(), except for the GIN case where lossy pages are added to the
TIDBitmap by tbm_add_page().
   /*    * Estimate number of hashtable entries we can have within maxbytes. ...    */   nbuckets = maxbytes /
(MAXALIGN(sizeof(HASHELEMENT))+ MAXALIGN(sizeof(PagetableEntry))        + sizeof(Pointer) + sizeof(Pointer));
 

In the GIN case, however, the version under development has a risk of the
overestimation.  (And in that case, in my understanding, we can't guarantee
non-lossy storage of the TIDBitmap any more.)  So, for that case, I think to
change the message for the desired value a bit.  I'll submit the patch
later.

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
I wrote:
> Robert Haas wrote:
> > I'd be wary of showing a desired value unless it's highly likely to be
> > accurate.

> The desired value is accurately estimated based on (a) the total number
> of exact/lossy pages stored in the TIDBitmap and (b) the following
equation
> in tbm_create(), except for the GIN case where lossy pages are added to
> the TIDBitmap by tbm_add_page().

>     /*
>      * Estimate number of hashtable entries we can have within maxbytes.
...
>      */
>     nbuckets = maxbytes /
>         (MAXALIGN(sizeof(HASHELEMENT)) +
> MAXALIGN(sizeof(PagetableEntry))
>          + sizeof(Pointer) + sizeof(Pointer));

> In the GIN case, however, the version under development has a risk of the
> overestimation.  (And in that case, in my understanding, we can't
guarantee
> non-lossy storage of the TIDBitmap any more.)  So, for that case, I think
> to change the message for the desired value a bit.  I'll submit the patch
> later.

On second thoughts, I've modified the patch so that the EXPLAIN ANALYZE
command shows not only the desired value but the total number of exact/lossy
heap blocks that have been fetched in query execution because ISTM the
latter is also useful for tuning work_mem, when an available memory capacity
is not so large as the desired value, or when non-lossy storage of the
TIDBitmap can't be guaranteed as mentioned above.  Here is an example.
Attached is an updated version of the patch, though a sufficient test hasn't
been performed.

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
0.02 ;                                                           QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------Bitmap Heap Scan on demo  (cost=2072.10..100674.45 rows=97528
width=42)
(actual time=27.387..1677.511 rows=99833 loops=1)  Recheck Cond: ((col2 >= 0.01::double precision) AND (col2 <=
0.02::double
precision))  Rows Removed by Index Recheck: 5581690  Heap Blocks: exact=8585 lossy=52980  Bitmap Memory Usage: 1025kB
(4810kBdesired)  ->  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2047.71 rows=97528
 
width=0) (actual time=25.884..25.884 rows=99833 loops=1)        Index Cond: ((col2 >= 0.01::double precision) AND (col2
<=
0.02::double precision))Total runtime: 1687.047 ms
(8 rows)

Thanks,

Best regards,
Etsuro Fujita

Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
> I wrote:
> > Robert Haas wrote:
> > > I'd be wary of showing a desired value unless it's highly likely to
> > > be accurate.

> > The desired value is accurately estimated based on (a) the total
> > number of exact/lossy pages stored in the TIDBitmap and (b) the
> > following equation in tbm_create(), except for the GIN case where
> > lossy pages are added to the TIDBitmap by tbm_add_page().

I've found there is another risk of overestimating the desired memory space
for a BitmapAnded TIDBitmap.  I'm inclined to get rid of the estimation
functionality from the patch completely, and leave it for future work.
Attached is a new version of the patch, which shows only fetch block
information and memory usage information.  I'll add this to the upcoming CF.

Thanks,

Best regards,
Etsuro Fujita

Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Robert Haas
Date:
On Fri, Dec 27, 2013 at 1:47 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
>> I wrote:
>> > Robert Haas wrote:
>> > > I'd be wary of showing a desired value unless it's highly likely to
>> > > be accurate.
>
>> > The desired value is accurately estimated based on (a) the total
>> > number of exact/lossy pages stored in the TIDBitmap and (b) the
>> > following equation in tbm_create(), except for the GIN case where
>> > lossy pages are added to the TIDBitmap by tbm_add_page().
>
> I've found there is another risk of overestimating the desired memory space
> for a BitmapAnded TIDBitmap.  I'm inclined to get rid of the estimation
> functionality from the patch completely, and leave it for future work.
> Attached is a new version of the patch, which shows only fetch block
> information and memory usage information.  I'll add this to the upcoming CF.

I spent some time looking at this tonight.  I don't think the value
that is displayed for the bitmap memory tracking will be accurate in
complex cases.  The bitmap heap scan may sit on top of one or more
bitmap-and or bitmap-or nodes.  When a bitmap-and operation happens,
one of the two bitmaps being combined will be thrown out and the
number of entries in the other map will, perhaps, be decreased.  The
peak memory usage for the surviving bitmap will be reflected in the
number displayed for the bitmap heap scan, but the peak memory usage
for the discarded bitmap will not.  This is wholly arbitrary because
both bitmaps existed at the same time, side by side, and which one we
keep and which one we throw out is essentially random.

I think we could report the results in a more principled way if we
reported the value for each bitmap *index* scan node rather than each
bitmap *heap* scan node.  However, I'm not sure it's really worth it.
I think what people really care about is knowing whether the bitmap
lossified or not, and generally how much got lossified.  The counts of
exact and lossy pages are sufficient for that, without anything
additional - so I'm inclined to think that the best course of action
might be to remove from the patch everything that's concerned with
trying to measure memory usage and just keep the exact/lossy page
counts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Andres Freund
Date:
On 2014-01-01 21:15:46 -0500, Robert Haas wrote:
> [ sensible reasoning ] However, I'm not sure it's really worth it.
> I think what people really care about is knowing whether the bitmap
> lossified or not, and generally how much got lossified.  The counts of
> exact and lossy pages are sufficient for that, without anything
> additional

Showing the amount of memory currently required could tell you how soon
accurate bitmap scans will turn into lossy scans though. Which is not a
bad thing to know, some kinds of scans (e.g. tsearch over expression
indexes, postgis) can get ridiculously slow once lossy.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Robert Haas
Date:
On Thu, Jan 2, 2014 at 4:27 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-01-01 21:15:46 -0500, Robert Haas wrote:
>> [ sensible reasoning ] However, I'm not sure it's really worth it.
>> I think what people really care about is knowing whether the bitmap
>> lossified or not, and generally how much got lossified.  The counts of
>> exact and lossy pages are sufficient for that, without anything
>> additional
>
> Showing the amount of memory currently required could tell you how soon
> accurate bitmap scans will turn into lossy scans though. Which is not a
> bad thing to know, some kinds of scans (e.g. tsearch over expression
> indexes, postgis) can get ridiculously slow once lossy.

Hmm, interesting.  I have not encountered that myself.  If we want
that, I'm tempted to think that we should display statistics for each
bitmap index scan - but I'd be somewhat inclined to see if we could
get by with the values that are already stored in a TIDBitmap rather
than adding new ones - e.g. show npages (the number of exact entries),
nchunks (the number of lossy entries), and maxentries.  From that, you
can work out the percentage of available entries that were actually
used.  The only thing that's a bit annoying about that is that we'd
probably have to copy those values out of the tid bitmap and into an
executor state node, because the tid bitmap will subsequently get
modified destructively.  But I think that's probably OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
Robert Haas wrote:
> I spent some time looking at this tonight.  I don't think the value that
> is displayed for the bitmap memory tracking will be accurate in complex
> cases.  The bitmap heap scan may sit on top of one or more bitmap-and or
> bitmap-or nodes.  When a bitmap-and operation happens, one of the two
> bitmaps being combined will be thrown out and the number of entries in the
> other map will, perhaps, be decreased.  The peak memory usage for the
> surviving bitmap will be reflected in the number displayed for the bitmap
> heap scan, but the peak memory usage for the discarded bitmap will not.
> This is wholly arbitrary because both bitmaps existed at the same time,
> side by side, and which one we keep and which one we throw out is
essentially
> random.

Thank you for taking time to look at this patch.  The peak memory usage for
the discarded bitmap *can* be reflected in the number displayed for the
bitmap heap scan by the following code in tbm_union() or tbm_intersect():
 tbm_union(TIDBitmap *a, const TIDBitmap *b) {       Assert(!a->iterating);
+       if (a->nentriesPeak < b->nentriesPeak)
+               a->nentriesPeak = b->nentriesPeak;       /* Nothing to do if b is empty */       if (b->nentries == 0)
            return;
 
***************
 tbm_intersect(TIDBitmap *a, const TIDBitmap *b) {       Assert(!a->iterating);
+       if (a->nentriesPeak < b->nentriesPeak)
+               a->nentriesPeak = b->nentriesPeak;       /* Nothing to do if a is empty */       if (a->nentries == 0)
            return;
 
***************

Sorry for the delay.

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Robert Haas
Date:
On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> Robert Haas wrote:
>> I spent some time looking at this tonight.  I don't think the value that
>> is displayed for the bitmap memory tracking will be accurate in complex
>> cases.  The bitmap heap scan may sit on top of one or more bitmap-and or
>> bitmap-or nodes.  When a bitmap-and operation happens, one of the two
>> bitmaps being combined will be thrown out and the number of entries in the
>> other map will, perhaps, be decreased.  The peak memory usage for the
>> surviving bitmap will be reflected in the number displayed for the bitmap
>> heap scan, but the peak memory usage for the discarded bitmap will not.
>> This is wholly arbitrary because both bitmaps existed at the same time,
>> side by side, and which one we keep and which one we throw out is
> essentially
>> random.
>
> Thank you for taking time to look at this patch.  The peak memory usage for
> the discarded bitmap *can* be reflected in the number displayed for the
> bitmap heap scan by the following code in tbm_union() or tbm_intersect():
>
>   tbm_union(TIDBitmap *a, const TIDBitmap *b)
>   {
>         Assert(!a->iterating);
> +       if (a->nentriesPeak < b->nentriesPeak)
> +               a->nentriesPeak = b->nentriesPeak;
>         /* Nothing to do if b is empty */
>         if (b->nentries == 0)
>                 return;
> ***************
>
>   tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
>   {
>         Assert(!a->iterating);
> +       if (a->nentriesPeak < b->nentriesPeak)
> +               a->nentriesPeak = b->nentriesPeak;
>         /* Nothing to do if a is empty */
>         if (a->nentries == 0)
>                 return;
> ***************
>
> Sorry for the delay.

Hmm, fair point.  But I'm still not convinced that we really need to
add extra accounting for this.  What's wrong with just reporting the
number of exact and lossy pages?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
Robert Haas wrote:
> On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>
> wrote:
> > Thank you for taking time to look at this patch.  The peak memory
> > usage for the discarded bitmap *can* be reflected in the number
> > displayed for the bitmap heap scan by the following code in tbm_union()
> or tbm_intersect():

> Hmm, fair point.  But I'm still not convinced that we really need to add
> extra accounting for this.  What's wrong with just reporting the number
> of exact and lossy pages?

No.  I intended to show the desired memory space for a TIDBitmap rather than
the peak memory usage for that TIDBitmap.  And I thought it'd be better for
the latter to be displayed as additional information.  However, I've removed
the functionality for showing the desired memory space due to technical
problems.  Now I should probably remove the functionality for showing the
peak memory usage too.

Yes, as Andres mentioned, showing the peak memory usage is not a bad idea, I
think.  But I start to think it's not necessarily worth complicating the
code ...

If there are no objections of others, I'll remove extra accounting for
showing the peak memory usage.

Thanks,

Best regards,
Etsuro Fujita




Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
"Etsuro Fujita"
Date:
I wrote:
> Robert Haas wrote:
> > Hmm, fair point.  But I'm still not convinced that we really need to
> > add extra accounting for this.  What's wrong with just reporting the
> > number of exact and lossy pages?

> No.  I intended to show the desired memory space for a TIDBitmap rather
> than the peak memory usage for that TIDBitmap.  And I thought it'd be
better
> for the latter to be displayed as additional information.  However, I've
> removed the functionality for showing the desired memory space due to
> technical problems.  Now I should probably remove the functionality for
> showing the peak memory usage too.

> Yes, as Andres mentioned, showing the peak memory usage is not a bad idea,
> I think.  But I start to think it's not necessarily worth complicating the
> code ...

> If there are no objections of others, I'll remove extra accounting for
> showing the peak memory usage.

Done.  Please find attached a patch.

Thanks,

Best regards,
Etsuro Fujita

Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

From
Robert Haas
Date:
On Thu, Jan 9, 2014 at 10:57 PM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I wrote:
>> Robert Haas wrote:
>> > Hmm, fair point.  But I'm still not convinced that we really need to
>> > add extra accounting for this.  What's wrong with just reporting the
>> > number of exact and lossy pages?
>
>> No.  I intended to show the desired memory space for a TIDBitmap rather
>> than the peak memory usage for that TIDBitmap.  And I thought it'd be
> better
>> for the latter to be displayed as additional information.  However, I've
>> removed the functionality for showing the desired memory space due to
>> technical problems.  Now I should probably remove the functionality for
>> showing the peak memory usage too.
>
>> Yes, as Andres mentioned, showing the peak memory usage is not a bad idea,
>> I think.  But I start to think it's not necessarily worth complicating the
>> code ...
>
>> If there are no objections of others, I'll remove extra accounting for
>> showing the peak memory usage.
>
> Done.  Please find attached a patch.

Looks good to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company