Thread: Extra cost of "lossy mode" Bitmap Scan plan

Extra cost of "lossy mode" Bitmap Scan plan

From
higepon
Date:
Hi.

I found the current planner doesn't care about "lossy mode" on Bitmap Scan.
I think costs of following two plans are equal now.

(a) Having enough work_mem => normal Bitmap Scan.

(b) Having less work_mem than estimated rows => Bitmap Scan with "lossy mode".   Slower than (a).

So, sometimes the planner makes a wrong choice.
For example, on some queries the planner doesn't choose a faster Index Scan plan
but a much slower Bitmap Scan (in actually lossy).

My understanding is that we can know whether the plan is lossy or not
like following.
int tbm_maxentries = work_mem * 1024L;if (estimatedNumLows < tbm_maxentries) {   /* not lossy */} else {   /* lossy :
wemay add some extra costs to total costs */}
 

Any ideas how to do this?

Best regards,

-----
MINOWA Taro (Higepon)

Cybozu Labs, Inc.

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Itagaki Takahiro
Date:
higepon <higepon@gmail.com> wrote:

> I found the current planner doesn't care about "lossy mode" on Bitmap Scan.

Good point. I saw the bad behavior on DBT-3 (TPC-H) benchmark before.
Loss-less bitmap scan was faster than seq Scan,
but lossy bitmap scan was slower than seq Scan:
   EXPLAIN ANALYZE SELECT * FROM test WHERE v < 0.2;   -- default       Bitmap Heap Scan on test
(cost=3948.42..11005.77rows=210588 width=8)                                (actual time=47.550..202.925 rows=200142)
--SET work_mem=64 (NOTICE: the cost is same as above!)       Bitmap Heap Scan on test (cost=3948.42..11005.77
rows=210588width=8)                                (actual time=52.057..358.145 rows=200142)   -- SET enable_bitmapscan
=off       Seq Scan on test (cost=0.00..16924.70 rows=210588 width=8)                        (actual
time=0.182..280.450rows=200142)
 

> My understanding is that we can know whether the plan is lossy or not
> like following.

Sure, we need it! Also, I hope some methods to determine whether the
bitmap scan was lossy or not, and how amount of work_mem is required to do
loss-less bitmap scan. For example, a new GUC variable trace_bitmapscan to
print the information of bitmap scan, like trace_sort for sorting.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Extra cost of "lossy mode" Bitmap Scan plan

From
higepon
Date:
Hi.

> Good point. I saw the bad behavior on DBT-3 (TPC-H) benchmark before.
> Loss-less bitmap scan was faster than seq Scan,
> but lossy bitmap scan was slower than seq Scan:

Thank you.

> Sure, we need it! Also, I hope some methods to determine whether the
> bitmap scan was lossy or not, and how amount of work_mem is required to do
> loss-less bitmap scan. For example, a new GUC variable trace_bitmapscan to
> print the information of bitmap scan, like trace_sort for sorting.

I think it is not hard to implement these methods and trace function.

The problem that remains to be solved is
"How much extra cost should we add for lossy mode?".

Any ideas?

Best regards,

-----
MINOWA Taro (Higepon)

Cybozu Labs, Inc.

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Greg Stark
Date:
On Tue, Apr 28, 2009 at 7:45 AM, higepon <higepon@gmail.com> wrote:
> "How much extra cost should we add for lossy mode?".

There is something odd in this concern. Normally people aren't raising
and lowering their work_mem so the comparison would be between a plan
where the planner expects to see n records and a plan where the
planner expects to see n+1 records where n would fit and n+1 wouldn't.

It seems like an awfully narrow corner case where n records would be
faster as a bitmap index scan but n+1 records would be faster as an
index scan because the bitmap becomes lossy. The whole point of bitmap
scans is that they're faster for large scans than index scans.

If the logic you're suggesting would kick in at all it would be for a
narrow range of scan sizes, so the net effect would be to use an index
scan for small scans, then switch to a bitmap scan, then switch back
to an index scan when the bitmap scan becomes lossy, then switch back
to a lossy bitmap scan for large scans. I'm thinking that even if it's
slightly faster when the planner has perfect inputs the downsides of
switching back and forth might not be worth it. Especially when you
consider than the planner is often going on approximate estimates and
it is probably not switching in precisely the right spot.

-- 
greg


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
higepon
Date:
Hi.

> There is something odd in this concern. Normally people aren't raising
> and lowering their work_mem so the comparison would be between a plan
> where the planner expects to see n records and a plan where the
> planner expects to see n+1 records where n would fit and n+1 wouldn't.

I see.

> It seems like an awfully narrow corner case where n records would be
> faster as a bitmap index scan but n+1 records would be faster as an
> index scan because the bitmap becomes lossy. The whole point of bitmap
> scans is that they're faster for large scans than index scans.

Hmm. Is this really a narrow corner case?
At least I and ITAGAKI-san met.
I think the default work_mem value (1MB) may cause these issues.

Do you think there's no need for the planner to know
whether the plan is lossy or not ?
If the planner could know the costs of lossy mode, it may choose
better plans than now.


Or the second option is to show some hints to people who are doing
performance tuning.

(a) Write trace log when bitmap scans falls into "lossy" mode.

(b) Show "lossy" or not on Explain results.


Best regards,

-----
MINOWA Taro (Higepon)

Cybozu Labs, Inc.

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/

On Tue, Apr 28, 2009 at 4:51 PM, Greg Stark <stark@enterprisedb.com> wrote:
> On Tue, Apr 28, 2009 at 7:45 AM, higepon <higepon@gmail.com> wrote:
>> "How much extra cost should we add for lossy mode?".
>
> There is something odd in this concern. Normally people aren't raising
> and lowering their work_mem so the comparison would be between a plan
> where the planner expects to see n records and a plan where the
> planner expects to see n+1 records where n would fit and n+1 wouldn't.
>
> It seems like an awfully narrow corner case where n records would be
> faster as a bitmap index scan but n+1 records would be faster as an
> index scan because the bitmap becomes lossy. The whole point of bitmap
> scans is that they're faster for large scans than index scans.
>
> If the logic you're suggesting would kick in at all it would be for a
> narrow range of scan sizes, so the net effect would be to use an index
> scan for small scans, then switch to a bitmap scan, then switch back
> to an index scan when the bitmap scan becomes lossy, then switch back
> to a lossy bitmap scan for large scans. I'm thinking that even if it's
> slightly faster when the planner has perfect inputs the downsides of
> switching back and forth might not be worth it. Especially when you
> consider than the planner is often going on approximate estimates and
> it is probably not switching in precisely the right spot.
>
> --
> greg
>


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Robert Haas
Date:
On Tue, Apr 28, 2009 at 3:51 AM, Greg Stark <stark@enterprisedb.com> wrote:
> On Tue, Apr 28, 2009 at 7:45 AM, higepon <higepon@gmail.com> wrote:
>> "How much extra cost should we add for lossy mode?".
>
> There is something odd in this concern. Normally people aren't raising
> and lowering their work_mem so the comparison would be between a plan
> where the planner expects to see n records and a plan where the
> planner expects to see n+1 records where n would fit and n+1 wouldn't.
>
> It seems like an awfully narrow corner case where n records would be
> faster as a bitmap index scan but n+1 records would be faster as an
> index scan because the bitmap becomes lossy. The whole point of bitmap
> scans is that they're faster for large scans than index scans.
>
> If the logic you're suggesting would kick in at all it would be for a
> narrow range of scan sizes, so the net effect would be to use an index
> scan for small scans, then switch to a bitmap scan, then switch back
> to an index scan when the bitmap scan becomes lossy, then switch back
> to a lossy bitmap scan for large scans. I'm thinking that even if it's
> slightly faster when the planner has perfect inputs the downsides of
> switching back and forth might not be worth it. Especially when you
> consider than the planner is often going on approximate estimates and
> it is probably not switching in precisely the right spot.

You may be right, but on the other hand, I'm not sure there's any
sense in NOT trying to model the impact of the additional heap
fetches.  Has anyone done a detailed analysis of the actual
performance characteristics of bitmap scans versus index scans, as
opposed to what the optimizer thinks?  We've had long discussions on
this topic in relation to both the posix_fadvise patch and the gin
fast update patch, but I haven't seen a lot of hard numbers.

...Robert


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Apr 28, 2009 at 3:51 AM, Greg Stark <stark@enterprisedb.com> wrote:
>> If the logic you're suggesting would kick in at all it would be for a
>> narrow range of scan sizes,

> You may be right, but on the other hand, I'm not sure there's any
> sense in NOT trying to model the impact of the additional heap
> fetches.

I think it's probably useless.  In the first place, at reasonable values
of work_mem the effect is going to be negligible (in the sense that a
plain indexscan would never win).  In the second place, there isn't any
way to guess the extent of lossiness at plan time --- it depends on how
much the target rows are "clumped" on particular pages.  The planner
hasn't got any stats that would let it guess that, and even if we tried
to collect such stats they'd probably be too unstable to be useful.

There are boatloads of effects that the planner doesn't model.  This
one seems very far down the list of what we should worry about.
        regards, tom lane


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Greg Stark
Date:
On Tue, Apr 28, 2009 at 3:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> You may be right, but on the other hand, I'm not sure there's any
> sense in NOT trying to model the impact of the additional heap
> fetches.

Yeah, the flip side of the argument is that we generally try to do the
best job we can modeling costs and let the arithmetic work out how it
however it does because you never know what kind of wacky situations
will arise planning queries and and the better the estimates the
better your chance of coming up with a good plan.

For example the planner may have other join orders which allow it to
avoid accessing those records entirely. So the comparison with a
nested loop might not be the only comparison that matters. It might be
a case of whether to run a bitmap scan against this table or some scan
against another table to drive the join.

I have been running benchmarks comparing bitmap heap scans against
index scans amongst other comparisons. I haven't done CVS head yet but
on an older version I'm seeing with effective_io_concurrency set to 0
scanning 1000 random tuples throughout a 130G table (one searched
tuple per page) on a machine with 64G of ram after repeated executions
index scans settle down to about 245s vs 205s for bitmap scans (for
100 iterations). So they're about 16% faster for this use case.

Incidentally with effective_io_concurrency set to 30 on this 30-drive
raid the bitmap scans go down to 17s :)

-- 
greg


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
higepon
Date:
Hi.
I apologize for this late reply.

Tom Lane wrote:
> I think it's probably useless.  In the first place, at reasonable values
> of work_mem the effect is going to be negligible (in the sense that a
> plain indexscan would never win).  In the second place, there isn't any
> way to guess the extent of lossiness at plan time --- it depends on how
> much the target rows are "clumped" on particular pages.  The planner
> hasn't got any stats that would let it guess that, and even if we tried
> to collect such stats they'd probably be too unstable to be useful.

Thank you. I understand your two points.

But in this context, I can't understand why Bitmap Scan is faster than
Index Scan.

If there are many where conditions like following,
I can understand why Bitmap Scan is faster than Index Scan.select * from emp where emp_no > 10000 and sex = 1 and
salary> 5000;
 
Bitmap Scan merges each result bitmaps and after the merge read tuples.


But I don't understand this case.select * from emp where emp_no > 10000;
Is Bitmap Scan is faster than Index Scan in this case ?
If so, why ?

My understanding that both Index Scan and Bitmap Scan scans index on
emp_no and they are almost same cost.
In addition Bitmap Scan creates a bitmap. So Bitmap Scan is a little bit slower?

Best regards,

-----
MINOWA Taro (Higepon)

Cybozu Labs, Inc.

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/    


On Tue, Apr 28, 2009 at 11:36 PM, Greg Stark <stark@enterprisedb.com> wrote:
> On Tue, Apr 28, 2009 at 3:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> You may be right, but on the other hand, I'm not sure there's any
>> sense in NOT trying to model the impact of the additional heap
>> fetches.
>
> Yeah, the flip side of the argument is that we generally try to do the
> best job we can modeling costs and let the arithmetic work out how it
> however it does because you never know what kind of wacky situations
> will arise planning queries and and the better the estimates the
> better your chance of coming up with a good plan.
>
> For example the planner may have other join orders which allow it to
> avoid accessing those records entirely. So the comparison with a
> nested loop might not be the only comparison that matters. It might be
> a case of whether to run a bitmap scan against this table or some scan
> against another table to drive the join.
>
> I have been running benchmarks comparing bitmap heap scans against
> index scans amongst other comparisons. I haven't done CVS head yet but
> on an older version I'm seeing with effective_io_concurrency set to 0
> scanning 1000 random tuples throughout a 130G table (one searched
> tuple per page) on a machine with 64G of ram after repeated executions
> index scans settle down to about 245s vs 205s for bitmap scans (for
> 100 iterations). So they're about 16% faster for this use case.
>
> Incidentally with effective_io_concurrency set to 30 on this 30-drive
> raid the bitmap scans go down to 17s :)
>
> --
> greg
>


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
Tom Lane
Date:
higepon <higepon@gmail.com> writes:
> But I don't understand this case.
>  select * from emp where emp_no > 10000;
> Is Bitmap Scan is faster than Index Scan in this case ?

Yes, very probably, if a lot of tuples are being retrieved.  A bitmap
scan will fetch the tuples from the heap in a more or less optimal
fashion --- for instance, each page is read only once.  Index scan will
result in a random sequence of accesses to the heap.  (Of course, it
might have some order if the index is well correlated with the heap
order, but most of the time that's not true.)
        regards, tom lane


Re: Extra cost of "lossy mode" Bitmap Scan plan

From
higepon
Date:
Hi.

> fashion --- for instance, each page is read only once.  Index scan will
> result in a random sequence of accesses to the heap.  (Of course, it
> might have some order if the index is well correlated with the heap
> order, but most of the time that's not true.)

Thank you. I finally understand the difference.

Cheers.