Thread: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)

While testing bitmap performance, I have observed that some of the
TPCH queries taking huge time in BitmapIndexScan node, when there are
lossy pages.

I suspected 75ae538bc3168bf44475240d4e0487ee2f3bb376 commit, because
prior to that it used to take very less time. So I tested by reverting
suspected commit and problem is solved.

Here is explained analyze result for TPCH query 6 (scale factor 10)

work_mem=10M
shared_buffers=20GB
machine under test: POWER, 4 socket machine

->On Head:

postgres=# \i 6.explain.sql
                                  QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------Limit  (cost=1585507.74..1585507.75 rows=1 width=32)
(actual
time=21626.467..21626.467 rows=1 loops=1)  ->  Aggregate  (cost=1585507.74..1585507.75 rows=1 width=32)
(actual time=21626.466..21626.466 rows=1 loops=1)        ->  Bitmap Heap Scan on lineitem  (cost=299632.60..1579529.48
rows=1195652 width=12) (actual time=9204.770..20910.089 rows=1190658
loops=1)              Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))              Rows Removed by Index Recheck: 27584798
      Heap Blocks: exact=101349 lossy=580141              ->  Bitmap Index Scan on idx_lineitem_shipdate
 
(cost=0.00..299333.68 rows=1195652 width=0) (actual
time=9185.490..9185.490 rows=1190658 loops=
1)                    Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))Planning time: 0.675 msExecution time: 21626.838 ms
(10 rows)


->After reverting Commit: 75ae538bc3168bf44475240d4e0487ee2f3bb376
postgres=# \i 6.explain.sql
                                  QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------Limit  (cost=1585507.74..1585507.75 rows=1 width=32)
(actual
time=12807.293..12807.294 rows=1 loops=1)  ->  Aggregate  (cost=1585507.74..1585507.75 rows=1 width=32)
(actual time=12807.291..12807.291 rows=1 loops=1)        ->  Bitmap Heap Scan on lineitem  (cost=299632.60..1579529.48
rows=1195652 width=12) (actual time=1632.351..12131.552 rows=1190658
loops=1)              Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))              Rows Removed by Index Recheck: 28401743
      Heap Blocks: exact=84860 lossy=596630              ->  Bitmap Index Scan on idx_lineitem_shipdate
 
(cost=0.00..299333.68 rows=1195652 width=0) (actual
time=1613.166..1613.166 rows=1190658 loops=
1)                    Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))Planning time: 0.173 msExecution time: 12807.380 ms
(10 rows)


From above explain analyze result we can see that with commit
75ae538bc3168bf44475240d4e0487ee2f3bb376, Bitmap Index Scan node is
way slower than without this commit.

Perf result:
Head
+   13.12%     0.01%  postgres  postgres            [.] tbm_lossify
+   13.10%    13.10%  postgres  postgres            [.]
tbm_mark_page_lossy
+   12.84%    12.82%  postgres  postgres            [.]
slot_deform_tuple
+    6.94%     0.00%  postgres  postgres            [.] _bt_next
+    6.94%     0.02%  postgres  postgres            [.] _bt_steppage
+    6.55%     0.05%  postgres  postgres            [.] _bt_readpage
+    6.41%     1.00%  postgres  postgres            [.] _bt_checkkeys

After Reverting 75ae538bc3168bf44475240d4e0487ee2f3bb376:
+    0.71%     0.71%  postgres  postgres            [.] cmp_var_common
+    0.62%     0.02%  postgres  postgres            [.] tbm_lossify
+    0.62%     0.62%  postgres  postgres            [.] AllocSetReset
+    0.60%     0.11%  postgres  [kernel.kallsyms]   [k] sys_read
+    0.59%     0.10%  postgres  postgres            [.]
advance_transition_function

I think in new hash implementation, delete from pagetable have severe
performance issue.

Note: If I set work_mem=100MB (no lossy page) then performance is fine.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Hi Dilip,

thanks for noticing that one.

On 2016-11-09 21:17:22 +0530, Dilip Kumar wrote:
> While testing bitmap performance, I have observed that some of the
> TPCH queries taking huge time in BitmapIndexScan node, when there are
> lossy pages.

It's not really related to lossy pages, it's just that due to deletions
/ insertions a lot more "shapes" of the hashtable are hit.

I suspect that this is with parallelism disabled? Without that the query
ends up using a parallel sequential scan for me.


I've a working fix for this, and for a similar issue Robert found. I'm
still playing around with it, but basically the fix is to make the
growth policy a bit more adaptive.

Greetings,

Andres Freund



On Wed, Nov 16, 2016 at 12:58 AM, Andres Freund <andres@anarazel.de> wrote:
> It's not really related to lossy pages, it's just that due to deletions
> / insertions a lot more "shapes" of the hashtable are hit.
Okay..
>
> I suspect that this is with parallelism disabled? Without that the query
> ends up using a parallel sequential scan for me.
>

It's with max_parallel_worker_per_gather=2, I always noticed that Q6
takes parallel seq scan only for
max_parallel_worker_per_gather=4 or more..
>
> I've a working fix for this, and for a similar issue Robert found. I'm
> still playing around with it, but basically the fix is to make the
> growth policy a bit more adaptive.

Okay.. Thanks.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
> I've a working fix for this, and for a similar issue Robert found. I'm
> still playing around with it, but basically the fix is to make the
> growth policy a bit more adaptive.

Any chance you can post a patch soon?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
> > I've a working fix for this, and for a similar issue Robert found. I'm
> > still playing around with it, but basically the fix is to make the
> > growth policy a bit more adaptive.
>
> Any chance you can post a patch soon?

Here's my WIP series addressing this and related problems. With this
we're again noticeably faster than the dynahash implementation, in both
the case here, and the query you brought up over IM.

This definitely needs some more TLC, but the general approach seems
good. I particularly like that it apparently allows us to increase the
default fillfactor without much downside according to my measurements.

Greetings,

Andres Freund

Attachment
On Wed, Nov 23, 2016 at 2:03 PM, Andres Freund <andres@anarazel.de> wrote:
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

This patch fixes the performance problem which I have reported
upthread. Time taken in Bitmap Index Scan is back to normal which was
drastically improved by 75ae538bc3168bf44475240d4e0487ee2f3bb376
commit.


--------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------Limit  (cost=1583130.59..1583130.60 rows=1 width=32)
(actual
time=14041.922..14041.923 rows=1 loops=1)  ->  Aggregate  (cost=1583130.59..1583130.60 rows=1 width=32)
(actual time=14041.921..14041.921 rows=1 loops=1)        ->  Bitmap Heap Scan on lineitem  (cost=296012.30..1577117.95
rows=1202528 width=12) (actual time=1711.899..13332.892 rows=1190658
loops=1)              Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND
(l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone) AND
(l_discount >= 0.07
) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))              Rows Removed by Index Recheck: 27585320
      Heap Blocks: exact=101349 lossy=580141              ->  Bitmap Index Scan on idx_lineitem_shipdate
 
(cost=0.00..295711.67 rows=1202528 width=0) (actual
time=1689.478..1689.478 rows=1190658 loops=
1)                    Index Cond: ((l_shipdate >= '1995-01-01'::date)
AND (l_shipdate < '1996-01-01 00:00:00'::timestamp without time zone)
AND (l_discount >=
0.07) AND (l_discount <= 0.09) AND (l_quantity < '25'::numeric))Planning time: 0.292 msExecution time: 14041.968 ms
(10 rows)
    revenue
-----------------1784930119.2454
(1 row)


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
>> > I've a working fix for this, and for a similar issue Robert found. I'm
>> > still playing around with it, but basically the fix is to make the
>> > growth policy a bit more adaptive.
>>
>> Any chance you can post a patch soon?
>
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

Are you going to commit something here?  At least enough to make
Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
time?  Because the fact that it doesn't really sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
>> > I've a working fix for this, and for a similar issue Robert found. I'm
>> > still playing around with it, but basically the fix is to make the
>> > growth policy a bit more adaptive.
>>
>> Any chance you can post a patch soon?
>
> Here's my WIP series addressing this and related problems. With this
> we're again noticeably faster than the dynahash implementation, in both
> the case here, and the query you brought up over IM.
>
> This definitely needs some more TLC, but the general approach seems
> good. I particularly like that it apparently allows us to increase the
> default fillfactor without much downside according to my measurements.

Are you going to commit something here?  At least enough to make
Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
time?  Because the fact that it doesn't really sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Thu, Dec 8, 2016 at 5:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Nov 23, 2016 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-11-18 08:00:40 -0500, Robert Haas wrote:
>>> On Tue, Nov 15, 2016 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:
>>> > I've a working fix for this, and for a similar issue Robert found. I'm
>>> > still playing around with it, but basically the fix is to make the
>>> > growth policy a bit more adaptive.
>>>
>>> Any chance you can post a patch soon?
>>
>> Here's my WIP series addressing this and related problems. With this
>> we're again noticeably faster than the dynahash implementation, in both
>> the case here, and the query you brought up over IM.
>>
>> This definitely needs some more TLC, but the general approach seems
>> good. I particularly like that it apparently allows us to increase the
>> default fillfactor without much downside according to my measurements.
>
> Are you going to commit something here?  At least enough to make
> Finalize HashAgg -> Gather -> Partial HashAgg terminate in finite
> time?  Because the fact that it doesn't really sucks.

I took a look at Andres's patches today and saw that they can't really
be applied as-is, because they "temporarily" change the master's
ParallelWorkerNumber but have no provision for restoring it after an
ERROR.  I think that approach isn't right anyway; it seems to me that
what we want to do is set hash_iv based on we're in a Partial HashAgg,
not whether we're in a parallel worker.  For instance, if we're
somehow in a nodeSubplan.c there's no need for this just because we
happen to be in a worker -- I think.  That led me to develop the
attached patch.

This may not be perfect, but it causes TPC-H Q18 to complete instead
of hanging forever, so I'm going to commit it RSN unless there are
loud objections combined with promising steps toward some alternative
fix.  It's been over a month since these problems were reported, and
it is not good that the tree has been in a broken state for that
entire time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
> I took a look at Andres's patches today and saw that they can't really
> be applied as-is, because they "temporarily" change the master's
> ParallelWorkerNumber but have no provision for restoring it after an
> ERROR.

Yea, that's not quite right. Although I'm not sure it really matters
that much, given that we're not continuing execution after an error. We
could just reset it at appropriate points - but that seems ugly.


> I think that approach isn't right anyway; it seems to me that
> what we want to do is set hash_iv based on we're in a Partial HashAgg,
> not whether we're in a parallel worker.  For instance, if we're
> somehow in a nodeSubplan.c there's no need for this just because we
> happen to be in a worker -- I think.  That led me to develop the
> attached patch.

Hm, I'd rather have this be a more general solution - it doesn't seem
unlikely that other parts of the system want to know whether they're
currently executing a parallel portion of the plan. E.g. it really
should be forbidden to do modifications even in the parallel portion of
the plan the master executes.  Right now that'd escape notice, which I
don't think is good.


> This may not be perfect, but it causes TPC-H Q18 to complete instead
> of hanging forever, so I'm going to commit it RSN unless there are
> loud objections combined with promising steps toward some alternative
> fix.  It's been over a month since these problems were reported, and
> it is not good that the tree has been in a broken state for that
> entire time.

Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
large portion of my brain.

I really hope to come up with a new version of the patch that does the
"smarter" expansion soon.

Greetings,

Andres Freund



On Wed, Dec 14, 2016 at 10:08 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-14 22:00:45 -0500, Robert Haas wrote:
>> I took a look at Andres's patches today and saw that they can't really
>> be applied as-is, because they "temporarily" change the master's
>> ParallelWorkerNumber but have no provision for restoring it after an
>> ERROR.
>
> Yea, that's not quite right. Although I'm not sure it really matters
> that much, given that we're not continuing execution after an error. We
> could just reset it at appropriate points - but that seems ugly.

Yes.

>> I think that approach isn't right anyway; it seems to me that
>> what we want to do is set hash_iv based on we're in a Partial HashAgg,
>> not whether we're in a parallel worker.  For instance, if we're
>> somehow in a nodeSubplan.c there's no need for this just because we
>> happen to be in a worker -- I think.  That led me to develop the
>> attached patch.
>
> Hm, I'd rather have this be a more general solution - it doesn't seem
> unlikely that other parts of the system want to know whether they're
> currently executing a parallel portion of the plan. E.g. it really
> should be forbidden to do modifications even in the parallel portion of
> the plan the master executes.  Right now that'd escape notice, which I
> don't think is good.

Actually, that wouldn't escape notice.  You can test with
IsInParallelMode().  That's entered before beginning execution of the
plan at the outermost level - see ExecutePlan().

>> This may not be perfect, but it causes TPC-H Q18 to complete instead
>> of hanging forever, so I'm going to commit it RSN unless there are
>> loud objections combined with promising steps toward some alternative
>> fix.  It's been over a month since these problems were reported, and
>> it is not good that the tree has been in a broken state for that
>> entire time.
>
> Yea, sorry for that :(. Unfortunately the JIT stuff currently occupies a
> large portion of my brain.
>
> I really hope to come up with a new version of the patch that does the
> "smarter" expansion soon.

I've got no problem with that at all, but I want to unbreak things
more or less immediately and then you/we can further improve it later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Wed, Dec 14, 2016 at 11:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I've got no problem with that at all, but I want to unbreak things
> more or less immediately and then you/we can further improve it later.

Committed, although I realize now that doesn't fix Dilip's problem,
only my (somewhat different) problem.  To fix his issue, we need
something like your 0001.  Are you going to polish that up soon here?
I kinda want to get the regressions we've introduced fixed here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 2016-12-16 10:12:42 -0500, Robert Haas wrote:
> On Wed, Dec 14, 2016 at 11:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > I've got no problem with that at all, but I want to unbreak things
> > more or less immediately and then you/we can further improve it later.
> 
> Committed

Thanks. Although "Fix drastic slowdown when ..." or so might have been a
less confusing commit message ;)

> , although I realize now that doesn't fix Dilip's problem,
> only my (somewhat different) problem.

Indeed. And the fix for Dilip's thing also fixes your issue, albeit what
you committed still helps noticeably (even with the old hashtable).

> To fix his issue, we need something like your 0001.  Are you going to
> polish that up soon here?

Yes.


Andres



On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
> > To fix his issue, we need something like your 0001.  Are you going to
> > polish that up soon here?
>
> Yes.

I've two versions of a fix for this. One of them basically increases the
"spread" of buckets when the density goes up too much. It does so by
basically shifting the bucket number to the left (e.g. only every second
bucket can be the "primary" bucket for a hash value).  The other
basically just replaces the magic constants in my previous POC patch
with slightly better documented constants.  For me the latter works just
as well as the former, even though aesthetically/theoretically the
former sounds better.  I'm inclined to commit the latter, at least for
now.

Andres



On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
>> > To fix his issue, we need something like your 0001.  Are you going to
>> > polish that up soon here?
>>
>> Yes.
>
> I've two versions of a fix for this. One of them basically increases the
> "spread" of buckets when the density goes up too much. It does so by
> basically shifting the bucket number to the left (e.g. only every second
> bucket can be the "primary" bucket for a hash value).  The other
> basically just replaces the magic constants in my previous POC patch
> with slightly better documented constants.  For me the latter works just
> as well as the former, even though aesthetically/theoretically the
> former sounds better.  I'm inclined to commit the latter, at least for
> now.

Did you intend to attach the patches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 2017-01-06 11:01:32 -0500, Robert Haas wrote:
> On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
> >> > To fix his issue, we need something like your 0001.  Are you going to
> >> > polish that up soon here?
> >>
> >> Yes.
> >
> > I've two versions of a fix for this. One of them basically increases the
> > "spread" of buckets when the density goes up too much. It does so by
> > basically shifting the bucket number to the left (e.g. only every second
> > bucket can be the "primary" bucket for a hash value).  The other
> > basically just replaces the magic constants in my previous POC patch
> > with slightly better documented constants.  For me the latter works just
> > as well as the former, even though aesthetically/theoretically the
> > former sounds better.  I'm inclined to commit the latter, at least for
> > now.
> 
> Did you intend to attach the patches?

No, I hadn't. You're interested in the "spreading" version?

Andres



On Fri, Jan 6, 2017 at 11:06 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-01-06 11:01:32 -0500, Robert Haas wrote:
>> On Fri, Jan 6, 2017 at 10:43 AM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-12-16 09:34:31 -0800, Andres Freund wrote:
>> >> > To fix his issue, we need something like your 0001.  Are you going to
>> >> > polish that up soon here?
>> >>
>> >> Yes.
>> >
>> > I've two versions of a fix for this. One of them basically increases the
>> > "spread" of buckets when the density goes up too much. It does so by
>> > basically shifting the bucket number to the left (e.g. only every second
>> > bucket can be the "primary" bucket for a hash value).  The other
>> > basically just replaces the magic constants in my previous POC patch
>> > with slightly better documented constants.  For me the latter works just
>> > as well as the former, even though aesthetically/theoretically the
>> > former sounds better.  I'm inclined to commit the latter, at least for
>> > now.
>>
>> Did you intend to attach the patches?
>
> No, I hadn't. You're interested in the "spreading" version?

I honestly have no opinion.  If you're confident that you know what
you want to do, it's fine with me if you just do it.  If you want my
opinion I probably need to see both patches and compare.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company