Thread: Extremely slow when query uses GIST exclusion index

Extremely slow when query uses GIST exclusion index

From

David

Date:

29 August 2018, 06:31:10

Hi. My databases make heavy use of timestamp ranges, and they rely on GIST exclusion constraints to ensure that the ranges are disjoint. I've noticed that queries that hit the GIST indexes are EXTREMELY slow, and the queries run much faster if I make trivial changes to avoid the GIST indexes.

Here's the setup for a test case. (Timings were collected on PostgreSQL 9.5.4 on x86_64-pc-linux-gnu, Intel Xeon E5 in a VM with 3.5 GB RAM):

CREATE TABLE app (
pk SERIAL PRIMARY KEY,
group_id TEXT NOT NULL,
app_time TIMESTAMPTZ NOT NULL
);

CREATE TABLE group_span (
pk SERIAL PRIMARY KEY,
group_id TEXT NOT NULL,
valid_period TSTZRANGE NOT NULL,
EXCLUDE USING GIST (group_id WITH =, valid_period WITH &&)
);

CREATE TABLE member_span (
pk SERIAL PRIMARY KEY,
member_id TEXT NOT NULL,
group_id TEXT NOT NULL,
valid_period TSTZRANGE NOT NULL,
EXCLUDE USING GIST
(member_id WITH =, group_id WITH =, valid_period WITH &&)
);

-- Fill tables with some random data

INSERT INTO app (group_id, app_time)
SELECT
MD5(CONCAT(GENERATE_SERIES(1, 10000), RANDOM())),
DATE_TRUNC('month', TIMESTAMPTZ '2000-01-01' +
INTERVAL '3 years' * RANDOM());

-- Give groups a 1-year span, and give some groups a 2nd-year span:
INSERT INTO group_span (group_id, valid_period)
(SELECT
group_id,
TSTZRANGE(app_time, app_time + INTERVAL '1 year')
FROM app)
UNION ALL
(SELECT
group_id,
TSTZRANGE(app_time + INTERVAL '1 year',
app_time + INTERVAL '2 year')
FROM app LIMIT 2000);

-- Create members with a random span within their group_span:
INSERT INTO member_span (member_id, group_id, valid_period)
SELECT
MD5(RANDOM()::TEXT),
group_id,
TSTZRANGE(
LOWER(valid_period),
UPPER(valid_period) - DATE_TRUNC(
'days',
(UPPER(valid_period) - LOWER(valid_period)) * RANDOM()
)
)
FROM group_span;

Given this setup, here's a query that hits the GIST exclusion index on the "member_span" table. It takes 38 sec on my machine:

SELECT *

FROM app

JOIN group_span ON

app.group_id = group_span.group_id AND

app.app_time <@ group_span.valid_period

JOIN member_span ON

group_span.group_id = member_span.group_id AND

group_span.valid_period && member_span.valid_period;

Here's the query plan for that query:

Nested Loop (cost=319.27..776.39 rows=1 width=196) (actual time=15.370..38406.466 rows=10000 loops=1)

Join Filter: (app.group_id = member_span.group_id)

-> Hash Join (cost=319.00..771.00 rows=12 width=104) (actual time=5.790..130.613 rows=10000 loops=1)

Hash Cond: (group_span.group_id = app.group_id)

Join Filter: (app.app_time <@ group_span.valid_period)

Rows Removed by Join Filter: 2000

-> Seq Scan on group_span (cost=0.00..257.00 rows=12000 width=59) (actual time=0.005..16.282 rows=12000 loops=1)

-> Hash (cost=194.00..194.00 rows=10000 width=45) (actual time=5.758..5.758 rows=10000 loops=1)

Buckets: 16384 Batches: 1 Memory Usage: 910kB

-> Seq Scan on app (cost=0.00..194.00 rows=10000 width=45) (actual time=0.002..2.426 rows=10000 loops=1)

-> Index Scan using member_span_member_id_group_id_valid_period_excl on member_span (cost=0.28..0.44 rows=1 width=92) (actual time=1.988..3.817 rows=1 loops=10000)

Index Cond: ((group_id = group_span.group_id) AND (group_span.valid_period && valid_period))

Planning time: 0.784 ms

Execution time: 38410.227 ms

We can make a small tweak to the query to make it complicated enough that the execution planner avoids the GIST index. In this particular case, we can replace "app.app_time <@ group_span.valid_period" with the equivalent "app.app_time >= LOWER(group_span.valid_period) AND app.app_time < UPPER(group_span.valid_period)". This equivalent query is MUCH faster:

SELECT *

FROM app

JOIN group_span ON

app.group_id = group_span.group_id AND

app.app_time >= LOWER(group_span.valid_period) AND

app.app_time < UPPER(group_span.valid_period)

JOIN member_span ON

group_span.group_id = member_span.group_id AND

group_span.valid_period && member_span.valid_period;

It only takes 86 ms, even though it's doing 3 seq scans instead of using the index:

Hash Join (cost=953.71..1186.65 rows=8 width=196) (actual time=58.364..84.706 rows=10000 loops=1)

Hash Cond: (app.group_id = group_span.group_id)

Join Filter: ((app.app_time >= lower(group_span.valid_period)) AND (app.app_time < upper(group_span.valid_period)))

Rows Removed by Join Filter: 2000

-> Seq Scan on app (cost=0.00..194.00 rows=10000 width=45) (actual time=0.007..2.391 rows=10000 loops=1)

-> Hash (cost=952.81..952.81 rows=72 width=151) (actual time=58.343..58.343 rows=12000 loops=1)

Buckets: 16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 2285kB

-> Hash Join (cost=407.00..952.81 rows=72 width=151) (actual time=15.048..44.103 rows=12000 loops=1)

Hash Cond: (member_span.group_id = group_span.group_id)

Join Filter: (group_span.valid_period && member_span.valid_period)

Rows Removed by Join Filter: 4000

-> Seq Scan on member_span (cost=0.00..305.00 rows=12000 width=92) (actual time=0.001..3.865 rows=12000 loops=1)

-> Hash (cost=257.00..257.00 rows=12000 width=59) (actual time=15.020..15.020 rows=12000 loops=1)

Buckets: 16384 Batches: 1 Memory Usage: 1195kB

-> Seq Scan on group_span (cost=0.00..257.00 rows=12000 width=59) (actual time=0.003..2.863 rows=12000 loops=1)

Planning time: 0.651 ms

Execution time: 86.721 ms

For now, I can bypass the GIST index by avoiding range operators in my queries. But why is the GIST index so slow?

Re: Extremely slow when query uses GIST exclusion index

From

Andreas Kretschmer

Date:

29 August 2018, 13:50:43


Am 29.08.2018 um 05:31 schrieb David:
> For now, I can bypass the GIST index by avoiding range operators in my 
> queries. But why is the GIST index so slow?

your GiST-Index contains (member_id,group_id,valid_period), but your 
query is only on the latter 2 fields.


test=*# create index test_index on member_span using gist 
(group_id,valid_period);
CREATE INDEX
test=*# commit;
COMMIT
test=# explain analyse SELECT *
FROM app
JOIN group_span ON
   app.group_id = group_span.group_id AND
   app.app_time <@ group_span.valid_period
JOIN member_span ON
   group_span.group_id = member_span.group_id AND
   group_span.valid_period && member_span.valid_period;
                                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=319.27..776.18 rows=1 width=196) (actual 
time=3.156..334.963 rows=10000 loops=1)
    Join Filter: (app.group_id = member_span.group_id)
    ->  Hash Join  (cost=319.00..771.00 rows=12 width=104) (actual 
time=3.100..14.040 rows=10000 loops=1)
          Hash Cond: (group_span.group_id = app.group_id)
          Join Filter: (app.app_time <@ group_span.valid_period)
          Rows Removed by Join Filter: 2000
          ->  Seq Scan on group_span  (cost=0.00..257.00 rows=12000 
width=59) (actual time=0.013..1.865 rows=12000 loops=1)
          ->  Hash  (cost=194.00..194.00 rows=10000 width=45) (actual 
time=3.037..3.037 rows=10000 loops=1)
                Buckets: 16384  Batches: 1  Memory Usage: 910kB
                ->  Seq Scan on app  (cost=0.00..194.00 rows=10000 
width=45) (actual time=0.010..1.201 rows=10000 loops=1)
    ->  Index Scan using test_index on member_span (cost=0.28..0.42 
rows=1 width=92) (actual time=0.027..0.031 rows=1 loops=10000)
          Index Cond: ((group_id = group_span.group_id) AND 
(group_span.valid_period && valid_period))
  Planning time: 2.160 ms
  Execution time: 335.820 ms
(14 rows)

test=*#


better?

Okay, other solution. The problem is the nested loop, we can disable that:


test=*# set enable_nestloop to false;
SET
test=*# explain analyse SELECT *
FROM app
JOIN group_span ON
   app.group_id = group_span.group_id AND
   app.app_time <@ group_span.valid_period
JOIN member_span ON
   group_span.group_id = member_span.group_id AND
   group_span.valid_period && member_span.valid_period;
                                                            QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=771.15..1121.33 rows=1 width=196) (actual 
time=23.291..32.028 rows=10000 loops=1)
    Hash Cond: (member_span.group_id = app.group_id)
    Join Filter: (group_span.valid_period && member_span.valid_period)
    Rows Removed by Join Filter: 2000
    ->  Seq Scan on member_span  (cost=0.00..305.00 rows=12000 width=92) 
(actual time=0.019..1.577 rows=12000 loops=1)
    ->  Hash  (cost=771.00..771.00 rows=12 width=104) (actual 
time=23.254..23.254 rows=10000 loops=1)
          Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  
Memory Usage: 1486kB
          ->  Hash Join  (cost=319.00..771.00 rows=12 width=104) (actual 
time=7.968..18.951 rows=10000 loops=1)
                Hash Cond: (group_span.group_id = app.group_id)
                Join Filter: (app.app_time <@ group_span.valid_period)
                Rows Removed by Join Filter: 2000
                ->  Seq Scan on group_span  (cost=0.00..257.00 
rows=12000 width=59) (actual time=0.010..2.068 rows=12000 loops=1)
                ->  Hash  (cost=194.00..194.00 rows=10000 width=45) 
(actual time=7.900..7.900 rows=10000 loops=1)
                      Buckets: 16384  Batches: 1  Memory Usage: 910kB
                      ->  Seq Scan on app  (cost=0.00..194.00 rows=10000 
width=45) (actual time=0.011..3.165 rows=10000 loops=1)
  Planning time: 1.241 ms
  Execution time: 32.676 ms
(17 rows)

test=*#






Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Re: Extremely slow when query uses GIST exclusion index

From

Andreas Kretschmer

Date:

29 August 2018, 14:25:55


Am 29.08.2018 um 12:50 schrieb Andreas Kretschmer:
> Okay, other solution. The problem is the nested loop, we can disable 
> that: 

oh, i used PG 10, this time 9.5:

test=# explain analyse SELECT *
FROM app
JOIN group_span ON
   app.group_id = group_span.group_id AND
   app.app_time <@ group_span.valid_period
JOIN member_span ON
   group_span.group_id = member_span.group_id AND
   group_span.valid_period && member_span.valid_period;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=0.55..4740.90 rows=180 width=212) (actual 
time=2.915..17624.676 rows=10000 loops=1)
    Join Filter: (app.group_id = member_span.group_id)
    ->  Nested Loop  (cost=0.28..4472.00 rows=600 width=112) (actual 
time=0.292..347.838 rows=10000 loops=1)
          ->  Seq Scan on app  (cost=0.00..194.00 rows=10000 width=44) 
(actual time=0.012..2.689 rows=10000 loops=1)
          ->  Index Scan using group_span_group_id_valid_period_excl on 
group_span (cost=0.28..0.42 rows=1 width=68) (actual time=0.029..0.033 
rows=1 loops=10000)
                Index Cond: ((group_id = app.group_id) AND (app.app_time 
<@ valid_period))
    ->  Index Scan using 
member_span_member_id_group_id_valid_period_excl on member_span 
(cost=0.28..0.44 rows=1 width=100) (actual time=0.912..1.726 rows=1 
loops=10000)
          Index Cond: ((group_id = group_span.group_id) AND 
(group_span.valid_period && valid_period))
  Planning time: 1.554 ms
  Execution time: 17627.266 ms
(10 rows)

test=*# set enable_nestloop to false;
SET
test=*# explain analyse SELECT *
FROM app
JOIN group_span ON
   app.group_id = group_span.group_id AND
   app.app_time <@ group_span.valid_period
JOIN member_span ON
   group_span.group_id = member_span.group_id AND
   group_span.valid_period && member_span.valid_period;
                                                            QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
  Hash Join  (cost=2383.43..14284.93 rows=180 width=212) (actual 
time=42.440..63.834 rows=10000 loops=1)
    Hash Cond: (app.group_id = member_span.group_id)
    Join Filter: (group_span.valid_period && member_span.valid_period)
    Rows Removed by Join Filter: 2000
    ->  Merge Join  (cost=1928.43..12478.43 rows=600 width=112) (actual 
time=34.068..47.954 rows=10000 loops=1)
          Merge Cond: (app.group_id = group_span.group_id)
          Join Filter: (app.app_time <@ group_span.valid_period)
          Rows Removed by Join Filter: 2000
          ->  Sort  (cost=858.39..883.39 rows=10000 width=44) (actual 
time=15.331..17.104 rows=10000 loops=1)
                Sort Key: app.group_id
                Sort Method: quicksort  Memory: 1166kB
                ->  Seq Scan on app  (cost=0.00..194.00 rows=10000 
width=44) (actual time=0.004..1.070 rows=10000 loops=1)
          ->  Sort  (cost=1070.04..1100.04 rows=12000 width=68) (actual 
time=18.720..20.712 rows=12000 loops=1)
                Sort Key: group_span.group_id
                Sort Method: quicksort  Memory: 2072kB
                ->  Seq Scan on group_span  (cost=0.00..257.00 
rows=12000 width=68) (actual time=0.007..1.396 rows=12000 loops=1)
    ->  Hash  (cost=305.00..305.00 rows=12000 width=100) (actual 
time=8.198..8.198 rows=12000 loops=1)
          Buckets: 16384  Batches: 1  Memory Usage: 1582kB
          ->  Seq Scan on member_span  (cost=0.00..305.00 rows=12000 
width=100) (actual time=0.011..2.783 rows=12000 loops=1)
  Planning time: 0.468 ms
  Execution time: 64.694 ms
(21 rows)

test=*#


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Re: Extremely slow when query uses GIST exclusion index

From

David

Date:

29 August 2018, 21:10:43

Thanks for your help investigating this! Follow-up below:

On Wed, Aug 29, 2018 at 7:25 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:

Okay, other solution. The problem is the nested loop, we can disable that:
test=*# set enable_nestloop to false;

Is it OK to keep this off permanently in production? I thought these settings were just for debugging, and once we've identified the culprit, we're supposed to take other steps (?) to avoid the suboptimal execution plan.

your GiST-Index contains (member_id,group_id,valid_period), but your query is only on the latter 2 fields.

Yeah, I didn't really want GIST index in the first place -- PostgreSQL created it automatically as a side effect of the exclusion constraint that I need.

Your suggestion to create *another* GIST index is an interesting workaround. But we've seen that the query runs even faster if we didn't have the GIST index(es) at all. So is there any way to tell the planner to avoid the GIST index altogether?

(Alternatively, could there be a bug that's causing PostgreSQL to underestimate the cost of using the GIST index?)

Nested Loop (cost=319.27..776.18 rows=1 width=196) (actual time=3.156..334.963 rows=10000 loops=1)
Join Filter: (app.group_id = member_span.group_id)
-> Hash Join (cost=319.00..771.00 rows=12 width=104) (actual time=3.100..14.040 rows=10000 loops=1)

Hm, also, it looks like one of the oddities of this query is that PostgreSQL is severely underestimating the cardinality of the join. It seems to think that the join will result in only 1 row, when the join actually produces 10,000 rows. Maybe that's why the planner thinks that using the GIST index is cheap? (I.e., the planner thought that it would only need to do 1 GIST index lookup, which is cheaper than a sequential scan; but in reality it has to do 10,000 GIST index lookups, which is much more expensive than a scan.) Is there any way to help the planner better estimate how big the join output going to be?

Re: Extremely slow when query uses GIST exclusion index

From

Andreas Kretschmer

Date:

29 August 2018, 21:48:15


Am 29.08.2018 um 20:10 schrieb David:
>
> On Wed, Aug 29, 2018 at 7:25 AM, Andreas Kretschmer 
> <andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:
>
>         Okay, other solution. The problem is the nested loop, we can
>         disable that:
>
>     test=*# set enable_nestloop to false;
>
>
> Is it OK to keep this off permanently in production?

no, but you can switch off/on per session, for instance. and you can it 
set to on after that query.


>
>      Nested Loop  (cost=319.27..776.18 rows=1 width=196) (actual
>     time=3.156..334.963 rows=10000 loops=1)
>        Join Filter: (app.group_id = member_span.group_id)
>     ->  Hash Join  (cost=319.00..771.00 rows=12 width=104) (actual
>     time=3.100..14.040 rows=10000 loops=1)
>
>
> Hm, also, it looks like one of the oddities of this query is that 
> PostgreSQL is severely underestimating the cardinality of the join.

ack, that's the main problem here, i think. It leads to the expensive 
nested loop. Tbh, i don't have a better suggestion now besides the 
workaround with setting nestloop to off.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com