Thread: [External] Join queries slow with predicate, limit, and ordering

[External] Join queries slow with predicate, limit, and ordering

From
Aufar Gilbran
Date:
Hello,

I'm trying to figure out how to optimise 3-table (many-to-many relation) joins
with predicate, limit, and ordering, where one of the tables returns at most one
row.

This is the query that I have right now:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.key = 'status'
      AND tag.value = 'SUCCEEDED'
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It runs very slowly when there are many rows matched on entity table
which, in my
case, there are about 90K rows, even though at most there is only one
row returned
by tag.

Limit  (cost=723.39..723.40 rows=1 width=4) (actual
time=6189.015..6189.242 rows=10 loops=1)
  Output: entity.id
  Buffers: shared hit=411886 read=31282
  ->  Sort  (cost=723.39..723.40 rows=1 width=4) (actual
time=6188.999..6189.059 rows=10 loops=1)
        Output: entity.id
        Sort Key: entity.id DESC
        Sort Method: top-N heapsort  Memory: 25kB
        Buffers: shared hit=411886 read=31282
        ->  Nested Loop  (cost=1.28..723.38 rows=1 width=4) (actual
time=0.153..5590.717 rows=89222 loops=1)
              Output: entity.id
              Buffers: shared hit=411886 read=31282
              ->  Nested Loop  (cost=0.86..721.98 rows=3 width=8)
(actual time=0.108..1851.707 rows=89222 loops=1)
                    Output: entity_tag.entity_id
                    Buffers: shared hit=65146 read=20646
                    ->  Index Scan using tag_key_value_key on
public.tag  (cost=0.43..8.45 rows=1 width=4) (actual time=0.043..0.061
rows=1 loops=1)
                          Output: tag.id, tag.key, tag.value, tag.created_at
                          Index Cond: (((tag.key)::text =
'status'::text) AND ((tag.value)::text = 'SUCCEEDED'::text))
                          Buffers: shared hit=1 read=3
                    ->  Index Only Scan using
entity_tag_tag_id_entity_id_idx on public.entity_tag
(cost=0.43..711.53 rows=201 width=16) (actual time=0.035..756.829
rows=89222 loops=1)
                          Output: entity_tag.tag_id, entity_tag.entity_id
                          Index Cond: (entity_tag.tag_id = tag.id)
                          Heap Fetches: 89222
                          Buffers: shared hit=65145 read=20643
              ->  Index Scan using entity_pkey on public.entity
(cost=0.42..0.46 rows=1 width=4) (actual time=0.010..0.017 rows=1
loops=89222)
                    Output: entity.id, entity.entity_id, entity.type,
entity.created_at
                    Index Cond: (entity.id = entity_tag.entity_id)
                    Filter: ((entity.type)::text = 'execution'::text)
                    Buffers: shared hit=346740 read=10636
Planning time: 0.817 ms
Execution time: 6189.419 ms

Both tag_key_value_key and entity_tag_tag_id_entity_id_idx is a UNIQUE
constraint on tag(key,value) and entity_tag(tag_id, entity_id) respectively.

It seems to me that PostgreSQL runs the nested loop against all of the 90K
records because it wants to sort the result before limiting the result. It
doesn't take into account of the UNIQUE constraint imposed on the table and
thinks that the join being done inside the subquery will change the ordering of
entity_id returned by the subquery, thus prompting the sort.

I believe with how the index sorted, it should be able to just scan the index
backwards because at most only one tag_id will be returned. When I tried
changing the predicate here to filter by ID with the following query:

-- This runs very fast
SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 24
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

and it's blazing fast. This time PostgreSQL seems to know that the join inside
the subqery won't change the ordering of entity_id returned by the subquery, as
seen in the following query explanation:

Limit  (cost=1.28..1025.56 rows=10 width=4) (actual time=0.144..0.276
rows=1 loops=1)
  Output: entity.id
  Buffers: shared hit=12
  ->  Nested Loop  (cost=1.28..1537.70 rows=15 width=4) (actual
time=0.125..0.238 rows=1 loops=1)
        Output: entity.id
        Buffers: shared hit=12
        ->  Nested Loop  (cost=0.86..1529.06 rows=15 width=12) (actual
time=0.057..0.116 rows=1 loops=1)
              Output: entity_tag.tag_id, entity.id
              Buffers: shared hit=8
              ->  Index Only Scan Backward using
entity_tag_tag_id_entity_id_idx on public.entity_tag
(cost=0.43..454.82 rows=128 width=16) (actual time=0.018..0.038 rows=1
loops=1)
                    Output: entity_tag.tag_id, entity_tag.entity_id
                    Index Cond: (entity_tag.tag_id = 24)
                    Heap Fetches: 1
                    Buffers: shared hit=4
              ->  Index Scan using entity_pkey on public.entity
(cost=0.42..8.38 rows=1 width=4) (actual time=0.011..0.030 rows=1
loops=1)
                    Output: entity.id, entity.entity_id, entity.type,
entity.created_at
                    Index Cond: (entity.id = entity_tag.entity_id)
                    Filter: ((entity.type)::text = 'execution'::text)
                    Buffers: shared hit=4
        ->  Materialize  (cost=0.43..8.45 rows=1 width=4) (actual
time=0.040..0.078 rows=1 loops=1)
              Output: tag.id
              Buffers: shared hit=4
              ->  Index Only Scan using tag_pkey on public.tag
(cost=0.43..8.45 rows=1 width=4) (actual time=0.021..0.040 rows=1
loops=1)
                    Output: tag.id
                    Index Cond: (tag.id = 24)
                    Heap Fetches: 1
                    Buffers: shared hit=4
Planning time: 0.362 ms
Execution time: 0.458 ms

which proves that it can just scan the index backward. I can't figure out why
PostgreSQL doesn't take into account the UNIQUE index there.

Is there any way to do this with one query? Or is it because the database
design is flawed?

Environment:
- PostgreSQL version: 9.6
- OS: Linux (Amazon RDS) / MacOSX / Docker

Installation:
- MacOSX: PostgreSQL is installed by using brew.
- Docker: PostgreSQL image from https://hub.docker.com/_/postgres.

I use the default configuration provided in all of these environments.

Thanks!

--
Best regards,

Aufar Gilbran

--
*_Grab is hiring. Learn more at _**https://grab.careers
<https://grab.careers/>*


By communicating with Grab Inc and/or its
subsidiaries, associate companies and jointly controlled entities (“Grab
Group”), you are deemed to have consented to the processing of your
personal data as set out in the Privacy Notice which can be viewed at
https://grab.com/privacy/ <https://grab.com/privacy/>


This email contains
confidential information and is only for the intended recipient(s). If you
are not the intended recipient(s), please do not disseminate, distribute or
copy this email Please notify Grab Group immediately if you have received
this by mistake and delete this email from your system. Email transmission
cannot be guaranteed to be secure or error-free as any information therein
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or
contain viruses. Grab Group do not accept liability for any errors or
omissions in the contents of this email arises as a result of email
transmission. All intellectual property rights in this email and
attachments therein shall remain vested in Grab Group, unless otherwise
provided by law.




Re: [External] Join queries slow with predicate, limit, and ordering

From
Jeff Janes
Date:
On Mon, Dec 2, 2019 at 8:29 AM Aufar Gilbran <aufar.gilbran@grab.com> wrote:
Hello,

I'm trying to figure out how to optimise 3-table (many-to-many relation) joins
with predicate, limit, and ordering, where one of the tables returns at most one
row.

This is the query that I have right now:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.key = 'status'
      AND tag.value = 'SUCCEEDED'
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

What happens if you set enable_sort to off before running it?
 
        ->  Nested Loop  (cost=1.28..723.38 rows=1 width=4) (actual
time=0.153..5590.717 rows=89222 loops=1)

It thinks it will find 1 row, and actually finds 89,222.  I don't know exactly why that would be, I suppose tag_id has an extremely skewed distribution.  But yeah, that is going to cause some problems.  For one thing, if there was actually just one qualifying row, then it wouldn't get to stop early, as the LIMIT would never be satisfied.  So it thinks that if it choose to walk the index backwards, it would have to walk the **entire** index.


                    ->  Index Only Scan using entity_tag_tag_id_entity_id_idx on public.entity_tag (cost=0.43..711.53 rows=201 width=16) (actual time=0.035..756.829 rows=89222 loops=1)
                          Heap Fetches: 89222

You should vacuum this table.  Doing that (and only that) probably won't make a great deal of difference to this particular query, but still, it will help some.  And might help other ones you haven't noticed yet as well.
 

Both tag_key_value_key and entity_tag_tag_id_entity_id_idx is a UNIQUE
constraint on tag(key,value) and entity_tag(tag_id, entity_id) respectively.

It seems to me that PostgreSQL runs the nested loop against all of the 90K
records because it wants to sort the result before limiting the result.

It doesn't **know** there are going to be 90000 records.  It cannot plan queries based on knowledge it doesn't possess.
 
It
doesn't take into account of the UNIQUE constraint imposed on the table and
thinks that the join being done inside the subquery will change the ordering of
entity_id returned by the subquery, thus prompting the sort.

This seems like rather adventurous speculation.  It does the sort because the horrible estimation makes it think it will be faster that way, not because it thinks it is the only possible way.  Of you set enable_sort = off and it still does a sort, then you know it thinks there is no other way.

 

I believe with how the index sorted, it should be able to just scan the index
backwards because at most only one tag_id will be returned. When I tried
changing the predicate here to filter by ID with the following query:

-- This runs very fast
SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 24
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

With this query, it can use the join condition to transfer the knowledge of tag.id=24 to become entity_tag.tag_id=24, and then look up stats on entity_tag.tag_id for the value 24.  When you specify the single row of tag indirectly, it can't do that as it doesn't know what specific value of tag.id is going to be the one it finds (until after the query is done being planned and starts executing, at which point it is too late).  But the row with id=24 doesn't seem to be the same one with "tag.key = 'status' AND tag.value = 'SUCCEEDED'", so you have basically changed the query entirely on us.
 
If you replanned this query with ORDER BY entity.id+0 DESC, (and with the true value of tag_id) that might give you some more insight into the hidden "thought process" behind the planner.

Cheers,

Jeff

Re: [External] Join queries slow with predicate, limit, and ordering

From
Aufar Gilbran
Date:
Thanks for the answer!

On Tue, Dec 3, 2019 at 8:39 AM Jeff Janes <jeff.janes@gmail.com> wrote:
> What happens if you set enable_sort to off before running it?

Turning enable_sort to off makes the first query to not sort[1]. It
does run much slower though compared to the original query[2]. This
time I do VACUUM ANALYZE first so even the slow query is much faster,
but still much slower than the fast query[3].

> It thinks it will find 1 row, and actually finds 89,222.  I don't know exactly why that would be, I suppose tag_id
hasan extremely skewed distribution.  But yeah, that is going to cause some problems.  For one thing, if there was
actuallyjust one qualifying row, then it wouldn't get to stop early, as the LIMIT would never be satisfied.  So it
thinksthat if it choose to walk the index backwards, it would have to walk the **entire** index. 

I'm not really sure what skewed distribution is. If by skewed you mean
that for a particular tag_id there are many entity and other tag_id
there might be low amount entity then yes, this particular key value
covers 80% of the entity. For this kind of dataset, is there any way
that I can do to improve it or is it just impossible?

> With this query, it can use the join condition to transfer the knowledge of tag.id=24 to become entity_tag.tag_id=24,
andthen look up stats on entity_tag.tag_id for the value 24.  When you specify the single row of tag indirectly, it
can'tdo that as it doesn't know what specific value of tag.id is going to be the one it finds (until after the query is
donebeing planned and starts executing, at which point it is too late).  But the row with id=24 doesn't seem to be the
sameone with "tag.key = 'status' AND tag.value = 'SUCCEEDED'", so you have basically changed the query entirely on us. 

Apologies, I used the query for database on another environment
previously. The correct one uses tag_id=18 [3]. So it becomes like
this:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 18
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It's still very fast and the query plan looks similar to me.

> If you replanned this query with ORDER BY entity.id+0 DESC, (and with the true value of tag_id) that might give you
somemore insight into the hidden "thought process" behind the planner. 

I tried this on the fast query and it becomes very slow [4]. I guess
because it cannot consult the index for the ordering anymore so it
can't do LIMIT? I'm not so sure.

[1] https://explain.depesz.com/s/aEmR
[2] https://explain.depesz.com/s/kmNY
[3] https://explain.depesz.com/s/pD5v
[4] https://explain.depesz.com/s/4s7Q

--
Best regards,

Aufar Gilbran

--
*_Grab is hiring. Learn more at _**https://grab.careers
<https://grab.careers/>*


By communicating with Grab Inc and/or its
subsidiaries, associate companies and jointly controlled entities (“Grab
Group”), you are deemed to have consented to the processing of your
personal data as set out in the Privacy Notice which can be viewed at
https://grab.com/privacy/ <https://grab.com/privacy/>


This email contains
confidential information and is only for the intended recipient(s). If you
are not the intended recipient(s), please do not disseminate, distribute or
copy this email Please notify Grab Group immediately if you have received
this by mistake and delete this email from your system. Email transmission
cannot be guaranteed to be secure or error-free as any information therein
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or
contain viruses. Grab Group do not accept liability for any errors or
omissions in the contents of this email arises as a result of email
transmission. All intellectual property rights in this email and
attachments therein shall remain vested in Grab Group, unless otherwise
provided by law.