Thread: [External] Join queries slow with predicate, limit, and ordering
Hello, I'm trying to figure out how to optimise 3-table (many-to-many relation) joins with predicate, limit, and ordering, where one of the tables returns at most one row. This is the query that I have right now: SELECT entity.id FROM ( SELECT entity_tag.entity_id FROM tag JOIN entity_tag ON tag.id = entity_tag.tag_id WHERE tag.key = 'status' AND tag.value = 'SUCCEEDED' ) matched JOIN entity ON matched.entity_id = entity.id WHERE entity.type = 'execution' ORDER BY entity.id DESC LIMIT 10; It runs very slowly when there are many rows matched on entity table which, in my case, there are about 90K rows, even though at most there is only one row returned by tag. Limit (cost=723.39..723.40 rows=1 width=4) (actual time=6189.015..6189.242 rows=10 loops=1) Output: entity.id Buffers: shared hit=411886 read=31282 -> Sort (cost=723.39..723.40 rows=1 width=4) (actual time=6188.999..6189.059 rows=10 loops=1) Output: entity.id Sort Key: entity.id DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=411886 read=31282 -> Nested Loop (cost=1.28..723.38 rows=1 width=4) (actual time=0.153..5590.717 rows=89222 loops=1) Output: entity.id Buffers: shared hit=411886 read=31282 -> Nested Loop (cost=0.86..721.98 rows=3 width=8) (actual time=0.108..1851.707 rows=89222 loops=1) Output: entity_tag.entity_id Buffers: shared hit=65146 read=20646 -> Index Scan using tag_key_value_key on public.tag (cost=0.43..8.45 rows=1 width=4) (actual time=0.043..0.061 rows=1 loops=1) Output: tag.id, tag.key, tag.value, tag.created_at Index Cond: (((tag.key)::text = 'status'::text) AND ((tag.value)::text = 'SUCCEEDED'::text)) Buffers: shared hit=1 read=3 -> Index Only Scan using entity_tag_tag_id_entity_id_idx on public.entity_tag (cost=0.43..711.53 rows=201 width=16) (actual time=0.035..756.829 rows=89222 loops=1) Output: entity_tag.tag_id, entity_tag.entity_id Index Cond: (entity_tag.tag_id = tag.id) Heap Fetches: 89222 Buffers: shared hit=65145 read=20643 -> Index Scan using entity_pkey on public.entity (cost=0.42..0.46 rows=1 width=4) (actual time=0.010..0.017 rows=1 loops=89222) Output: entity.id, entity.entity_id, entity.type, entity.created_at Index Cond: (entity.id = entity_tag.entity_id) Filter: ((entity.type)::text = 'execution'::text) Buffers: shared hit=346740 read=10636 Planning time: 0.817 ms Execution time: 6189.419 ms Both tag_key_value_key and entity_tag_tag_id_entity_id_idx is a UNIQUE constraint on tag(key,value) and entity_tag(tag_id, entity_id) respectively. It seems to me that PostgreSQL runs the nested loop against all of the 90K records because it wants to sort the result before limiting the result. It doesn't take into account of the UNIQUE constraint imposed on the table and thinks that the join being done inside the subquery will change the ordering of entity_id returned by the subquery, thus prompting the sort. I believe with how the index sorted, it should be able to just scan the index backwards because at most only one tag_id will be returned. When I tried changing the predicate here to filter by ID with the following query: -- This runs very fast SELECT entity.id FROM ( SELECT entity_tag.entity_id FROM tag JOIN entity_tag ON tag.id = entity_tag.tag_id WHERE tag.id = 24 ) matched JOIN entity ON matched.entity_id = entity.id WHERE entity.type = 'execution' ORDER BY entity.id DESC LIMIT 10; and it's blazing fast. This time PostgreSQL seems to know that the join inside the subqery won't change the ordering of entity_id returned by the subquery, as seen in the following query explanation: Limit (cost=1.28..1025.56 rows=10 width=4) (actual time=0.144..0.276 rows=1 loops=1) Output: entity.id Buffers: shared hit=12 -> Nested Loop (cost=1.28..1537.70 rows=15 width=4) (actual time=0.125..0.238 rows=1 loops=1) Output: entity.id Buffers: shared hit=12 -> Nested Loop (cost=0.86..1529.06 rows=15 width=12) (actual time=0.057..0.116 rows=1 loops=1) Output: entity_tag.tag_id, entity.id Buffers: shared hit=8 -> Index Only Scan Backward using entity_tag_tag_id_entity_id_idx on public.entity_tag (cost=0.43..454.82 rows=128 width=16) (actual time=0.018..0.038 rows=1 loops=1) Output: entity_tag.tag_id, entity_tag.entity_id Index Cond: (entity_tag.tag_id = 24) Heap Fetches: 1 Buffers: shared hit=4 -> Index Scan using entity_pkey on public.entity (cost=0.42..8.38 rows=1 width=4) (actual time=0.011..0.030 rows=1 loops=1) Output: entity.id, entity.entity_id, entity.type, entity.created_at Index Cond: (entity.id = entity_tag.entity_id) Filter: ((entity.type)::text = 'execution'::text) Buffers: shared hit=4 -> Materialize (cost=0.43..8.45 rows=1 width=4) (actual time=0.040..0.078 rows=1 loops=1) Output: tag.id Buffers: shared hit=4 -> Index Only Scan using tag_pkey on public.tag (cost=0.43..8.45 rows=1 width=4) (actual time=0.021..0.040 rows=1 loops=1) Output: tag.id Index Cond: (tag.id = 24) Heap Fetches: 1 Buffers: shared hit=4 Planning time: 0.362 ms Execution time: 0.458 ms which proves that it can just scan the index backward. I can't figure out why PostgreSQL doesn't take into account the UNIQUE index there. Is there any way to do this with one query? Or is it because the database design is flawed? Environment: - PostgreSQL version: 9.6 - OS: Linux (Amazon RDS) / MacOSX / Docker Installation: - MacOSX: PostgreSQL is installed by using brew. - Docker: PostgreSQL image from https://hub.docker.com/_/postgres. I use the default configuration provided in all of these environments. Thanks! -- Best regards, Aufar Gilbran -- *_Grab is hiring. Learn more at _**https://grab.careers <https://grab.careers/>* By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.
On Mon, Dec 2, 2019 at 8:29 AM Aufar Gilbran <aufar.gilbran@grab.com> wrote:
Hello,
I'm trying to figure out how to optimise 3-table (many-to-many relation) joins
with predicate, limit, and ordering, where one of the tables returns at most one
row.
This is the query that I have right now:
SELECT entity.id
FROM (
SELECT entity_tag.entity_id
FROM tag
JOIN entity_tag ON tag.id = entity_tag.tag_id
WHERE tag.key = 'status'
AND tag.value = 'SUCCEEDED'
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;
What happens if you set enable_sort to off before running it?
-> Nested Loop (cost=1.28..723.38 rows=1 width=4) (actual
time=0.153..5590.717 rows=89222 loops=1)
It thinks it will find 1 row, and actually finds 89,222. I don't know exactly why that would be, I suppose tag_id has an extremely skewed distribution. But yeah, that is going to cause some problems. For one thing, if there was actually just one qualifying row, then it wouldn't get to stop early, as the LIMIT would never be satisfied. So it thinks that if it choose to walk the index backwards, it would have to walk the **entire** index.
-> Index Only Scan using entity_tag_tag_id_entity_id_idx on public.entity_tag (cost=0.43..711.53 rows=201 width=16) (actual time=0.035..756.829 rows=89222 loops=1)
Heap Fetches: 89222
You should vacuum this table. Doing that (and only that) probably won't make a great deal of difference to this particular query, but still, it will help some. And might help other ones you haven't noticed yet as well.
Both tag_key_value_key and entity_tag_tag_id_entity_id_idx is a UNIQUE
constraint on tag(key,value) and entity_tag(tag_id, entity_id) respectively.
It seems to me that PostgreSQL runs the nested loop against all of the 90K
records because it wants to sort the result before limiting the result.
It doesn't **know** there are going to be 90000 records. It cannot plan queries based on knowledge it doesn't possess.
It
doesn't take into account of the UNIQUE constraint imposed on the table and
thinks that the join being done inside the subquery will change the ordering of
entity_id returned by the subquery, thus prompting the sort.
This seems like rather adventurous speculation. It does the sort because the horrible estimation makes it think it will be faster that way, not because it thinks it is the only possible way. Of you set enable_sort = off and it still does a sort, then you know it thinks there is no other way.
I believe with how the index sorted, it should be able to just scan the index
backwards because at most only one tag_id will be returned. When I tried
changing the predicate here to filter by ID with the following query:
-- This runs very fast
SELECT entity.id
FROM (
SELECT entity_tag.entity_id
FROM tag
JOIN entity_tag ON tag.id = entity_tag.tag_id
WHERE tag.id = 24
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;
With this query, it can use the join condition to transfer the knowledge of tag.id=24 to become entity_tag.tag_id=24, and then look up stats on entity_tag.tag_id for the value 24. When you specify the single row of tag indirectly, it can't do that as it doesn't know what specific value of tag.id is going to be the one it finds (until after the query is done being planned and starts executing, at which point it is too late). But the row with id=24 doesn't seem to be the same one with "tag.key = 'status' AND tag.value = 'SUCCEEDED'", so you have basically changed the query entirely on us.
If you replanned this query with ORDER BY entity.id+0 DESC, (and with the true value of tag_id) that might give you some more insight into the hidden "thought process" behind the planner.
Cheers,
Jeff
Thanks for the answer! On Tue, Dec 3, 2019 at 8:39 AM Jeff Janes <jeff.janes@gmail.com> wrote: > What happens if you set enable_sort to off before running it? Turning enable_sort to off makes the first query to not sort[1]. It does run much slower though compared to the original query[2]. This time I do VACUUM ANALYZE first so even the slow query is much faster, but still much slower than the fast query[3]. > It thinks it will find 1 row, and actually finds 89,222. I don't know exactly why that would be, I suppose tag_id hasan extremely skewed distribution. But yeah, that is going to cause some problems. For one thing, if there was actuallyjust one qualifying row, then it wouldn't get to stop early, as the LIMIT would never be satisfied. So it thinksthat if it choose to walk the index backwards, it would have to walk the **entire** index. I'm not really sure what skewed distribution is. If by skewed you mean that for a particular tag_id there are many entity and other tag_id there might be low amount entity then yes, this particular key value covers 80% of the entity. For this kind of dataset, is there any way that I can do to improve it or is it just impossible? > With this query, it can use the join condition to transfer the knowledge of tag.id=24 to become entity_tag.tag_id=24, andthen look up stats on entity_tag.tag_id for the value 24. When you specify the single row of tag indirectly, it can'tdo that as it doesn't know what specific value of tag.id is going to be the one it finds (until after the query is donebeing planned and starts executing, at which point it is too late). But the row with id=24 doesn't seem to be the sameone with "tag.key = 'status' AND tag.value = 'SUCCEEDED'", so you have basically changed the query entirely on us. Apologies, I used the query for database on another environment previously. The correct one uses tag_id=18 [3]. So it becomes like this: SELECT entity.id FROM ( SELECT entity_tag.entity_id FROM tag JOIN entity_tag ON tag.id = entity_tag.tag_id WHERE tag.id = 18 ) matched JOIN entity ON matched.entity_id = entity.id WHERE entity.type = 'execution' ORDER BY entity.id DESC LIMIT 10; It's still very fast and the query plan looks similar to me. > If you replanned this query with ORDER BY entity.id+0 DESC, (and with the true value of tag_id) that might give you somemore insight into the hidden "thought process" behind the planner. I tried this on the fast query and it becomes very slow [4]. I guess because it cannot consult the index for the ordering anymore so it can't do LIMIT? I'm not so sure. [1] https://explain.depesz.com/s/aEmR [2] https://explain.depesz.com/s/kmNY [3] https://explain.depesz.com/s/pD5v [4] https://explain.depesz.com/s/4s7Q -- Best regards, Aufar Gilbran -- *_Grab is hiring. Learn more at _**https://grab.careers <https://grab.careers/>* By communicating with Grab Inc and/or its subsidiaries, associate companies and jointly controlled entities (“Grab Group”), you are deemed to have consented to the processing of your personal data as set out in the Privacy Notice which can be viewed at https://grab.com/privacy/ <https://grab.com/privacy/> This email contains confidential information and is only for the intended recipient(s). If you are not the intended recipient(s), please do not disseminate, distribute or copy this email Please notify Grab Group immediately if you have received this by mistake and delete this email from your system. Email transmission cannot be guaranteed to be secure or error-free as any information therein could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or contain viruses. Grab Group do not accept liability for any errors or omissions in the contents of this email arises as a result of email transmission. All intellectual property rights in this email and attachments therein shall remain vested in Grab Group, unless otherwise provided by law.