Re: [External] Join queries slow with predicate, limit, and ordering - Mailing list pgsql-performance

From Aufar Gilbran
Subject Re: [External] Join queries slow with predicate, limit, and ordering
Date
Msg-id CAFm+y9JDfq6DMU11FYS1s4SXGbvPLibFRga0iuaLF0VJ509DVg@mail.gmail.com
Whole thread Raw
In response to Re: [External] Join queries slow with predicate, limit, and ordering  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Thanks for the answer!

On Tue, Dec 3, 2019 at 8:39 AM Jeff Janes <jeff.janes@gmail.com> wrote:
> What happens if you set enable_sort to off before running it?

Turning enable_sort to off makes the first query to not sort[1]. It
does run much slower though compared to the original query[2]. This
time I do VACUUM ANALYZE first so even the slow query is much faster,
but still much slower than the fast query[3].

> It thinks it will find 1 row, and actually finds 89,222.  I don't know exactly why that would be, I suppose tag_id
hasan extremely skewed distribution.  But yeah, that is going to cause some problems.  For one thing, if there was
actuallyjust one qualifying row, then it wouldn't get to stop early, as the LIMIT would never be satisfied.  So it
thinksthat if it choose to walk the index backwards, it would have to walk the **entire** index. 

I'm not really sure what skewed distribution is. If by skewed you mean
that for a particular tag_id there are many entity and other tag_id
there might be low amount entity then yes, this particular key value
covers 80% of the entity. For this kind of dataset, is there any way
that I can do to improve it or is it just impossible?

> With this query, it can use the join condition to transfer the knowledge of tag.id=24 to become entity_tag.tag_id=24,
andthen look up stats on entity_tag.tag_id for the value 24.  When you specify the single row of tag indirectly, it
can'tdo that as it doesn't know what specific value of tag.id is going to be the one it finds (until after the query is
donebeing planned and starts executing, at which point it is too late).  But the row with id=24 doesn't seem to be the
sameone with "tag.key = 'status' AND tag.value = 'SUCCEEDED'", so you have basically changed the query entirely on us. 

Apologies, I used the query for database on another environment
previously. The correct one uses tag_id=18 [3]. So it becomes like
this:

SELECT entity.id
FROM (
    SELECT entity_tag.entity_id
    FROM tag
    JOIN entity_tag ON tag.id = entity_tag.tag_id
    WHERE tag.id = 18
) matched
JOIN entity ON matched.entity_id = entity.id
WHERE entity.type = 'execution'
ORDER BY entity.id DESC
LIMIT 10;

It's still very fast and the query plan looks similar to me.

> If you replanned this query with ORDER BY entity.id+0 DESC, (and with the true value of tag_id) that might give you
somemore insight into the hidden "thought process" behind the planner. 

I tried this on the fast query and it becomes very slow [4]. I guess
because it cannot consult the index for the ordering anymore so it
can't do LIMIT? I'm not so sure.

[1] https://explain.depesz.com/s/aEmR
[2] https://explain.depesz.com/s/kmNY
[3] https://explain.depesz.com/s/pD5v
[4] https://explain.depesz.com/s/4s7Q

--
Best regards,

Aufar Gilbran

--
*_Grab is hiring. Learn more at _**https://grab.careers
<https://grab.careers/>*


By communicating with Grab Inc and/or its
subsidiaries, associate companies and jointly controlled entities (“Grab
Group”), you are deemed to have consented to the processing of your
personal data as set out in the Privacy Notice which can be viewed at
https://grab.com/privacy/ <https://grab.com/privacy/>


This email contains
confidential information and is only for the intended recipient(s). If you
are not the intended recipient(s), please do not disseminate, distribute or
copy this email Please notify Grab Group immediately if you have received
this by mistake and delete this email from your system. Email transmission
cannot be guaranteed to be secure or error-free as any information therein
could be intercepted, corrupted, lost, destroyed, delayed or incomplete, or
contain viruses. Grab Group do not accept liability for any errors or
omissions in the contents of this email arises as a result of email
transmission. All intellectual property rights in this email and
attachments therein shall remain vested in Grab Group, unless otherwise
provided by law.




pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [External] Join queries slow with predicate, limit, and ordering
Next
From: Michael Lewis
Date:
Subject: Re: Make recently inserted/updated records available in the buffer/cache