Thread: Bad query? Or planner?
Hi,
I have recently started using postgres and have been operating under the assumption that the query planner knows best (as long as I don't do anything too stupid). I've been structuring my queries (and data) in a certain way: writing re-usable subqueries, joining them in as necessary, and only filtering the query at the top level (assuming that the query planner will push down the appropriate restrictions as necessary). Of course, also keeping in mind proper indexes to support efficient joins and sorts.
I recently wrote a query that I thought was easy to reason about, and I assumed the query planner would execute it efficiently.
SELECT * FROM xtag_stack_feed
JOIN (
SELECT DISTINCT ON (do_post_xtag.xtag_ci) *
FROM do_post_xtag
JOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_id
ORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_post
ON last_post.xtag_ci=xtag_stack_feed.xtag_ci
ORDER BY decayed_to_base DESC
LIMIT 1;
Unfortunately, the query as written is not being executed efficiently. I tried to rewrite it in a couple different ways without success, and then learned about lateral joins. Rewritten as follows, it executes efficiently.
SELECT * FROM xtag_stack_feed
JOIN LATERAL (
SELECT DISTINCT ON (do_post_xtag.xtag_ci) *
FROM do_post_xtag
JOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_id
WHERE do_post_xtag.xtag_ci=xtag_stack_feed.xtag_ci
ORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_post
ON true
ORDER BY decayed_to_base DESC
LIMIT 1;
From my naive perspective, it seems like the second query is semantically equivalent to the first; it just has the join condition moved into the subquery as a WHERE filter.
Am I doing something wrong? Is there room for query planner improvement in cases like these?
I've attached the EXPLAIN ANALYZEs. Any help would be much appreciated!
Thanks,
-Devin
Attachment
On 28 November 2016 at 21:11, Devin Smith <dsmith@redcurrent.com> wrote:
Hi,I recently wrote a query that I thought was easy to reason about, and I assumed the query planner would execute it efficiently.SELECT * FROM xtag_stack_feedJOIN (SELECT DISTINCT ON (do_post_xtag.xtag_ci) *FROM do_post_xtagJOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_id ORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_postON last_post.xtag_ci=xtag_stack_feed.xtag_ci ORDER BY decayed_to_base DESCLIMIT 1;Unfortunately, the query as written is not being executed efficiently. I tried to rewrite it in a couple different ways without success, and then learned about lateral joins. Rewritten as follows, it executes efficiently.SELECT * FROM xtag_stack_feedJOIN LATERAL (SELECT DISTINCT ON (do_post_xtag.xtag_ci) *FROM do_post_xtagJOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_id WHERE do_post_xtag.xtag_ci=xtag_stack_feed.xtag_ci ORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_postON trueORDER BY decayed_to_base DESCLIMIT 1;From my naive perspective, it seems like the second query is semantically equivalent to the first; it just has the join condition moved into the subquery as a WHERE filter.
I do not see a "where" condition in your first query.
Regards
Johann
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
my lips will praise you. (Psalm 63:3)
Correct. There is no WHERE filter in the first query. The JOIN condition from the first query was moved into a WHERE filter in the second query (enabled by LATERAL). Both have the same ordering applied with a limit of 1.
I chatted with a couple users in the IRC channel, and I think I got the consensus that the two queries are semantically the same, but the query planner doesn't currently optimize my original query.
I chatted with a couple users in the IRC channel, and I think I got the consensus that the two queries are semantically the same, but the query planner doesn't currently optimize my original query.
-Devin
On Mon, Dec 5, 2016 at 1:59 AM Johann Spies <johann.spies@gmail.com> wrote:
I recently wrote a query that I thought was easy to reason about, and I assumed the query planner would execute it efficiently.SELECT * FROM xtag_stack_feedJOIN (SELECT DISTINCT ON (do_post_xtag.xtag_ci) *FROM do_post_xtagJOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_idORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_postON last_post.xtag_ci=xtag_stack_feed.xtag_ciORDER BY decayed_to_base DESCLIMIT 1;Unfortunately, the query as written is not being executed efficiently. I tried to rewrite it in a couple different ways without success, and then learned about lateral joins. Rewritten as follows, it executes efficiently.SELECT * FROM xtag_stack_feedJOIN LATERAL (SELECT DISTINCT ON (do_post_xtag.xtag_ci) *FROM do_post_xtagJOIN do_post ON do_post.rc_id=do_post_xtag.post_rc_idWHERE do_post_xtag.xtag_ci=xtag_stack_feed.xtag_ciORDER BY do_post_xtag.xtag_ci ASC, do_post_xtag.post_rc_id DESC) last_postON trueORDER BY decayed_to_base DESCLIMIT 1;From my naive perspective, it seems like the second query is semantically equivalent to the first; it just has the join condition moved into the subquery as a WHERE filter.I do not see a "where" condition in your first query.RegardsJohann
--Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)