Thread: two table join with order by on both tables attributes

two table join with order by on both tables attributes

From
Evgeniy Shishkin
Date:
Hello,

suppose you have two very simple tables with fk dependency, by which we join them
and another attribute for sorting

like this
select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 

Very typical web query.

No matter which composite indexes i try, postgresql can not make efficient nested loop plan using indexes.
It chooses all sorts of seq scans and hash joins or merge join and always a sort node and then a limit 10.

Neither plan provides acceptable performance. And tables tend to grow =\

Can anybody suggest something or explain this behavior?

Re: two table join with order by on both tables attributes

From
David G Johnston
Date:
Evgeniy Shishkin wrote
> Hello,
>
> suppose you have two very simple tables with fk dependency, by which we
> join them
> and another attribute for sorting
>
> like this
> select * from users join  notifications on users.id=notifications.user_id
> ORDER BY users.priority desc ,notifications.priority desc limit 10;
>
> Very typical web query.
>
> No matter which composite indexes i try, postgresql can not make efficient
> nested loop plan using indexes.
> It chooses all sorts of seq scans and hash joins or merge join and always
> a sort node and then a limit 10.
>
> Neither plan provides acceptable performance. And tables tend to grow =\
>
> Can anybody suggest something or explain this behavior?

Can you explain why a nested loop is best for your data?  Given my
understanding of an expected "priority"cardinality I would expect your ORDER
BY to be extremely inefficient and not all that compatible with a nested
loop approach.

You can use the various parameters listed on this page to force the desired
plan and then provide EXPLAIN ANALYZE results for the various executed plans
and compare them.

http://www.postgresql.org/docs/9.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE

And now for the obligatory "read this" link:

https://wiki.postgresql.org/wiki/SlowQueryQuestions

If you can show that in fact the nested loop (or some other plan) performs
better than the one chosen by the planner - and can provide data that the
developers can use to replicate the experiment - then improvements can be
made.  At worse you will come to understand why the planner is right and can
then explore alternative models.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/two-table-join-with-order-by-on-both-tables-attributes-tp5814135p5814137.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: two table join with order by on both tables attributes

From
Evgeniy Shishkin
Date:
My question was about that you can not have fast execution of this kind of query in postgresql.
With any runtime configuration you just swith from seq scan and hash join to merge join, and then you have a sort node.

In my understanding, i need to have two indexes
on users(priority desc, id)
and notifications(user_id, priority desc)

then postgresql would choose nested loop and get sorted data from indexes.
But it wont.

I don't understand why.

Do you have any schema and GUCs which performs this kind of query well?

Sorry for top posting.

> Can you explain why a nested loop is best for your data?  Given my
> understanding of an expected "priority"cardinality I would expect your ORDER
> BY to be extremely inefficient and not all that compatible with a nested
> loop approach.
>
> You can use the various parameters listed on this page to force the desired
> plan and then provide EXPLAIN ANALYZE results for the various executed plans
> and compare them.
>
> http://www.postgresql.org/docs/9.3/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-ENABLE
>
> And now for the obligatory "read this" link:
>
> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>
> If you can show that in fact the nested loop (or some other plan) performs
> better than the one chosen by the planner - and can provide data that the
> developers can use to replicate the experiment - then improvements can be
> made.  At worse you will come to understand why the planner is right and can
> then explore alternative models.
>
> David J.
>
>
>
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/two-table-join-with-order-by-on-both-tables-attributes-tp5814135p5814137.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



Re: two table join with order by on both tables attributes

From
Tom Lane
Date:
Evgeniy Shishkin <itparanoia@gmail.com> writes:
>> select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 

> In my understanding, i need to have two indexes
> on users(priority desc, id)
> and notifications(user_id, priority desc)
> then postgresql would choose nested loop and get sorted data from indexes.
> But it wont.

Indeed.  If you think a bit harder, you'll realize that the plan you
suggest would *not* produce the sort order requested by this query.
It would (if I'm not confused myself) produce an ordering like
   users.priority desc, id asc, notifications.priority desc
which would only match what the query asks for if there's just a single
value of id per users.priority value.

Offhand I think that the planner will not recognize a nestloop as
producing a sort ordering of this kind even if the query did request the
right ordering.  That could perhaps be improved, but I've not seen many
if any cases where it would be worth the trouble.

            regards, tom lane


Re: two table join with order by on both tables attributes

From
Evgeniy Shishkin
Date:
>>> select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 
>
>> In my understanding, i need to have two indexes
>> on users(priority desc, id)
>> and notifications(user_id, priority desc)
>> then postgresql would choose nested loop and get sorted data from indexes.
>> But it wont.
>
> Indeed.  If you think a bit harder, you'll realize that the plan you
> suggest would *not* produce the sort order requested by this query.
> It would (if I'm not confused myself) produce an ordering like
>   users.priority desc, id asc, notifications.priority desc
> which would only match what the query asks for if there's just a single
> value of id per users.priority value.
>
> Offhand I think that the planner will not recognize a nestloop as
> producing a sort ordering of this kind even if the query did request the
> right ordering.  That could perhaps be improved, but I've not seen many
> if any cases where it would be worth the trouble.


Thanks Tom, you are right.

But may be some sort of skip index scan ala loose index scan will help with index on notifications(priority
desc,user_id)?

I know that this is currently not handled by native executors.
May by i can work around this using WITH RECURSIVE query?

Also, are there any plans to handle  loose index scan in the upcoming release?

Re: two table join with order by on both tables attributes

From
Evgeniy Shishkin
Date:
> On 08 Aug 2014, at 03:43, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>
>>>> select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 
>>
>>> In my understanding, i need to have two indexes
>>> on users(priority desc, id)
>>> and notifications(user_id, priority desc)
>>> then postgresql would choose nested loop and get sorted data from indexes.
>>> But it wont.
>>
>> Indeed.  If you think a bit harder, you'll realize that the plan you
>> suggest would *not* produce the sort order requested by this query.
>> It would (if I'm not confused myself) produce an ordering like
>>  users.priority desc, id asc, notifications.priority desc
>> which would only match what the query asks for if there's just a single
>> value of id per users.priority value.
>>
>> Offhand I think that the planner will not recognize a nestloop as
>> producing a sort ordering of this kind even if the query did request the
>> right ordering.  That could perhaps be improved, but I've not seen many
>> if any cases where it would be worth the trouble.
>

And actually with this kind of query we really want the most wanted notifications, by the user.
So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business
logic.
And we will benefit if this case would be improved.

Re: two table join with order by on both tables attributes

From
Marti Raudsepp
Date:
On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>>>>> select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 

>>>> In my understanding, i need to have two indexes
>>>> on users(priority desc, id)
>>>> and notifications(user_id, priority desc)

> And actually with this kind of query we really want the most wanted notifications, by the user.
> So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business
logic.

You can rewrite it with LATERAL to trick the planner into sorting each
user's notifications separately. This should give you the nestloop
plan you expect:

SELECT *
FROM users,
LATERAL (
  SELECT * FROM notifications WHERE notifications.user_id=users.id
  ORDER BY notifications.priority DESC
) AS notifications
ORDER BY users.priority DESC, users.id

It would be great if Postgres could do this transformation automatically.

There's a "partial sort" patch in the current CommitFest, which would
solve the problem partially (it could use the index on users, but the
notifications sort would have to be done in memory still).
https://commitfest.postgresql.org/action/patch_view?id=1368

Regards,
Marti


Re: two table join with order by on both tables attributes

From
Evgeniy Shishkin
Date:
> On 08 Aug 2014, at 16:29, Marti Raudsepp <marti@juffo.org> wrote:
>
> On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin <itparanoia@gmail.com> wrote:
>>>>>> select * from users join  notifications on users.id=notifications.user_id ORDER BY users.priority desc
,notifications.prioritydesc limit 10; 
>
>>>>> In my understanding, i need to have two indexes
>>>>> on users(priority desc, id)
>>>>> and notifications(user_id, priority desc)
>
>> And actually with this kind of query we really want the most wanted notifications, by the user.
>> So we really can rewrite to order by users.priority desc, id asc, notifications.priority desc according to business
logic.
>
> You can rewrite it with LATERAL to trick the planner into sorting each
> user's notifications separately. This should give you the nestloop
> plan you expect:
>
> SELECT *
> FROM users,
> LATERAL (
>  SELECT * FROM notifications WHERE notifications.user_id=users.id
>  ORDER BY notifications.priority DESC
> ) AS notifications
> ORDER BY users.priority DESC, users.id
>

Thank you very much.


> It would be great if Postgres could do this transformation automatically.
>
> There's a "partial sort" patch in the current CommitFest, which would
> solve the problem partially (it could use the index on users, but the
> notifications sort would have to be done in memory still).
> https://commitfest.postgresql.org/action/patch_view?id=1368
>
> Regards,
> Marti