Thread: Reproducing incorrect order with order by in a subquery
Hello,
--
For example I have a query:
SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X) main;
So the `ORDER BY` clause ended up in a subquery. Most of the time ordering works until it doesn't.
Can you help me create a set of test tables with some data to reproduce this problem more repeatedly? I just want to write a regression test to make sure it doesn't happen again.
Best regards, Ruslan.
Ruslan Zakirov schrieb am 13.06.2023 um 09:49: > For example I have a query: > > SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X) > main; > > So the `ORDER BY` clause ended up in a subquery. Most of the time > ordering works until it doesn't. > > Can you help me create a set of test tables with some data to > reproduce this problem more repeatedly? I just want to write a > regression test to make sure it doesn't happen again. Your final/overall query has no ORDER BY, so Postgres is free to return the result in any order it likes. You will have to add an ORDER BY to the "main" part to get a guaranteed sort order
On Tue, Jun 13, 2023 at 1:26 PM Thomas Kellerer <shammat@gmx.net> wrote:
Ruslan Zakirov schrieb am 13.06.2023 um 09:49:
> For example I have a query:
>
> SELECT main.*, count(*) OVER () FROM (SELECT DISTINCT ... ORDER BY X)
> main;
>
> So the `ORDER BY` clause ended up in a subquery. Most of the time
> ordering works until it doesn't.
>
> Can you help me create a set of test tables with some data to
> reproduce this problem more repeatedly? I just want to write a
> regression test to make sure it doesn't happen again.
Your final/overall query has no ORDER BY, so Postgres is free to return the result in any order it likes.
You will have to add an ORDER BY to the "main" part to get a guaranteed sort order
I know how to fix the problem and I know that ORDER BY should be in the outermost select.
However, I want to write a test case that shows that the old code is wrong, but can not create
minimal set of tables to reproduce it. With this I'm looking for help.
Best regards, Ruslan.
Ruslan Zakirov <ruslan.zakirov@gmail.com> writes: > I know how to fix the problem and I know that ORDER BY should be in the > outermost select. > However, I want to write a test case that shows that the old code is wrong, > but can not create > minimal set of tables to reproduce it. With this I'm looking for help. The ORDER BY in the sub-select will be honored at the output of the sub-select. To have a different ordering at the final output, you need the upper query to do something that would re-order the rows. Joining the sub-select to something else might make that happen, or you could apply DISTINCT or some other non-trivial processing in the upper query. regards, tom lane
On Tue, Jun 13, 2023 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ruslan Zakirov <ruslan.zakirov@gmail.com> writes:
> I know how to fix the problem and I know that ORDER BY should be in the
> outermost select.
> However, I want to write a test case that shows that the old code is wrong,
> but can not create
> minimal set of tables to reproduce it. With this I'm looking for help.
The ORDER BY in the sub-select will be honored at the output of the
sub-select. To have a different ordering at the final output, you
need the upper query to do something that would re-order the rows.
Joining the sub-select to something else might make that happen,
or you could apply DISTINCT or some other non-trivial processing
in the upper query.
regards, tom lane
Hello Tom,
Thanks for replying. Maybe I'm just wrong in my assumption. A user reports incorrect order in the following query:
SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM (
SELECT DISTINCT main.* FROM Tickets main
LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND ( Groups_2.Instance = main.id )
JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue )
LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId = Groups_2.id )
WHERE ( ( main.Queue IN ('1', ... , '20') OR ( CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN ('cc', 'requestor') ) OR ( main.Owner = '38' ) )
AND (main.IsMerged IS NULL)
AND (main.Status != 'deleted')
AND (main.Type = 'ticket')
AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle = 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) )
) ORDER BY main.Created DESC ) main LIMIT 50
We have an option in our product that makes this query simpler, no joins in the subquery. The user reports that using this option helps with order.
This is a too complex query to build a test on. Tried simpler scenarios and failed.
-- Best regards, Ruslan.
On Tue, Jun 13, 2023 at 10:55 AM Ruslan Zakirov <ruslan.zakirov@gmail.com> wrote:
Thanks for replying. Maybe I'm just wrong in my assumption. A user reports incorrect order in the following query:SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM (SELECT DISTINCT main.* FROM Tickets mainLEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND ( Groups_2.Instance = main.id )JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue )LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId = Groups_2.id )WHERE ( ( main.Queue IN ('1', ... , '20') OR ( CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN ('cc', 'requestor') ) OR ( main.Owner = '38' ) )AND (main.IsMerged IS NULL)AND (main.Status != 'deleted')AND (main.Type = 'ticket')AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle = 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) )) ORDER BY main.Created DESC ) main LIMIT 50We have an option in our product that makes this query simpler, no joins in the subquery. The user reports that using this option helps with order.This is a too complex query to build a test on. Tried simpler scenarios and failed.
If you want guaranteed ordered output you must place the order by in the outermost query level (i.e., before your limit 50). Trying to do that for dynamic SQL where you don't actually know what query you are working with is going to be a challenge - maybe force the user to have the order by column first in their query then just say "ORDER BY 1" in the wrapper query you are adding? Basically have them write "row_number() over (order by)" for their query and you then order by row number.
David J.
On Tue, Jun 13, 2023 at 8:55 PM Ruslan Zakirov <ruslan.zakirov@gmail.com> wrote:
On Tue, Jun 13, 2023 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:Ruslan Zakirov <ruslan.zakirov@gmail.com> writes:
> I know how to fix the problem and I know that ORDER BY should be in the
> outermost select.
> However, I want to write a test case that shows that the old code is wrong,
> but can not create
> minimal set of tables to reproduce it. With this I'm looking for help.
The ORDER BY in the sub-select will be honored at the output of the
sub-select. To have a different ordering at the final output, you
need the upper query to do something that would re-order the rows.
Joining the sub-select to something else might make that happen,
or you could apply DISTINCT or some other non-trivial processing
in the upper query.
regards, tom laneHello Tom,Thanks for replying. Maybe I'm just wrong in my assumption. A user reports incorrect order in the following query:SELECT main.*, COUNT(main.id) OVER() AS search_builder_count_all FROM (SELECT DISTINCT main.* FROM Tickets mainLEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = 'RT::Ticket-Role' ) AND ( Groups_2.Instance = main.id )JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue )LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.MemberId IN ('38', '837', ... , '987', '58468') ) AND ( CachedGroupMembers_3.GroupId = Groups_2.id )WHERE ( ( main.Queue IN ('1', ... , '20') OR ( CachedGroupMembers_3.MemberId IS NOT NULL AND LOWER(Groups_2.Name) IN ('cc', 'requestor') ) OR ( main.Owner = '38' ) )AND (main.IsMerged IS NULL)AND (main.Status != 'deleted')AND (main.Type = 'ticket')AND (main.Owner = '6' AND ( ( Queues_1.Lifecycle = 'default' AND main.Status IN ('new', 'open', 'stalled') ) OR ( Queues_1.Lifecycle = 'approvals' AND main.Status IN ('new', 'open', 'stalled') ) )) ORDER BY main.Created DESC ) main LIMIT 50We have an option in our product that makes this query simpler, no joins in the subquery. The user reports that using this option helps with order.This is a too complex query to build a test on. Tried simpler scenarios and failed.
Hello,
First of all I want to apologize. We work with multiple RDBMS systems. This particular user is using mysql. So it's not clear if it works ok or not on Pg.
Anyway, yesterday I tried my simplified case on Pg latest, Pg 11 and on mysql latest. Had no luck. Either my test case is too simple or I can not find the correct distribution of data between two tables.
Spent too much time on this. Going to work on the query builder and move the "order by" clause out of the subquery. Just to be on the safe side. Most probably it will fix the issue for the user.
On 6/14/23 05:03, Ruslan Zakirov wrote:
[snip]
This is when you need a prod copy...
[snip]
Anyway, yesterday I tried my simplified case on Pg latest, Pg 11 and on mysql latest. Had no luck. Either my test case is too simple or I can not find the correct distribution of data between two tables.
This is when you need a prod copy...
--
Born in Arizona, moved to Babylonia.
Born in Arizona, moved to Babylonia.
Am Wed, Jun 14, 2023 at 01:03:06PM +0300 schrieb Ruslan Zakirov: > > This is a too complex query to build a test on. Tried simpler scenarios > > and failed. > > > > First of all I want to apologize. We work with multiple RDBMS systems. This > particular user is using mysql. ... > Anyway, yesterday I tried my simplified case on Pg latest, Pg 11 and on > mysql latest. Had no luck. Either my test case is too simple or I can not > find the correct distribution of data between two tables. IOW neither is help to be expected on this list nor can any testing (on PG) help with anything to be expected on MySQL ? As to the question: since the outer query does not have an ORDER BY it can return results in any order INCLUDING the one produced by the subquery. Which renders impossible any *proving* that it can return rows in orders different from the subquery *unless* one forces a different order on the outer query. Which in turn would defeat the purpose as then the outer query *does* have an explicit ordering... IMO you need to either rewrite the query or look to MySQL for more detailed help. Probably the former, regardless of the latter. Best, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Wed, Jun 14, 2023 at 3:50 PM Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
IOW neither is help to be expected on this list nor can any
testing (on PG) help with anything to be expected on MySQL ?
Don't expect any help on mysql part.
As to the question: since the outer query does not have an
ORDER BY it can return results in any order INCLUDING the one
produced by the subquery. Which renders impossible any
*proving* that it can return rows in orders different from
Well, in *theory* it can return rows in a different order that doesn't match the order produced
by the subquery. As far as I know no RDBMS state in its documentation that ordering between
subquery and its outer query is preserved. Some explicitly state the opposite:
MS SQL server:
"The ORDER BY clause is not valid in views, inline functions, derived tables, and subqueries, unless either the TOP or OFFSET and FETCH clauses are also specified. When ORDER BY is used in these objects, the clause is used only to determine the rows returned by the TOP clause or OFFSET and FETCH clauses. The ORDER BY clause does not guarantee ordered results when these constructs are queried, unless ORDER BY is also specified in the query itself."
My goal was to find a small dataset that demonstrates this ordering mismatch.
Failed to reach my goal. Ended up with a code change with a lot of explanations,
comments and links to documentation. No prove in tests that the old code was wrong
and a new one fixes it.
the subquery *unless* one forces a different order on the
outer query. Which in turn would defeat the purpose as then
the outer query *does* have an explicit ordering...
Best regards, Ruslan.
Am Thu, Jun 15, 2023 at 12:58:55AM +0300 schrieb Ruslan Zakirov: e.g. > https://mariadb.com/kb/en/why-is-order-by-in-a-from-subquery-ignored/ ... > My goal was to find a small dataset that demonstrates this ordering > mismatch. I attempted to think it through whether it is even *possible* to find a dataset (of any size) which *reliably* demonstrates the ordering mismatch. To my understanding -- since the outer query *can* (per happenstance) return rows in the order of the subquery's ORDER BY it is not possible to reliably demonstrate that it won't, is it ? Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B