Thread: Preserving order through an inner join

Preserving order through an inner join

From
Kevin Jardine
Date:
I have a query structured like this:

SELECT stuff FROM
(SELECT more stuff FROM
table1
ORDER BY field1) AS q1
INNER JOIN table2 ON ( ... )

and have found that the INNER JOIN is ignoring the order set for q1.

The final results are not ordered by field1.

This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.

I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.

I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER BY
outsideq1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that
thereis a simpler solution. 

Any suggestions for getting this to work?

Kevin




Re: Preserving order through an inner join

From
Pavel Stehule
Date:
Hello

2010/9/26 Kevin Jardine <kevinjardine@yahoo.com>:
> I have a query structured like this:
>
> SELECT stuff FROM
> (SELECT more stuff FROM
> table1
> ORDER BY field1) AS q1
> INNER JOIN table2 ON ( ... )
>
> and have found that the INNER JOIN is ignoring the order set for q1.
>

you can block a hash join that cannot to protect order. But it can be
very contra-productive - this method of JOIN is the most faster.
SQLite or MySQL are too simple and doesn't provide some more
sophisticated constructs. Principally your query is wrong - the real
order is based on final ORDER BY. It's probable, so your queries will
stop working (on SQLite or MySQL) in future, when these databases will
be more mature.

Regards

Pavel Stehule

> The final results are not ordered by field1.
>
> This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
>
> I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.
>
> I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER
BYoutside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that
thereis a simpler solution. 
>
> Any suggestions for getting this to work?
>
> Kevin
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Preserving order through an inner join

From
Kevin Jardine
Date:
Hi Pavel,

I'm not really interested in a "my database is better than your database" discussion.

I want to know how to preserve the order. As I said, moving the ORDER BY out of the subquery would be a lot of work.

I am trying to find another solution.

Any suggestions from anyone?

Kevin

--- On Sun, 9/26/10, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> From: Pavel Stehule <pavel.stehule@gmail.com>
> Subject: Re: [GENERAL] Preserving order through an inner join
> To: "Kevin Jardine" <kevinjardine@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Sunday, September 26, 2010, 9:23 PM
> Hello
>
> 2010/9/26 Kevin Jardine <kevinjardine@yahoo.com>:
> > I have a query structured like this:
> >
> > SELECT stuff FROM
> > (SELECT more stuff FROM
> > table1
> > ORDER BY field1) AS q1
> > INNER JOIN table2 ON ( ... )
> >
> > and have found that the INNER JOIN is ignoring the
> order set for q1.
> >
>
> you can block a hash join that cannot to protect order. But
> it can be
> very contra-productive - this method of JOIN is the most
> faster.
> SQLite or MySQL are too simple and doesn't provide some
> more
> sophisticated constructs. Principally your query is wrong -
> the real
> order is based on final ORDER BY. It's probable, so your
> queries will
> stop working (on SQLite or MySQL) in future, when these
> databases will
> be more mature.
>
> Regards
>
> Pavel Stehule
>
> > The final results are not ordered by field1.
> >
> > This works for other databases (eg. MySQL and
> Sqllite3) but not PostgreSQL.
> >
> > I would really like to support PostgreSQL but this
> ordering problem is stopping me from doing so.
> >
> > I can make some small changes to the query structure
> as long as it works for the other DBs as well. Moving the
> ORDER BY outside q1 would be a large amount of work, however
> (these queries are generated by a program), so I am hoping
> that there is a simpler solution.
> >
> > Any suggestions for getting this to work?
> >
> > Kevin
> >
> >
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
>




Re: Preserving order through an inner join

From
Tom Lane
Date:
Kevin Jardine <kevinjardine@yahoo.com> writes:
> I have a query structured like this:
> SELECT stuff FROM
> (SELECT more stuff FROM
> table1
> ORDER BY field1) AS q1
> INNER JOIN table2 ON ( ... )

> and have found that the INNER JOIN is ignoring the order set for q1.

> The final results are not ordered by field1.

Indeed.  Many of the possible join techniques won't preserve that ordering.

> This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.

It might sometimes accidentally fail to fail, but I think you'll find
that there are *no* SQL databases where this is guaranteed to work the
way you expect.  The SQL standard explicitly disavows any particular
output row order unless there is a top-level ORDER BY.  (In fact,
unless things have changed recently an ORDER BY in a sub-select isn't
even legal per spec.)

> I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER
BYoutside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that
thereis a simpler solution. 

Nope, that's what you need to do.

            regards, tom lane

Re: Preserving order through an inner join

From
Kevin Jardine
Date:
OK, then.

The sub-select needs to go.

Thanks for helpful advice.

Kevin

--- On Sun, 9/26/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Subject: Re: [GENERAL] Preserving order through an inner join
> To: "Kevin Jardine" <kevinjardine@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Sunday, September 26, 2010, 9:37 PM
> Kevin Jardine <kevinjardine@yahoo.com>
> writes:
> > I have a query structured like this:
> > SELECT stuff FROM
> > (SELECT more stuff FROM
> > table1
> > ORDER BY field1) AS q1
> > INNER JOIN table2 ON ( ... )
>
> > and have found that the INNER JOIN is ignoring the
> order set for q1.
>
> > The final results are not ordered by field1.
>
> Indeed.  Many of the possible join techniques won't
> preserve that ordering.
>
> > This works for other databases (eg. MySQL and
> Sqllite3) but not PostgreSQL.
>
> It might sometimes accidentally fail to fail, but I think
> you'll find
> that there are *no* SQL databases where this is guaranteed
> to work the
> way you expect.  The SQL standard explicitly disavows
> any particular
> output row order unless there is a top-level ORDER
> BY.  (In fact,
> unless things have changed recently an ORDER BY in a
> sub-select isn't
> even legal per spec.)
>
> > I can make some small changes to the query structure
> as long as it works for the other DBs as well. Moving the
> ORDER BY outside q1 would be a large amount of work, however
> (these queries are generated by a program), so I am hoping
> that there is a simpler solution.
>
> Nope, that's what you need to do.
>
>            
> regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>




Re: Preserving order through an inner join

From
Karsten Hilbert
Date:
On Sun, Sep 26, 2010 at 07:32:01PM +0000, Kevin Jardine wrote:

> Hi Pavel,
>
> I'm not really interested in a "my database is better than your database" discussion.

Pavel did not say that his database is better than yours.

What he said was that your query is wrong (with respect to
what you said you expect it to return). Tom attested that it
isn't even "legal" SQL as per the spec.

Please keep things factual.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Preserving order through an inner join

From
"Igor Neyman"
Date:

> -----Original Message-----
> From: Kevin Jardine [mailto:kevinjardine@yahoo.com]
> Sent: Sunday, September 26, 2010 3:15 PM
> To: pgsql-general@postgresql.org
> Subject: Preserving order through an inner join
>
> I have a query structured like this:
>
> SELECT stuff FROM
> (SELECT more stuff FROM
> table1
> ORDER BY field1) AS q1
> INNER JOIN table2 ON ( ... )
>
> and have found that the INNER JOIN is ignoring the order set for q1.
>
> The final results are not ordered by field1.
>
> This works for other databases (eg. MySQL and Sqllite3) but
> not PostgreSQL.
>
> I would really like to support PostgreSQL but this ordering
> problem is stopping me from doing so.
>
> I can make some small changes to the query structure as long
> as it works for the other DBs as well. Moving the ORDER BY
> outside q1 would be a large amount of work, however (these
> queries are generated by a program), so I am hoping that
> there is a simpler solution.
>
> Any suggestions for getting this to work?
>
> Kevin
>

As others stated, it's a bad idea to order sub-select.
But if you really, really need it (m.b. not in this particular case),
you could do:

SELECT stuff FROM
 (SELECT field1, field2, COUNT(*) FROM
 table1
 GROUP BY field1, field2
 ORDER BY field1) AS q1
 INNER JOIN table2 ON ( ... )


Regards,
Igor Neyman

Re: Preserving order through an inner join

From
Gurjeet Singh
Date:
On Sun, Sep 26, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The SQL standard explicitly disavows any particular
output row order unless there is a top-level ORDER BY.  (In fact,
unless things have changed recently an ORDER BY in a sub-select isn't
even legal per spec.)


Not sure about the SQL spec allowing it, but an ORDER BY followed by a LIMIT does have valid use cases in sub-selects.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: Preserving order through an inner join

From
Darren Duncan
Date:
Following up on other replies, and agreeing that SQL has no inherent row
ordering and it is only by accident that you are seeing such ...

The *only* way to reliably preserve sort order is by explicitly encoding the
order in your data, by maintaining a column or columns through to the outermost
query on which you sort.

A generic solution which you can apply is to use a SQL "RANK"-alike feature to
generate a serial number column in your rowset; you can do this in the inner
query where you want to have all your unique or complex sorting details, and
then your sort order will be preserved (as long as you explicitly propagate said
column either by name or with *) through to the outer query.  You can then order
by and exclude said column in the outermost query if you want, but you don't
have to because your application will still get the ordering information in said
extra column.

For example, something like this:

SELECT stuff FROM
(SELECT more stuff, rank() OVER (ORDER BY field1) AS ordcol FROM
table1
) AS q1
INNER JOIN table2 ON ( ... )
ORDER BY ordcol

See
http://www.postgresql.org/docs/9.0/interactive/queries-table-expressions.html#QUERIES-WINDOW
and the 3 urls it links to.

-- Darren Duncan

Kevin Jardine wrote:
> I have a query structured like this:
>
> SELECT stuff FROM
> (SELECT more stuff FROM
> table1
> ORDER BY field1) AS q1
> INNER JOIN table2 ON ( ... )
>
> and have found that the INNER JOIN is ignoring the order set for q1.
>
> The final results are not ordered by field1.
>
> This works for other databases (eg. MySQL and Sqllite3) but not PostgreSQL.
>
> I would really like to support PostgreSQL but this ordering problem is stopping me from doing so.
>
> I can make some small changes to the query structure as long as it works for the other DBs as well. Moving the ORDER
BYoutside q1 would be a large amount of work, however (these queries are generated by a program), so I am hoping that
thereis a simpler solution. 
>
> Any suggestions for getting this to work?

Re: Preserving order through an inner join

From
Darren Duncan
Date:
Gurjeet Singh wrote:
> On Sun, Sep 26, 2010 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>     The SQL standard explicitly disavows any particular
>     output row order unless there is a top-level ORDER BY.  (In fact,
>     unless things have changed recently an ORDER BY in a sub-select isn't
>     even legal per spec.)
>
> Not sure about the SQL spec allowing it, but an ORDER BY followed by a
> LIMIT does have valid use cases in sub-selects.

Absolutely it does, but that is just a row *filtering* operation.  You still
have to have a separate ORDER BY in the outermost query to get result rows
output in a particular order. -- Darren Duncan

Re: Preserving order through an inner join

From
Craig Ringer
Date:
On 09/27/2010 03:37 AM, Tom Lane wrote:
> Kevin Jardine<kevinjardine@yahoo.com>  writes:
>> I have a query structured like this:
>> SELECT stuff FROM
>> (SELECT more stuff FROM
>> table1
>> ORDER BY field1) AS q1
>> INNER JOIN table2 ON ( ... )
>
>> and have found that the INNER JOIN is ignoring the order set for q1.
>
>> The final results are not ordered by field1.
>
> Indeed.  Many of the possible join techniques won't preserve that ordering.

Just to elaborate on this for the OP:

When joining two data sets, PostgreSQL has a number of choices about how
to do it. Some of these are:

- Merge join (http://en.wikipedia.org/wiki/Sort-merge_join)
- Hash join (http://en.wikipedia.org/wiki/Hash_join)
- Nested loop + index/sequential table scans
   (http://en.wikipedia.org/wiki/Nested_loop_join)

As you will see from the descriptions and how they work, many join
algorithms do not preserve the order of the input relations. Different
join types are optimal for different absolute and relative sizes of
input relations and different join conditions; all of them are useful in
one situation or another. Only using joins that preserved the natural
order of the relations (which is undefined as per the SQL spec and 99%
of the time people don't care about anyway) would be pretty nasty for
performance.

You could potentially force preservation of order in your particular
query by telling PostgreSQL to use a nested loop with your subquery as
the outer loop, but it'd require overriding the query planner's join
method and join order optimization in ugly ways that aren't well supported.

Not only that, but you'll have to do that kind of hacking-around if you
target any other database that supports at least hash joins, and you'll
have to do it using database-specific query hints, configuration
options, session variables, etc.

I hope this helps to explain in a bit more detail why it's worth putting
in the effort to update your code.

--
Craig ringer