Thread: [RFE] auto ORDER BY for SELECT
Hello 2012/1/23 Douglas Eric <sekkuar@hotmail.com>: > I'm not sure if this is the right list to discuss this, but, I have a > suggestion: > > ORDER BY clause, as defined in the SELECT documentation says: > > "If ORDER BY is not given, the rows are returned in whatever order the > system finds fastest to produce" > > This order is usually not wanted, as it is not predictable. I believe many > people would expect the order of rows > returned in this case, to be ordered as the primary key of the table, or the > same order the rows were inserted. > > I suggest to change this behavior. If one makes a SELECT statement without > any ORDER BY, it would be > clever to automatically sort by the first primary key found in the query, if > any. > The present behavior would still be used in case of queries without any > primary key fields. > > This would save a lot of repeated clauses "ORDER BY <table primary key>" > that we have to add to every SELECT, even the most simple one > "SELECT * FROM <table>" > If we actually want the order of the rows to make any sense. * I don't think so this is good idea. Any sort is not cheap - so ORDER BY "hint" - yes, user can do expensive operation. * second argument SELECT * FROM longtab LIMIT 100 is significantly faster than SELECT * FROM longtab ORDER BY PK LIMIT 100; so implicit ORDER BY can significantly increase a load of server Regards Pavel Stehule
Douglas Eric <sekkuar@hotmail.com> wrote: > I'm not sure if this is the right list to discuss this, but, I have a > suggestion: > > ORDER BY clause, as defined in the SELECT documentation says: > > "If ORDER BY is not given, the rows are returned in whatever order the system > finds fastest to produce" > > This order is usually not wanted, as it is not predictable. I believe many > people would expect the order of rows > returned in this case, to be ordered as the primary key of the table, or the > same order the rows were inserted. > > I suggest to change this behavior. If one makes a SELECT statement without any > ORDER BY, it would be > clever to automatically sort by the first primary key found in the query, if > any. No. Since 8.3 (IIRC) we have a feature called 'concurrent seq. scan', see: http://j-davis.com/postgresql/83v82_scans.html Your suggestion can't work in this way. That's only one problem, there are more. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Piling On…
NO! (not that I actually think this would ever happen anyway).
Sorting is a resource-intensive process and it should NOT be made into a default. Besides, any kind of intelligent/implicit behavior like that ends up being forgotten and/or changed in the future and previously working queries begin to break.
SQL is an explicit, declarative, language and thus will tend toward having verbose syntax generally – this is a good thing since while you are in the middle of writing a query you may be frustrated above having to explain every little thing but when you come back to the code a year from now you have a good chance of knowing exactly what it is doing AND when you upgrade PostgreSQL you have less exposure to the breakage of existing queries since the server is not making guesses as to what you want/mean.
David J.
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Douglas Eric
Sent: Monday, January 23, 2012 7:18 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] [RFE] auto ORDER BY for SELECT
I'm not sure if this is the right list to discuss this, but, I have a suggestion:
ORDER BY clause, as defined in the SELECT documentation says:
"If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce"
This order is usually not wanted, as it is not predictable. I believe many people would expect the order of rows
returned in this case, to be ordered as the primary key of the table, or the same order the rows were inserted.
I suggest to change this behavior. If one makes a SELECT statement without any ORDER BY, it would be
clever to automatically sort by the first primary key found in the query, if any.
The present behavior would still be used in case of queries without any primary key fields.
This would save a lot of repeated clauses "ORDER BY <table primary key>"
that we have to add to every SELECT, even the most simple one
"SELECT * FROM <table>"
If we actually want the order of the rows to make any sense.
On Mon, Jan 23, 2012 at 11:17 PM, Douglas Eric <sekkuar@hotmail.com> wrote: > I suggest to change this behavior. If one makes a SELECT statement without > any ORDER BY, it would be > clever to automatically sort by the first primary key found in the query, if > any. > The present behavior would still be used in case of queries without any > primary key fields. This would require that work be done without text commanding it, which is IMHO a bad idea. Generally, SQL follows the principle that more text --> more work: compare SELECT and SELECT DISTINCT (it's more work to look for duplicates), VACUUM and VACUUM ANALYZE, etc, etc. The default state is to do the least work that makes sense. (There are exceptions - UNION ought to be UNION DISTINCT, versus UNION ALL to reduce the work done - but this is the general rule.) Often, a query is done with genuine disregard for order. If you're going to take the results of the query and stuff them into a hashtable, you don't care what order they come up in. Why have the database sort them? Let 'em come in the easiest order possible. ChrisA
On 24 January 2012 09:29, Chris Angelico <rosuav@gmail.com> wrote: > On Mon, Jan 23, 2012 at 11:17 PM, Douglas Eric <sekkuar@hotmail.com> wrote: >> I suggest to change this behavior. If one makes a SELECT statement without >> any ORDER BY, it would be >> clever to automatically sort by the first primary key found in the query, if >> any. I recently submitted a problem report with a product that had that behaviour. The data involved was a table of database ID's and text labels for use in a drop-down list. In such cases, sorting the data by primary key (the ID) is rarely what you want! For example, if you have a listing of car brands, sorting them by some arbitrary ID quickly makes such a list impossible to use. You want such a list sorted alphabetically. Defaulting to sorting by ID (like aforementioned product did) does not make sense in such a case. So, this is not just a bad idea from a performance perspective, it's also often not what you want. Of course specifying a "different" sort order than the default one would solve the issue, but that's not the point here. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.