Thread: Re: (Select *) vs. (Select id) from table.
Mike wrote: > Hi, > > I am trying to make a website where scalability matters. In my quest to > make my website more scalable I broke down the following SQL statement: > > select * from customers limit 100 > > to: > > select id, updated_date from customers limit 100 > > Then my application would check it's cache to see if it has those > records available and will hit the database with consequent: > > select * from customers where id = 4 or id = 9 or id = 19 > > Am I really speeding things up by breaking down the SQL statements to > what's necessary? or is it faster to get everything right at once! > Well, first, it's never really a good idea to use "SELECT * FROM" in a production application, against a table. Tables can (and do) change from one release to another, and if the layout of the table changes, you could be looking at having to rewrite your code, especially if it relied on the order of the columns in the tables. It's always better to specify the columns that you're looking for, since existing columns should (at least in theory on a production DB) remain present, though their order can change sometimes, depending on what the DBA does. :) Secondly, as far as making your queries more efficient, the only way that you can really do that is to determine actually how long the queries are taking. This is relative to the size of the database in rows, and of course, the data that you're querying against, whether a table scan is necessary, and all of that. This is the process of optimizing queries. For a small table, it can be faster sometimes to just pull all of the records at once (for example, if they're all within a single page). However, if you're pulling from a large DB, it will be faster to use smaller queries against it, using well-placed indexes. Also, you may wish to consider using views if you really like using * with SELECT... Since this way, you can just depreciate a view and start using a new one if the underlying columns are changed. :) Also, if you use views, you can optimize the view's query when it comes time to change it, which mess less messing around in the application code, especially if it is a frequently used query. HTH, Mike
Michael Trausch <michael.trausch@comcast.nope.net> writes: > Well, first, it's never really a good idea to use "SELECT * FROM" in a > production application, against a table. Tables can (and do) change > from one release to another, and if the layout of the table changes, you > could be looking at having to rewrite your code, especially if it relied > on the order of the columns in the tables. Wouldn't it be more logical to say "it's never a good idea to depend on the order of the columns in the tables" then? > It's always better to specify the columns that you're looking for, since > existing columns should (at least in theory on a production DB) remain > present, though their order can change sometimes, depending on what the DBA > does. :) And then when new columns are added (or existing columns removed as you point out) what would have to happen to your code? Hint, you used the word above... This particular piece of dogma dates back to the days under Oracle when "SELECT *" actively broke. If you changed the column you had to recompile the code in question or else you got strange errors or crashes. The *only* way to refer to columns was by position. These days any reasonable driver allows you to use the column names to refer to columns, and application code often has multiple levels of complexity. The layer handling the query often has no idea which columns will be used by later layers. If you hard code the list of columns at each layer then you have to rewrite lots of code when you add a column or change a datatype. -- greg
On Saturday 07 January 2006 13:50, Michael Trausch wrote: > Mike wrote: > > Hi, > > > > I am trying to make a website where scalability matters. In my quest to > > make my website more scalable I broke down the following SQL statement: > > > > select * from customers limit 100 > > > > to: > > > > select id, updated_date from customers limit 100 > > > > Then my application would check it's cache to see if it has those > > records available and will hit the database with consequent: > > > > select * from customers where id = 4 or id = 9 or id = 19 > > > > Am I really speeding things up by breaking down the SQL statements to > > what's necessary? or is it faster to get everything right at once! > > Well, first, it's never really a good idea to use "SELECT * FROM" in a > production application, against a table. Tables can (and do) change > from one release to another, and if the layout of the table changes, you > could be looking at having to rewrite your code, especially if it relied > on the order of the columns in the tables. It's always better to > specify the columns that you're looking for, since existing columns > should (at least in theory on a production DB) remain present, though > their order can change sometimes, depending on what the DBA does. :) > > Secondly, as far as making your queries more efficient, the only way > that you can really do that is to determine actually how long the > queries are taking. This is relative to the size of the database in > rows, and of course, the data that you're querying against, whether a > table scan is necessary, and all of that. This is the process of > optimizing queries. For a small table, it can be faster sometimes to > just pull all of the records at once (for example, if they're all within > a single page). However, if you're pulling from a large DB, it will be > faster to use smaller queries against it, using well-placed indexes. > > Also, you may wish to consider using views if you really like using * > with SELECT... Since this way, you can just depreciate a view and start > using a new one if the underlying columns are changed. :) Also, if you > use views, you can optimize the view's query when it comes time to > change it, which mess less messing around in the application code, > especially if it is a frequently used query. > > HTH, > Mike > If your application uses a kind of mapping algorithm that is based on the information schema you can use select * without a risk. I'd say a select * is slightly slower than a column name based select, simply because you transfer more information to the application. The time the DB needs to find the record should be the same since the record has to be found before the resultset is assembled. I.e. my application uses an object-relational mapper and returns an object to the application. The object is assembled using the information schema and the columns asked for - if any are asked for specifically. Therefor my app does a lot of select *'s, but it won't break if I add, change or delete columns. It doesn't even break if I rename a column. If you don't use something like that I'd go with the "better" approach of selecting by column name. The only time when this is considerably slower is while you program the queries (more typing :-) ). Views are the way to go when you know there's going to be changes. Views allow you to define the "API" your application relies on. Rules allow you to make the view writeable, so you can effectively hide the layout of your database which is allways a good idea. UC -- Open Source Solutions 4U, LLC 1618 Kelly St Phone: +1 707 568 3056 Santa Rosa, CA 95401 Cell: +1 650 302 2405 United States Fax: +1 707 568 6416
> The time the DB needs to find the record > should be the same since the record has to be found before the resultset is > assembled. What if the query can be satisfied from an index? I don't know if PostgreSQL has this kind of optimization or not. But in the original example: select id, updated_date from customers limit 100 Sybase, for instance, would not use table pages at all if there were an index covering id & updated_date. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 665-7007 voice
Scott Ribe <scott_ribe@killerbytes.com> writes: >> The time the DB needs to find the record >> should be the same since the record has to be found before the resultset is >> assembled. > > What if the query can be satisfied from an index? I don't know if PostgreSQL > has this kind of optimization or not. Nope. Tuple visibility isn't stored in indexes, so it still has to visit the heap to see if a row is visible to your transaction. -Doug