Thread: Re: (Select *) vs. (Select id) from table.

Re: (Select *) vs. (Select id) from table.

From
Michael Trausch
Date:
Mike wrote:
> Hi,
>
> I am trying to make a website where scalability matters. In my quest to
> make my website more scalable I broke down the following SQL statement:
>
> select * from customers limit 100
>
> to:
>
> select id, updated_date from customers limit 100
>
> Then my application would check it's cache to see if it has those
> records available and will hit the database with consequent:
>
> select  * from customers where id = 4 or id = 9 or id = 19
>
> Am I really speeding things up by breaking down the SQL statements to
> what's necessary? or is it faster to get everything right at once!
>

Well, first, it's never really a good idea to use "SELECT * FROM" in a
production application, against a table.  Tables can (and do) change
from one release to another, and if the layout of the table changes, you
could be looking at having to rewrite your code, especially if it relied
on the order of the columns in the tables.  It's always better to
specify the columns that you're looking for, since existing columns
should (at least in theory on a production DB) remain present, though
their order can change sometimes, depending on what the DBA does.  :)

Secondly, as far as making your queries more efficient, the only way
that you can really do that is to determine actually how long the
queries are taking.  This is relative to the size of the database in
rows, and of course, the data that you're querying against, whether a
table scan is necessary, and all of that.  This is the process of
optimizing queries.  For a small table, it can be faster sometimes to
just pull all of the records at once (for example, if they're all within
a single page).  However, if you're pulling from a large DB, it will be
faster to use smaller queries against it, using well-placed indexes.

Also, you may wish to consider using views if you really like using *
with SELECT... Since this way, you can just depreciate a view and start
using a new one if the underlying columns are changed.  :)  Also, if you
use views, you can optimize the view's query when it comes time to
change it, which mess less messing around in the application code,
especially if it is a frequently used query.

    HTH,
    Mike

Re: (Select *) vs. (Select id) from table.

From
Greg Stark
Date:
Michael Trausch <michael.trausch@comcast.nope.net> writes:

> Well, first, it's never really a good idea to use "SELECT * FROM" in a
> production application, against a table.  Tables can (and do) change
> from one release to another, and if the layout of the table changes, you
> could be looking at having to rewrite your code, especially if it relied
> on the order of the columns in the tables.

Wouldn't it be more logical to say "it's never a good idea to depend on the
order of the columns in the tables" then?

> It's always better to specify the columns that you're looking for, since
> existing columns should (at least in theory on a production DB) remain
> present, though their order can change sometimes, depending on what the DBA
> does. :)

And then when new columns are added (or existing columns removed as you point
out) what would have to happen to your code? Hint, you used the word above...


This particular piece of dogma dates back to the days under Oracle when
"SELECT *" actively broke. If you changed the column you had to recompile the
code in question or else you got strange errors or crashes. The *only* way to
refer to columns was by position.

These days any reasonable driver allows you to use the column names to refer
to columns, and application code often has multiple levels of complexity. The
layer handling the query often has no idea which columns will be used by later
layers. If you hard code the list of columns at each layer then you have to
rewrite lots of code when you add a column or change a datatype.

--
greg

Re: (Select *) vs. (Select id) from table.

From
"Uwe C. Schroeder"
Date:
On Saturday 07 January 2006 13:50, Michael Trausch wrote:
> Mike wrote:
> > Hi,
> >
> > I am trying to make a website where scalability matters. In my quest to
> > make my website more scalable I broke down the following SQL statement:
> >
> > select * from customers limit 100
> >
> > to:
> >
> > select id, updated_date from customers limit 100
> >
> > Then my application would check it's cache to see if it has those
> > records available and will hit the database with consequent:
> >
> > select  * from customers where id = 4 or id = 9 or id = 19
> >
> > Am I really speeding things up by breaking down the SQL statements to
> > what's necessary? or is it faster to get everything right at once!
>
> Well, first, it's never really a good idea to use "SELECT * FROM" in a
> production application, against a table.  Tables can (and do) change
> from one release to another, and if the layout of the table changes, you
> could be looking at having to rewrite your code, especially if it relied
> on the order of the columns in the tables.  It's always better to
> specify the columns that you're looking for, since existing columns
> should (at least in theory on a production DB) remain present, though
> their order can change sometimes, depending on what the DBA does.  :)
>
> Secondly, as far as making your queries more efficient, the only way
> that you can really do that is to determine actually how long the
> queries are taking.  This is relative to the size of the database in
> rows, and of course, the data that you're querying against, whether a
> table scan is necessary, and all of that.  This is the process of
> optimizing queries.  For a small table, it can be faster sometimes to
> just pull all of the records at once (for example, if they're all within
> a single page).  However, if you're pulling from a large DB, it will be
> faster to use smaller queries against it, using well-placed indexes.
>
> Also, you may wish to consider using views if you really like using *
> with SELECT... Since this way, you can just depreciate a view and start
> using a new one if the underlying columns are changed.  :)  Also, if you
> use views, you can optimize the view's query when it comes time to
> change it, which mess less messing around in the application code,
> especially if it is a frequently used query.
>
>     HTH,
>     Mike
>


If your application uses a kind of mapping algorithm that is based on the
information schema you can use select * without a risk. I'd say a select * is
slightly slower than a column name based select, simply because you transfer
more information to the application. The time the DB needs to find the record
should be the same since the record has to be found before the resultset is
assembled.
I.e. my application uses an object-relational mapper and returns an object to
the application. The object is assembled using the information schema and the
columns asked for - if any are asked for specifically. Therefor my app does a
lot of select *'s, but it won't break if I add, change or delete columns. It
doesn't even break if I rename a column.
If you don't use something like that I'd go with the "better" approach of
selecting by column name. The only time when this is considerably slower is
while you program the queries (more typing :-) ).
Views are the way to go when you know there's going to be changes. Views allow
you to define the "API" your application relies on. Rules allow you to make
the view writeable, so you can effectively hide the layout of your database
which is allways a good idea.

    UC

--
Open Source Solutions 4U, LLC    1618 Kelly St
Phone:  +1 707 568 3056        Santa Rosa, CA 95401
Cell:   +1 650 302 2405        United States
Fax:    +1 707 568 6416

Re: (Select *) vs. (Select id) from table.

From
Scott Ribe
Date:
> The time the DB needs to find the record
> should be the same since the record has to be found before the resultset is
> assembled.

What if the query can be satisfied from an index? I don't know if PostgreSQL
has this kind of optimization or not. But in the original example:

select id, updated_date from customers limit 100

Sybase, for instance, would not use table pages at all if there were an
index covering id & updated_date.


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 665-7007 voice



Re: (Select *) vs. (Select id) from table.

From
Doug McNaught
Date:
Scott Ribe <scott_ribe@killerbytes.com> writes:

>> The time the DB needs to find the record
>> should be the same since the record has to be found before the resultset is
>> assembled.
>
> What if the query can be satisfied from an index? I don't know if PostgreSQL
> has this kind of optimization or not.

Nope.  Tuple visibility isn't stored in indexes, so it still has to
visit the heap to see if a row is visible to your transaction.

-Doug