Re: change natural column order - Mailing list pgsql-general

From Greg Stark
Subject Re: change natural column order
Date
Msg-id 87mzwy7gyl.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: change natural column order  ("Dann Corbit" <DCorbit@connx.com>)
Responses Re: change natural column order  (Russell Smith <mr-russ@pws.com.au>)
List pgsql-general
"Dann Corbit" <DCorbit@connx.com> writes:

> Putting "SELECT * FROM <table_name>" into a compiled program using libpq or
> ESQL is a code defect. Period.
>
>     ALTER TABLE ADD COLUMN /* Most frequent defect maker for SELECT * */

Whatever are you talking about? I've never tried ESQL precisely because it
seems a terribly old-fashioned interface. Does it not support referring to
columns by name? Even in libpq you can use PQfnumber or load all the columns
found with PQfname into an associative array like higher level language
drivers typically do automatically.

>     DROP TABLE/CREATE TABLE /* New version may have the same name and the
> same number of columns, and they may even have the same data types but there is
> no guarantee that the meaning is the same. */

Um, well, there was no guarantee the meaning was the intended meaning in the
first place except for your own interface documentation. Obviously if you
replace the table with a new one you're either maintaining compatible
semantics or else you're changing the interface and it will require some code
changes.

> It is a defect of equal magnitude to assume that columns are returned in any
> particular order unless specified in a column list (again, from a program and
> not interactively).

Actually the spec does guarantee that the columns have a fixed defined
ordering. However I would agree it would be a poor design to depend on that
ordering since it's not self-documenting and requires close synchronization
between far distant pieces of code.

But then I think it's bad to depend on ordering even when it is an explicitly
listed column list because it requires close synchronization between two
pieces of code even if they aren't too far distant. I prefer referring to
columns by name in all but the simplest cases because it means the only
synchronization is the presence or lack of a column, not the precise position
in the output list. I'm free to add columns to a select list in the logical
position without having to make adjustments elsewhere in the code.

> Another typical defect is to assume that columns come backed ordered by the
> primary key if the table is clustered on the primary key column. You can
> have a page split with many database systems and so there is no guarantee
> that data will be returned in order without an ORDER BY clause -- clustered
> or not.

You're confusing columns with rows. Without an ORDER BY clause there's no
guarantee of the order of the rows. Not because of page splits or any other
specific technical reason. There just isn't. There are any number of reasons
the database might choose another ordering to return rows.

In the case of Postgres clustering doesn't really "work" anyways so *any*
sequential table scan without an explicit ORDER BY will be out of order unless
you haven't made any data changes since the clustering.

> Any of (ASSUME NO COLUMN CHANGES/ASSUME COLUMN ORDER/ASSUME CLUSTERED KEY SORT
> ORDER) would cause me to fail code in a code review.
>
> IMO-YMMV

Well with Postgres and reasonably modern drivers none of the above impact at
all on whether "SELECT *" is a good idea or not.

Like I said in my post. Because of historical problems with various other
databases the rule of thumb that "SELECT *" is bad has become entrenched in
standard DBA dogma.

One of those historical problems is that some interfaces depend on column
ordering in result sets and have particular problems dealing when the columns
change in any way. As you point out this can happen for lots of reasons, not
just because "SELECT *" is used. Thankfully nowadays we get to use much more
flexible interfaces that find columns based on column names that don't suffer
from these problems.

--
greg

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Upcoming Changes to News Server ...
Next
From: Jan Wieck
Date:
Subject: Re: [ANNOUNCE] USENET vs Mailing Lists Poll ...