Thread: Clustered table order is not preserved on insert

Clustered table order is not preserved on insert

From
"Andrus"
Date:
I have table of reports

CREATE TABLE report (
ReportName CHAR(5) not null check (reportname<>''),
< a lot of other fields >,
id serial primary key
)

I want to duplicate report so that id order is preserved.

BEGIN;
CREATE temp TABLE tempreport AS
      SELECT * FROM report
      WHERE reportname='oldr'
      ORDER BY id;

ALTER TABLE tempreport DROP COLUMN id;
update tempreport set  reportname='newr';
insert into report SELECT * FROM tempreport;
DROP TABLE tempreport;
COMMIT;

SELECT *
FROM  report
WHERE reportname='newr'
ORDER BY id;

Observed:

order of some rows in newr is different than in oldr

Expected:

newr must have exactly the same order since
CREATE temp TABLE tempreport AS  .... ORDER BY id
creates clustered table.

Is this best method to preform this?
Why postgres 8.1.3 changes order ?
How to preserve order in newr without adding extra field to report table ?

Andrus.




Re: Clustered table order is not preserved on insert

From
Douglas McNaught
Date:
"Andrus" <eetasoft@online.ee> writes:

> Why postgres 8.1.3 changes order ?
> How to preserve order in newr without adding extra field to report table ?

You are never guaranteed any order in a result set unless you use
ORDER BY in the query.  Because PG treats UPDATE as DELETE + INSERT,
the table ordering changes all the time.

-Doug

Re: Clustered table order is not preserved on insert

From
Richard Huxton
Date:
Andrus wrote:
> I have table of reports
>
> CREATE TABLE report (
> ReportName CHAR(5) not null check (reportname<>''),
> < a lot of other fields >,
> id serial primary key
> )
>
> I want to duplicate report so that id order is preserved.

Tables aren't ordered by definition. If you want to get results back in
a particular order use ORDER BY, possibly wrapped in a view.

> BEGIN;
> CREATE temp TABLE tempreport AS
>       SELECT * FROM report
>       WHERE reportname='oldr'
>       ORDER BY id;
>
> ALTER TABLE tempreport DROP COLUMN id;
> update tempreport set  reportname='newr';
> insert into report SELECT * FROM tempreport;
> DROP TABLE tempreport;
> COMMIT;
>
> SELECT *
> FROM  report
> WHERE reportname='newr'
> ORDER BY id;
>
> Observed:
>
> order of some rows in newr is different than in oldr

Yes

> Expected:
>
> newr must have exactly the same order since
> CREATE temp TABLE tempreport AS  .... ORDER BY id
> creates clustered table.

And the INSERT INTO ... SELECT didn't ask for any order.

If you really want to do this, then try something like

-- Don't drop the id column
UPDATE tempreport SET ...;
INSERT INTO report SELECT reportname,... FROM tempreport ORDER BY id;
DROP TABLE tempreport;

> Is this best method to preform this?
> Why postgres 8.1.3 changes order ?

There is no order inside a table - you must supply your own.

Although the solution I describe should work it's still not a good idea.
The reason you are having this problem is that you are trying to do two
things with one column. You are using "id" as a unique ID number and
also as a sort order. If you have a separate sort_order this will let
you duplicate reports as you desire and also allow you to re-arrange
reports without changing their IDs.

Can I recommend getting a book or two on relational theory - "An
Introduction to Database Systems" by Date is widely available.
--
   Richard Huxton
   Archonet Ltd

Re: Clustered table order is not preserved on insert

From
"Andrus"
Date:
> You are never guaranteed any order in a result set unless you use
> ORDER BY in the query.

I cannot use order by since postgres must generate new values for id column.
For this case, id column must not exist in insertable table.

> Because PG treats UPDATE as DELETE + INSERT,
> the table ordering changes all the time.

This is excellent explanation! Thank you.

I changed by code so that clustering is performed after UPDATE command:

CREATE temp TABLE tempreport AS
      SELECT * FROM report
      WHERE reportname='oldr';

UPDATE tempreport SET reportname='newr';
CREATE TEMP TABLE t2 AS SELECT * FROM tempreport ORDER BY id;

ALTER TABLE t2 DROP COLUMN id;
insert into report SELECT * FROM t2;



Will DROP COLUMN preserve table clustering ?

Is it reasonable to  expect that clustered table is inserted in pyhical
order ?
Is it OK to use this code ?

Andrus.



Re: Clustered table order is not preserved on insert

From
"Andrus"
Date:
>> I want to duplicate report so that id order is preserved.
>
> Tables aren't ordered by definition.

From CLUSTER docs:

"When a table is clustered, it is physically reordered based on the index
information. "

> If you want to get results back in a particular order use ORDER BY,
> possibly wrapped in a view.

Using wrapper view would be excellent idea!  Thank you.
However, I have some hundred of columns in report table.
It is very tedious to list all those columns in view definition.

How to create a wrapper view so that it returns all columns except id column
without listing all columns in SELECT clause ? How to implement EXCEPT
COLUMNS clause like:

CREATE TEMP VIEW reportwrapper AS
SELECT  * EXCEPT COLUMNS (id, reportname),
   'newr' as reportname
FROM reports
WHERE reportname='oldr'
ORDER BY id

INSERT INTO reports SELECT * FROM reportwrapper;


> If you really want to do this, then try something like
>
> -- Don't drop the id column
> UPDATE tempreport SET ...;
> INSERT INTO report SELECT reportname,... FROM tempreport ORDER BY id;
> DROP TABLE tempreport;

Since id is primary key field, I got duplicate primary key error when it is
not dropped!

How to preserve order ant let postgres to generate primary keys without
adding extra order field?

> Although the solution I describe should work it's still not a good idea.
> The reason you are having this problem is that you are trying to do two
> things with one column. You are using "id" as a unique ID number and also
> as a sort order. If you have a separate sort_order this will let you
> duplicate reports as you desire and also allow you to re-arrange reports
> without changing their IDs.

Thank you. This is good explanation.
However, this ill-designed structure is used in a lot of different sites and
now it suddenly stops working.
Table stucture change requires re-writing parts of code, testing, debugging
and creating conversion routines from previous table version.

So I'll prefer some other solution if possible.

> Can I recommend getting a book or two on relational theory - "An
> Introduction to Database Systems" by Date is widely available.

Is it possible to read this form internet?
I have read Joel  Celkos book "SQL Programming Style".
Among other things Joel wrote  that every table must have only natural
primary keys. No surrogates, no ids.
There was no discussion about this in Joel's  book

Andrus.



Re: Clustered table order is not preserved on insert

From
"Jim Buttafuoco"
Date:
why don't you just (not tested)

insert into report (col1,col2,col3) SELECT col1,col2,col3 FROM t2 order by id

This should get the row into report in id order, you need to put in the correct column names

---------- Original Message -----------
From: "Andrus" <eetasoft@online.ee>
To: pgsql-general@postgresql.org
Sent: Wed, 26 Apr 2006 22:45:49 +0300
Subject: Re: [GENERAL] Clustered table order is not preserved on insert

> > You are never guaranteed any order in a result set unless you use
> > ORDER BY in the query.
>
> I cannot use order by since postgres must generate new values for id column.
> For this case, id column must not exist in insertable table.
>
> > Because PG treats UPDATE as DELETE + INSERT,
> > the table ordering changes all the time.
>
> This is excellent explanation! Thank you.
>
> I changed by code so that clustering is performed after UPDATE command:
>
> CREATE temp TABLE tempreport AS
>       SELECT * FROM report
>       WHERE reportname='oldr';
>
> UPDATE tempreport SET reportname='newr';
> CREATE TEMP TABLE t2 AS SELECT * FROM tempreport ORDER BY id;
>
> ALTER TABLE t2 DROP COLUMN id;
> insert into report SELECT * FROM t2;
>
> Will DROP COLUMN preserve table clustering ?
>
> Is it reasonable to  expect that clustered table is inserted in pyhical
> order ?
> Is it OK to use this code ?
>
> Andrus.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
------- End of Original Message -------


Re: Clustered table order is not preserved on insert

From
Richard Huxton
Date:
Andrus wrote:
>>> I want to duplicate report so that id order is preserved.
>> Tables aren't ordered by definition.
>
>From CLUSTER docs:
>
> "When a table is clustered, it is physically reordered based on the index
> information. "

And what has that got to do with the order of SELECTs or INSERTs?
Physical storage is an optimisation issue not a logical/functional one.

>> If you want to get results back in a particular order use ORDER BY,
>> possibly wrapped in a view.
>
> Using wrapper view would be excellent idea!  Thank you.
> However, I have some hundred of columns in report table.
> It is very tedious to list all those columns in view definition.
>
> How to create a wrapper view so that it returns all columns except id column
> without listing all columns in SELECT clause ? How to implement EXCEPT
> COLUMNS clause like:

Sorry, you can't at present. You only need to do it the once though.

> CREATE TEMP VIEW reportwrapper AS
> SELECT  * EXCEPT COLUMNS (id, reportname),
>    'newr' as reportname
> FROM reports
> WHERE reportname='oldr'
> ORDER BY id
>
> INSERT INTO reports SELECT * FROM reportwrapper;

Well, I wouldn't have a temporary view either - just create a normal
view. I must admit I still can't see *why* you are duplicating all your
reports and what it means to do that.

>> If you really want to do this, then try something like
>>
>> -- Don't drop the id column
>> UPDATE tempreport SET ...;
>> INSERT INTO report SELECT reportname,... FROM tempreport ORDER BY id;
>> DROP TABLE tempreport;
>
> Since id is primary key field, I got duplicate primary key error when it is
> not dropped!
>
> How to preserve order ant let postgres to generate primary keys without
> adding extra order field?
>
>> Although the solution I describe should work it's still not a good idea.
>> The reason you are having this problem is that you are trying to do two
>> things with one column. You are using "id" as a unique ID number and also
>> as a sort order. If you have a separate sort_order this will let you
>> duplicate reports as you desire and also allow you to re-arrange reports
>> without changing their IDs.
>
> Thank you. This is good explanation.
> However, this ill-designed structure is used in a lot of different sites and
> now it suddenly stops working.

It was working by luck before. No RDBMS guarantees ordering without
SELECT. Some might pull data out by ID number by chance, but that's all
it is.

> Table stucture change requires re-writing parts of code, testing, debugging
> and creating conversion routines from previous table version.

Wrap the base-table in a view that conceals the id and sortorder. Access
reports through that view. Your application doesn't need to know
anything has changed.

> So I'll prefer some other solution if possible.
>
>> Can I recommend getting a book or two on relational theory - "An
>> Introduction to Database Systems" by Date is widely available.
>
> Is it possible to read this form internet?

No. In the UK you can order any in-print book at your local library for
£1 or so - not sure about your corner of the world though.

> I have read Joel  Celkos book "SQL Programming Style".
> Among other things Joel wrote  that every table must have only natural
> primary keys. No surrogates, no ids.

Nice to have natural keys, but not always possible. You also need to be
careful not to rely on "promises" from other systems that they are
providing you with good, guaranteed unique keys. Sometimes they can get
it wrong and then you are stuck.

> There was no discussion about this in Joel's  book

You have two concepts in one column here:
1. Each report has a unique code (a number in this case)
2. Each report has a sort ordering.

If you have two reports (1,repA), (2,repB) and want to add a third
between them then with your system you need to change the ID of
(2,repB). Why? The report is the same, so why has it's ID changed?

However, if you have a separate sort_num column then you just need to
update that, perhaps not even that if you leave gaps. For example: (1,
repA, 10), (2, repB, 20) and we add (3, repC, 15).

As I said, rename the base-table handle the sorting with a view and your
application doesn't need to know anything has changed.

--
   Richard Huxton
   Archonet Ltd


Re: Clustered table order is not preserved on insert

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Andrus
> Sent: Wednesday, April 26, 2006 1:05 PM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Clustered table order is not preserved on
insert
>
> >> I want to duplicate report so that id order is preserved.
> >
> > Tables aren't ordered by definition.
>
> From CLUSTER docs:
>
> "When a table is clustered, it is physically reordered based on the
index
> information. "

The next sentence after that is rather important.
From:
http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html
We have this:
"When a table is clustered, it is physically reordered based on the
index information. Clustering is a one-time operation: when the table is
subsequently updated, the changes are not clustered. That is, no attempt
is made to store new or updated rows according to their index order. If
one wishes, one can periodically recluster by issuing the command
again."

This is typical behavior for other database systems as well.

The only way to be totally sure that rows come back in order is to add
an ORDER BY clause to the SQL query.

There is no additional overhead to adding the command.  The data is
clustered already (in which case there are no unexpected page splits) or
it isn't.  If it isn't you need to follow the page split and get the
right data.  If it is clustered, you will just read pages serially all
the way to the end.

I don't know if it is still true, but you used to have to execute
"UPDATE STATISTICS" on a SQL*Server table if you wanted to be sure that
the pages were physically ordered along the unique clustered index
correctly (I guess that it is still true, but I have not checked the
facts).  So it is normal behavior for a database to act in this way.
When you think about it, really there is no way around it.  If you
reorganize the whole table every time there is a page split, the whole
performance objective of clustering would become a disaster instead of a
benefit.

> > If you want to get results back in a particular order use ORDER BY,
> > possibly wrapped in a view.
>
> Using wrapper view would be excellent idea!  Thank you.
> However, I have some hundred of columns in report table.
> It is very tedious to list all those columns in view definition.
>
> How to create a wrapper view so that it returns all columns except id
> column
> without listing all columns in SELECT clause ? How to implement EXCEPT
> COLUMNS clause like:
>
> CREATE TEMP VIEW reportwrapper AS
> SELECT  * EXCEPT COLUMNS (id, reportname),
>    'newr' as reportname
> FROM reports
> WHERE reportname='oldr'
> ORDER BY id
>
> INSERT INTO reports SELECT * FROM reportwrapper;
>
>
> > If you really want to do this, then try something like
> >
> > -- Don't drop the id column
> > UPDATE tempreport SET ...;
> > INSERT INTO report SELECT reportname,... FROM tempreport ORDER BY
id;
> > DROP TABLE tempreport;
>
> Since id is primary key field, I got duplicate primary key error when
it
> is
> not dropped!
>
> How to preserve order ant let postgres to generate primary keys
without
> adding extra order field?
>
> > Although the solution I describe should work it's still not a good
idea.
> > The reason you are having this problem is that you are trying to do
two
> > things with one column. You are using "id" as a unique ID number and
> also
> > as a sort order. If you have a separate sort_order this will let you
> > duplicate reports as you desire and also allow you to re-arrange
reports
> > without changing their IDs.
>
> Thank you. This is good explanation.
> However, this ill-designed structure is used in a lot of different
sites
> and
> now it suddenly stops working.
> Table stucture change requires re-writing parts of code, testing,
> debugging
> and creating conversion routines from previous table version.
>
> So I'll prefer some other solution if possible.
>
> > Can I recommend getting a book or two on relational theory - "An
> > Introduction to Database Systems" by Date is widely available.
>
> Is it possible to read this form internet?
> I have read Joel  Celkos book "SQL Programming Style".
> Among other things Joel wrote  that every table must have only natural
> primary keys. No surrogates, no ids.
> There was no discussion about this in Joel's  book
>
> Andrus.
>
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: Clustered table order is not preserved on insert

From
"Jim C. Nasby"
Date:
On Wed, Apr 26, 2006 at 11:05:09PM +0300, Andrus wrote:
> >> I want to duplicate report so that id order is preserved.
> >
> > Tables aren't ordered by definition.
>
> From CLUSTER docs:
>
> "When a table is clustered, it is physically reordered based on the index
> information. "

Keep reading... from the same paragraph:

Clustering is a one-time operation: when the table is subsequently
updated, the changes are not clustered. That is, no attempt is made to
store new or updated rows according to their index order. If one wishes,
one can periodically recluster by issuing the command again.

There is absolutely no way to enforce any kind of table ordering in
PostgreSQL as soon as you do *anything* that changes the table.

And like others said, I think you need to re-think how you're doing
this... :)
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Clustered table order is not preserved on insert

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
In article <20060427001523.GR97354@pervasive.com>,
Jim C. Nasby <jnasby@pervasive.com> wrote:

% Keep reading... from the same paragraph:
%
% Clustering is a one-time operation: when the table is subsequently
% updated, the changes are not clustered.

But this isn't really relevant to the question. More to the point
is this tidbit from the documentation for SELECT:

  If the ORDER BY clause is specified, the returned rows are
  sorted in the specified order. If ORDER BY is not given, the
  rows are returned in whatever order the system finds fastest
  to produce.

This is not necessarily the order in which they're stored on disk.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com