Thread: Cluster table and order information
Hello, From the documentation of CLUSTER table (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I understand that clustering can be achieved by re-creating the table like this: CREATE TABLE newtable AS SELECT * FROM table ORDER BY columnlist; My question is: If I upload with COPY a datafile which is already correctly sorted, can I inform postgres of such order, so that no clustering is necessary after the import? In other words, how can I tell postgres that my file is order by this and that column? Many thanks! Dario -- Dr. Dario Beraldi Institute of Evolutionary Biology University of Edinburgh West Mains Road Edinburgh EH9 3JT Scotland, UK -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On 2/1/2011 9:08 AM, Dario Beraldi wrote: > Hello, > > From the documentation of CLUSTER table > (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I > understand that clustering can be achieved by re-creating the table like > this: > > CREATE TABLE newtable AS > SELECT * FROM table ORDER BY columnlist; > > My question is: If I upload with COPY a datafile which is already > correctly sorted, can I inform postgres of such order, so that no > clustering is necessary after the import? In other words, how can I tell > postgres that my file is order by this and that column? > > Many thanks! > > Dario > The planner has no knowledge of cluster. Meaning PG will query a clustered and unclustered table exactly the same way. A table is not marked or anything as clustered. And in fact, during usage of a table it'll become unclustered. Clustering is only useful when you are going to read multiple records in the same order as an index. It turns "more random seeks" into "more sequential reads". If your COPY loads data in indexed order, then just dont run the cluster. -Andy
On 2/1/2011 10:17 AM, Dario Beraldi wrote: > Quoting Andy Colson <andy@squeakycode.net>: > >> On 2/1/2011 9:08 AM, Dario Beraldi wrote: >>> Hello, >>> >>> From the documentation of CLUSTER table >>> (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I >>> understand that clustering can be achieved by re-creating the table like >>> this: >>> >>> CREATE TABLE newtable AS >>> SELECT * FROM table ORDER BY columnlist; >>> >>> My question is: If I upload with COPY a datafile which is already >>> correctly sorted, can I inform postgres of such order, so that no >>> clustering is necessary after the import? In other words, how can I tell >>> postgres that my file is order by this and that column? >>> >>> Many thanks! >>> >>> Dario >>> >> >> The planner has no knowledge of cluster. Meaning PG will query a >> clustered and unclustered table exactly the same way. A table is not >> marked or anything as clustered. And in fact, during usage of a table >> it'll become unclustered. >> >> Clustering is only useful when you are going to read multiple records >> in the same order as an index. It turns "more random seeks" into "more >> sequential reads". >> >> If your COPY loads data in indexed order, then just dont run the cluster. >> >> -Andy >> > Thanks very much Andy, this clarifies my doubts. > > I was misled by the docs saying "When a table is clustered, PostgreSQL > remembers which index it was clustered by" which made me think that the > order information is stored somewhere. > > All the best > Dario > The next sentience clears it up: The form "CLUSTER table_name" reclusters the table using the same index as before. -Andy
Quoting Andy Colson <andy@squeakycode.net>: > On 2/1/2011 9:08 AM, Dario Beraldi wrote: >> Hello, >> >> From the documentation of CLUSTER table >> (http://www.postgresql.org/docs/8.4/static/sql-cluster.html) I >> understand that clustering can be achieved by re-creating the table like >> this: >> >> CREATE TABLE newtable AS >> SELECT * FROM table ORDER BY columnlist; >> >> My question is: If I upload with COPY a datafile which is already >> correctly sorted, can I inform postgres of such order, so that no >> clustering is necessary after the import? In other words, how can I tell >> postgres that my file is order by this and that column? >> >> Many thanks! >> >> Dario >> > > The planner has no knowledge of cluster. Meaning PG will query a > clustered and unclustered table exactly the same way. A table is > not marked or anything as clustered. And in fact, during usage of a > table it'll become unclustered. > > Clustering is only useful when you are going to read multiple > records in the same order as an index. It turns "more random > seeks" into "more sequential reads". > > If your COPY loads data in indexed order, then just dont run the cluster. > > -Andy > Thanks very much Andy, this clarifies my doubts. I was misled by the docs saying "When a table is clustered, PostgreSQL remembers which index it was clustered by" which made me think that the order information is stored somewhere. All the best Dario -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.