Thread: statistics about tamp tables ...
Recently I have come across a simple issue which made me think about it. When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner won't know anything about its content after creating it. Many people use temp tables heavy when the amount of data for a certain analysis has to be reduced significantly. Frequently the same tmp table is queried quite frequently. In order to speed those scenarios up it can be useful to vacuum those tmp tables so that the planner will find more clever joins. Is it possible and does it make sense to generate those statistics on the fly (during CREATE TABLE AS)? Maybe we could have a GUC which tells the system whether to generate statistics or not. test=# select * from test; id ---- 4 4 (2 rows) test=# VACUUM test ; VACUUM test=# explain select * from test ; QUERY PLAN ---------------------------------------------------- Seq Scan on test (cost=0.00..1.02 rows=2 width=4) (1 row) test=# select * into tmp from test; SELECT test=# explain select * from tmp; QUERY PLAN ------------------------------------------------------- Seq Scan on tmp (cost=0.00..20.00 rows=1000 width=4) (1 row) Best regards, Hans -- Cybertec Geschwinde u Schoenig Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/2952/30706 or +43/660/816 40 77 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > Recently I have come across a simple issue which made me think about it. > When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner > won't know anything about its content after creating it. Run ANALYZE on the temp table, if you intend to use it enough to justify gathering stats about it. VACUUM is more work than needed. regards, tom lane
Tom Lane wrote: > Hans-Jürgen Schönig <postgres@cybertec.at> writes: > >>Recently I have come across a simple issue which made me think about it. >>When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner >>won't know anything about its content after creating it. > > > Run ANALYZE on the temp table, if you intend to use it enough to justify > gathering stats about it. VACUUM is more work than needed. > > regards, tom lane Of course, VACUUM is on overkill (there is no use to shrink something minimal ;) ). The reason why I came up with this posting is slightly different: Assume a JDBC application which works with PostgreSQL + some other database. If you want to use both databases without PostgreSQL being unnecessarily slow an implicit mechanism would be better. Because otherwise you will have an SQL command in there which is off standard - putting a switch into the application seems to be a fairly ugly solution. regards, Hans -- Cybertec Geschwinde u Schoenig Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/2952/30706 or +43/660/816 40 77 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-Jürgen Schönig wrote: > The reason why I came up with this posting is slightly different: Assume > a JDBC application which works with PostgreSQL + some other database. If > you want to use both databases without PostgreSQL being unnecessarily > slow an implicit mechanism would be better. Because otherwise you will > have an SQL command in there which is off standard - putting a switch > into the application seems to be a fairly ugly solution. That's why you delegate the job to something else, like pg_autovacuum or cron ... -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
Alvaro Herrera wrote: > On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-Jürgen Schönig wrote: > > >>The reason why I came up with this posting is slightly different: Assume >>a JDBC application which works with PostgreSQL + some other database. If >>you want to use both databases without PostgreSQL being unnecessarily >>slow an implicit mechanism would be better. Because otherwise you will >>have an SQL command in there which is off standard - putting a switch >>into the application seems to be a fairly ugly solution. > > > That's why you delegate the job to something else, like pg_autovacuum or > cron ... > If you are in the middle of a data mining application using a tmp table you don't want to wait for cron ;). You might want the statistics to be correct as soon as the table has been created. Regards, Hans -- Cybertec Geschwinde u Schoenig Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/2952/30706 or +43/664/233 90 75 www.cybertec.at, www.postgresql.at, kernel.cybertec.at