Thread: statistics about tamp tables ...

statistics about tamp tables ...

From
Hans-Jürgen Schönig
Date:
Recently I have come across a simple issue which made me think about it.
When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner 
won't know anything about its content after creating it.
Many people use temp tables heavy when the amount of data for a certain 
analysis has to be reduced significantly. Frequently the same tmp table 
is queried quite frequently. In order to speed those scenarios up it can 
be useful to vacuum those tmp tables so that the planner will find more 
clever joins.
Is it possible and does it make sense to generate those statistics on 
the fly (during CREATE TABLE AS)? Maybe we could have a GUC which tells 
the system whether to generate statistics or not.

test=# select * from test; id
----  4  4
(2 rows)

test=# VACUUM test ;
VACUUM

test=# explain select * from test ;                     QUERY PLAN
---------------------------------------------------- Seq Scan on test  (cost=0.00..1.02 rows=2 width=4)
(1 row)



test=# select * into tmp from test;
SELECT
test=# explain select * from tmp;                      QUERY PLAN
------------------------------------------------------- Seq Scan on tmp  (cost=0.00..20.00 rows=1000 width=4)
(1 row)

Best regards,        Hans

-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: statistics about tamp tables ...

From
Tom Lane
Date:
Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> Recently I have come across a simple issue which made me think about it.
> When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner 
> won't know anything about its content after creating it.

Run ANALYZE on the temp table, if you intend to use it enough to justify
gathering stats about it.  VACUUM is more work than needed.
        regards, tom lane


Re: statistics about tamp tables ...

From
Hans-Jürgen Schönig
Date:
Tom Lane wrote:
> Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> 
>>Recently I have come across a simple issue which made me think about it.
>>When we create a tmp table (SELECT INTO, CREATE TABLE AS) the planner 
>>won't know anything about its content after creating it.
> 
> 
> Run ANALYZE on the temp table, if you intend to use it enough to justify
> gathering stats about it.  VACUUM is more work than needed.
> 
>             regards, tom lane

Of course, VACUUM is on overkill (there is no use to shrink something 
minimal ;) ).
The reason why I came up with this posting is slightly different: Assume 
a JDBC application which works with PostgreSQL + some other database. If 
you want to use both databases without PostgreSQL being unnecessarily 
slow an implicit mechanism would be better. Because otherwise you will 
have an SQL command in there which is off standard - putting a switch 
into the application seems to be a fairly ugly solution.
regards,
    Hans

-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at




Re: statistics about tamp tables ...

From
Alvaro Herrera
Date:
On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-Jürgen Schönig wrote:

> The reason why I came up with this posting is slightly different: Assume 
> a JDBC application which works with PostgreSQL + some other database. If 
> you want to use both databases without PostgreSQL being unnecessarily 
> slow an implicit mechanism would be better. Because otherwise you will 
> have an SQL command in there which is off standard - putting a switch 
> into the application seems to be a fairly ugly solution.

That's why you delegate the job to something else, like pg_autovacuum or
cron ...

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)


Re: statistics about tamp tables ...

From
Hans-Jürgen Schönig
Date:
Alvaro Herrera wrote:
> On Wed, Nov 26, 2003 at 05:34:28PM +0100, Hans-Jürgen Schönig wrote:
> 
> 
>>The reason why I came up with this posting is slightly different: Assume 
>>a JDBC application which works with PostgreSQL + some other database. If 
>>you want to use both databases without PostgreSQL being unnecessarily 
>>slow an implicit mechanism would be better. Because otherwise you will 
>>have an SQL command in there which is off standard - putting a switch 
>>into the application seems to be a fairly ugly solution.
> 
> 
> That's why you delegate the job to something else, like pg_autovacuum or
> cron ...
> 


If you are in the middle of a data mining application using a tmp table 
you don't want to wait for cron ;). You might want the statistics to be 
correct as soon as the table has been created.
Regards,    Hans


-- 
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at