Re: Planner estimates and cast operations ,... - Mailing list pgsql-hackers

From Hans-Juergen Schoenig
Subject Re: Planner estimates and cast operations ,...
Date
Msg-id 14AF07E7-CEFC-4BC3-96C5-CFC394D7DE87@cybertec.at
Whole thread Raw
In response to Re: Planner estimates and cast operations ,...  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Planner estimates and cast operations ,...  (Bruno Wolff III <bruno@wolff.to>)
List pgsql-hackers
hi tom ...

i thought about creating an index on the expression but the problem  
is that this is hardly feasable.
in 8.0 (what i have here) this would block the table and i would run  
out of disk space as well. this is a 600 gb biest :(

what about the planner approach?
this would solve the problem for some other issues as well. an index  
might not be flexible enough :(.
many thanks,
    hans


On Sep 4, 2006, at 4:57 PM, Tom Lane wrote:

> Hans-Juergen Schoenig <postgres@cybertec.at> writes:
>> consider the following:
>
>>     SELECT some_timestamp::date FROM very_large_table GROUP BY
>> some_timestamp::date
>
>> my very_large_table is around 1billion entries.
>> the problem is: the planner has a problem here as it is taking the
>> (correct) estimates for timestamp. this avoids a HashAggregate
>> because the dataset seems to large for work_mem.
>> what the planner cannot know is that the number of days is quite
>> limited (in my case around 1000 different values).
>> i wonder how to teach the planner to take the cast into  
>> consideration.
>
> Create an index on that expression.
>
> regression=# create table foo(x) as select x * '864 sec'::interval  
> + now()::timestamp from generate_series(1,10000) x;
> SELECT
> regression=# analyze foo;
> ANALYZE
> regression=# explain select x::date from foo group by x::date;
>                           QUERY PLAN
> ---------------------------------------------------------------
>  HashAggregate  (cost=205.00..330.00 rows=10000 width=8)
>    ->  Seq Scan on foo  (cost=0.00..180.00 rows=10000 width=8)
> (2 rows)
>
> regression=# create index fooi on foo((x::date));
> CREATE INDEX
> regression=# analyze foo;
> ANALYZE
> regression=# explain select x::date from foo group by x::date;
>                           QUERY PLAN
> ---------------------------------------------------------------
>  HashAggregate  (cost=205.00..206.26 rows=101 width=8)
>    ->  Seq Scan on foo  (cost=0.00..180.00 rows=10000 width=8)
> (2 rows)
>
> regression=#
>
> I had to cheat a little bit here: I tried to do this example with a
> timestamptz column, and the index creation failed because  
> timestamptz to
> date isn't immutable (it depends on TimeZone).  If yours is too, you
> could perhaps do something involving AT TIME ZONE to generate an
> immutable conversion to date.
>
> It would perhaps make sense to provide a way to cue ANALYZE to compute
> stats on expressions that aren't actually being indexed, but I see no
> good reason to limit our attention to cast expressions.
>
>             regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Getting a move on for 8.2 beta
Next
From: Tom Lane
Date:
Subject: Re: [PATCHES] Contrib module to examine client