Planner estimates and cast operations ,... - Mailing list pgsql-hackers

From Hans-Juergen Schoenig
Subject Planner estimates and cast operations ,...
Date
Msg-id 8B9AC7D9-407E-4B72-90E0-E129AF370DD9@cybertec.at
Whole thread Raw
Responses Re: Planner estimates and cast operations ,...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
i am looking at some corner case which might also cause troubles for  
other people.
consider the following:
SELECT some_timestamp::date FROM very_large_table GROUP BY  
some_timestamp::date

my very_large_table is around 1billion entries.
the problem is: the planner has a problem here as it is taking the  
(correct) estimates for timestamp. this avoids a HashAggregate  
because the dataset seems to large for work_mem.
what the planner cannot know is that the number of days is quite  
limited (in my case around 1000 different values).
i wonder how to teach the planner to take the cast into consideration.

at the moment the planner uses the per column statistics - it cannot  
know that the cast might change the number of different values.
how about the following?

Command:     CREATE CAST
Description: define a new cast
Syntax:
CREATE CAST (sourcetype AS targettype)    [USING SELECTIVITY number | funcname(argtypes)]    WITH FUNCTION funcname
(argtypes)   [ AS ASSIGNMENT | AS IMPLICIT ]
 

if it was possible to assign a constant or some function to the cast  
i think we could make the example used above work. by default no  
costs are changed. if somebody is doing some fancy query it would be  
possible to tweak GOUOP BY planning by assigning some cleverly  
written function or a constant to the scenery.

a constant would be useful in terms of casts to boolean or so.

does anybody have an idea which could help solving this issue?
best regards,
    hans



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: @ versus ~, redux
Next
From: Tom Lane
Date:
Subject: Re: @ versus ~, redux