Thread: planner and having clausule

planner and having clausule

From
Ľubomír Varga
Date:
Hi. I have one query which possibly is not optimized by planner (not using index for aggregate having clause
restriction):

explain
SELECT stocktaking_id
FROM t_weighting
GROUP BY stocktaking_id
HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';

with result:
"HashAggregate  (cost=59782.43..59787.39 rows=248 width=32)"
"  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp without time zone) AND (min(stat_item_start) <=
'2013-09-0100:00:00'::timestamp without time zone))" 
"  ->  Seq Scan on t_weighting  (cost=0.00..49002.39 rows=1437339 width=32)"

I have probably an obvious tough, that query will touch only rows with stat_item_start values only within given
constrainsin having clause. If (and only if) planner have some info that MIN and MAX aggregate functions could return
onlyone of values that comes into them, it can search only rows within given constraints in having part of select.
Somethinglike this: 


explain
SELECT stocktaking_id
FROM t_weighting
--added restriction by hand:
WHERE stat_item_start BETWEEN '2013-08-01' AND '2013-09-01'
GROUP BY stocktaking_id
HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';

with result:
"HashAggregate  (cost=8.45..8.47 rows=1 width=32)"
"  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp without time zone) AND (min(stat_item_start) <=
'2013-09-0100:00:00'::timestamp without time zone))" 
"  ->  Index Scan using idx_t_weighting_stat_item_start on t_weighting  (cost=0.00..8.44 rows=1 width=32)"
"        Index Cond: ((stat_item_start >= '2013-08-01 00:00:00'::timestamp without time zone) AND (stat_item_start <=
'2013-09-0100:00:00'::timestamp without time zone))" 

Is this optimization by planner possible, or it is already have been done on newer DB version (I am using PostgreSQL
8.4.13)?IMHO it should be added into planner if possible for all built in aggregate functions. 

Best regards,
--
Ing. Ľubomír Varga
+421 (0)908 541 700
varga@plaintext.sk
www.plaintext.sk


Re: planner and having clausule

From
Ľubomír Varga
Date:
Hi again, my mistake. I have found why there are not this optimization (thus I have found other one, correct, see
bellow).I can have for example: 

stocktaking_id   |   stat_item_start
------------------------------------
abc              |   2013-01-01
abc              |   2013-08-08

And when applied my "optimization", it will return me abc (minimum for abc is 2013-01-01 and it does not conform having
restriction,but I have applied where restriction to date which broke my result...) 

Proper optimization should be:

explain
SELECT stocktaking_id
FROM t_weighting
--proper optimization restriction
WHERE stocktaking_id IN (SELECT DISTINCT stocktaking_id FROM t_weighting WHERE stat_item_start BETWEEN '2013-08-01' AND
'2013-09-01')
GROUP BY stocktaking_id
HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';

with result:
"HashAggregate  (cost=15485.12..15490.08 rows=248 width=32)"
"  Filter: ((min(public.t_weighting.stat_item_start) >= '2013-08-01 00:00:00'::timestamp without time zone) AND
(min(public.t_weighting.stat_item_start)<= '2013-09-01 00:00:00'::timestamp without time zone))" 
"  ->  Nested Loop  (cost=222.05..15441.65 rows=5796 width=32)"
"        ->  HashAggregate  (cost=8.47..8.48 rows=1 width=32)"
"              ->  Subquery Scan "ANY_subquery"  (cost=8.45..8.47 rows=1 width=32)"
"                    ->  HashAggregate  (cost=8.45..8.46 rows=1 width=24)"
"                          ->  Index Scan using idx_t_weighting_stat_item_start on t_weighting  (cost=0.00..8.44 rows=1
width=24)"
"                                Index Cond: ((stat_item_start >= '2013-08-01 00:00:00'::timestamp without time zone)
AND(stat_item_start <= '2013-09-01 00:00:00'::timestamp without time zone))" 
"        ->  Bitmap Heap Scan on t_weighting  (cost=213.58..15360.73 rows=5796 width=32)"
"              Recheck Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"
"              ->  Bitmap Index Scan on idx_t_weighting_stocktaking_id_user_id  (cost=0.00..212.13 rows=5796 width=0)"
"                    Index Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"


This will be probably a little bit harder to use in planner in general manner.

Best regards,
--
Ing. Ľubomír Varga
+421 (0)908 541 700
varga@plaintext.sk
www.plaintext.sk

----- "Ľubomír Varga" <varga@plaintext.sk> wrote:

> Hi. I have one query which possibly is not optimized by planner (not
> using index for aggregate having clause restriction):
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate  (cost=59782.43..59787.39 rows=248 width=32)"
> "  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> "  ->  Seq Scan on t_weighting  (cost=0.00..49002.39 rows=1437339
> width=32)"
>
> I have probably an obvious tough, that query will touch only rows with
> stat_item_start values only within given constrains in having clause.
> If (and only if) planner have some info that MIN and MAX aggregate
> functions could return only one of values that comes into them, it can
> search only rows within given constraints in having part of select.
> Something like this:
>
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> --added restriction by hand:
> WHERE stat_item_start BETWEEN '2013-08-01' AND '2013-09-01'
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate  (cost=8.45..8.47 rows=1 width=32)"
> "  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> "  ->  Index Scan using idx_t_weighting_stat_item_start on t_weighting
>  (cost=0.00..8.44 rows=1 width=32)"
> "        Index Cond: ((stat_item_start >= '2013-08-01
> 00:00:00'::timestamp without time zone) AND (stat_item_start <=
> '2013-09-01 00:00:00'::timestamp without time zone))"
>
> Is this optimization by planner possible, or it is already have been
> done on newer DB version (I am using PostgreSQL 8.4.13)? IMHO it
> should be added into planner if possible for all built in aggregate
> functions.
>
> Best regards,
> --
> Ing. Ľubomír Varga
> +421 (0)908 541 700
> varga@plaintext.sk
> www.plaintext.sk