Re: planner and having clausule - Mailing list pgsql-performance

From Ľubomír Varga
Subject Re: planner and having clausule
Date
Msg-id 128353624.931378457060911.JavaMail.root@shiva
Whole thread Raw
In response to planner and having clausule  (Ľubomír Varga <varga@plaintext.sk>)
List pgsql-performance
Hi again, my mistake. I have found why there are not this optimization (thus I have found other one, correct, see
bellow).I can have for example: 

stocktaking_id   |   stat_item_start
------------------------------------
abc              |   2013-01-01
abc              |   2013-08-08

And when applied my "optimization", it will return me abc (minimum for abc is 2013-01-01 and it does not conform having
restriction,but I have applied where restriction to date which broke my result...) 

Proper optimization should be:

explain
SELECT stocktaking_id
FROM t_weighting
--proper optimization restriction
WHERE stocktaking_id IN (SELECT DISTINCT stocktaking_id FROM t_weighting WHERE stat_item_start BETWEEN '2013-08-01' AND
'2013-09-01')
GROUP BY stocktaking_id
HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';

with result:
"HashAggregate  (cost=15485.12..15490.08 rows=248 width=32)"
"  Filter: ((min(public.t_weighting.stat_item_start) >= '2013-08-01 00:00:00'::timestamp without time zone) AND
(min(public.t_weighting.stat_item_start)<= '2013-09-01 00:00:00'::timestamp without time zone))" 
"  ->  Nested Loop  (cost=222.05..15441.65 rows=5796 width=32)"
"        ->  HashAggregate  (cost=8.47..8.48 rows=1 width=32)"
"              ->  Subquery Scan "ANY_subquery"  (cost=8.45..8.47 rows=1 width=32)"
"                    ->  HashAggregate  (cost=8.45..8.46 rows=1 width=24)"
"                          ->  Index Scan using idx_t_weighting_stat_item_start on t_weighting  (cost=0.00..8.44 rows=1
width=24)"
"                                Index Cond: ((stat_item_start >= '2013-08-01 00:00:00'::timestamp without time zone)
AND(stat_item_start <= '2013-09-01 00:00:00'::timestamp without time zone))" 
"        ->  Bitmap Heap Scan on t_weighting  (cost=213.58..15360.73 rows=5796 width=32)"
"              Recheck Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"
"              ->  Bitmap Index Scan on idx_t_weighting_stocktaking_id_user_id  (cost=0.00..212.13 rows=5796 width=0)"
"                    Index Cond: ((public.t_weighting.stocktaking_id)::text = ("ANY_subquery".stocktaking_id)::text)"


This will be probably a little bit harder to use in planner in general manner.

Best regards,
--
Ing. Ľubomír Varga
+421 (0)908 541 700
varga@plaintext.sk
www.plaintext.sk

----- "Ľubomír Varga" <varga@plaintext.sk> wrote:

> Hi. I have one query which possibly is not optimized by planner (not
> using index for aggregate having clause restriction):
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate  (cost=59782.43..59787.39 rows=248 width=32)"
> "  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> "  ->  Seq Scan on t_weighting  (cost=0.00..49002.39 rows=1437339
> width=32)"
>
> I have probably an obvious tough, that query will touch only rows with
> stat_item_start values only within given constrains in having clause.
> If (and only if) planner have some info that MIN and MAX aggregate
> functions could return only one of values that comes into them, it can
> search only rows within given constraints in having part of select.
> Something like this:
>
>
> explain
> SELECT stocktaking_id
> FROM t_weighting
> --added restriction by hand:
> WHERE stat_item_start BETWEEN '2013-08-01' AND '2013-09-01'
> GROUP BY stocktaking_id
> HAVING MIN(stat_item_start) BETWEEN '2013-08-01' AND '2013-09-01';
>
> with result:
> "HashAggregate  (cost=8.45..8.47 rows=1 width=32)"
> "  Filter: ((min(stat_item_start) >= '2013-08-01 00:00:00'::timestamp
> without time zone) AND (min(stat_item_start) <= '2013-09-01
> 00:00:00'::timestamp without time zone))"
> "  ->  Index Scan using idx_t_weighting_stat_item_start on t_weighting
>  (cost=0.00..8.44 rows=1 width=32)"
> "        Index Cond: ((stat_item_start >= '2013-08-01
> 00:00:00'::timestamp without time zone) AND (stat_item_start <=
> '2013-09-01 00:00:00'::timestamp without time zone))"
>
> Is this optimization by planner possible, or it is already have been
> done on newer DB version (I am using PostgreSQL 8.4.13)? IMHO it
> should be added into planner if possible for all built in aggregate
> functions.
>
> Best regards,
> --
> Ing. Ľubomír Varga
> +421 (0)908 541 700
> varga@plaintext.sk
> www.plaintext.sk


pgsql-performance by date:

Previous
From: Ľubomír Varga
Date:
Subject: planner and having clausule
Next
From: Roberto Grandi
Date:
Subject: RESTORE multiple DBs concurrently