Thread: performance question

performance question

From
"Oliveiros Cristina"
Date:
Hi, All
 
 
 
Suppose you have a query like this
 

SELECT *
FROM t_sitesresumebydate a
JOIN t_triple_association  c
ON "IDSiteResume" = "IDResume"
WHERE "dtDate" BETWEEN '2009-2-1'
AND '2009-2-3'
AND "IDHitsAccount" = 378284
 
 
With this time interval it completes in less than a second.
If I add one day to the condition, i.e., if it becomes BETWEEN '2009-2-1' AND '2009-2-4'
the performance is radically different, it jumps to a staggering 424 seconds. and the number of records returned is just 117 (against 79, by the former condition)
 
Frankly, I cannot understand the reason for this, it seems the planner is taking radically diferent plans on the two conditions (they are below).
I have an index on all the fields used in the query.
 
Can anyone help me in fixing this, please?
 
Thanks in advance for your kind help
 
Best,
Oliveiros
 
 
"Hash Join  (cost=46644.50..751271.16 rows=117 width=60) (actual time=15821.110..424120.924 rows=247 loops=1)"
"  Hash Cond: (c."IDResume" = a."IDSiteResume")"
"  ->  Seq Scan on t_triple_association c  (cost=0.00..554934.99 rows=29938099 width=32) (actual time=38.253..392251.754 rows=30101626 loops=1)"
"  ->  Hash  (cost=46644.30..46644.30 rows=82 width=28) (actual time=2711.356..2711.356 rows=23 loops=1)"
"        ->  Index Scan using "fki_FGK_SITERESUME_ACCOUNT" on t_sitesresumebydate a  (cost=0.00..46644.30 rows=82 width=28) (actual time=881.146..2711.303 rows=23 loops=1)"
"              Index Cond: ("IDHitsAccount" = 378284)"
"              Filter: (("dtDate" >= '2009-02-01'::date) AND ("dtDate" <= '2009-02-04'::date))"
"Total runtime: 424121.180 ms"
 
"Nested Loop  (cost=108.43..745296.34 rows=79 width=60) (actual time=44.283..311.942 rows=185 loops=1)"
"  ->  Index Scan using "fki_FGK_SITERESUME_ACCOUNT" on t_sitesresumebydate a  (cost=0.00..46644.30 rows=55 width=28) (actual time=5.825..23.828 rows=17 loops=1)"
"        Index Cond: ("IDHitsAccount" = 378284)"
"        Filter: (("dtDate" >= '2009-02-01'::date) AND ("dtDate" <= '2009-02-03'::date))"
"  ->  Bitmap Heap Scan on t_triple_association c  (cost=108.43..12658.83 rows=3515 width=32) (actual time=16.902..16.910 rows=11 loops=17)"
"        Recheck Cond: (a."IDSiteResume" = c."IDResume")"
"        ->  Bitmap Index Scan on "index"  (cost=0.00..108.43 rows=3515 width=0) (actual time=14.466..14.466 rows=11 loops=17)"
"              Index Cond: (a."IDSiteResume" = c."IDResume")"
"Total runtime: 312.192 ms"

Re: performance question

From
Tom Lane
Date:
"Oliveiros Cristina" <oliveiros.cristina@marktest.pt> writes:
> Frankly, I cannot understand the reason for this, it seems the planner is taking radically diferent plans on the two
conditions(they are below).
 

Yup, and you seem to be near the crossover point where it thinks they
have equal cost.  You need to be fixing the inaccurate cost estimates.
The most obvious problem is the bad rowcount estimate here:

> "        ->  Bitmap Index Scan on "index"  (cost=0.00..108.43 rows=3515 width=0) (actual time=14.466..14.466 rows=11
loops=17)"
> "              Index Cond: (a."IDSiteResume" = c."IDResume")"

Perhaps increasing the statistics targets for one or both tables would
help on that.

Another odd thing is that essentially identical indexscans are taking
radically different times:

> "        ->  Index Scan using "fki_FGK_SITERESUME_ACCOUNT" on t_sitesresumebydate a  (cost=0.00..46644.30 rows=82
width=28)(actual time=881.146..2711.303 rows=23 loops=1)"
 
> "              Index Cond: ("IDHitsAccount" = 378284)"
> "              Filter: (("dtDate" >= '2009-02-01'::date) AND ("dtDate" <= '2009-02-04'::date))"

> "  ->  Index Scan using "fki_FGK_SITERESUME_ACCOUNT" on t_sitesresumebydate a  (cost=0.00..46644.30 rows=55 width=28)
(actualtime=5.825..23.828 rows=17 loops=1)"
 
> "        Index Cond: ("IDHitsAccount" = 378284)"
> "        Filter: (("dtDate" >= '2009-02-01'::date) AND ("dtDate" <= '2009-02-03'::date))"

I think probably the second one was fast because the data is already
cached, so you're not making an entirely fair comparison.  If your
expectation is that the database is going to be operating under mostly
cached conditions, then you probably ought to adjust the planner cost
parameters to reflect that (look at effective_cache_size, and try
reducing random_page_cost).
        regards, tom lane