Re: Different plan for very similar queries - Mailing list pgsql-performance

From Peter J. Holzer
Subject Re: Different plan for very similar queries
Date
Msg-id 20150719204144.GA14234@hjp.at
Whole thread Raw
In response to Different plan for very similar queries  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
List pgsql-performance
On 2015-05-29 10:55:44 +0200, Peter J. Holzer wrote:
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='einheit' and exists (select 1 from
facttable_stat_fta4f where f.einheit=t.term ); 
>                                                                                QUERY PLAN
                                                   
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop Semi Join  (cost=0.00..384860.48 rows=1 width=81) (actual time=0.061..0.119 rows=2 loops=1)
>    ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.00..391.46 rows=636 width=81) (actual
time=0.028..0.030rows=3 loops=1) 
>          Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'einheit'::text))
>    ->  Index Scan using facttable_stat_fta4_einheit_idx on facttable_stat_fta4 f  (cost=0.00..384457.80 rows=21788970
width=3)(actual time=0.027..0.027 rows=1 loops=3) 
>          Index Cond: ((einheit)::text = (t.term)::text)
>  Total runtime: 0.173 ms
> (6 rows)
>
> 0.17 ms. Much faster than a plain select distinct over a table with 43
> million rows could ever hope to be.
>
> warenstrom is very similar and the columns with more distinct values
> aren't that bad either.
>
> But for column berechnungsart the result is bad:
>
> wdsah=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register
>         from term t where facttablename='facttable_stat_fta4' and columnname='berechnungsart' and exists (select 1
fromfacttable_stat_fta4 f where f.berechnungsart=t.term ); 
>                                                                                          QUERY PLAN
                                                                      
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Semi Join  (cost=316864.57..319975.79 rows=1 width=81) (actual time=7703.917..30948.271 rows=2 loops=1)
>    Merge Cond: ((t.term)::text = (f.berechnungsart)::text)
>    ->  Index Scan using term_term_idx on term t  (cost=0.00..319880.73 rows=636 width=81) (actual
time=7703.809..7703.938rows=3 loops=1) 
>          Filter: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>    ->  Index Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.00..2545748.85
rows=43577940width=2) (actual time=0.089..16263.582 rows=21336180 loops=1) 
>  Total runtime: 30948.648 ms
> (6 rows)
>
> Over 30 seconds! That's almost 200'000 times slower.

First I'd like to apologize for dropping out of the thread without
providing a test data set. I actually had one prepared (without
confidential data), but I wanted to make sure that I could reproduce the
problem with the test data, and I didn't get around to it for a week or
two and then I went on vacation ...

Anyway, in the meantime you released 9.5alpha (thanks for that, I
probably would have compiled a snapshot sooner or later, but installing
debian packages is just a lot more convenient - I hope you get a lot of
useful feedback) and I installed that this weekend.

I am happy to report that the problem appears to be solved. All the
queries of this type I threw at the database finish in a few
milliseconds now.

    hp


--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Attachment

pgsql-performance by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: intel s3500 -- hot stuff
Next
From: Jeison Bedoya Delgado
Date:
Subject: hyperthreadin low performance