Thread: Why search term results different query plan?

Why search term results different query plan?

From
"Erol Öz"
Date:
Hi,
Could anybody give me hint, or reccomend a source about this:
Please note that two queries are different only in search terms (STAR and
A). Plan and performane difference between them confused me.
Thanks,
Erol

trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;
NOTICE:  QUERY PLAN:

Unique  (cost=1599.50..1599.58 rows=1 width=98)
  ->  Sort  (cost=1599.50..1599.50 rows=2 width=98)
        ->  Nested Loop  (cost=0.00..1599.48 rows=2 width=98)
              ->  Seq Scan on product_t p  (cost=0.00..613.41 rows=1
width=94)
              ->  Seq Scan on product_detail_t pd  (cost=0.00..983.19
rows=231 width=4)


[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;" -d trollandtoad2
real    4m24.500s
user    0m0.020s
sys     0m0.010s

------------------------------------------
EXPLAIN
trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;
NOTICE:  QUERY PLAN:

Unique  (cost=4456.37..5127.88 rows=1919 width=98)
  ->  Sort  (cost=4456.37..4456.37 rows=19186 width=98)
        ->  Merge Join  (cost=2389.21..2496.11 rows=19186 width=98)
              ->  Sort  (cost=1396.97..1396.97 rows=8321 width=94)
                    ->  Seq Scan on product_t p  (cost=0.00..613.41
rows=8321 width=94)
              ->  Sort  (cost=992.24..992.24 rows=231 width=4)
                    ->  Seq Scan on product_detail_t pd  (cost=0.00..983.19
rows=231 width=4)

EXPLAIN

[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;" -d trollandtoad2
real    0m6.284s
user    0m0.270s
sys     0m0.030s





Re: Why search term results different query plan?

From
Martijn van Oosterhout
Date:
On Sun, Sep 30, 2001 at 04:18:53AM +0300, Erol ?z wrote:
> Hi,
> Could anybody give me hint, or reccomend a source about this:
> Please note that two queries are different only in search terms (STAR and
> A). Plan and performane difference between them confused me.
> Thanks,
> Erol

%STAR%:
> Unique  (cost=1599.50..1599.58 rows=1 width=98)
>   ->  Sort  (cost=1599.50..1599.50 rows=2 width=98)
>         ->  Nested Loop  (cost=0.00..1599.48 rows=2 width=98)
>               ->  Seq Scan on product_t p  (cost=0.00..613.41 rows=1 width=94)
>               ->  Seq Scan on product_detail_t pd  (cost=0.00..983.19 rows=231 width=4)

%A%:
> Unique  (cost=4456.37..5127.88 rows=1919 width=98)
>   ->  Sort  (cost=4456.37..4456.37 rows=19186 width=98)
>         ->  Merge Join  (cost=2389.21..2496.11 rows=19186 width=98)
>               ->  Sort  (cost=1396.97..1396.97 rows=8321 width=94)
>                     ->  Seq Scan on product_t p  (cost=0.00..613.41 rows=8321 width=94)
>               ->  Sort  (cost=992.24..992.24 rows=231 width=4)
>                     ->  Seq Scan on product_detail_t pd  (cost=0.00..983.19 rows=231 width=4)

Note how in the first one it assumes that there is only one match in p
whereas in the second there are 8321. Are either of those correct?

I don't really see how you can do any real estimates on %X% type queries,
although maybe it's assuming longer string => less matches.

Not sure how to fix it though...

HTH,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Why search term results different query plan?

From
Tom Lane
Date:
"=?iso-8859-9?B?RXJvbCDWeg==?=" <eroloz@esg.com.tr> writes:
> Could anybody give me hint, or reccomend a source about this:
> Please note that two queries are different only in search terms (STAR and
> A). Plan and performane difference between them confused me.

Note the difference in the estimated number of rows matched in
product_t:

>          ->  Seq Scan on product_t p  (cost=0.00..613.41 rows=1 width=94)

>          ->  Seq Scan on product_t p  (cost=0.00..613.41 rows=8321 width=94)

The planner thinks --- not unreasonably, IMHO --- that LIKE '%A%' will
match many more rows than LIKE '%STAR%'.  Accordingly, it prepares
different plans for the two cases.

Since you're complaining, I suppose that LIKE '%STAR%' matches many more
rows than random chance would suggest, and so that plan turns out to be
poorly chosen.

            regards, tom lane

Re: Why search term results different query plan?

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> I don't really see how you can do any real estimates on %X% type queries,
> although maybe it's assuming longer string => less matches.

More exactly, it's assuming more fixed characters in the pattern ->
less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c.
While the specific numbers it's using are made from whole cloth, I think
the principle should hold good.

I don't see any way to accumulate actual statistics that would improve
the estimate, do you?

            regards, tom lane

Re: Why search term results different query plan?

From
Martijn van Oosterhout
Date:
On Sat, Sep 29, 2001 at 10:37:24PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > I don't really see how you can do any real estimates on %X% type queries,
> > although maybe it's assuming longer string => less matches.
>
> More exactly, it's assuming more fixed characters in the pattern ->
> less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c.
> While the specific numbers it's using are made from whole cloth, I think
> the principle should hold good.
>
> I don't see any way to accumulate actual statistics that would improve
> the estimate, do you?

It is difficult, although obviously the results he is getting are bunk. The
only problem I can see is that it doesn't appear to be using any of the
gathered statistics at all. For example, if %STAR% matched the most common
value in the column, the selectivity would still be very low.

What I find most interesting about his case is that the query with %A% ran
40 times faster (260s to 6s) than the one where it thought only 1 row in p
was going to match. What that tells me is that the plan used for %A% is more
appropriate even though %STAR% is more selective.

How can you convince the planner of that?

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.