Thread: Why search term results different query plan?
Hi, Could anybody give me hint, or reccomend a source about this: Please note that two queries are different only in search terms (STAR and A). Plan and performane difference between them confused me. Thanks, Erol trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND (pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and (p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE '%STAR%') order by p.name; NOTICE: QUERY PLAN: Unique (cost=1599.50..1599.58 rows=1 width=98) -> Sort (cost=1599.50..1599.50 rows=2 width=98) -> Nested Loop (cost=0.00..1599.48 rows=2 width=98) -> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94) -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) [postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND (pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and (p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE '%STAR%') order by p.name;" -d trollandtoad2 real 4m24.500s user 0m0.020s sys 0m0.010s ------------------------------------------ EXPLAIN trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND (pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and (p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%') order by p.name; NOTICE: QUERY PLAN: Unique (cost=4456.37..5127.88 rows=1919 width=98) -> Sort (cost=4456.37..4456.37 rows=19186 width=98) -> Merge Join (cost=2389.21..2496.11 rows=19186 width=98) -> Sort (cost=1396.97..1396.97 rows=8321 width=94) -> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94) -> Sort (cost=992.24..992.24 rows=231 width=4) -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) EXPLAIN [postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND (pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and (p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%') order by p.name;" -d trollandtoad2 real 0m6.284s user 0m0.270s sys 0m0.030s
On Sun, Sep 30, 2001 at 04:18:53AM +0300, Erol ?z wrote: > Hi, > Could anybody give me hint, or reccomend a source about this: > Please note that two queries are different only in search terms (STAR and > A). Plan and performane difference between them confused me. > Thanks, > Erol %STAR%: > Unique (cost=1599.50..1599.58 rows=1 width=98) > -> Sort (cost=1599.50..1599.50 rows=2 width=98) > -> Nested Loop (cost=0.00..1599.48 rows=2 width=98) > -> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94) > -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) %A%: > Unique (cost=4456.37..5127.88 rows=1919 width=98) > -> Sort (cost=4456.37..4456.37 rows=19186 width=98) > -> Merge Join (cost=2389.21..2496.11 rows=19186 width=98) > -> Sort (cost=1396.97..1396.97 rows=8321 width=94) > -> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94) > -> Sort (cost=992.24..992.24 rows=231 width=4) > -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) Note how in the first one it assumes that there is only one match in p whereas in the second there are 8321. Are either of those correct? I don't really see how you can do any real estimates on %X% type queries, although maybe it's assuming longer string => less matches. Not sure how to fix it though... HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
"=?iso-8859-9?B?RXJvbCDWeg==?=" <eroloz@esg.com.tr> writes: > Could anybody give me hint, or reccomend a source about this: > Please note that two queries are different only in search terms (STAR and > A). Plan and performane difference between them confused me. Note the difference in the estimated number of rows matched in product_t: > -> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94) > -> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94) The planner thinks --- not unreasonably, IMHO --- that LIKE '%A%' will match many more rows than LIKE '%STAR%'. Accordingly, it prepares different plans for the two cases. Since you're complaining, I suppose that LIKE '%STAR%' matches many more rows than random chance would suggest, and so that plan turns out to be poorly chosen. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I don't really see how you can do any real estimates on %X% type queries, > although maybe it's assuming longer string => less matches. More exactly, it's assuming more fixed characters in the pattern -> less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c. While the specific numbers it's using are made from whole cloth, I think the principle should hold good. I don't see any way to accumulate actual statistics that would improve the estimate, do you? regards, tom lane
On Sat, Sep 29, 2001 at 10:37:24PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I don't really see how you can do any real estimates on %X% type queries, > > although maybe it's assuming longer string => less matches. > > More exactly, it's assuming more fixed characters in the pattern -> > less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c. > While the specific numbers it's using are made from whole cloth, I think > the principle should hold good. > > I don't see any way to accumulate actual statistics that would improve > the estimate, do you? It is difficult, although obviously the results he is getting are bunk. The only problem I can see is that it doesn't appear to be using any of the gathered statistics at all. For example, if %STAR% matched the most common value in the column, the selectivity would still be very low. What I find most interesting about his case is that the query with %A% ran 40 times faster (260s to 6s) than the one where it thought only 1 row in p was going to match. What that tells me is that the plan used for %A% is more appropriate even though %STAR% is more selective. How can you convince the planner of that? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.