Thread: LIKE query verses =

LIKE query verses =

From

Karthikeyan Mahadevan

Date:

29 August 2007, 14:00:12

*************************************************************************************************************************
1)

EXPLAIN ANALYSE SELECT job_category.job_id,job.name,job.state,job.build_id,cat.name as reporting_group
FROM category,job_category,job,category as cat
WHERE job.job_id=job_category.job_id
AND job_category.category_id=category.category_id
AND cat.build_id=category.build_id
AND category.name = 'build_id.pap3260-20070828_01'
AND cat.name like ('reporting_group.Tier2%');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..291.53 rows=8 width=103) (actual time=98.999..385.590 rows=100 loops=1)
-> Nested Loop (cost=0.00..250.12 rows=9 width=34) (actual time=98.854..381.106 rows=100 loops=1)
-> Nested Loop (cost=0.00..123.22 rows=1 width=34) (actual time=98.717..380.185 rows=1 loops=1)
-> Index Scan using idx_cat_by_name on category cat (cost=0.00..5.97 rows=1 width=34) (actual time=95.834..245.276 rows=977 loops=1)
Index Cond: (((name)::text >= 'reporting'::character varying) AND ((name)::text < 'reportinh'::character varying))
Filter: ((name)::text ~~ 'reporting_group.Tier2%'::text)
-> Index Scan using idx_cat_by_bld_id on category (cost=0.00..117.24 rows=1 width=8) (actual time=0.134..0.134 rows=0 loops=977)
Index Cond: ("outer".build_id = category.build_id)
Filter: ((name)::text = 'build_id.pap3260-20070828_01'::text)
-> Index Scan using idx_jcat_by_cat_id on job_category (cost=0.00..126.00 rows=71 width=8) (actual time=0.126..0.569 rows=100 loops=1)
Index Cond: (job_category.category_id = "outer".category_id)
-> Index Scan using job_pkey on job (cost=0.00..4.59 rows=1 width=73) (actual time=0.033..0.036 rows=1 loops=100)
Index Cond: (job.job_id = "outer".job_id)

Total runtime: 385.882 ms
------------------------------------------------------------------------------------------------------------------------------------------------------

but , if I use AND cat.name = 'reporting_group.Tier2' ;

*************************************************************************************************************************
2)

EXPLAIN ANALYSE SELECT job_category.job_id,job.name,job.state,job.build_id,cat.name as reporting_group
FROM category,job_category,job,category as cat
WHERE job.job_id=job_category.job_id
AND job_category.category_id=category.category_id
AND cat.build_id=category.build_id
AND category.name = 'build_id.pap3260-20070828_01'
AND cat.name = 'reporting_group.Tier2' ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=8186.96..26124.40 rows=796 width=103) (actual time=40.584..48.966 rows=100 loops=1)
-> Nested Loop (cost=8186.96..21776.35 rows=945 width=34) (actual time=40.445..41.437 rows=100 loops=1)
-> Merge Join (cost=8186.96..8198.88 rows=107 width=34) (actual time=40.290..40.303 rows=1 loops=1)
Merge Cond: ("outer".build_id = "inner".build_id)
-> Sort (cost=4093.48..4096.19 rows=1085 width=8) (actual time=0.206..0.211 rows=3 loops=1)
Sort Key: category.build_id
-> Index Scan using idx_cat_by_name on category (cost=0.00..4038.78 rows=1085 width=8) (actual time=0.130..0.183 rows=3 loops=1)
Index Cond: ((name)::text = 'build_id.pap3260-20070828_01'::text)
-> Sort (cost=4093.48..4096.19 rows=1085 width=34) (actual time=37.424..38.591 rows=956 loops=1)
Sort Key: cat.build_id
-> Index Scan using idx_cat_by_name on category cat (cost=0.00..4038.78 rows=1085 width=34) (actual time=0.076..34.328 rows=962 loops=1)
Index Cond: ((name)::text = 'reporting_group.Tier2'::text)
-> Index Scan using idx_jcat_by_cat_id on job_category (cost=0.00..126.00 rows=71 width=8) (actual time=0.139..0.743 rows=100 loops=1)
Index Cond: (job_category.category_id = "outer".category_id)
-> Index Scan using job_pkey on job (cost=0.00..4.59 rows=1 width=73) (actual time=0.063..0.066 rows=1 loops=100)
Index Cond: (job.job_id = "outer".job_id)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Total runtime: 49.453 ms

How to increase the performance of the first query ?

Thank you !
Regards,
Karthi
-------------------------------------------------------------------
Karthikeyan Mahadevan
Java Technology Center
IBM Software Labs ,Bangalore, India.
Phone: +91 80 2504 4000 or 2509 4000 Ext: 2413
Direct : +91 80 25094413
Email : kamahade@in.ibm.com
"Doesn't expecting the unexpected make the unexpected become the expected? "
----------------------------------------------------------------------------

Re: LIKE query verses =

From

Mark Lewis

Date:

29 August 2007, 14:12:44

On Wed, 2007-08-29 at 18:01 +0530, Karthikeyan Mahadevan wrote:
>
>
*************************************************************************************************************************

> 1)
>
> EXPLAIN ANALYSE SELECT
> job_category.job_id,job.name,job.state,job.build_id,cat.name as
> reporting_group
> FROM category,job_category,job,category as cat
> WHERE job.job_id=job_category.job_id
> AND job_category.category_id=category.category_id
> AND cat.build_id=category.build_id
> AND category.name = 'build_id.pap3260-20070828_01'
> AND cat.name like ('reporting_group.Tier2%');
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------

>  Nested Loop  (cost=0.00..291.53 rows=8 width=103) (actual
> time=98.999..385.590 rows=100 loops=1)
>    ->  Nested Loop  (cost=0.00..250.12 rows=9 width=34) (actual
> time=98.854..381.106 rows=100 loops=1)
>          ->  Nested Loop  (cost=0.00..123.22 rows=1 width=34) (actual
> time=98.717..380.185 rows=1 loops=1)
>                ->  Index Scan using idx_cat_by_name on category cat
>  (cost=0.00..5.97 rows=1 width=34) (actual time=95.834..245.276
> rows=977 loops=1)
>                      Index Cond: (((name)::text >=
> 'reporting'::character varying) AND ((name)::text <
> 'reportinh'::character varying))
>                      Filter: ((name)::text ~~
> 'reporting_group.Tier2%'::text)
>                ->  Index Scan using idx_cat_by_bld_id on category
>  (cost=0.00..117.24 rows=1 width=8) (actual time=0.134..0.134 rows=0
> loops=977)
>                      Index Cond: ("outer".build_id =
> category.build_id)
>                      Filter: ((name)::text =
> 'build_id.pap3260-20070828_01'::text)
>          ->  Index Scan using idx_jcat_by_cat_id on job_category
>  (cost=0.00..126.00 rows=71 width=8) (actual time=0.126..0.569
> rows=100 loops=1)
>                Index Cond: (job_category.category_id =
> "outer".category_id)
>    ->  Index Scan using job_pkey on job  (cost=0.00..4.59 rows=1
> width=73) (actual time=0.033..0.036 rows=1 loops=100)
>          Index Cond: (job.job_id = "outer".job_id)
>
>  Total runtime: 385.882 ms
>
------------------------------------------------------------------------------------------------------------------------------------------------------


Remember that using LIKE causes PG to interpret an underscore as 'any
character', which means that it can only scan the index for all records
that start with 'reporting', and then it needs to apply a filter to each
match.  This is going to be slower than just going directly to the
matching index entry.

What you probably want to do is tell PG that you're looking for a
literal underscore and not for any matching character by escaping the
underscore, that will allow it to do a much quicker index scan.
Something like:

cat.name like 'reporting|_group.Tier2%' ESCAPE '|'

-- Mark Lewis

Re: LIKE query verses =

From

Tom Lane

Date:

29 August 2007, 14:52:15

Mark Lewis <mark.lewis@mir3.com> writes:
> What you probably want to do is tell PG that you're looking for a
> literal underscore and not for any matching character by escaping the
> underscore, that will allow it to do a much quicker index scan.

The other half of the problem is that the planner is drastically
misestimating the number of matching rows --- it thinks only one
when there are really about a thousand, and this leads it to use
a nestloop that will be very inefficient with so many rows.
Try increasing the statistics target for that column.  Also, if
this is a pre-8.2 PG release, consider upgrading; I believe we
improved the LIKE estimator in 8.2.

            regards, tom lane