Thread: Query/database optimization

Query/database optimization

From
Eugeny N Dzhurinsky
Date:
Hello, I have a query:

explain analyze select tu.url_id, tu.url, coalesce(sd.recurse, 100), case when
COALESCE(get_option('use_banner')::integer,0) = 0 then 0 else ts.use_banner
end as use_banner, ts.use_cookies, ts.use_robots,  ts.includes, ts.excludes,
ts.track_domain, ts.task_id,get_available_pages(ts.task_id,ts.customer_id),
ts.redirects from task_url tu  inner join task_scheduler ts on
tu.task_id=ts.task_id  inner join  (subscription s inner join subscription_dic
sd  on sd.id=s.dict_id )  on s.customer_id=ts.customer_id  inner join customer
c on c.customer_id=ts.customer_id AND c.active  WHERE
get_available_pages(ts.task_id,ts.customer_id) > 0 AND
((get_option('expired_users')::integer = 0) OR (isfinite(last_login) AND
extract('day' from current_timestamp - last_login)::integer <=
coalesce(get_option('expired_users')::integer,100))) AND  ((s.status is null
AND ts.customer_id is null)  OR s.status > 0) AND
(get_check_period(ts.task_id,ts.next_check) is null OR
(unix_timestamp(get_check_period(ts.task_id,ts.next_check)) -
unix_timestamp(timenow()) < 3600)) AND ts.status <> 1 AND ((ts.start_time <
current_time AND ts.stop_time > current_time)  OR (ts.start_time is null AND
ts.stop_time is null))  AND tu.url_id = 1  AND ts.customer_id not in (select
distinct customer_id from task_scheduler where status = 1)  order by
ts.next_check is not null, unix_timestamp(ts.next_check) -
unix_timestamp(timenow()) limit 10;

which produces this query plan:
 Limit  (cost=2874.98..2874.99 rows=2 width=88) (actual time=11800.535..11800.546 rows=3 loops=1)
   ->  Sort  (cost=2874.98..2874.99 rows=2 width=88) (actual time=11800.529..11800.532 rows=3 loops=1)
         Sort Key: (ts.next_check IS NOT NULL), (date_part('epoch'::text, ts.next_check) - date_part('epoch'::text,
(timenow())::timestampwithout time zone)) 
         ->  Nested Loop  (cost=4.37..2874.97 rows=2 width=88) (actual time=10249.115..11800.486 rows=3 loops=1)
               ->  Nested Loop  (cost=4.37..2868.87 rows=2 width=55) (actual time=10247.721..11796.303 rows=3 loops=1)
                     Join Filter: ("inner".id = "outer".dict_id)
                     ->  Nested Loop  (cost=2.03..2865.13 rows=2 width=55) (actual time=10247.649..11796.142 rows=3
loops=1)
                           Join Filter: ((("inner".status IS NULL) AND ("outer".customer_id IS NULL)) OR
("inner".status> 0)) 
                           ->  Nested Loop  (cost=2.03..2858.34 rows=2 width=55) (actual time=10247.583..11795.936
rows=3loops=1) 
                                 ->  Seq Scan on customer c  (cost=0.00..195.71 rows=231 width=4) (actual
time=0.082..154.344rows=4161 loops=1) 
                                       Filter: (active AND isfinite(last_login) AND ((date_part('day'::text,
(('now'::text)::timestamp(6)with time zone - (last_login)::timestamp with time zone)))::integer <= 150)) 
                                 ->  Index Scan using task_scheduler_icustomer_id on task_scheduler ts
(cost=2.03..11.51rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161) 
                                       Index Cond: ("outer".customer_id = ts.customer_id)
                                       Filter: ((get_available_pages(task_id, customer_id) > 0) AND
((get_check_period(task_id,next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) -
date_part('epoch'::text,(timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND
((((start_time)::timewith time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone >
('now'::text)::time(6)with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan))) 
                                       SubPlan
                                         ->  Unique  (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631 rows=3
loops=1)
                                               ->  Sort  (cost=2.02..2.03 rows=1 width=4) (actual time=0.613..0.617
rows=3loops=1) 
                                                     Sort Key: customer_id
                                                     ->  Index Scan using task_scheduler_istatus on task_scheduler
(cost=0.00..2.01rows=1 width=4) (actual time=0.044..0.580 rows=3 loops=1) 
                                                           Index Cond: (status = 1)
                           ->  Index Scan using subscription_icustomer_id on subscription s  (cost=0.00..3.38 rows=1
width=12)(actual time=0.035..0.041 rows=1 loops=3) 
                                 Index Cond: ("outer".customer_id = s.customer_id)
                     ->  Materialize  (cost=2.34..2.65 rows=31 width=8) (actual time=0.008..0.027 rows=6 loops=3)
                           ->  Seq Scan on subscription_dic sd  (cost=0.00..2.31 rows=31 width=8) (actual
time=0.013..0.034rows=6 loops=1) 
               ->  Index Scan using task_url_storage_task_id on task_url tu  (cost=0.00..3.03 rows=1 width=37) (actual
time=0.028..0.045rows=1 loops=3) 
                     Index Cond: (tu.task_id = "outer".task_id)
                     Filter: (url_id = 1)
 Total runtime: 11801.082 ms
(28 rows)


Do I need to optimize a query somehow, or it is related to database
configuration?

I'm running postgresql 8.0.0 on CentOS release 3.7

--
Eugene N Dzhurinsky

Re: Query/database optimization

From
Tom Lane
Date:
Eugeny N Dzhurinsky <bofh@redwerk.com> writes:
> [slow query]

The bulk of your time seems to be going into this indexscan:

>                                  ->  Index Scan using task_scheduler_icustomer_id on task_scheduler ts
(cost=2.03..11.51rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161) 
>                                        Index Cond: ("outer".customer_id = ts.customer_id)
>                                        Filter: ((get_available_pages(task_id, customer_id) > 0) AND
((get_check_period(task_id,next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) -
date_part('epoch'::text,(timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND
((((start_time)::timewith time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone >
('now'::text)::time(6)with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan))) 
>                                        SubPlan
>                                          ->  Unique  (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631
rows=3loops=1) 
> ...

I kinda doubt that the index search itself is that slow --- doubtless
the problem comes from having to evaluate that filter condition on a lot
of rows.  How fast are those functions you're calling?

            regards, tom lane

Re: Query/database optimization

From
Eugeny N Dzhurinsky
Date:
On Tue, Aug 01, 2006 at 11:15:11PM -0400, Tom Lane wrote:
> Eugeny N Dzhurinsky <bofh@redwerk.com> writes:
> > [slow query]
> The bulk of your time seems to be going into this indexscan:
> >                                  ->  Index Scan using task_scheduler_icustomer_id on task_scheduler ts
(cost=2.03..11.51rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161) 
> >                                        Index Cond: ("outer".customer_id = ts.customer_id)
> >                                        Filter: ((get_available_pages(task_id, customer_id) > 0) AND
((get_check_period(task_id,next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) -
date_part('epoch'::text,(timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND
((((start_time)::timewith time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone >
('now'::text)::time(6)with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan))) 
> >                                        SubPlan
> >                                          ->  Unique  (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631
rows=3loops=1) 
> > ...
>
> I kinda doubt that the index search itself is that slow --- doubtless
> the problem comes from having to evaluate that filter condition on a lot
> of rows.  How fast are those functions you're calling?

Well, not really fast, especially get_available_pages

there is special table with history of changes, and there is a view for latest
changes per task, and this function selects all records from a view for given
ID, then calculates sum of pages of tasks and then calculates number of
available pages as number of allowed pages deduct number of processed pages.

probably there is bottleneck in this view selection?

--
Eugene N Dzhurinsky