Thread: Re: Very slow queries - please help

Re: Very slow queries - please help

From

"Bealach-na Bo"

Date:

24 November 2005, 14:14:57

OK.

The consensus seems to be that I need more indexes and I also need to
look into the NOT IN statement as a possible bottleneck. I've
introduced the indexes which has led to a DRAMATIC change in response
time. Now I have to experiment with INNER JOIN -> OUTER JOIN
variations, SET ENABLE_SEQSCAN=OFF.

Forgive me for not mentioning each person individually and by name.
You have all contributed to confirming what I had suspected (and
hoped): that *I* have a lot to learn!

I'm attaching table descriptions, the first few lines of top output
while the queries were running, index lists, sample queries and
EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
vary very much, it's nearly 100% anyway, but the memory usage drops
markedly, which is another very nice result of the index introduction.

Any more comments and tips would be very welcome.

Thank you all for your input.

Bealach.




blouser@10.0.0.2.dbdev=> \d job_log
                                     Table "blouser.job_log"
     Column     |            Type             |                    Modifiers
----------------+-----------------------------+--------------------------------------------------
job_log_id     | integer                     | not null default
nextval('job_log_id_seq'::text)
first_registry | timestamp without time zone |
blogger_name   | character(50)               |
node_id        | integer                     |
job_type       | character(50)               |
job_name       | character(256)              |
job_start      | timestamp without time zone |
job_timeout    | interval                    |
job_stop       | timestamp without time zone |
nfiles_in_job  | integer                     |
status         | integer                     |
error_code     | smallint                    |
Indexes:
    "job_log_id_pkey" PRIMARY KEY, btree (job_log_id)
Check constraints:
    "job_log_status_check" CHECK (status = 0 OR status = 1 OR status = 8 OR
status = 9)
Foreign-key constraints:
    "legal_node" FOREIGN KEY (node_id) REFERENCES node(node_id)





blouser@10.0.0.2.dbdev=> \d node
                          Table "blouser.node"
Column  |     Type      |                   Modifiers
---------+---------------+-----------------------------------------------
node_id | integer       | not null default nextval('node_id_seq'::text)
name    | character(50) |
type    | character(1)  |
usage   | character(4)  |
Indexes:
    "node_id_pkey" PRIMARY KEY, btree (node_id)
    "node_name_key" UNIQUE, btree (name)
Check constraints:
    "node_type_check" CHECK ("type" = 'B'::bpchar OR "type" = 'K'::bpchar OR
"type" = 'C'::bpchar OR "type" = 'T'::bpchar OR "type" = 'R'::bpchar)
    "node_usage_check" CHECK ("usage" = 'TEST'::bpchar OR "usage" =
'LIVE'::bpchar)


#========================before new indexes were created


Tasks: 114 total,   2 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.7% us, 24.5% sy,  0.0% ni, 49.4% id,  0.3% wa,  0.0% hi,  0.0% si
Mem:   1554788k total,  1513576k used,    41212k free,    31968k buffers
Swap:  1020024k total,    27916k used,   992108k free,   708728k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25883 postgres  25   0 20528  12m  11m R 99.7  0.8   4:54.91 postmaster





blouser@10.0.0.2.dbdev=> \di
                   List of relations
Schema  |      Name       | Type  |  Owner  |  Table
---------+-----------------+-------+---------+---------
blouser | job_log_id_pkey | index | blouser | job_log
blouser | node_id_pkey    | index | blouser | node
blouser | node_name_key   | index | blouser | node
(3 rows)


      EXPLAIN ANALYSE SELECT n.name,n.type,
             n.usage, j.status,
             j.job_start,j.job_stop,
             j.nfiles_in_job,j.job_name
      FROM job_log j
      INNER JOIN node n
      ON j.node_id = n.node_id
      WHERE n.name
      LIKE '711%'
      AND n.type = 'K'
      AND n.usage = 'LIVE'
      AND j.job_name = 'COPY FILES'
      AND j.job_start >= '2005-11-14 00:00:00'
      AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL)
      ORDER BY n.name;


                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..75753.31 rows=1 width=461) (actual
time=270486.692..291662.350 rows=3 loops=1)
   Join Filter: ("inner".node_id = "outer".node_id)
   ->  Index Scan using node_name_key on node n  (cost=0.00..307.75 rows=1
width=181) (actual time=0.135..11.034 rows=208 loops=1)
         Filter: ((name ~~ '711%'::text) AND ("type" = 'K'::bpchar) AND
("usage" = 'LIVE'::bpchar))
   ->  Seq Scan on job_log j  (cost=0.00..75445.54 rows=1 width=288) (actual
time=273.374..1402.089 rows=22 loops=208)
         Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start >=
'2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop <=
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))
Total runtime: 291662.482 ms
(7 rows)


      EXPLAIN ANALYSE SELECT n.name, n.type, n.usage
      FROM node n
      WHERE n.name
      LIKE '56x%'
      AND n.type  = 'K'
      AND n.usage = 'LIVE'
      AND n.node_id NOT IN
      (SELECT n.node_id
      FROM job_log j
      INNER JOIN node n
      ON j.node_id = n.node_id
      WHERE n.name
      LIKE '711%'
      AND n.type  = 'K'
      AND n.usage = 'LIVE'
      AND j.job_name = 'COPY FILES'
      AND j.job_start >= '2005-11-14 00:00:00'
      AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL))
      ORDER BY n.name;



                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Index Scan using node_name_key on node n  (cost=75451.55..75764.94 rows=1
width=177) (actual time=1394.617..1398.609 rows=205 loops=1)
   Filter: ((name ~~ '56x%'::text) AND ("type" = 'K'::bpchar) AND ("usage" =
'LIVE'::bpchar) AND (NOT (hashed subplan)))
   SubPlan
     ->  Nested Loop  (cost=0.00..75451.54 rows=1 width=4) (actual
time=1206.622..1394.462 rows=3 loops=1)
           ->  Seq Scan on job_log j  (cost=0.00..75445.54 rows=1 width=4)
(actual time=271.361..1393.363 rows=22 loops=1)
                 Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start
 >= '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop <=
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))
           ->  Index Scan using node_id_pkey on node n  (cost=0.00..5.99
rows=1 width=4) (actual time=0.042..0.042 rows=0 loops=22)
                 Index Cond: ("outer".node_id = n.node_id)
                 Filter: ((name ~~ '711%'::text) AND ("type" = 'K'::bpchar)
AND ("usage" = 'LIVE'::bpchar))
Total runtime: 1398.868 ms
(10 rows)




#===================================after the new indexes were created

Tasks: 114 total,   2 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s): 22.9% us, 27.2% sy,  0.0% ni, 49.7% id,  0.0% wa,  0.2% hi,  0.0% si
Mem:   1554788k total,  1414632k used,   140156k free,    14784k buffers
Swap:  1020024k total,    28008k used,   992016k free,   623652k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26409 postgres  25   0 21580 8684 7116 R 99.9  0.6   0:25.38 postmaster



Schema  |        Name        | Type  |  Owner  |  Table
---------+--------------------+-------+---------+---------
blouser | idx_job_log_filter | index | blouser | job_log
blouser | idx_node_filter    | index | blouser | node
blouser | job_log_id_pkey    | index | blouser | job_log
blouser | node_id_pkey       | index | blouser | node
blouser | node_name_key      | index | blouser | node
(5 rows)


      EXPLAIN ANALYSE SELECT n.name,n.type,
             n.usage, j.status,
             j.job_start,j.job_stop,
             j.nfiles_in_job,j.job_name
      FROM job_log j
      INNER JOIN node n
      ON j.node_id = n.node_id
      WHERE n.name
      LIKE '711%'
      AND n.type = 'K'
      AND n.usage = 'LIVE'
      AND j.job_name = 'COPY FILES'
      AND j.job_start >= '2005-11-14 00:00:00'
      AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL)
      ORDER BY n.name;



----------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=18049.23..18049.23 rows=1 width=461) (actual
time=223.540..223.543 rows=3 loops=1)
   Sort Key: n.name
   ->  Nested Loop  (cost=0.00..18049.22 rows=1 width=461) (actual
time=201.575..223.470 rows=3 loops=1)
         ->  Index Scan using idx_job_log_filter on job_log j
(cost=0.00..18043.21 rows=1 width=288) (actual time=52.567..222.855 rows=22
loops=1)
               Index Cond: ((job_name = 'COPY FILES'::bpchar) AND (job_start
 >= '2005-11-14 00:00:00'::timestamp without time zone))
               Filter: ((job_stop <= '2005-11-14 05:00:00'::timestamp
without time zone) OR (job_stop IS NULL))
         ->  Index Scan using node_id_pkey on node n  (cost=0.00..5.99
rows=1 width=181) (actual time=0.022..0.022 rows=0 loops=22)
               Index Cond: ("outer".node_id = n.node_id)
               Filter: ((name ~~ '711%'::text) AND ("type" = 'K'::bpchar)
AND ("usage" = 'LIVE'::bpchar))
Total runtime: 223.677 ms
(10 rows)



      EXPLAIN ANALYSE SELECT n.name, n.type, n.usage
      FROM node n
      WHERE n.name
      LIKE '56x%'
      AND n.type  = 'K'
      AND n.usage = 'LIVE'
      AND n.node_id NOT IN
      (SELECT n.node_id
      FROM job_log j
      INNER JOIN node n
      ON j.node_id = n.node_id
      WHERE n.name
      LIKE '711%'
      AND n.type  = 'K'
      AND n.usage = 'LIVE'
      AND j.job_name = 'COPY FILES'
      AND j.job_start >= '2005-11-14 00:00:00'
      AND (j.job_stop <= '2005-11-14 05:00:00' OR j.job_stop IS NULL))
      ORDER BY n.name;



QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=18141.89..18141.89 rows=1 width=177) (actual
time=223.495..223.627 rows=205 loops=1)
   Sort Key: name
   ->  Seq Scan on node n  (cost=18049.22..18141.88 rows=1 width=177)
(actual time=220.293..222.526 rows=205 loops=1)
         Filter: ((name ~~ '56x%'::text) AND ("type" = 'K'::bpchar) AND
("usage" = 'LIVE'::bpchar) AND (NOT (hashed subplan)))
         SubPlan
           ->  Nested Loop  (cost=0.00..18049.22 rows=1 width=4) (actual
time=198.343..220.195 rows=3 loops=1)
                 ->  Index Scan using idx_job_log_filter on job_log j
(cost=0.00..18043.21 rows=1 width=4) (actual time=50.748..219.741 rows=22
loops=1)
                       Index Cond: ((job_name = 'COPY FILES'::bpchar) AND
(job_start >= '2005-11-14 00:00:00'::timestamp without time zone))
                       Filter: ((job_stop <= '2005-11-14
05:00:00'::timestamp without time zone) OR (job_stop IS NULL))
                 ->  Index Scan using node_id_pkey on node n
(cost=0.00..5.99 rows=1 width=4) (actual time=0.015..0.016 rows=0 loops=22)
                       Index Cond: ("outer".node_id = n.node_id)
                       Filter: ((name ~~ '711%'::text) AND ("type" =
'K'::bpchar) AND ("usage" = 'LIVE'::bpchar))
Total runtime: 223.860 ms
(13 rows)

Re: Very slow queries - please help

From

"Thomas F. O'Connell"

Date:

04 December 2005, 02:40:08

On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote:

> The consensus seems to be that I need more indexes and I also need to
> look into the NOT IN statement as a possible bottleneck. I've
> introduced the indexes which has led to a DRAMATIC change in response
> time. Now I have to experiment with INNER JOIN -> OUTER JOIN
> variations, SET ENABLE_SEQSCAN=OFF.
>
> Forgive me for not mentioning each person individually and by name.
> You have all contributed to confirming what I had suspected (and
> hoped): that *I* have a lot to learn!
>
> I'm attaching table descriptions, the first few lines of top output
> while the queries were running, index lists, sample queries and
> EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
> indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
> vary very much, it's nearly 100% anyway, but the memory usage drops
> markedly, which is another very nice result of the index introduction.
>
> Any more comments and tips would be very welcome.

You might find the following resources from techdocs instructive:

http://techdocs.postgresql.org/redir.php?link=/techdocs/
pgsqladventuresep2.php

http://techdocs.postgresql.org/redir.php?link=/techdocs/
pgsqladventuresep3.php

These documents provide some guidance into the process of index
selection. It seems like you could still stand to benefit from more
indexes based on your queries, table definitions, and current indexes.

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

Re: Very slow queries - please help

From

"Bealach-na Bo"

Date:

12 December 2005, 12:24:45

Thanks very  much - there are a lot of good articles there... Reading as
fast as I can :)

Best,

Bealach


>From: "Thomas F. O'Connell" <tfo@sitening.com>
>To: Bealach-na Bo <bealach_na_bo@hotmail.com>
>CC: PgSQL - Performance <pgsql-performance@postgresql.org>
>Subject: Re: [PERFORM] Very slow queries - please help
>Date: Sun, 4 Dec 2005 00:40:01 -0600
>
>
>On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote:
>
>>The consensus seems to be that I need more indexes and I also need to
>>look into the NOT IN statement as a possible bottleneck. I've
>>introduced the indexes which has led to a DRAMATIC change in response
>>time. Now I have to experiment with INNER JOIN -> OUTER JOIN
>>variations, SET ENABLE_SEQSCAN=OFF.
>>
>>Forgive me for not mentioning each person individually and by name.
>>You have all contributed to confirming what I had suspected (and
>>hoped): that *I* have a lot to learn!
>>
>>I'm attaching table descriptions, the first few lines of top output
>>while the queries were running, index lists, sample queries and
>>EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
>>indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
>>vary very much, it's nearly 100% anyway, but the memory usage drops
>>markedly, which is another very nice result of the index introduction.
>>
>>Any more comments and tips would be very welcome.
>
>You might find the following resources from techdocs instructive:
>
>http://techdocs.postgresql.org/redir.php?link=/techdocs/
>pgsqladventuresep2.php
>
>http://techdocs.postgresql.org/redir.php?link=/techdocs/
>pgsqladventuresep3.php
>
>These documents provide some guidance into the process of index  selection.
>It seems like you could still stand to benefit from more  indexes based on
>your queries, table definitions, and current indexes.
>
>--
>Thomas F. O'Connell
>Database Architecture and Programming
>Co-Founder
>Sitening, LLC
>
>http://www.sitening.com/
>110 30th Avenue North, Suite 6
>Nashville, TN 37203-6320
>615-260-0005 (cell)
>615-469-5150 (office)
>615-469-5151 (fax)
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster