Thread: This query is still running after 10 hours...

This query is still running after 10 hours...

From
Robert Creager
Date:
Help?

Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's been running for more than 10 hours.

Can it be helped?

UPDATE obs_v
SET mag = obs_v.imag + zp.zero_v + cg.color_v * (obs_v.imag - i.imag),
    use = true
FROM color_groups AS cg, zero_pair AS zp, obs_i AS i, files AS f, groups AS g
WHERE  obs_v.star_id = i.star_id
   AND obs_v.file_id = f.file_id
   AND cg.group_id = g.group_id
   AND g.night_id = f.night_id
   AND g.group_id = $group_id
   AND zp.pair_id = f.pair_id

Hash Join  (cost=130079.22..639663.94 rows=1590204 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=129094.19..129094.19 rows=77211 width=59)
        ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
              ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
                    Hash Cond: ("outer".pair_id = "inner".pair_id)
                    ->  Seq Scan on zero_pair zp  (cost=0.00..43.32 rows=2532 width=8)
                    ->  Hash  (cost=250.40..250.40 rows=118 width=12)
                          ->  Hash Join  (cost=4.80..250.40 rows=118 width=12)
                                Hash Cond: ("outer".night_id = "inner".night_id)
                                ->  Seq Scan on files f  (cost=0.00..199.28 rows=9028 width=12)
                                ->  Hash  (cost=4.80..4.80 rows=1 width=8)
                                      ->  Nested Loop  (cost=0.00..4.80 rows=1 width=8)
                                            ->  Seq Scan on color_groups cg  (cost=0.00..2.84 rows=1 width=8)
                                                  Filter: (171 = group_id)
                                            ->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)
                                                  Filter: (group_id = 171)
              ->  Index Scan using obs_v_file_id_index on obs_v  (cost=0.00..1893.23 rows=2317 width=51)
                    Index Cond: (obs_v.file_id = "outer".file_id)

Table definitions:

tassiv=# \d color_groups
                              Table "public.color_groups"
    Column    |  Type   |                           Modifiers
--------------+---------+---------------------------------------------------------------
 group_id     | integer | not null default nextval('"color_groups_group_id_seq"'::text)
 color_u      | real    |
 color_b      | real    |
 color_v      | real    |
 color_r      | real    |
 color_i      | real    |
 max_residual | real    |
Indexes:
    "color_groups_pkey" primary key, btree (group_id)
    "color_group_group_id_index" btree (group_id)

tassiv=# \d zero_pair
   Table "public.zero_pair"
 Column  |  Type   | Modifiers
---------+---------+-----------
 pair_id | integer | not null
 zero_u  | real    | default 0
 zero_b  | real    | default 0
 zero_v  | real    | default 0
 zero_r  | real    | default 0
 zero_i  | real    | default 0
Indexes:
    "zero_pair_pkey" primary key, btree (pair_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE

tassiv=# \d obs_v
                        Table "public.obs_v"
 Column  |  Type   |                   Modifiers
---------+---------+------------------------------------------------
 x       | real    | not null
 y       | real    | not null
 imag    | real    | not null
 smag    | real    | not null
 loc     | spoint  | not null
 obs_id  | integer | not null default nextval('"obs_id_seq"'::text)
 file_id | integer | not null
 use     | boolean | default false
 solve   | boolean | default false
 star_id | integer |
 mag     | real    |
Indexes:
    "obs_v_file_id_index" btree (file_id)
    "obs_v_loc_index" gist (loc)
    "obs_v_obs_id_index" btree (obs_id)
    "obs_v_star_id_index" btree (star_id)
    "obs_v_use_index" btree (use)
Foreign-key constraints:
    "obs_v_files_constraint" FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE CASCADE
    "obs_v_star_id_constraint" FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON DELETE SET NULL
Triggers:
    obs_v_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_v FOR EACH ROW EXECUTE PROCEDURE observations_trigger
()

tassiv=# \d files
                                      Table "public.files"
  Column  |            Type             |                       Modifiers
----------+-----------------------------+-------------------------------------------------------
 file_id  | integer                     | not null default nextval('"files_file_id_seq"'::text)
 night_id | integer                     |
 pair_id  | integer                     |
 name     | character varying           | not null
 date     | timestamp without time zone |
Indexes:
    "files_pkey" primary key, btree (file_id)
    "files_name_key" unique, btree (name)
    "files_id_index" btree (file_id, night_id, pair_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (night_id) REFERENCES nights(night_id) ON UPDATE CASCADE ON DELETE CASCADE
    "$2" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE

tassiv=# \d groups
     Table "public.groups"
  Column  |  Type   | Modifiers
----------+---------+-----------
 group_id | integer | not null
 night_id | integer | not null
Indexes:
    "groups_pkey" primary key, btree (group_id, night_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (group_id) REFERENCES color_groups(group_id) ON DELETE CASCADE
    "$2" FOREIGN KEY (night_id) REFERENCES nights(night_id) ON DELETE CASCADE

Server is a dual AMD2600+ with 2Gb mem:

shared_buffers = 20000          # min 16, at least max_connections*2, 8KB each
sort_mem = 16000                # min 64, size in KB
max_fsm_pages = 100000          # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 5000        # min 100, ~50 bytes each
effective_cache_size = 100000   # typically 8KB each
random_page_cost = 2            # units are one sequential page
default_statistics_target = 500 # range 1-1000

Thanks,
Rob

--
 08:06:34 up 5 days, 10:33,  2 users,  load average: 3.13, 3.29, 3.61
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: This query is still running after 10 hours...

From
Kevin Barnard
Date:
What does observations_trigger do?



On Tue, 28 Sep 2004 08:19:57 -0600, Robert Creager
<robert_creager@logicalchaos.org> wrote:
>
> Help?
>
> Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's been running for more than 10
hours.
>
> Can it be helped?
>
> UPDATE obs_v
> SET mag = obs_v.imag + zp.zero_v + cg.color_v * (obs_v.imag - i.imag),
>     use = true
> FROM color_groups AS cg, zero_pair AS zp, obs_i AS i, files AS f, groups AS g
> WHERE  obs_v.star_id = i.star_id
>    AND obs_v.file_id = f.file_id
>    AND cg.group_id = g.group_id
>    AND g.night_id = f.night_id
>    AND g.group_id = $group_id
>    AND zp.pair_id = f.pair_id
>
> Hash Join  (cost=130079.22..639663.94 rows=1590204 width=63)
>   Hash Cond: ("outer".star_id = "inner".star_id)
>   ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
>   ->  Hash  (cost=129094.19..129094.19 rows=77211 width=59)
>         ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
>               ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
>                     Hash Cond: ("outer".pair_id = "inner".pair_id)
>                     ->  Seq Scan on zero_pair zp  (cost=0.00..43.32 rows=2532 width=8)
>                     ->  Hash  (cost=250.40..250.40 rows=118 width=12)
>                           ->  Hash Join  (cost=4.80..250.40 rows=118 width=12)
>                                 Hash Cond: ("outer".night_id = "inner".night_id)
>                                 ->  Seq Scan on files f  (cost=0.00..199.28 rows=9028 width=12)
>                                 ->  Hash  (cost=4.80..4.80 rows=1 width=8)
>                                       ->  Nested Loop  (cost=0.00..4.80 rows=1 width=8)
>                                             ->  Seq Scan on color_groups cg  (cost=0.00..2.84 rows=1 width=8)
>                                                   Filter: (171 = group_id)
>                                             ->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)
>                                                   Filter: (group_id = 171)
>               ->  Index Scan using obs_v_file_id_index on obs_v  (cost=0.00..1893.23 rows=2317 width=51)
>                     Index Cond: (obs_v.file_id = "outer".file_id)
>
> Table definitions:
>
> tassiv=# \d color_groups
>                               Table "public.color_groups"
>     Column    |  Type   |                           Modifiers
> --------------+---------+---------------------------------------------------------------
>  group_id     | integer | not null default nextval('"color_groups_group_id_seq"'::text)
>  color_u      | real    |
>  color_b      | real    |
>  color_v      | real    |
>  color_r      | real    |
>  color_i      | real    |
>  max_residual | real    |
> Indexes:
>     "color_groups_pkey" primary key, btree (group_id)
>     "color_group_group_id_index" btree (group_id)
>
> tassiv=# \d zero_pair
>    Table "public.zero_pair"
>  Column  |  Type   | Modifiers
> ---------+---------+-----------
>  pair_id | integer | not null
>  zero_u  | real    | default 0
>  zero_b  | real    | default 0
>  zero_v  | real    | default 0
>  zero_r  | real    | default 0
>  zero_i  | real    | default 0
> Indexes:
>     "zero_pair_pkey" primary key, btree (pair_id)
> Foreign-key constraints:
>     "$1" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE
>
> tassiv=# \d obs_v
>                         Table "public.obs_v"
>  Column  |  Type   |                   Modifiers
> ---------+---------+------------------------------------------------
>  x       | real    | not null
>  y       | real    | not null
>  imag    | real    | not null
>  smag    | real    | not null
>  loc     | spoint  | not null
>  obs_id  | integer | not null default nextval('"obs_id_seq"'::text)
>  file_id | integer | not null
>  use     | boolean | default false
>  solve   | boolean | default false
>  star_id | integer |
>  mag     | real    |
> Indexes:
>     "obs_v_file_id_index" btree (file_id)
>     "obs_v_loc_index" gist (loc)
>     "obs_v_obs_id_index" btree (obs_id)
>     "obs_v_star_id_index" btree (star_id)
>     "obs_v_use_index" btree (use)
> Foreign-key constraints:
>     "obs_v_files_constraint" FOREIGN KEY (file_id) REFERENCES files(file_id) ON DELETE CASCADE
>     "obs_v_star_id_constraint" FOREIGN KEY (star_id) REFERENCES catalog(star_id) ON DELETE SET NULL
> Triggers:
>     obs_v_trig BEFORE INSERT OR DELETE OR UPDATE ON obs_v FOR EACH ROW EXECUTE PROCEDURE observations_trigger
> ()
>
> tassiv=# \d files
>                                       Table "public.files"
>   Column  |            Type             |                       Modifiers
> ----------+-----------------------------+-------------------------------------------------------
>  file_id  | integer                     | not null default nextval('"files_file_id_seq"'::text)
>  night_id | integer                     |
>  pair_id  | integer                     |
>  name     | character varying           | not null
>  date     | timestamp without time zone |
> Indexes:
>     "files_pkey" primary key, btree (file_id)
>     "files_name_key" unique, btree (name)
>     "files_id_index" btree (file_id, night_id, pair_id)
> Foreign-key constraints:
>     "$1" FOREIGN KEY (night_id) REFERENCES nights(night_id) ON UPDATE CASCADE ON DELETE CASCADE
>     "$2" FOREIGN KEY (pair_id) REFERENCES pairs(pair_id) ON DELETE CASCADE
>
> tassiv=# \d groups
>      Table "public.groups"
>   Column  |  Type   | Modifiers
> ----------+---------+-----------
>  group_id | integer | not null
>  night_id | integer | not null
> Indexes:
>     "groups_pkey" primary key, btree (group_id, night_id)
> Foreign-key constraints:
>     "$1" FOREIGN KEY (group_id) REFERENCES color_groups(group_id) ON DELETE CASCADE
>     "$2" FOREIGN KEY (night_id) REFERENCES nights(night_id) ON DELETE CASCADE
>
> Server is a dual AMD2600+ with 2Gb mem:
>
> shared_buffers = 20000          # min 16, at least max_connections*2, 8KB each
> sort_mem = 16000                # min 64, size in KB
> max_fsm_pages = 100000          # min max_fsm_relations*16, 6 bytes each
> max_fsm_relations = 5000        # min 100, ~50 bytes each
> effective_cache_size = 100000   # typically 8KB each
> random_page_cost = 2            # units are one sequential page
> default_statistics_target = 500 # range 1-1000
>
> Thanks,
> Rob
>
> --
>  08:06:34 up 5 days, 10:33,  2 users,  load average: 3.13, 3.29, 3.61
> Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
>
>
>
>

Re: This query is still running after 10 hours...

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's been running for more than 10
hours.

> ...
>         ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
>               ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
>                     Hash Cond: ("outer".pair_id = "inner".pair_id)
> ...

It chose a nested loop here because it was only expecting 67 rows out of
the next-lower join, and so it thought it would only need 67 repetitions
of the index probe into obs_v_file_id_index.  I'm suspicious that that
estimate was way low and so the nestloop is taking forever.  You might
try "SET enable_nestloop = off" as a crude way of avoiding that trap.

            regards, tom lane

Re: This query is still running after 10 hours...

From
Gaetano Mendola
Date:
Robert Creager wrote:
> Help?
>
> Normally, this query takes from 5 minutes to 2 hours to run.  On this update, it's been running for more than 10
hours.
>
> Can it be helped?


When I see this usually means that tables are full of
dead rows. Did you vacuum you DB. Which version are you
using ?

Regards
Gaetano Mendola


Re: This query is still running after 10 hours...

From
Robert Creager
Date:
When grilled further on (Tue, 28 Sep 2004 09:28:47 -0500),
Kevin Barnard <kevin.barnard@gmail.com> confessed:

> What does observations_trigger do?
>

The trigger keeps another table (catalog) up to date with the information from the obs_v and obs_i tables.  There are
nodirect insert/update/delete's on the catalog table, only though the trigger. 

--
 19:56:54 up 5 days, 22:23,  2 users,  load average: 2.46, 2.27, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: This query is still running after 10 hours...

From
Robert Creager
Date:
When grilled further on (Tue, 28 Sep 2004 16:55:13 +0200),
Gaetano Mendola <mendola@bigfoot.com> confessed:

> Robert Creager wrote:
> > Help?
> >
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
> >
> > Can it be helped?
>
>
> When I see this usually means that tables are full of
> dead rows. Did you vacuum you DB. Which version are you
> using ?
>

Gee, the two questions I realized I forgot to answer going into work ;-)  I  run
pg_autovacuum, and it's working.  Even ran a FULL ANALYZE, no help.  The version
is 7.4.1.

Cheers,
Rob

--
 20:22:11 up 5 days, 22:48,  2 users,  load average: 2.16, 2.18, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: This query is still running after 10 hours...

From
Kevin Barnard
Date:
On Tue, 28 Sep 2004 20:21:40 -0600, Robert Creager
<robert_creager@logicalchaos.org> wrote:
>
> The trigger keeps another table (catalog) up to date with the information from the obs_v and obs_i tables.  There are
nodirect insert/update/delete's on the catalog table, only though the trigger. 
>

It's possible that the update to catalog is what is really taking a
long time.  You might wish to try and explain that query just to make
sure.  You might also wish to disable to trigger just to rule it out.
Does catalog have any triggers on it?  Does it have any foreign keys?

I've shot myself in the foot on this before which is the only reason I
ask about it.

Re: This query is still running after 10 hours...

From
Robert Creager
Date:
When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
>
> > ...
> >         ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
> >               ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> >                     Hash Cond: ("outer".pair_id = "inner".pair_id)
> > ...
>
> It chose a nested loop here because it was only expecting 67 rows out of
> the next-lower join, and so it thought it would only need 67 repetitions
> of the index probe into obs_v_file_id_index.  I'm suspicious that that
> estimate was way low and so the nestloop is taking forever.  You might
> try "SET enable_nestloop = off" as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151 different
pair_id's in the pairs table, which equates to 302 entries in the files table
(part of the query), which moves on to 533592 entries in the obs_v table and
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532, color_groups
147. Only the obs_v and obs_i tables have data of any significant quantities
with 10M rows apiece.  The trigger hitting the catalog table (875499 entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on an
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available memory
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
        ->  Hash Join  (cost=100000307.51..100266886.39 rows=77113 width=59)
              Hash Cond: ("outer".file_id = "inner".file_id)
              ->  Seq Scan on obs_v  (cost=0.00..213854.50 rows=10390650 width=5
1)              ->  Hash  (cost=100000307.34..100000307.34 rows=67 width=12)
                    ->  Hash Join  (cost=100000250.69..100000307.34 rows=67
width=12)                          Hash Cond: ("outer".pair_id =
"inner".pair_id)                          ->  Seq Scan on zero_pair zp
(cost=0.00..43.32 rows=2532 width=8)                          ->  Hash
(cost=100000250.40..100000250.40 rows=118 width=12)
  ->  Hash Join  (cost=100000004.80..100000250.40 rows=118 width=12)
                         Hash Cond: ("outer".night_id = "inner".night_id)
                              ->  Seq Scan on files f  (cost=0.00..199.28
rows=9028 width=12)                                      ->  Hash
(cost=100000004.80..100000004.80rows=1 width=8)
          ->  Nested Loop  (cost=100000000.00..100000004.80 rows=1 width=8)
                                            ->  Seq Scan on color_groups cg
(cost=0.00..2.84 rows=1 width=8)
       Filter: (175 = group_id)
->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)
                                 Filter: (group_id = 175)



--
 20:48:23 up 5 days, 23:14,  2 users,  load average: 2.56, 2.91, 2.78
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: This query is still running after 10 hours...

From
Robert Creager
Date:
When grilled further on (Tue, 28 Sep 2004 21:41:50 -0500),
Kevin Barnard <kevin.barnard@gmail.com> confessed:

> On Tue, 28 Sep 2004 20:21:40 -0600, Robert Creager
> <robert_creager@logicalchaos.org> wrote:
> >
> > The trigger keeps another table (catalog) up to date with the information
> > from the obs_v and obs_i tables.  There are no direct insert/update/delete's
> > on the catalog table, only though the trigger.
> >
>
> It's possible that the update to catalog is what is really taking a
> long time.  You might wish to try and explain that query just to make
> sure.  You might also wish to disable to trigger just to rule it out.
> Does catalog have any triggers on it?  Does it have any foreign keys?

A select on the catalog is really quick (54ms on a random query - ~1M entries).  The updates use the index.  The
catalogtable has no triggers or foreign keys.  The trigger on the obs_? tables manages the catalog table. 

tassiv=# \d catalog
                                Table "public.catalog"
      Column      |       Type       |                    Modifiers
------------------+------------------+-------------------------------------------------
 star_id          | integer          | not null default nextval('"star_id_seq"'::text)
 loc_count        | integer          | default 0
 loc              | spoint           | not null
 ra_sum           | double precision | default 0
 ra_sigma         | real             | default 0
 ra_sum_square    | double precision | default 0
 dec_sum          | double precision | default 0
 dec_sigma        | real             | default 0
 dec_sum_square   | double precision | default 0
 mag_u_count      | integer          | default 0
 mag_u            | real             | default 99
 mag_u_sum        | double precision | default 0
 mag_u_sigma      | real             | default 0
 mag_u_sum_square | double precision | default 0
 mag_b_count      | integer          | default 0
 mag_b            | real             | default 99
 mag_b_sum        | double precision | default 0
 mag_b_sigma      | real             | default 0
 mag_b_sum_square | double precision | default 0
 mag_v_count      | integer          | default 0
 mag_v            | real             | default 99
 mag_v_sum        | double precision | default 0
 mag_v_sigma      | real             | default 0
 mag_v_sum_square | double precision | default 0
 mag_r_count      | integer          | default 0
 mag_r            | real             | default 99
 mag_r_sum        | double precision | default 0
 mag_r_sigma      | real             | default 0
 mag_r_sum_square | double precision | default 0
 mag_i_count      | integer          | default 0
 mag_i            | real             | default 99
 mag_i_sum        | double precision | default 0
 mag_i_sigma      | real             | default 0
 mag_i_sum_square | double precision | default 0
Indexes:
    "catalog_pkey" primary key, btree (star_id)
    "catalog_ra_decl_index" gist (loc)


--
 21:44:49 up 6 days, 11 min,  2 users,  load average: 2.03, 2.17, 2.39
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: This query is still running after 10 hours...

From
Steven Rosenstein
Date:



Hi Robert,

"There is no significant disk activity (read 0), one CPU is pegged, and
that process is consuming 218M Resident memory, 168M Shared (10% available
memory total). All reasonable, except for the fact it doesn't come back..."

Just to let you know, I've observed the identical phenomenon on my RHEL3-WS
server running PostgreSQL V7.3.4: One of the CPU's pegged at 100% (2-way
SMP with hyperthreading, so 4 apparent CPU's), virtually zero disk I/O
activity, high memory usage, etc.  I thought it might be due to a casting
problem in a JOIN's ON clause, but that did not turn out to be the case.  I
*have* recently observed that if I run a vacuum analyze on the entire
database, the amount of time spent in this looping state decreases greatly,
but it has *not* disappeared in all cases.

Next week I hope to be able to run some directed test with stats collection
turned on, to try to see if I can find out what's causing this to occur.
I'll post the results if I find anything significant.

--- Steve
___________________________________________________________________________________

Steven Rosenstein
Senior IT Architect/Specialist | IBM Virtual Server Administration
Voice/FAX: 845-689-2064 | Cell: 646-345-6978 | Tieline: 930-6001
Text Messaging: 6463456978 @ mobile.mycingular.com
Email: srosenst @ us.ibm.com

"Learn from the mistakes of others because you can't live long enough to
make them all yourself." -- Eleanor Roosevelt



             Robert Creager
             <Robert_Creager@L
             ogicalChaos.org>                                           To
             Sent by:                  Tom Lane <tgl@sss.pgh.pa.us>
             pgsql-performance                                          cc
             -owner@postgresql         PGPerformance
             .org                      <pgsql-performance@postgresql.org>
                                                                   Subject
                                       Re: [PERFORM] This query is still
             09/28/2004 11:44          running after 10 hours...
             PM









When grilled further on (Tue, 28 Sep 2004 11:04:23 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> > Normally, this query takes from 5 minutes to 2 hours to run.  On this
> > update, it's been running for more than 10 hours.
>
> > ...
> >         ->  Nested Loop  (cost=250.69..129094.19 rows=77211 width=59)
> >               ->  Hash Join  (cost=250.69..307.34 rows=67 width=12)
> >                     Hash Cond: ("outer".pair_id = "inner".pair_id)
> > ...
>
> It chose a nested loop here because it was only expecting 67 rows out of
> the next-lower join, and so it thought it would only need 67 repetitions
> of the index probe into obs_v_file_id_index.  I'm suspicious that that
> estimate was way low and so the nestloop is taking forever.  You might
> try "SET enable_nestloop = off" as a crude way of avoiding that trap.

I tried your suggestion.  Did generate a different plan (below), but the
estimation is blown as it still used a nested loop. The query is currently
running(42 minutes so far). For the query in question, there are 151
different
pair_id's in the pairs table, which equates to 302 entries in the files
table
(part of the query), which moves on to 533592 entries in the obs_v table
and
533699 entries in the obs_i table.

The groups table has 76 total entries, files 9028, zero_pair 2532,
color_groups
147. Only the obs_v and obs_i tables have data of any significant
quantities
with 10M rows apiece.  The trigger hitting the catalog table (875499
entries) is
searching for single entries to match (one fire per obs_v/obs_i update) on
an
index (took 54ms on the first query of a random id just now).

There is no significant disk activity (read 0), one CPU is pegged, and that
process is consuming 218M Resident memory, 168M Shared (10% available
memory
total). All reasonable, except for the fact it doesn't come back...

Hash Join  (cost=100267870.17..100751247.13 rows=1578889 width=63)
  Hash Cond: ("outer".star_id = "inner".star_id)
  ->  Seq Scan on obs_i i  (cost=0.00..213658.19 rows=10391319 width=8)
  ->  Hash  (cost=100266886.39..100266886.39 rows=77113 width=59)
        ->  Hash Join  (cost=100000307.51..100266886.39 rows=77113
width=59)
              Hash Cond: ("outer".file_id = "inner".file_id)
              ->  Seq Scan on obs_v  (cost=0.00..213854.50 rows=10390650
width=5
1)              ->  Hash  (cost=100000307.34..100000307.34 rows=67
width=12)
                    ->  Hash Join  (cost=100000250.69..100000307.34 rows=67
width=12)                          Hash Cond: ("outer".pair_id =
"inner".pair_id)                          ->  Seq Scan on zero_pair zp
(cost=0.00..43.32 rows=2532 width=8)                          ->  Hash
(cost=100000250.40..100000250.40 rows=118 width=12)

  ->  Hash Join  (cost=100000004.80..100000250.40 rows=118 width=12)

                         Hash Cond: ("outer".night_id = "inner".night_id)

                              ->  Seq Scan on files f  (cost=0.00..199.28
rows=9028 width=12)                                      ->  Hash
(cost=100000004.80..100000004.80rows=1 width=8)

          ->  Nested Loop  (cost=100000000.00..100000004.80 rows=1 width=8)

                                            ->  Seq Scan on color_groups cg

(cost=0.00..2.84 rows=1 width=8)

       Filter: (175 = group_id)

->  Seq Scan on groups g  (cost=0.00..1.95 rows=1 width=8)

                                 Filter: (group_id = 175)



--
 20:48:23 up 5 days, 23:14,  2 users,  load average: 2.56, 2.91, 2.78
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004
(See attached file: color.explain)(See attached file: attlakjy.dat)

Attachment