Thread: simple join uses indexes, very slow

simple join uses indexes, very slow

From

george young

Date:

27 March 2006, 14:47:53

[PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1]
I have a simple join on two tables that takes way too long.  Can you help
me understand what's wrong?  There are indexes defined on the relevant columns.
I just did a fresh vacuum --full --analyze on the two tables.
Is there something I'm not seeing?
[CPU is 950Mhz AMD, 256MB RAM, 15k rpm scsi disk]
-- George Young

Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745 tuples.

explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run = p.run AND
ro.opset_num= p.opset_num and ro.run='team9'; 
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual time=14.986..70197.129 rows=43050 loops=1)
   ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=0.386..62.959rows=263 loops=1) 
         Index Cond: (run = 'team9'::text)
   ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27 width=22) (actual time=1.591..266.211 rows=164
loops=263)
         Recheck Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
         ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16 rows=27 width=0) (actual time=1.153..1.153 rows=164
loops=263)
               Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
 Total runtime: 70237.727 ms
(8 rows)

                     Table "public.run_opsets"
    Column    |            Type             |        Modifiers
--------------+-----------------------------+-------------------------
 run          | text                        | not null
 opset        | text                        |
 opset_ver    | integer                     |
 opset_num    | integer                     | not null
 status       | opset_status                |
 date_started | timestamp without time zone |
 date_done    | timestamp without time zone |
 work_started | timestamp without time zone |
 lock_user    | text                        | default 'NO-USER'::text
 lock_pid     | integer                     |
 needs_review | text                        |
Indexes:
    "run_opsets_pkey" PRIMARY KEY, btree (run, opset_num) CLUSTER


--              Table "public.parameters"
  Column   |  Type   |           Modifiers
-----------+---------+-------------------------------
 run       | text    | not null
 opset_num | integer | not null
 opset     | text    | not null
 opset_ver | integer | not null
 step_num  | integer | not null
 step      | text    | not null
 step_ver  | integer | not null
 name      | text    | not null
 value     | text    |
 split     | boolean | not null default false
 wafers    | text[]  | not null default '{}'::text[]
Indexes:
    "parameters_idx" btree (run, opset_num, step_num, opset, opset_ver, step, step_ver, name, split, wafers)
    "parameters_opset_idx" btree (opset, step, name)
    "parameters_step_idx" btree (step, name)


"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)

Re: simple join uses indexes, very slow

From

Simon Riggs

Date:

28 March 2006, 04:31:05

On Mon, 2006-03-27 at 13:47 -0500, george young wrote:

> Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745 tuples.
>
> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run = p.run AND
ro.opset_num= p.opset_num and ro.run='team9'; 
>                                                                  QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual time=14.986..70197.129 rows=43050 loops=1)
>    ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=0.386..62.959rows=263 loops=1) 
>          Index Cond: (run = 'team9'::text)
>    ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27 width=22) (actual time=1.591..266.211 rows=164
loops=263)
>          Recheck Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
>          ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16 rows=27 width=0) (actual time=1.153..1.153
rows=164loops=263) 
>                Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
>  Total runtime: 70237.727 ms
> (8 rows)

The planner appears to be underestimating the number of rows retrieved
in both cases, then multiplying them together to make it worse.
Multi-column indexes provide less accurate estimates (right now).

Looks like a hash join might be faster. What is your work_mem set to?

Can you SET enable_nestloop=off and rerun the EXPLAIN ANALYZE?

Best Regards, Simon Riggs

Re: simple join uses indexes, very slow

From

george young

Date:

28 March 2006, 11:22:31

On Tue, 28 Mar 2006 09:30:54 +0100
Simon Riggs <simon@2ndquadrant.com> threw this fish to the penguins:

> On Mon, 2006-03-27 at 13:47 -0500, george young wrote:
>
> > Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745 tuples.
> >
> > explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run = p.run AND
ro.opset_num= p.opset_num and ro.run='team9'; 
> >                                                                  QUERY PLAN
> >
--------------------------------------------------------------------------------------------------------------------------------------------
> >  Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual time=14.986..70197.129 rows=43050 loops=1)
> >    ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=0.386..62.959rows=263 loops=1) 
> >          Index Cond: (run = 'team9'::text)
> >    ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27 width=22) (actual time=1.591..266.211 rows=164
loops=263)
> >          Recheck Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
> >          ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16 rows=27 width=0) (actual time=1.153..1.153
rows=164loops=263) 
> >                Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
> >  Total runtime: 70237.727 ms
> > (8 rows)
>
> The planner appears to be underestimating the number of rows retrieved
> in both cases, then multiplying them together to make it worse.
> Multi-column indexes provide less accurate estimates (right now).
>
> Looks like a hash join might be faster. What is your work_mem set to?
work_mem= 1024


> Can you SET enable_nestloop=off and rerun the EXPLAIN ANALYZE?
newschm3=> set enable_nestloop=off ;
SET
newschm3=> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run =
p.runAND ro.opset_num = p.opset_num and ro.run='team9'; 
                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=34177.87..34291.36 rows=6707 width=22) (actual time=68421.681..68547.686 rows=43050 loops=1)
   Merge Cond: ("outer".opset_num = "inner".opset_num)
   ->  Sort  (cost=130.93..131.11 rows=71 width=18) (actual time=107.744..107.901 rows=263 loops=1)
         Sort Key: ro.opset_num
         ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=57.641..106.096rows=263 loops=1) 
               Index Cond: (run = 'team9'::text)
   ->  Sort  (cost=34046.94..34070.02 rows=9231 width=22) (actual time=68301.325..68358.087 rows=43050 loops=1)
         Sort Key: p.opset_num
         ->  Bitmap Heap Scan on parameters p  (cost=272.31..33438.97 rows=9231 width=22) (actual
time=526.462..67363.577rows=43050 loops=1) 
               Recheck Cond: ('team9'::text = run)
               ->  Bitmap Index Scan on parameters_idx  (cost=0.00..272.31 rows=9231 width=0) (actual
time=483.500..483.500rows=43050 loops=1) 
                     Index Cond: ('team9'::text = run)
 Total runtime: 68595.868 ms
(13 rows)

-- George Young

--
"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)

Re: simple join uses indexes, very slow

From

Markus Schaber

Date:

28 March 2006, 11:26:36

Hi, George,

george young wrote:

>>Looks like a hash join might be faster. What is your work_mem set to?
>
> work_mem= 1024

This is 1 Megabyte. By all means, increase it, if possible.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: simple join uses indexes, very slow

From

"Dave Dutcher"

Date:

28 March 2006, 12:18:37


> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of george young
> Sent: Monday, March 27, 2006 12:48 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] simple join uses indexes, very slow
>
[Snip]
>
> Indexes:
>     "parameters_idx" btree (run, opset_num, step_num, opset,
opset_ver,
> step, step_ver, name, split, wafers)
>     "parameters_opset_idx" btree (opset, step, name)
>     "parameters_step_idx" btree (step, name)
>


Have you tried creating some different indexes on parameters?  I don't
know if it should matter or not, but I would try some indexes like:

(run, opset_num) //Without all the other columns
(opset_num, run) //Backwards
(opset_num)

I don't really know Postgres internals all that well.  It just seems to
me that parameters_idx has a lot of columns this query is not interested
in.  I'd just be curious to see what happens.

Re: simple join uses indexes, very slow

From

"Steinar H. Gunderson"

Date:

28 March 2006, 12:29:18

On Tue, Mar 28, 2006 at 10:18:25AM -0600, Dave Dutcher wrote:
>>     "parameters_idx" btree (run, opset_num, step_num, opset,
> opset_ver,
>> step, step_ver, name, split, wafers)
>>     "parameters_opset_idx" btree (opset, step, name)
>>     "parameters_step_idx" btree (step, name)
> Have you tried creating some different indexes on parameters?  I don't
> know if it should matter or not, but I would try some indexes like:
>
> (run, opset_num) //Without all the other columns
> (opset_num, run) //Backwards
> (opset_num)

An index on (A,B,C) can be used for a query on (A,B) or (A), so it doesn't
really matter. It isn't usable for a query on (B), (C) or (B,C), though. (The
index rows will get bigger, of course, so you'll need more I/O if you want to
scan large parts of it, but I guess that's beside the point.)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: simple join uses indexes, very slow

From

"Dave Dutcher"

Date:

28 March 2006, 13:20:27

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Steinar H. Gunderson
> Sent: Tuesday, March 28, 2006 10:29 AM
>
> An index on (A,B,C) can be used for a query on (A,B) or (A), so it
doesn't
> really matter. It isn't usable for a query on (B), (C) or (B,C),
though.
> (The
> index rows will get bigger, of course, so you'll need more I/O if you
want
> to
> scan large parts of it, but I guess that's beside the point.)


I guess what I am really curious about is why was the OP getting an
expensive sort when the planner tried a merge join?  Most of the time
was spent sorting the parameters parameters table by opset_num even
though opset_num is indexed.  Isn't Postgres able to walk the index
instead of sorting?  I was wondering if maybe Postgres wasn't
recognizing that it could just walk the index because the opset_num
column isn't the first in the index.

Re: simple join uses indexes, very slow

From

"Jim C. Nasby"

Date:

28 March 2006, 13:30:54

On Tue, Mar 28, 2006 at 06:29:08PM +0200, Steinar H. Gunderson wrote:
> On Tue, Mar 28, 2006 at 10:18:25AM -0600, Dave Dutcher wrote:
> >>     "parameters_idx" btree (run, opset_num, step_num, opset,
> > opset_ver,
> >> step, step_ver, name, split, wafers)
> >>     "parameters_opset_idx" btree (opset, step, name)
> >>     "parameters_step_idx" btree (step, name)
> > Have you tried creating some different indexes on parameters?  I don't
> > know if it should matter or not, but I would try some indexes like:
> >
> > (run, opset_num) //Without all the other columns
> > (opset_num, run) //Backwards
> > (opset_num)
>
> An index on (A,B,C) can be used for a query on (A,B) or (A), so it doesn't
> really matter. It isn't usable for a query on (B), (C) or (B,C), though. (The
> index rows will get bigger, of course, so you'll need more I/O if you want to
> scan large parts of it, but I guess that's beside the point.)

Note that given how statistics currenly work, there are many situations
where the planner will refuse to use a multi-column index. This probably
won't change until there's some concept of multi-column statistics, at
least for multi-column indexes.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: simple join uses indexes, very slow

From

"Steinar H. Gunderson"

Date:

28 March 2006, 13:34:55

On Tue, Mar 28, 2006 at 11:20:19AM -0600, Dave Dutcher wrote:
> I guess what I am really curious about is why was the OP getting an
> expensive sort when the planner tried a merge join?

A merge join requires sorted inputs.

> Most of the time was spent sorting the parameters parameters table by
> opset_num even though opset_num is indexed. Isn't Postgres able to walk the
> index instead of sorting?

The time of an index scan vs. a sequential scan + sort depends on several
factors, so it's not just a matter of walking the index whenever there is one.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: simple join uses indexes, very slow

From

Simon Riggs

Date:

28 March 2006, 14:18:04

On Tue, 2006-03-28 at 10:22 -0500, george young wrote:

> work_mem= 1024

Set that higher.

Try a couple of other plans using enable_* and let us have the EXPLAIN
ANALYZE plans.

Best Regards, Simon Riggs

Re: simple join uses indexes, very slow

From

george young

Date:

28 March 2006, 16:57:20

On Tue, 28 Mar 2006 19:17:49 +0100
Simon Riggs <simon@2ndquadrant.com> threw this fish to the penguins:

> On Tue, 2006-03-28 at 10:22 -0500, george young wrote:
>
> > work_mem= 1024
>
> Set that higher.
>
> Try a couple of other plans using enable_* and let us have the EXPLAIN
> ANALYZE plans.
I tried this, but it doesn't seem to have made much difference that I can see:

newschm3=> show work_mem;
 work_mem
----------
 8024

newschm3=> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run =
p.runAND ro.opset_num = p.opset_num and ro.run='team9'; 
                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual time=292.739..107672.525 rows=43050 loops=1)
   ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=115.134..197.818rows=263 loops=1) 
         Index Cond: (run = 'team9'::text)
   ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27 width=22) (actual time=2.559..408.125 rows=164
loops=263)
         Recheck Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
         ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16 rows=27 width=0) (actual time=2.099..2.099 rows=164
loops=263)
               Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
 Total runtime: 107860.493 ms
(8 rows)

newschm3=> shoe enable_nestloop;
ERROR:  syntax error at or near "shoe" at character 1
LINE 1: shoe enable_nestloop;
        ^
newschm3=> show enable_nestloop;
 enable_nestloop
-----------------
 on
(1 row)

newschm3=> set enable_nestloop=off;
SET
newschm3=> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run =
p.runAND ro.opset_num = p.opset_num and ro.run='team9'; 
                                                                    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=34177.87..34291.36 rows=6707 width=22) (actual time=64654.744..64760.875 rows=43050 loops=1)
   Merge Cond: ("outer".opset_num = "inner".opset_num)
   ->  Sort  (cost=130.93..131.11 rows=71 width=18) (actual time=62.177..62.333 rows=263 loops=1)
         Sort Key: ro.opset_num
         ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=40.415..55.745rows=263 loops=1) 
               Index Cond: (run = 'team9'::text)
   ->  Sort  (cost=34046.94..34070.02 rows=9231 width=22) (actual time=64592.526..64615.228 rows=43050 loops=1)
         Sort Key: p.opset_num
         ->  Bitmap Heap Scan on parameters p  (cost=272.31..33438.97 rows=9231 width=22) (actual
time=333.975..64126.200rows=43050 loops=1) 
               Recheck Cond: ('team9'::text = run)
               ->  Bitmap Index Scan on parameters_idx  (cost=0.00..272.31 rows=9231 width=0) (actual
time=309.199..309.199rows=43050 loops=1) 
                     Index Cond: ('team9'::text = run)
 Total runtime: 64919.714 ms
(13 rows)

--
"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)

Re: simple join uses indexes, very slow

From

"Dave Dutcher"

Date:

28 March 2006, 21:30:30

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-
> owner@postgresql.org] On Behalf Of Steinar H. Gunderson
> A merge join requires sorted inputs.
>
> > Most of the time was spent sorting the parameters parameters table
by
> > opset_num even though opset_num is indexed. Isn't Postgres able to
walk
> the
> > index instead of sorting?
>
> The time of an index scan vs. a sequential scan + sort depends on
several
> factors, so it's not just a matter of walking the index whenever there
is
> one.

I was just looking this over again and I realized I misread the query
plan.  The slowest step was the Bitmap Heap Scan not the sort.  (The
sort was relatively fast.)

Re: simple join uses indexes, very slow

From

Chris

Date:

29 March 2006, 01:16:02

george young wrote:
> [PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1]
> I have a simple join on two tables that takes way too long.  Can you help
> me understand what's wrong?  There are indexes defined on the relevant columns.
> I just did a fresh vacuum --full --analyze on the two tables.
> Is there something I'm not seeing?
> [CPU is 950Mhz AMD, 256MB RAM, 15k rpm scsi disk]
> -- George Young
>
> Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745 tuples.
>
> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run = p.run AND
ro.opset_num= p.opset_num and ro.run='team9'; 
>                                                                  QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
>  Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual time=14.986..70197.129 rows=43050 loops=1)
>    ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..128.75 rows=71 width=18) (actual
time=0.386..62.959rows=263 loops=1) 
>          Index Cond: (run = 'team9'::text)
>    ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27 width=22) (actual time=1.591..266.211 rows=164
loops=263)
>          Recheck Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
>          ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16 rows=27 width=0) (actual time=1.153..1.153
rows=164loops=263) 
>                Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
>  Total runtime: 70237.727 ms
> (8 rows)
>
>                      Table "public.run_opsets"
>     Column    |            Type             |        Modifiers
> --------------+-----------------------------+-------------------------
>  run          | text                        | not null
>  opset        | text                        |
>  opset_ver    | integer                     |
>  opset_num    | integer                     | not null
>  status       | opset_status                |
>  date_started | timestamp without time zone |
>  date_done    | timestamp without time zone |
>  work_started | timestamp without time zone |
>  lock_user    | text                        | default 'NO-USER'::text
>  lock_pid     | integer                     |
>  needs_review | text                        |
> Indexes:
>     "run_opsets_pkey" PRIMARY KEY, btree (run, opset_num) CLUSTER
>
>
> --              Table "public.parameters"
>   Column   |  Type   |           Modifiers
> -----------+---------+-------------------------------
>  run       | text    | not null
>  opset_num | integer | not null
>  opset     | text    | not null
>  opset_ver | integer | not null
>  step_num  | integer | not null
>  step      | text    | not null
>  step_ver  | integer | not null
>  name      | text    | not null
>  value     | text    |
>  split     | boolean | not null default false
>  wafers    | text[]  | not null default '{}'::text[]
> Indexes:
>     "parameters_idx" btree (run, opset_num, step_num, opset, opset_ver, step, step_ver, name, split, wafers)
>     "parameters_opset_idx" btree (opset, step, name)
>     "parameters_step_idx" btree (step, name)

More for my own information (because nobody else has suggested it),
would it make a difference if 'run' was a varchar field rather than text?


--
Postgresql & php tutorials
http://www.designmagick.com/

Re: simple join uses indexes, very slow

From

stef

Date:

29 March 2006, 02:11:52

If your looking for suggestions, I would suggest updating the 8.1.x you
have installed to the latest version, as of typing this is 8.1.3 ;) Most
notable is some of the  -bug- fixes that are in since 8.1.0, for example;

* Fix incorrect optimizations of outer-join conditions (Tom)

You know, minor point releases aren't adding new features or changing
basic functionality, they are pure and simple bugfixes. If I was in
-your- position, I would run (don't walk ;) and install upto 8.1.3

of course, thats jst my 2c, feel free to ignore :D
Regards
Stef

Chris wrote:

> george young wrote:
>
>> [PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1]
>> I have a simple join on two tables that takes way too long.  Can you
>> help
>> me understand what's wrong?  There are indexes defined on the
>> relevant columns.
>> I just did a fresh vacuum --full --analyze on the two tables.
>> Is there something I'm not seeing?
>> [CPU is 950Mhz AMD, 256MB RAM, 15k rpm scsi disk]
>> -- George Young
>>
>> Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745
>> tuples.
>>
>> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM
>> run_opsets ro, parameters p WHERE ro.run = p.run AND ro.opset_num =
>> p.opset_num and ro.run='team9';
>>
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------

>>
>>  Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual
>> time=14.986..70197.129 rows=43050 loops=1)
>>    ->  Index Scan using run_opsets_pkey on run_opsets ro
>> (cost=0.00..128.75 rows=71 width=18) (actual time=0.386..62.959
>> rows=263 loops=1)
>>          Index Cond: (run = 'team9'::text)
>>    ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27
>> width=22) (actual time=1.591..266.211 rows=164 loops=263)
>>          Recheck Cond: (('team9'::text = p.run) AND
>> ("outer".opset_num = p.opset_num))
>>          ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16
>> rows=27 width=0) (actual time=1.153..1.153 rows=164 loops=263)
>>                Index Cond: (('team9'::text = p.run) AND
>> ("outer".opset_num = p.opset_num))
>>  Total runtime: 70237.727 ms
>> (8 rows)
>>
>>                      Table "public.run_opsets"
>>     Column    |            Type             |        Modifiers
>> --------------+-----------------------------+-------------------------
>>  run          | text                        | not null
>>  opset        | text                        |
>>  opset_ver    | integer                     |
>>  opset_num    | integer                     | not null
>>  status       | opset_status                |
>>  date_started | timestamp without time zone |
>>  date_done    | timestamp without time zone |
>>  work_started | timestamp without time zone |
>>  lock_user    | text                        | default 'NO-USER'::text
>>  lock_pid     | integer                     |
>>  needs_review | text                        |
>> Indexes:
>>     "run_opsets_pkey" PRIMARY KEY, btree (run, opset_num) CLUSTER
>>
>>
>> --              Table "public.parameters"
>>   Column   |  Type   |           Modifiers
>> -----------+---------+-------------------------------
>>  run       | text    | not null
>>  opset_num | integer | not null
>>  opset     | text    | not null
>>  opset_ver | integer | not null
>>  step_num  | integer | not null
>>  step      | text    | not null
>>  step_ver  | integer | not null
>>  name      | text    | not null
>>  value     | text    |
>>  split     | boolean | not null default false
>>  wafers    | text[]  | not null default '{}'::text[]
>> Indexes:
>>     "parameters_idx" btree (run, opset_num, step_num, opset,
>> opset_ver, step, step_ver, name, split, wafers)
>>     "parameters_opset_idx" btree (opset, step, name)
>>     "parameters_step_idx" btree (step, name)
>
>
> More for my own information (because nobody else has suggested it),
> would it make a difference if 'run' was a varchar field rather than text?
>
>

Re: simple join uses indexes, very slow

From

george young

Date:

29 March 2006, 11:29:35

On Wed, 29 Mar 2006 01:08:15 -0500
stef <stef@ummon.com> threw this fish to the penguins:

>
> If your looking for suggestions, I would suggest updating the 8.1.x you
> have installed to the latest version, as of typing this is 8.1.3 ;) Most
> notable is some of the  -bug- fixes that are in since 8.1.0, for example;
>
> * Fix incorrect optimizations of outer-join conditions (Tom)
>
> You know, minor point releases aren't adding new features or changing
> basic functionality, they are pure and simple bugfixes. If I was in
> -your- position, I would run (don't walk ;) and install upto 8.1.3

I just did this(8.1.3).  I also moved the server to a host with more
ram and faster cpu.  And I did cluster on the main index of the large
parameters table. The result is less than a second instead of 70
seconds.

Sorry I didn't have time to isolate the individual effects
of the above changes, but sometimes you just have to do "a bunch of
good things" and move on.  For your enjoyment here's the latest analyze:

newschm3=> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM run_opsets ro, parameters p WHERE ro.run =
p.runAND ro.opset_num = p.opset_num and ro.run='team9'; 
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..6194.18 rows=9186 width=22) (actual time=0.477..175.554 rows=43050 loops=1)
   ->  Index Scan using run_opsets_pkey on run_opsets ro  (cost=0.00..122.27 rows=68 width=18) (actual
time=0.222..1.093rows=263 loops=1) 
         Index Cond: (run = 'team9'::text)
   ->  Index Scan using parameters_idx on parameters p  (cost=0.00..88.72 rows=46 width=22) (actual time=0.023..0.498
rows=164loops=263) 
         Index Cond: (('team9'::text = p.run) AND ("outer".opset_num = p.opset_num))
 Total runtime: 190.821 ms

Thank you all very much for you help!

-- George Young

>
> of course, thats jst my 2c, feel free to ignore :D
> Regards
> Stef
>
> Chris wrote:
>
> > george young wrote:
> >
> >> [PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1]
> >> I have a simple join on two tables that takes way too long.  Can you
> >> help
> >> me understand what's wrong?  There are indexes defined on the
> >> relevant columns.
> >> I just did a fresh vacuum --full --analyze on the two tables.
> >> Is there something I'm not seeing?
> >> [CPU is 950Mhz AMD, 256MB RAM, 15k rpm scsi disk]
> >> -- George Young
> >>
> >> Table sizes: parameters has 2.1512e+07 tuples, run_opsets has 211745
> >> tuples.
> >>
> >> explain analyze SELECT ro.run, ro.opset_num, p.step_num FROM
> >> run_opsets ro, parameters p WHERE ro.run = p.run AND ro.opset_num =
> >> p.opset_num and ro.run='team9';
> >>
> >> QUERY PLAN
> >>
--------------------------------------------------------------------------------------------------------------------------------------------

> >>
> >>  Nested Loop  (cost=2.16..7957.40 rows=6707 width=22) (actual
> >> time=14.986..70197.129 rows=43050 loops=1)
> >>    ->  Index Scan using run_opsets_pkey on run_opsets ro
> >> (cost=0.00..128.75 rows=71 width=18) (actual time=0.386..62.959
> >> rows=263 loops=1)
> >>          Index Cond: (run = 'team9'::text)
> >>    ->  Bitmap Heap Scan on parameters p  (cost=2.16..109.93 rows=27
> >> width=22) (actual time=1.591..266.211 rows=164 loops=263)
> >>          Recheck Cond: (('team9'::text = p.run) AND
> >> ("outer".opset_num = p.opset_num))
> >>          ->  Bitmap Index Scan on parameters_idx  (cost=0.00..2.16
> >> rows=27 width=0) (actual time=1.153..1.153 rows=164 loops=263)
> >>                Index Cond: (('team9'::text = p.run) AND
> >> ("outer".opset_num = p.opset_num))
> >>  Total runtime: 70237.727 ms
> >> (8 rows)
> >>
> >>                      Table "public.run_opsets"
> >>     Column    |            Type             |        Modifiers
> >> --------------+-----------------------------+-------------------------
> >>  run          | text                        | not null
> >>  opset        | text                        |
> >>  opset_ver    | integer                     |
> >>  opset_num    | integer                     | not null
> >>  status       | opset_status                |
> >>  date_started | timestamp without time zone |
> >>  date_done    | timestamp without time zone |
> >>  work_started | timestamp without time zone |
> >>  lock_user    | text                        | default 'NO-USER'::text
> >>  lock_pid     | integer                     |
> >>  needs_review | text                        |
> >> Indexes:
> >>     "run_opsets_pkey" PRIMARY KEY, btree (run, opset_num) CLUSTER
> >>
> >>
> >> --              Table "public.parameters"
> >>   Column   |  Type   |           Modifiers
> >> -----------+---------+-------------------------------
> >>  run       | text    | not null
> >>  opset_num | integer | not null
> >>  opset     | text    | not null
> >>  opset_ver | integer | not null
> >>  step_num  | integer | not null
> >>  step      | text    | not null
> >>  step_ver  | integer | not null
> >>  name      | text    | not null
> >>  value     | text    |
> >>  split     | boolean | not null default false
> >>  wafers    | text[]  | not null default '{}'::text[]
> >> Indexes:
> >>     "parameters_idx" btree (run, opset_num, step_num, opset,
> >> opset_ver, step, step_ver, name, split, wafers)
> >>     "parameters_opset_idx" btree (opset, step, name)
> >>     "parameters_step_idx" btree (step, name)
> >
> >
> > More for my own information (because nobody else has suggested it),
> > would it make a difference if 'run' was a varchar field rather than text?
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


--
"Are the gods not just?"  "Oh no, child.
What would become of us if they were?" (CSL)

Re: simple join uses indexes, very slow

From

"Jim C. Nasby"

Date:

31 March 2006, 11:48:07

On Wed, Mar 29, 2006 at 01:08:15AM -0500, stef wrote:
>
> If your looking for suggestions, I would suggest updating the 8.1.x you
> have installed to the latest version, as of typing this is 8.1.3 ;) Most
> notable is some of the  -bug- fixes that are in since 8.1.0, for example;
>
> * Fix incorrect optimizations of outer-join conditions (Tom)
>
> You know, minor point releases aren't adding new features or changing
> basic functionality, they are pure and simple bugfixes. If I was in
> -your- position, I would run (don't walk ;) and install upto 8.1.3

More important, there are data loss bugfixes between 8.1.0 and 8.1.3.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461