Re: same plan, add 1 condition, 1900x slower - Mailing list pgsql-performance

From Mitch Skinner
Subject Re: same plan, add 1 condition, 1900x slower
Date
Msg-id 1131700635.29496.136.camel@enzian
Whole thread Raw
In response to Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: same plan, add 1 condition, 1900x slower
Re: same plan, add 1 condition, 1900x slower
List pgsql-performance
On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote:
> Apparently, you are using a platform and/or locale in which strcoll() is
> spectacularly, god-awfully slow --- on the order of 10 msec per comparison.

The version with the condition is definitely doing more I/O.  The
version without the condition doesn't read at all.  I strace'd an
explain analyze for each separately, and this is what I ended up with
(the first is with the condition, the second is without):

bash-2.05b$ cut '-d(' -f1 subsourcestrace | sort | uniq -c
   7127 gettimeofday
  75213 _llseek
      1 Process 30227 attached - interrupt to quit
      1 Process 30227 detached
 148671 read
      2 recv
      4 semop
      4 send
bash-2.05b$ cut '-d(' -f1 subsourcestrace-nocond | sort | uniq -c
   9103 gettimeofday
      7 _llseek
      1 Process 30227 attached - interrupt to quit
      1 Process 30227 detached
      2 recv
      4 send

For the moment, all of the rows in the view I'm selecting from satisfy
the condition, so the output of both queries is the same.  The relevant
rows of the underlying tables are probably pretty contiguous (all of the
rows satisfying the condition and the join were inserted at the same
time).  Could it just be the result of a weird physical distribution of
data in the table/index files?  For the fast query, the actual number of
rows is a lot less than the planner expects.

> This is a bit hard to believe but I can't make sense of those numbers
> any other way.  What is the platform exactly, and what database locale
> and encoding are you using?

It's RHEL 3 on x86:
[root@rehoboam root]# uname -a
Linux rehoboam 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005
i686 i686 i386 GNU/Linux

The glibc version is 2.3.2.

statgen=# select current_setting('lc_collate');
 current_setting
-----------------
 en_US.UTF-8

Not sure what's relevant, but here's some more info:
The machine has 4.5GiB of RAM and a 5-disk Raid 5.  It's a dual xeon
3.2ghz.

           relname           | relpages |  reltuples
-----------------------------+----------+-------------
 external_id_map             |   126883 | 1.55625e+07
 external_id_map_primary_key |    64607 | 1.55625e+07
 subject                     |       31 |        1186
 subject_pkey                |       19 |        1186

I've attached the output of "select name, setting from pg_settings".

And, in case my original message isn't handy, the explain analyze output
and table/view info is below.

Thanks for taking a look,
Mitch

statgen=> explain analyze select * from subject_source;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..330.72 rows=1186 width=46) (actual
time=0.051..8.890 rows=1186 loops=1)
   Merge Cond: ("outer".id = "inner".target_id)
   ->  Index Scan using subject_pkey on subject norm  (cost=0.00..63.36
rows=1186 width=28) (actual time=0.022..1.441 rows=1186 loops=1)
   ->  Index Scan using external_id_map_primary_key on external_id_map
eim  (cost=0.00..2485226.70 rows=15562513 width=26) (actual
time=0.016..2.532 rows=2175 loops=1)
 Total runtime: 9.592 ms
(5 rows)

statgen=> explain analyze select * from subject_source where
source='SCH';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..1147.33 rows=1 width=46) (actual
time=0.054..20258.161 rows=1186 loops=1)
   Merge Cond: ("outer".id = "inner".target_id)
   ->  Index Scan using subject_pkey on subject norm  (cost=0.00..63.36
rows=1186 width=28) (actual time=0.022..1.478 rows=1186 loops=1)
   ->  Index Scan using external_id_map_primary_key on external_id_map
eim  (cost=0.00..2524132.99 rows=2335 width=26) (actual
time=0.022..20252.326 rows=1186 loops=1)
         Filter: (source = 'SCH'::bpchar)
 Total runtime: 20258.922 ms
(6 rows)

statgen=> \d subject_source
         View "public.subject_source"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 id        | bigint                |
 sex       | integer               |
 parent1   | bigint                |
 parent2   | bigint                |
 source    | character(3)          |
 source_id | character varying(32) |
View definition:
 SELECT norm.id, norm.sex, norm.parent1, norm.parent2, eim.source,
eim.source_id
   FROM subject norm
   JOIN util.external_id_map eim ON norm.id = eim.target_id;

statgen=> \d subject
    Table "public.subject"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id      | bigint  | not null
 sex     | integer |
 parent1 | bigint  |
 parent2 | bigint  |
Indexes:
    "subject_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "subject_parent1" FOREIGN KEY (parent1) REFERENCES subject(id)
DEFERRABLE INITIALLY DEFERRED
    "subject_parent2" FOREIGN KEY (parent2) REFERENCES subject(id)
DEFERRABLE INITIALLY DEFERRED
    "subject_id_map" FOREIGN KEY (id) REFERENCES
util.external_id_map(target_id) DEFERRABLE INITIALLY DEFERRED

statgen=> \d util.external_id_map
         Table "util.external_id_map"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 source_id | character varying(32) | not null
 source    | character(3)          | not null
 target_id | bigint                | not null
Indexes:
    "external_id_map_primary_key" PRIMARY KEY, btree (target_id)
    "external_id_map_source_source_id_unique" UNIQUE, btree (source,
source_id)
    "external_id_map_source" btree (source)
    "external_id_map_source_target_id" btree (source, target_id)
Foreign-key constraints:
    "external_id_map_source" FOREIGN KEY (source) REFERENCES
util.source(id)



Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index Scan Costs versus Sort
Next
From: Richard Huxton
Date:
Subject: Re: same plan, add 1 condition, 1900x slower