DIfferent plans for explicit versus implicit join using link table - Mailing list pgsql-general

From John D. Burger
Subject DIfferent plans for explicit versus implicit join using link table
Date
Msg-id 7ABD4750-BFC5-4E5A-BD89-2EAF8478B284@mitre.org
Whole thread Raw
Responses Re: DIfferent plans for explicit versus implicit join using link table  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hi -

I have a table of words and a table linking words in various ways:

create table allWords (
   wordID    serial    PRIMARY KEY,
   word        text
);
create unique index ix_allwords_word ON allwords (word);

create table allWordRelations (
   word1ID    integer references allWords,
   word2ID    integer references allWords,
   pos1        integer references posTypes,
   pos2        integer references posTypes,
   relID        integer references allRelationTypes,
   confidence    float,
   primary key (word1ID, word2ID, pos1, pos2, relID)
);
create index ix_allWordRelations_word1_pos1 on allWordRelations
(word1ID, pos1);
create index ix_allWordRelations_word2_pos2 on allWordRelations
(word2ID, pos2);

I have two queries for looking up related words which I think should
be equivalent, but 7.4.8 comes up with very different plans.  The
first query joins the word table to itself explicitly via the
relations table - this is very fast.  The second query uses an IN
against the link table in the where clause, and is very slow.  I'm
sure I can affect this by adding indexes, but I'm mainly trying to
understand what difference the planner is seeing.  EXPLAIN ANALYZE
output is below - can anyone explain?  Are my two queries subtly
different in terms of NULLs, or something like that?  Thanks.

- John Burger
   MITRE


explain analyze select w2.word from allwords w1 join allwordrelations
as r on (w1.wordid = r.word1id) join allwords w2 on (w2.wordid =
r.word2id) where w1.word = 'dogging';

       QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------------------
Nested Loop  (cost=0.00..579.05 rows=81 width=15) (actual
time=0.607..30.509 rows=59 loops=1)
    ->  Nested Loop  (cost=0.00..333.94 rows=81 width=4) (actual
time=0.564..29.032 rows=59 loops=1)
          ->  Index Scan using ix_allwords_word on allwords w1
(cost=0.00..3.49 rows=1 width=4) (actual time=0.326..0.329 rows=1
loops=1)
                Index Cond: (word = 'dogging'::text)
          ->  Index Scan using ix_allwordrelations_word1_pos1 on
allwordrelations r  (cost=0.00..329.36 rows=87 width=8) (actual
time=0.220..28.564 rows=59 loops=1)
                Index Cond: ("outer".wordid = r.word1id)
    ->  Index Scan using allwords_pkey on allwords w2
(cost=0.00..3.01 rows=1 width=19) (actual time=0.018..0.020 rows=1
loops=59)
          Index Cond: (w2.wordid = "outer".word2id)
Total runtime: 30.713 ms



explain analyze select w2.word from allwords w1, allwords w2 where
(w1.wordid, w2.wordid) in (select word1id, word2id from
allwordrelations ) and w1.word = 'dogging';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-----
Nested Loop  (cost=760422.86..817628.29 rows=1 width=15) (actual
time=99277.403..111291.862 rows=59 loops=1)
    ->  Hash Join  (cost=760422.86..817625.27 rows=1 width=4) (actual
time=99277.110..111270.093 rows=59 loops=1)
          Hash Cond: ("outer".word1id = "inner".wordid)
          ->  Unique  (cost=760419.36..794740.32 rows=4576128
width=8) (actual time=96713.791..107843.446 rows=4302242 loops=1)
                ->  Sort  (cost=760419.36..771859.68 rows=4576128
width=8) (actual time=96713.785..102973.088 rows=4576035 loops=1)
                      Sort Key: allwordrelations.word1id,
allwordrelations.word2id
                      ->  Seq Scan on allwordrelations
(cost=0.00..79409.28 rows=4576128 width=8) (actual
time=0.008..8668.255 rows=4576035 loops=1)
          ->  Hash  (cost=3.49..3.49 rows=1 width=4) (actual
time=0.078..0.078 rows=0 loops=1)
                ->  Index Scan using ix_allwords_word on allwords w1
(cost=0.00..3.49 rows=1 width=4) (actual time=0.067..0.070 rows=1
loops=1)
                      Index Cond: (word = 'dogging'::text)
    ->  Index Scan using allwords_pkey on allwords w2
(cost=0.00..3.01 rows=1 width=19) (actual time=0.360..0.363 rows=1
loops=59)
          Index Cond: (w2.wordid = "outer".word2id)
Total runtime: 111292.449 ms



pgsql-general by date:

Previous
From: Gerhard Wiesinger
Date:
Subject: Re: Fw: PostgreSQL Performance Tuning
Next
From: Jon Sime
Date:
Subject: Re: Feature request - have postgresql log warning when new sub-release comes out.