postgresql uses Hash-join, i need Nested-loop - Mailing list pgsql-performance

From Gábor Farkas
Subject postgresql uses Hash-join, i need Nested-loop
Date
Msg-id 2c173a350908232354r3f4f75f1p48e604f784b4ff93@mail.gmail.com
Whole thread Raw
Responses Re: postgresql uses Hash-join, i need Nested-loop  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
hi,

i have a query that uses a Hash-Join, but it would be faster with Nested-Loop,
and i don't know how to persuade postgresql to do it.

details:

postgresql-8.2 + tsearch2

i have 2 tables, one for people, and one that does a many-to-many
link between people:

CREATE TABLE personlink (
    id integer NOT NULL,
    relid integer NOT NULL,
    created timestamp with time zone DEFAULT now() NOT NULL,
    changed timestamp with time zone,
    editorid integer NOT NULL
);

btree indexes on "id" and "relid",
PRIMARY KEY btree index on (id,relid).

CREATE TABLE person (
    id integer NOT NULL,
    firstname character varying(255),
    .
    .
    .
);
PRIMARY KEY btree index on "id".
gin index on "firstname" (for tsearch2)

(the "person" table contains more columns (around 30))

personlink contains 1.500.000 rows, person contains 900.000 rows.
i did a vacuum-with-analyze.

my query is:

SELECT personlink.id
FROM personlink
INNER JOIN person ON personlink.relid=person.id
WHERE to_tsquery('default','duck') @@ to_tsvector('default',person.firstname);

explain analyze says this:


QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=3108.62..35687.67 rows=1535 width=4) (actual
time=901.110..6113.683 rows=2 loops=1)
   Hash Cond: (personlink.relid = person.id)
   ->  Seq Scan on personlink  (cost=0.00..26805.14 rows=1535614
width=8) (actual time=0.029..3000.503 rows=1535614 loops=1)
   ->  Hash  (cost=3097.80..3097.80 rows=866 width=4) (actual
time=0.185..0.185 rows=8 loops=1)
         ->  Bitmap Heap Scan on person  (cost=23.09..3097.80 rows=866
width=4) (actual time=0.078..0.160 rows=8 loops=1)
               Recheck Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
               ->  Bitmap Index Scan on person_firstname_exact
(cost=0.00..22.87 rows=866 width=0) (actual time=0.056..0.056 rows=8
loops=1)
                     Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
 Total runtime: 6113.748 ms
(9 rows)

if i disable hash-joins with "SET enable_hashjoin =false;"

i get:


 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..45698.23 rows=1535 width=4) (actual
time=4.960..15.098 rows=2 loops=1)
   ->  Index Scan using person_firstname_exact on person
(cost=0.00..3463.53 rows=866 width=4) (actual time=0.117..0.234 rows=8
loops=1)
         Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
   ->  Index Scan using personlink_relid_idx on personlink
(cost=0.00..48.54 rows=18 width=8) (actual time=1.848..1.849 rows=0
loops=8)
         Index Cond: (personlink.relid = person.id)
 Total runtime: 15.253 ms
(6 rows)

what could i do to persuade postgresql to choose the faster Nested-Loop?

thanks,
gabor

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [PERFORMANCE] how to set wal_buffers
Next
From: "Kevin Grittner"
Date:
Subject: Re: Number of tables