Re: Inefficient query plan - Mailing list pgsql-performance

From Jann Röder
Subject Re: Inefficient query plan
Date
Msg-id i4tsgo$sm8$1@dough.gmane.org
Whole thread Raw
In response to Re: Inefficient query plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Inefficient query plan
Re: Inefficient query plan
Re: Inefficient query plan
List pgsql-performance
Thanks for your help,
here is the information you requested:

Table information: A = Papers, B = PaperReferences

wos-db=> \d Papers
                 Table "public.papers"
      Column      |          Type           | Modifiers
------------------+-------------------------+-----------
 itemid           | character(15)           | not null
 t9id             | integer                 |
 artn             | character varying       |
 doi              | character varying       |
 pii              | character varying       |
 unsp             | character varying       |
 issueid          | character(10)           | not null
 title            | character varying(1500) | not null
 titleenhancement | character varying(500)  |
 beginningpage    | character varying(19)   |
 pagecount        | integer                 | not null
 documenttype     | character(1)            | not null
 abstract         | text                    |
Indexes:
    "papers_pkey" PRIMARY KEY, btree (itemid)
    "idx_papers_issueid" btree (issueid)
Foreign-key constraints:
    "papers_issueid_fkey" FOREIGN KEY (issueid) REFERENCES
journals(issueid) ON DELETE CASCADE
Referenced by:
    TABLE "authorkeywords" CONSTRAINT "authorkeywords_itemid_fkey"
FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
    TABLE "authors" CONSTRAINT "authors_itemid_fkey" FOREIGN KEY
(itemid) REFERENCES papers(itemid) ON DELETE CASCADE
    TABLE "grantnumbers" CONSTRAINT "grantnumbers_itemid_fkey" FOREIGN
KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
    TABLE "keywordsplus" CONSTRAINT "keywordsplus_itemid_fkey" FOREIGN
KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE
    TABLE "languages" CONSTRAINT "languages_itemid_fkey" FOREIGN KEY
(itemid) REFERENCES papers(itemid) ON DELETE CASCADE
    TABLE "paperreferences" CONSTRAINT "paperreferences_fromitemid_fkey"
FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE

wos-db=> \d PaperReferences
             Table "public.paperreferences"
       Column       |         Type          | Modifiers
--------------------+-----------------------+-----------
 itemid             | character varying(15) | not null
 t9id               | integer               |
 citedauthor        | character varying(75) |
 citedartn          | character varying     |
 citeddoi           | character varying     |
 citedpii           | character varying     |
 citedunsp          | character varying     |
 citedreferenceyear | integer               |
 citedtitle         | character varying(20) | not null
 citedvolume        | character varying(4)  |
 citedpage          | character varying(5)  |
 referenceindex     | integer               | not null
Indexes:
    "paperreferences_pkey" PRIMARY KEY, btree (itemid, referenceindex)
Foreign-key constraints:
    "paperreferences_fromitemid_fkey" FOREIGN KEY (itemid) REFERENCES
papers(itemid) ON DELETE CASCADE

I just noticed that PaperReferences uses character varying (15) and
Papers uses character(15). Stupid mistake of mine. Do you think this
might cause the bad query planning? I will alter the table to use
character(15) in both cases and see if that helps.

postgresql.conf:
max_connections = 20
shared_buffers = 256MB
work_mem = 10MB
maintenance_work_mem = 128MB
max_stack_depth = 4MB
synchronous_commit = off
wal_buffers = 1MB
checkpoint_segments = 10
effective_cache_size = 768MB
default_statistics_target = 200
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.simple'

The query I run:
SELECT p.ItemID FROM Papers AS p, PaperReferences AS r WHERE p.itemID =
r.ItemID AND p.issueID = 'A1983PW823'

Query plan with seqscan enabled:

"Hash Join  (cost=512.71..17709356.53 rows=8283226 width=16)"
"  Hash Cond: ((r.itemid)::bpchar = p.itemid)"
"  ->  Seq Scan on paperreferences r  (cost=0.00..15110856.68
rows=670707968 width=16)"
"  ->  Hash  (cost=500.29..500.29 rows=994 width=16)"
"        ->  Index Scan using idx_papers_issueid on papers p
(cost=0.00..500.29 rows=994 width=16)"
"              Index Cond: (issueid = 'A1983PW823'::bpchar)"

Query plan with seqscan disbaled

"Hash Join  (cost=10000000280.88..10017668625.22 rows=4233278 width=16)"
"  Hash Cond: ((r.itemid)::bpchar = p.itemid)"
"  ->  Seq Scan on paperreferences r
(cost=10000000000.00..10015110856.68 rows=670707968 width=16)"
"  ->  Hash  (cost=274.53..274.53 rows=508 width=16)"
"        ->  Index Scan using idx_papers_issueid on papers p
(cost=0.00..274.53 rows=508 width=16)"
"              Index Cond: (issueid = 'A1983PW823'::bpchar)"

Do you need an EXPLAIN ANALYZE output? Since it takes so long I can't
easily post one right now. But maybe I can get one over night.

My Hardware is an iMac running OS X 10.6.4 with 1.5 GB RAM and a 2.1 GHz
(or so) core 2 Duo processor.

Jann

Am 23.08.10 14:08, schrieb Kevin Grittner:
> Jann Röder wrote:
> Am 23.08.10 12:18, schrieb Scott Marlowe:
>
>>> What happens if you try
>>>
>>> set enable_seqscan=off;
>>> (your query here)
>>>
>> Tried that already. The query plan is exactly the same.
>
> Exactly?  Not even the cost shown for the seq scan changed?
>
> You are almost certainly omitting some crucial piece of information
> in your report.  Please look over this page and post a more complete
> report.  In particular, please show the results of \d for both tables
> (or of pg_dump -s -t 'tablename'), your complete postgresql.conf file
> stripped of comments, and a description of your hardware and OS.
>
> -Kevin
>


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Inefficient query plan
Next
From: Grzegorz Jaśkiewicz
Date:
Subject: Re: Inefficient query plan