Re: Inefficient query plan - Mailing list pgsql-performance
From | Jann Röder |
---|---|
Subject | Re: Inefficient query plan |
Date | |
Msg-id | i4tsgo$sm8$1@dough.gmane.org Whole thread Raw |
In response to | Re: Inefficient query plan ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: Inefficient query plan
Re: Inefficient query plan Re: Inefficient query plan |
List | pgsql-performance |
Thanks for your help, here is the information you requested: Table information: A = Papers, B = PaperReferences wos-db=> \d Papers Table "public.papers" Column | Type | Modifiers ------------------+-------------------------+----------- itemid | character(15) | not null t9id | integer | artn | character varying | doi | character varying | pii | character varying | unsp | character varying | issueid | character(10) | not null title | character varying(1500) | not null titleenhancement | character varying(500) | beginningpage | character varying(19) | pagecount | integer | not null documenttype | character(1) | not null abstract | text | Indexes: "papers_pkey" PRIMARY KEY, btree (itemid) "idx_papers_issueid" btree (issueid) Foreign-key constraints: "papers_issueid_fkey" FOREIGN KEY (issueid) REFERENCES journals(issueid) ON DELETE CASCADE Referenced by: TABLE "authorkeywords" CONSTRAINT "authorkeywords_itemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE TABLE "authors" CONSTRAINT "authors_itemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE TABLE "grantnumbers" CONSTRAINT "grantnumbers_itemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE TABLE "keywordsplus" CONSTRAINT "keywordsplus_itemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE TABLE "languages" CONSTRAINT "languages_itemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE TABLE "paperreferences" CONSTRAINT "paperreferences_fromitemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE wos-db=> \d PaperReferences Table "public.paperreferences" Column | Type | Modifiers --------------------+-----------------------+----------- itemid | character varying(15) | not null t9id | integer | citedauthor | character varying(75) | citedartn | character varying | citeddoi | character varying | citedpii | character varying | citedunsp | character varying | citedreferenceyear | integer | citedtitle | character varying(20) | not null citedvolume | character varying(4) | citedpage | character varying(5) | referenceindex | integer | not null Indexes: "paperreferences_pkey" PRIMARY KEY, btree (itemid, referenceindex) Foreign-key constraints: "paperreferences_fromitemid_fkey" FOREIGN KEY (itemid) REFERENCES papers(itemid) ON DELETE CASCADE I just noticed that PaperReferences uses character varying (15) and Papers uses character(15). Stupid mistake of mine. Do you think this might cause the bad query planning? I will alter the table to use character(15) in both cases and see if that helps. postgresql.conf: max_connections = 20 shared_buffers = 256MB work_mem = 10MB maintenance_work_mem = 128MB max_stack_depth = 4MB synchronous_commit = off wal_buffers = 1MB checkpoint_segments = 10 effective_cache_size = 768MB default_statistics_target = 200 datestyle = 'iso, mdy' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' default_text_search_config = 'pg_catalog.simple' The query I run: SELECT p.ItemID FROM Papers AS p, PaperReferences AS r WHERE p.itemID = r.ItemID AND p.issueID = 'A1983PW823' Query plan with seqscan enabled: "Hash Join (cost=512.71..17709356.53 rows=8283226 width=16)" " Hash Cond: ((r.itemid)::bpchar = p.itemid)" " -> Seq Scan on paperreferences r (cost=0.00..15110856.68 rows=670707968 width=16)" " -> Hash (cost=500.29..500.29 rows=994 width=16)" " -> Index Scan using idx_papers_issueid on papers p (cost=0.00..500.29 rows=994 width=16)" " Index Cond: (issueid = 'A1983PW823'::bpchar)" Query plan with seqscan disbaled "Hash Join (cost=10000000280.88..10017668625.22 rows=4233278 width=16)" " Hash Cond: ((r.itemid)::bpchar = p.itemid)" " -> Seq Scan on paperreferences r (cost=10000000000.00..10015110856.68 rows=670707968 width=16)" " -> Hash (cost=274.53..274.53 rows=508 width=16)" " -> Index Scan using idx_papers_issueid on papers p (cost=0.00..274.53 rows=508 width=16)" " Index Cond: (issueid = 'A1983PW823'::bpchar)" Do you need an EXPLAIN ANALYZE output? Since it takes so long I can't easily post one right now. But maybe I can get one over night. My Hardware is an iMac running OS X 10.6.4 with 1.5 GB RAM and a 2.1 GHz (or so) core 2 Duo processor. Jann Am 23.08.10 14:08, schrieb Kevin Grittner: > Jann Röder wrote: > Am 23.08.10 12:18, schrieb Scott Marlowe: > >>> What happens if you try >>> >>> set enable_seqscan=off; >>> (your query here) >>> >> Tried that already. The query plan is exactly the same. > > Exactly? Not even the cost shown for the seq scan changed? > > You are almost certainly omitting some crucial piece of information > in your report. Please look over this page and post a more complete > report. In particular, please show the results of \d for both tables > (or of pg_dump -s -t 'tablename'), your complete postgresql.conf file > stripped of comments, and a description of your hardware and OS. > > -Kevin >
pgsql-performance by date: