Re: [GENERAL] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-general
From | Charles Hornberger |
---|---|
Subject | Re: [GENERAL] nested loops in joins, ambiguous rewrite rules |
Date | |
Msg-id | 3.0.5.32.19990129145115.00b0ac50@k4azl.net Whole thread Raw |
In response to | nested loops in joins, ambiguous rewrite rules (Charles Hornberger <hornberger@tabloid.net>) |
List | pgsql-general |
Hello! I'm in a bit of a fix, and I bet that someone out there has just the right answer for me. I've been struggling with this optimization issue for a couple days now. The explanation that follows is very detailed ... I include the dumps from a few versions of one of my test databases. It's been suggested to me by others that perhaps there's some kind of bug in the optimizer, but that's not a diagnois that I could make on my own. I'd really appreciate a fresh view on this problem. The most recent suggestion I got from Vadim was to run vacuum analyze against the databases. I've tried this multiple times, but it has no effect. What is happening (please forgive my non-technical explanation) is that the optimizer is occasionally lapsing into these "nested loop" query plans, rather than using something more efficient. It doesn't matter whether I vacuum the DB; it seems to depend on something else entirely. But I can't figure out what that is. Here are two identically structured sets of tables from two separate databases, with slightly different data in them. This DB is named apx13: ---------------------- CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; SELECT nextval ('article_article_id_seq'); CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval ( 'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL, "locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id" "int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date" "date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text 'now', "published" "int4" DEFAULT 0); INSERT INTO "article" ("article_id","section_id","locale_id","article_source_id","volume_id","issu e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i nputter_id","input_date","published") values (10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999 PST',100,'Thu Jan 28 19:28:40 1999 PST',0); CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree ( "article_id" "int4_ops" ); CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id" "int4_ops" ); CREATE INDEX "article_source_ix" on "article" using btree ( "article_source_id" "int4_ops" ); CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id" "int4_ops" ); CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id" "int4_ops" ); CREATE INDEX "article_section_ix" on "article" using btree ( "section_id" "int4_ops" ); CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline" varchar, "subhead" varchar); INSERT INTO "article_text" ("article_id","headline","subhead") values (10,'Mayor Signs Contract With Company','Legally binding document said to be four pages long'); Here's the same thing, but with a couple extra rows in the 'article' and 'article_text' tables This one is named apx14 ----------------------- CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; SELECT nextval ('article_article_id_seq'); CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval ( 'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL, "locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id" "int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date" "date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text 'now', "published" "int4" DEFAULT 0); INSERT INTO "article" ("article_id","section_id","locale_id","article_source_id","volume_id","issu e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i nputter_id","input_date","published") values (10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999 PST',100,'Thu Jan 28 19:28:40 1999 PST',0); INSERT INTO "article" ("article_id","section_id","locale_id","article_source_id","volume_id","issu e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i nputter_id","input_date","published") values (11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999 PST',100,'Thu Jan 28 19:28:40 1999 PST',0); CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree ( "article_id" "int4_ops" ); CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id" "int4_ops" ); CREATE INDEX "article_source_ix" on "article" using btree ( "article_source_id" "int4_ops" ); CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id" "int4_ops" ); CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id" "int4_ops" ); CREATE INDEX "article_section_ix" on "article" using btree ( "section_id" "int4_ops" ); CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline" varchar, "subhead" varchar); INSERT INTO "article_text" ("article_id","headline","subhead") values (10,'Mayor Signs Contract With Company','Legally binding document said to be four pages long'); INSERT INTO "article_text" ("article_id","headline","subhead") values (11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal'); CREATE INDEX "article_text_ix" on "article_text" using btree ( "article_id" "int4_ops" ); When I run the following query... EXPLAIN SELECT a.article_id, b.article_id FROM article a, article_text b WHERE a.article_id = b.article_id; ... against apx13, it produces: NOTICE: QUERY PLAN: Nested Loop (cost=2.07 size=1 width=8) -> Seq Scan on article a (cost=1.03 size=1 width=4) -> Seq Scan on article_text b (cost=1.03 size=1 width=4) EXPLAIN When I run it against apx14, it produces: NOTICE: QUERY PLAN: Merge Join (cost=0.00 size=1 width=8) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on article a (cost=0.00 size=0 width=4) -> Seq Scan (cost=0.00 size=0 width=0) -> Sort (cost=0.00 size=0 width=0) -> Seq Scan on article_text b (cost=0.00 size=0 width=4) EXPLAIN In order to further try and track down this problem, I dumped the schema from apx14 and the data from apx13, then created a new database, apx15, by reloading the apx14 schema and the apx13 data. Join queries in the new apx15 database also produce "Merge Join" query plans instead of nested loops. So what's wrong with apx13? Can anyone give me any hints as to how to fix this? Thanks in advance! Charlie (My previous posting, and Vadim's reponse, follow:) At 07:03 PM 1/29/99 +0700, you wrote: >Charles Hornberger wrote: >> >> I've got a Postgres 6.4 DB that is doing, to my mind, funny things. >> >> The main problem is that joins over multiple tables are taking a long, long >> time to execute. >> > >man vacuum > >Vadim > > >> >> >>When I do EXPLAINs on join queries, I find that the optimizer is choosing >>to do a lot of nested loops and sequential scans, instead of Merge Joins or >>something that sounds more efficient. >> >>For instance, here's a EXPLAIN SELECT on a view that joins 7 tables. All >>of the join columns are indexed. >> >>NOTICE: QUERY PLAN: >> >>Nested Loop (cost=0.00 size=1 width=140) >> -> Nested Loop (cost=0.00 size=1 width=124) >> -> Nested Loop (cost=0.00 size=1 width=108) >> -> Nested Loop (cost=0.00 size=1 width=92) >> -> Nested Loop (cost=0.00 size=1 width=76) >> -> Merge Join (cost=0.00 size=1 width=60) >> -> Seq Scan (cost=0.00 size=0 width=0) >> -> Sort (cost=0.00 size=0 width=0) >> -> Seq Scan on article (cost=0.00 size=0 width=32) >> -> Seq Scan (cost=0.00 size=0 width=0) >> -> Sort (cost=0.00 size=0 width=0) >> -> Seq Scan on article_text (cost=0.00 size=0 width=28) >> -> Seq Scan on article_source (cost=0.00 size=0 width=16) >> -> Seq Scan on section (cost=0.00 size=0 width=16) >> -> Seq Scan on locale (cost=0.00 size=0 width=16) >> -> Seq Scan on volume (cost=0.00 size=0 width=16) >> -> Seq Scan on issue (cost=0.00 size=0 width=16) >> >>EXPLAIN >> >>This view was created with: >> >>CREATE VIEW all_articles AS >>SELECT article.article_id, article.print_publ_date, >> article.print_page_no, >> article_text.headline, article_text.subhead, >> article_source.source_name, >> section.section_name, >> locale.locale_name, >> volume.volume_name, >> issue.issue_name >> FROM article, >> article_text, >> article_source, >> section, >> locale, >> volume, >> issue >> WHERE article.article_id = article_text.article_id >> AND article.article_source_id = article_source.source_id >> AND article.section_id = section.section_id >> AND article.locale_id = locale.locale_id >> AND article.volume_id = volume.volume_id >> AND article.issue_id = issue.issue_id ; >> >> >>It takes 5-7 minutes to perform a query on this view, even though there >>are no records in any of >these tables. >> >> >>One more thing I noticed. In trying to figure out what's going wrong, I >>dumped the structure of an existing DB using `pg_dump -s`. Then when I >>tried to create a new DB from the dump file, I got the following error: >> >>CREATE RULE "_RETall_articles" AS ON SELECT TO "all_articles" DO INSTEAD >>SELECT "article_id", "print_publ_date", "print_page_no", "headline", >>"subhead", "source_name", "section_name", "locale_name", "volume_name", >>"issue_name" FROM "article", "article_text", "article_source", "section", >>"locale", "volume", "issue" WHERE ((((("article_id" = "article_id") AND >>("article_source_id" = "source_id")) AND ("section_id" = "section_id")) >>AND ("locale_id" = "locale_id")) AND ("volume_id" = "volume_id")) AND >>("issue_id" = "issue_id"); >>ERROR: Column article_id is ambiguous >> >>It seems that pg_dump isn't handling the SQL correctly. >> >>Thanks in advance for any advice. >> >>Charlie
pgsql-general by date: