Re: [GENERAL] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-general

From Charles Hornberger
Subject Re: [GENERAL] nested loops in joins, ambiguous rewrite rules
Date
Msg-id 3.0.5.32.19990129145115.00b0ac50@k4azl.net
Whole thread Raw
In response to nested loops in joins, ambiguous rewrite rules  (Charles Hornberger <hornberger@tabloid.net>)
List pgsql-general
Hello! I'm in a bit of a fix, and I bet that someone out there has just the
right answer for me. I've been struggling with this optimization issue for
a couple days now.

The explanation that follows is very detailed ... I include the dumps from
a few versions of
one of my test databases.

It's been suggested to me by others that perhaps there's some kind of bug
in the optimizer, but that's not a diagnois that I could make on my own.
I'd really appreciate a fresh view on this problem.

The most recent suggestion I got from Vadim was to run vacuum analyze
against the databases. I've tried this multiple times, but it has no effect.

What is happening (please forgive my non-technical explanation) is that the
optimizer is occasionally lapsing into these "nested loop" query plans,
rather than using something more efficient.  It doesn't matter whether I
vacuum the DB; it seems to depend on something else entirely. But I can't
figure out what that is.

Here are two identically structured sets of tables from two separate
databases, with slightly different data in them.

This DB is named apx13:
----------------------
CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE  INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE  INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE  INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE  INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE  INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');


Here's the same thing, but with a couple extra rows in the 'article' and
'article_text' tables

This one is named apx14
-----------------------
CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE  INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE  INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE  INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE  INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE  INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');
INSERT INTO "article_text" ("article_id","headline","subhead") values
(11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal');
CREATE  INDEX "article_text_ix" on "article_text" using btree (
"article_id" "int4_ops" );


When I run the following query...

    EXPLAIN SELECT a.article_id, b.article_id
       FROM article a, article_text b
      WHERE a.article_id = b.article_id;

... against apx13, it produces:

NOTICE: QUERY PLAN:

Nested Loop  (cost=2.07 size=1 width=8)
  ->  Seq Scan on article a  (cost=1.03 size=1 width=4)
  ->  Seq Scan on article_text b  (cost=1.03 size=1 width=4)

EXPLAIN


When I run it against apx14, it produces:

NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00 size=1 width=8)
  ->  Seq Scan  (cost=0.00 size=0 width=0)
        ->  Sort  (cost=0.00 size=0 width=0)
              ->  Seq Scan on article a  (cost=0.00 size=0 width=4)
  ->  Seq Scan  (cost=0.00 size=0 width=0)
        ->  Sort  (cost=0.00 size=0 width=0)
              ->  Seq Scan on article_text b  (cost=0.00 size=0 width=4)

EXPLAIN

In order to further try and track down this problem, I dumped the schema
from apx14 and the data from apx13, then created a new database, apx15, by
reloading the apx14 schema and the apx13 data.  Join queries in the new
apx15 database also produce "Merge Join" query plans instead of nested loops.

So what's wrong with apx13?

Can anyone give me any hints as to how to fix this?

Thanks in advance!

Charlie

(My previous posting, and Vadim's reponse, follow:)

At 07:03 PM 1/29/99 +0700, you wrote:
>Charles Hornberger wrote:
>>
>> I've got a Postgres 6.4 DB that is doing, to my mind, funny things.
>>
>> The main problem is that joins over multiple tables are taking a long, long
>> time to execute.
>>
>
>man vacuum
>
>Vadim
>
>
>>
>>
>>When I do EXPLAINs on join queries, I find that the optimizer is choosing
>>to do a lot of nested loops and sequential scans, instead of Merge Joins or
>>something that sounds more efficient.
>>
>>For instance, here's a EXPLAIN SELECT on a view that joins 7 tables.  All
>>of the join columns are indexed.
>>
>>NOTICE:  QUERY PLAN:
>>
>>Nested Loop  (cost=0.00 size=1 width=140)
>>  ->  Nested Loop  (cost=0.00 size=1 width=124)
>>        ->  Nested Loop  (cost=0.00 size=1 width=108)
>>              ->  Nested Loop  (cost=0.00 size=1 width=92)
>>                    ->  Nested Loop  (cost=0.00 size=1 width=76)
>>                         ->  Merge Join  (cost=0.00 size=1 width=60)
>>                                ->  Seq Scan  (cost=0.00 size=0 width=0)
>>                                      ->  Sort  (cost=0.00 size=0 width=0)
>>                                            ->  Seq Scan on article
(cost=0.00 size=0 width=32)
>>                                ->  Seq Scan  (cost=0.00 size=0 width=0)
>>                                      ->  Sort  (cost=0.00 size=0 width=0)
>>                                            ->  Seq Scan on article_text
(cost=0.00 size=0 width=28)
>>                          ->  Seq Scan on article_source  (cost=0.00
size=0 width=16)
>>                    ->  Seq Scan on section  (cost=0.00 size=0 width=16)
>>              ->  Seq Scan on locale  (cost=0.00 size=0 width=16)
>>        ->  Seq Scan on volume  (cost=0.00 size=0 width=16)
>>  ->  Seq Scan on issue  (cost=0.00 size=0 width=16)
>>
>>EXPLAIN
>>
>>This view was created with:
>>
>>CREATE VIEW all_articles AS
>>SELECT  article.article_id, article.print_publ_date,
>>        article.print_page_no,
>>        article_text.headline, article_text.subhead,
>>        article_source.source_name,
>>        section.section_name,
>>        locale.locale_name,
>>        volume.volume_name,
>>        issue.issue_name
>>   FROM article,
>>        article_text,
>>        article_source,
>>        section,
>>        locale,
>>        volume,
>>        issue
>>  WHERE article.article_id = article_text.article_id
>>    AND article.article_source_id = article_source.source_id
>>    AND article.section_id = section.section_id
>>    AND article.locale_id = locale.locale_id
>>    AND article.volume_id = volume.volume_id
>>    AND article.issue_id = issue.issue_id ;
>>
>>
>>It takes 5-7 minutes to perform a query on this view, even though there
>>are no records in any of >these tables.
>>
>>
>>One more thing I noticed.  In trying to figure out what's going wrong, I
>>dumped the structure of an existing DB using `pg_dump -s`.  Then when I
>>tried to create a new DB from the dump file, I got the following error:
>>
>>CREATE RULE "_RETall_articles" AS ON SELECT TO "all_articles" DO INSTEAD
>>SELECT "article_id", "print_publ_date", "print_page_no", "headline",
>>"subhead", "source_name", "section_name", "locale_name", "volume_name",
>>"issue_name" FROM "article", "article_text", "article_source", "section",
>>"locale", "volume", "issue" WHERE ((((("article_id" = "article_id") AND
>>("article_source_id" = "source_id")) AND ("section_id" = "section_id"))
>>AND ("locale_id" = "locale_id")) AND ("volume_id" = "volume_id")) AND
>>("issue_id" = "issue_id");
>>ERROR:  Column article_id is ambiguous
>>
>>It seems that pg_dump isn't handling the SQL correctly.
>>
>>Thanks in advance for any advice.
>>
>>Charlie


pgsql-general by date:

Previous
From: Mateus Cordeiro Inssa
Date:
Subject: [ADMIN] Duplicate Data/RAID-1/postgres 6.4.2
Next
From: Al Dev
Date:
Subject: readline lib: command line history editing - Is it needed?