Thread: Re: [GENERAL] nested loops in joins, ambiguous rewrite rules

Re: [GENERAL] nested loops in joins, ambiguous rewrite rules

From
Charles Hornberger
Date:
Hello.  I just sent this note to the pgsql-general list, but I thought it
probably deserved to land on the hackers list as well.  My apologies if I'm
wrong.

Charlie
-----------------------------------------

X-Sender: hornberger@tabloid.net (Unverified)
Date: Fri, 29 Jan 1999 18:00:11 -0800
To: pgsql-general@postgreSQL.org
From: Charles Hornberger <hornberger@tabloid.net>
Subject: Re: [GENERAL] nested loops in joins, ambiguous rewrite rules
Sender: owner-pgsql-general@postgreSQL.org

Hello again.  I've got a few more notes on this "strange" optimizer
behavior, which might be useful to anyone interested in this problem:

As I mentioned in my last posting, I have one DB called apx13 that is
producing expensive "nested loop" query plans when I try to join two
tables, and one called apx14 that's using "merge join" query plans.  I've
been trying to figure out why this is.  (This is all happening under 6.4.
We haven't upgraded to 6.4.2 yet but I don't see anything in the 6.4.2
HISTORY file that suggests that major changes have been made to the
optimizer, so I guess that probably wouldn't change anything?)

In any case, I dumped the entire contents the apx14 database using `pg_dump
-D apx14 > apx14.out`.  Then I created a new database called apx15 and ran
`psql -f apx14.out apx15`.

Then I ran a join query against apx15:
 apx15=> \i ../query explain select a.article_id, b.article_id  from article a, article_text b  where a.article_id =
b.article_id;NOTICE:  QUERY PLAN:
 
 Nested Loop  (cost=3.20 size=2 width=8)   ->  Seq Scan on article a  (cost=1.07 size=2 width=4)   ->  Seq Scan on
article_textb  (cost=1.07 size=2 width=4)
 
 EXPLAIN

So I ran vacuum analyze and did it again.  And got the same query plan.

At that point, I decided to create yet another DB, apx16.  I just used the
schema from apx14, and the contents of the two tables being used in my join
query.

In other words:
`pg_dump -s apx14 > apx14.schema.out`
`pg_dump -aDt article apx14 > apx14.article.out`
`pg_dump -aDt article_text apx14 > apx14.article_text.out`
`psql -f apx14.schema.out apx16`
`psql -f apx14.article.out apx16`
`psql -f apx14.article_text.out apx16`


Then I ran the same join query against apx16, and this time the optimizer
decided to use a "merge join":
 apx16=> \i ../query explain select a.article_id, b.article_id  from article a, article_text b  where a.article_id =
b.article_id;NOTICE:  QUERY PLAN:
 
 Merge Join  (cost=0.00 size=1 width=8)   ->  Seq Scan  (cost=0.00 size=0 width=0)         ->  Sort  (cost=0.00 size=0
width=0)              ->  Seq Scan on article a  (cost=0.00 size=0 width=4)   ->  Seq Scan  (cost=0.00 size=0 width=0)
      ->  Sort  (cost=0.00 size=0 width=0)               ->  Seq Scan on article_text b  (cost=0.00 size=0 width=4)
 
 EXPLAIN

Maybe I'm missing something, but I can't understand why two identical join
queries run against two identical tables should produce different output.
Maybe there's something in the way my tables and indices are set up, but I
can't imagine what that is.

One final note: I just did another test, hoping to exlude other factors
from the equation.  I created two more databases using only the two
relevant tables (and sequences & indices) from apx16. What I discovered is
that if I build a database from "full" dump files (i.e., dump files that
include both table schema & data), the optimizer will always elect to use
"nested loops" for its query plans.  On the other hand, if I build the
database by first creating the tables and indices, and then doing the
inserts in a second operation, it uses the less costly "merge join".  As a
matter of fact, it appears that if you create an index on a table *after*
inserting data into that table, joins performed on that table will *always*
use these expensive nested loops.

So my theory is:

Since pg_dump creates dump files that perform INSERTs on tables before
performing any CREATE INDEX statements on tables, using raw dump files to
create databases will cause this "nested loop" behavior.

And there's another catch: If you create your tables and indices first,
then insert your data, the optimizer will continue to build "merge join"
query plans until you make the mistake of running a "vacuum analyze".  If
you run "vacuum analyze", the optimizer will then decide to begin using the
more costly nested loops to handle any join queries.  The only way to get
around this is to delete all records from the tables being joined,
re-insert them and remember NOT to run vacuum analyze.

If anyone wants to replicate this, I've documented my tests below.

(And, of course, if I've totally missed the point on anything in here,
please correct me.)

===============
Table structure
===============

apx16=> \d article
Table    = article
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4 not null default nextval (  |
4 |
| section_id                       | int4 not null                    |
4 |
| locale_id                        | int4 not null                    |
4 |
| article_source_id                | int4                             |
4 |
| volume_id                        | int4                             |
4 |
| issue_id                         | int4                             |
4 |
| print_page_no                    | int4                             |
4 |
| print_publ_date                  | date                             |
4 |
| site_publ_date                   | date not null                    |
4 |
| site_publ_time                   | datetime not null                |
8 |
| inputter_id                      | int4 not null                    |
4 |
| input_date                       | datetime default text 'now'      |
8 |
| published                        | int4 default 0                   |
4 |
+----------------------------------+----------------------------------+-----
--+
Indices:  article_article_id_key         article_issue_ix         article_locale_ix         article_section_ix
article_source_ix        article_vol_ix
 

apx16=> \d article_article_id_key

Table    = article_article_id_key
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+

apx16=> \d article_text

Table    = article_text
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4 not null                    |
4 |
| headline                         | varchar()                        |
0 |
| subhead                          | varchar()                        |
0 |
+----------------------------------+----------------------------------+-----
--+
Index:    article_text_ix

apx16=> \d article_text_ix

Table    = article_text_ix
+----------------------------------+----------------------------------+-----
--+
|              Field               |              Type                |
Length|
+----------------------------------+----------------------------------+-----
--+
| article_id                       | int4                             |
4 |
+----------------------------------+----------------------------------+-----
--+



=================
CREATE STATEMENTS
=================

CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
SELECT nextval ('article_article_id_seq');
CREATE TABLE "article" ("article_id" "int4" DEFAULT nextval (
'article_article_id_seq' ) NOT NULL, "section_id" "int4" NOT NULL,
"locale_id" "int4" NOT NULL, "article_source_id" "int4", "volume_id"
"int4", "issue_id" "int4", "print_page_no" "int4", "print_publ_date"
"date", "site_publ_date" "date" NOT NULL, "site_publ_time" "datetime" NOT
NULL, "inputter_id" "int4" NOT NULL, "input_date" "datetime" DEFAULT text
'now', "published" "int4" DEFAULT 0);
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE  INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE  INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE  INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE  INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE  INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE TABLE "article_text" ("article_id" "int4" NOT NULL, "headline"
varchar, "subhead" varchar);
CREATE  INDEX "article_text_ix" on "article_text" using btree (
"article_id" "int4_ops" );


=================
INSERT STATEMENTS
=================

INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');
INSERT INTO "article_text" ("article_id","headline","subhead") values
(11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal');

==================
Steps to replicate
==================

Vacuum analyze 'problem'
------------------------
1. Create a fresh database and execute the CREATE statements above
2. Execute the INSERT statements above
3. Execute the following query:   EXPLAIN SELECT a.article_id, b.article_id    FROM article a, article_text b    WHERE
a.article_id= b.article_id;
 
4. You should see:
   NOTICE:  QUERY PLAN:
   Merge Join  (cost=0.00 size=1 width=8)     ->  Seq Scan  (cost=0.00 size=0 width=0)           ->  Sort  (cost=0.00
size=0width=0)                 ->  Seq Scan on article a  (cost=0.00 size=0 width=4)     ->  Seq Scan  (cost=0.00
size=0width=0)           ->  Sort  (cost=0.00 size=0 width=0)                 ->  Seq Scan on article_text b
(cost=0.00size=0 width=4)
 
   EXPLAIN

5. Do "vacuum analyze".
6. Execute the query again.
7. Now you'll see:
   NOTICE:  QUERY PLAN:
   Nested Loop  (cost=3.20 size=2 width=8)     ->  Seq Scan on article a  (cost=1.07 size=2 width=4)     ->  Seq Scan
onarticle_text b  (cost=1.07 size=2 width=4)
 
   EXPLAIN


pg_dump 'problem'
-----------------

1. Dump the DB you created above to a file with`pg_dump -D dbname > foo`

2. Create a new DB

3. Do `psql -f foo newdb`

4. Execute the join query:   EXPLAIN SELECT a.article_id, b.article_id    FROM article a, article_text b    WHERE
a.article_id= b.article_id;
 

5. You should see:
   NOTICE:  QUERY PLAN:
   Nested Loop  (cost=3.20 size=2 width=8)     ->  Seq Scan on article a  (cost=1.07 size=2 width=4)     ->  Seq Scan
onarticle_text b  (cost=1.07 size=2 width=4)
 
   EXPLAIN