Re: [HACKERS] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-hackers

From Charles Hornberger
Subject Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
Date
Msg-id 3.0.5.32.19990130183504.00acc300@k4azl.net
Whole thread Raw
In response to Re: [HACKERS] nested loops in joins, ambiguous rewrite rules  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
List pgsql-hackers
At 04:07 PM 1/30/99 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>First, you're assuming that a merge-join plan is necessarily better than
>a nested-loop plan.  That should be true for large queries, but it is
>*not* necessarily true for small tables --- when there are only a few
>tuples in the tables being scanned, a simple nested loop wins because it
>has much less startup overhead.  (Or at least that's what our optimizer
>thinks; I have not tried to measure this for myself.)

OK, I understand that I don't understand whether merge-join plans are
necessarily better than nested-loop plans, and that it could make sense to
pick one or the other depending on the size of the tables and the number of
rows in them.  Also, your explanation of how 'vacuum analyze' updates the
statistics in pg_class and pg_statistic makes it very clear why I'm seeing
one query plan in one DB, and different plan in the other.  Thanks for the
quick lesson, and my apologies for making it happen on the hackers list.

But let's leave that aside for the moment and address the underlying
question:  Why does it take ~ 11.5 minutes to process the following query?
It performs a simple ANDed join on seven tables.  All of the columns named
in the WHERE clauses are indexed, and all of the tables except article and
article_text contain just one single row; article and article_text contain
two rows.

EXPLAIN SELECT a.article_id, a.print_publ_date, b.section_name,               c.source_name, d.headline,
e.issue_name,f.locale_name, g.volume_name         FROM article a, section b, article_source c, article_text d,
   issue e, locale f, volume g        WHERE a.article_id = d.article_id          AND a.section_id = b.section_id
 AND a.article_source_id = c.source_id          AND a.issue_id = e.issue_id          AND a.locale_id = f.locale_id
   AND a.volume_id = g.volume_id ;
 

Let me explain each of the WHERE clauses: 1) a.article_id has a unique index, d.article_id has a non-unique index 2)
a.section_idhas a non-unique index, b.section_id has a unique index 3) a.article_source_id has a non-unique index,
c.source_idhas a unique
 
index 4) a.issue_id has a non-unique index, e.issue_id has a unique index 5) a.locale_id has a non-unique index,
f.locale_idhas a unique index 6) a.volume_id has a non-unique index, g.volume_id has a unique index
 


This is being done under Postgres 6.4.2 (we upgraded from 6.4 last night)
on a Pentium 150 with 192MB of RAM running Linux 2.0.35 

Here's the query plan it generated:

NOTICE:  QUERY PLAN:

Nested Loop  (cost=12.49 size=2 width=124) ->  Nested Loop  (cost=10.43 size=2 width=108)       ->  Nested Loop
(cost=8.36size=2 width=92)             ->  Nested Loop  (cost=6.30 size=2 width=76)                   ->  Nested Loop
(cost=4.16size=2 width=60)                         ->  Nested Loop  (cost=2.10 size=2 width=44)
     ->  Seq Scan on locale f  (cost=1.03 size=1
 
width=16)                               ->  Seq Scan on article a  (cost=1.07
size=2 width=28)                         ->  Seq Scan on issue e  (cost=1.03 size=1 width=16)                   ->  Seq
Scanon article_text d  (cost=1.07 size=2
 
width=16)             ->  Seq Scan on article_source c  (cost=1.03 size=1 width=16)       ->  Seq Scan on section b
(cost=1.03size=1 width=16) ->  Seq Scan on volume g  (cost=1.03 size=1 width=16)
 


>At any rate, if you still think there's something flaky going on,
>please have a look at what's happening in these statistical fields;
>that'll give us more info about whether there's really a bug or not.
>
>Also, if the optimizer still wants to use nested loop when it knows
>there are a *lot* of tuples to be processed, there might be a bug in
>its equations for the cost of these operations --- how large are your
>tables?
>

The tables are tiny.

Here are some statistics from the DB.

apx00=> select * from pg_statistic;
starelid|staattnum|staop|stalokey
|stahikey                                           
--------+---------+-----+------------------------------------------+--------
------------------------------------------- 407828|        4|    0|4                                         |4
                                 407828|        5|    0|2                                         |2
                      407828|        6|    0|3                                         |3
           407828|        7|    0|4                                         |4
407828|        8|    0|04-05-2006
 
|04-05-2006                                          407828|        9|    0|01-28-1999
|01-28-1999                                          407828|       10|    0|Thu Jan 28 19:28:40 1999 PST
|ThuJan
 
28 19:28:40 1999 PST                        407828|       11|    0|100                                       |100
                                   407828|       12|    0|Thu Jan 28 19:28:40 1999 PST              |Thu Jan
 
28 19:28:40 1999 PST                        407828|       13|    0|0                                         |0
                                 407828|        1|    0|10                                        |11
                       407828|        2|    0|3                                         |3
            407828|        3|    0|1                                         |1
 407852|        1|    0|10                                        |11                                           407852|
      2|    0|Mayor Cancels Contract                    |Mayor
 
Signs Contract With Company                   407852|        3|    0|Company Promises to Sue Over Scuttled
Deal|Legally
binding document said to be four pages long 407863|        1|    0|3                                         |3
                                 407863|        2|    0|News                                      |News
                         407874|        1|    0|1                                         |1
              407874|        2|    0|Downtown
 
|Downtown                                            407885|        1|    0|4
|4                                          407885|        2|    0|The Times                                 |The
 
Times                                           407896|        1|    0|2                                         |2
                                     407896|        2|    0|2                                         |2
                          407907|        1|    0|3                                         |3
               407907|        2|    0|25                                        |25
     407907|        3|    0|04-05-2006
 
|04-05-2006                                         
(27 rows)

apx00=> select oid,* from pg_class where relowner <> 505;  oid|relname
|reltype|relowner|relam|relpages|reltuples|relhasindex|relisshared|relkind|r
elnatts|relchecks|reltriggers|relukeys|relfkeys|relrefs|relhaspkey|relhasrul
es|relacl
------+----------------------------+-------+--------+-----+--------+--------
-+-----------+-----------+-------+--------+---------+-----------+--------+--
------+-------+----------+-----------+------
407923|article_vol_ix              |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407926|article_source_ix           |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |
407929|article_issue_ix            |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407932|article_locale_ix           |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407726|article_article_id_seq      |      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407743|section_section_id_seq      |      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407760|locale_locale_id_seq        |      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407777|article_source_source_id_seq|      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407794|volume_volume_id_seq        |      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407811|issue_issue_id_seq          |      0|     508|    0|       0|
0|f          |f          |S      |       8|        0|          0|       0|    0|      0|f         |f          |      
407828|article                     |      0|     508|    0|       1|
2|t          |f          |r      |      13|        0|          0|       0|    0|      0|f         |f          |      
407935|article_section_ix          |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407852|article_text                |      0|     508|    0|       1|
2|t          |f          |r      |       3|        0|          0|       0|    0|      0|f         |f          |      
407938|article_text_ix             |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407863|section                     |      0|     508|    0|       1|
1|t          |f          |r      |       2|        0|          0|       0|    0|      0|f         |f          |      
407941|section_section_id_key      |      0|     508|  403|       2|
1|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407874|locale                      |      0|     508|    0|       1|
1|t          |f          |r      |       2|        0|          0|       0|    0|      0|f         |f          |      
407944|locale_locale_id_key        |      0|     508|  403|       2|
1|f          |f          |i      |       1|        0|  
407885|article_source              |      0|     508|    0|       1|
1|t          |f          |r      |       2|        0|          0|       0|    0|      0|f         |f          |      
407920|article_article_id_key      |      0|     508|  403|       2|
2|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407947|article_source_source_id_key|      0|     508|  403|       2|
1|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407896|volume                      |      0|     508|    0|       1|
1|t          |f          |r      |       2|        0|          0|       0|    0|      0|f         |f          |      
407950|volume_volume_id_key        |      0|     508|  403|       2|
1|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
407907|issue                       |      0|     508|    0|       1|
1|t          |f          |r      |       3|        0|          0|       0|    0|      0|f         |f          |      
407953|issue_issue_id_key          |      0|     508|  403|       2|
1|f          |f          |i      |       1|        0|          0|       0|    0|      0|f         |f          |      
(25 rows)


I've appended a dump of this database to the end of this e-mail, if anyone
else wants to give it a shot.

(And finally, a note of apology.  I think my last e-mail to the list was
posted three times; I got two BOUNCE messages back from the list, and so I
resent the message each time.  I think the "From" header in my e-mails was
wrong, but apparently it wasn't wrong enough to actually stop the message
from making its way onto the list; it was just wrong enough to generate a
BOUNCE.)

Thanks,
Charlie

>What's really going on here is that when the optimizer *knows* how small
>your test tables are, it deliberately chooses nested-loop as being the
>fastest thing.  If it doesn't know, it makes some default assumptions
>about the sizes of the tables, and with those default sizes it decides
>that merge-join will be cheaper.
>
>So the next question is why apparently similar database situations yield
>different states of optimizer knowledge.  The answer is that creating
>an index before inserting tuples and creating it afterwards have
>different side effects on the optimizer's statistics.
>
>I've only spent about ten minutes looking into this, so my understanding
>is no doubt incomplete, but what I've found out is:
>  1. There are a couple of statistical fields in the system pg_class
>     table, namely relpages and reltuples (number of disk pages and
>     tuples in each relation).
>  2. There are per-attribute statistics kept in the pg_statistic table.
>  3. The pg_class statistics fields are updated by a VACUUM (with or
>     without ANALYZE) *and also by CREATE INDEX*.  Possibly by other
>     things too ... but plain INSERTs and so forth don't update 'em.
>  4. The pg_statistics info only seems to be updated by VACUUM ANALYZE.
>
>So if you do
>    CREATE TABLE
>    CREATE INDEX
>    insert tuples
>then the state of play is that the optimizer thinks the table is empty.
>(Or, perhaps, it makes its default assumption --- pg_class doesn't seem
>to have any special representation for "size of table unknown" as
>opposed to "size of table is zero", so maybe the optimizer treats
>reltuples = 0 as meaning it should use a default table size.  I haven't
>looked at that part of the code to find out.)
>
>But if you do
>    CREATE TABLE
>    insert tuples
>    CREATE INDEX
>the state of play is that the optimizer thinks there are as many tuples
>in the table as there were when you created the index.  This explains
>the varying behavior in your detailed test case.
>
>I'll bet that if you investigate the contents of pg_class and
>pg_statistic in the two databases you were originally working with,
>you'll find that they are different.  But a VACUUM ANALYZE should
>bring everything into sync, assuming that the actual data in the
>databases is the same.
>
>At any rate, if you still think there's something flaky going on,
>please have a look at what's happening in these statistical fields;
>that'll give us more info about whether there's really a bug or not.
>
>Also, if the optimizer still wants to use nested loop when it knows
>there are a *lot* of tuples to be processed, there might be a bug in
>its equations for the cost of these operations --- how large are your
>tables?
>
>            regards, tom lane
>
>


CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE SEQUENCE "section_section_id_seq" start 7 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE SEQUENCE "locale_locale_id_seq" start 3 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE SEQUENCE "article_source_source_id_seq" start 4 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE SEQUENCE "volume_volume_id_seq" start 2 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE SEQUENCE "issue_issue_id_seq" start 3 increment 1 maxvalue
2147483647 minvalue 1  cache 1 ;
CREATE TABLE "article" (       "article_id" int4 DEFAULT nextval ( 'article_article_id_seq' ) NOT
NULL,       "section_id" int4 NOT NULL,       "locale_id" int4 NOT NULL,       "article_source_id" int4,
"volume_id"int4,       "issue_id" int4,       "print_page_no" int4,       "print_publ_date" date,
"site_publ_date"date NOT NULL,       "site_publ_time" datetime NOT NULL,       "inputter_id" int4 NOT NULL,
"input_date"datetime DEFAULT text 'now',       "published" int4 DEFAULT 0);
 
CREATE TABLE "article_text" (       "article_id" int4 NOT NULL,       "headline" character varying,       "subhead"
charactervarying);
 
CREATE TABLE "section" (       "section_id" int4 DEFAULT nextval ( 'section_section_id_seq' ) NOT
NULL,       "section_name" character varying NOT NULL);
CREATE TABLE "locale" (       "locale_id" int4 DEFAULT nextval ( 'locale_locale_id_seq' ) NOT NULL,       "locale_name"
charactervarying NOT NULL);
 
CREATE TABLE "article_source" (       "source_id" int4 DEFAULT nextval ( 'article_source_source_id_seq' )
NOT NULL,       "source_name" character varying NOT NULL);
CREATE TABLE "volume" (       "volume_id" int4 DEFAULT nextval ( 'volume_volume_id_seq' ) NOT NULL,       "volume_name"
charactervarying);
 
CREATE TABLE "issue" (       "issue_id" int4 DEFAULT nextval ( 'issue_issue_id_seq' ) NOT NULL,       "issue_name"
charactervarying,       "issue_date" date NOT NULL);
 
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article"
("article_id","section_id","locale_id","article_source_id","volume_id","issu
e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i
nputter_id","input_date","published") values
(11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999
PST',100,'Thu Jan 28 19:28:40 1999 PST',0);
INSERT INTO "article_text" ("article_id","headline","subhead") values
(10,'Mayor Signs Contract With Company','Legally binding document said to
be four pages long');
INSERT INTO "article_text" ("article_id","headline","subhead") values
(11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal');
INSERT INTO "section" ("section_id","section_name") values (3,'News');
INSERT INTO "locale" ("locale_id","locale_name") values (1,'Downtown');
INSERT INTO "article_source" ("source_id","source_name") values (4,'The
Times');
INSERT INTO "volume" ("volume_id","volume_name") values (2,'2');
INSERT INTO "issue" ("issue_id","issue_name","issue_date") values
(3,'25','04-05-2006');
CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree (
"article_id" "int4_ops" );
CREATE  INDEX "article_vol_ix" on "article" using btree ( "volume_id"
"int4_ops" );
CREATE  INDEX "article_source_ix" on "article" using btree (
"article_source_id" "int4_ops" );
CREATE  INDEX "article_issue_ix" on "article" using btree ( "issue_id"
"int4_ops" );
CREATE  INDEX "article_locale_ix" on "article" using btree ( "locale_id"
"int4_ops" );
CREATE  INDEX "article_section_ix" on "article" using btree ( "section_id"
"int4_ops" );
CREATE  INDEX "article_text_ix" on "article_text" using btree (
"article_id" "int4_ops" );
CREATE UNIQUE INDEX "section_section_id_key" on "section" using btree (
"section_id" "int4_ops" );
CREATE UNIQUE INDEX "locale_locale_id_key" on "locale" using btree (
"locale_id" "int4_ops" );
CREATE UNIQUE INDEX "article_source_source_id_key" on "article_source"
using btree ( "source_id" "int4_ops" );
CREATE UNIQUE INDEX "volume_volume_id_key" on "volume" using btree (
"volume_id" "int4_ops" );
CREATE UNIQUE INDEX "issue_issue_id_key" on "issue" using btree (
"issue_id" "int4_ops" );



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes)
Next
From: "D'Arcy" "J.M." Cain
Date:
Subject: Re: vacuumdb?