Re: [HACKERS] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-hackers
From | Charles Hornberger |
---|---|
Subject | Re: [HACKERS] nested loops in joins, ambiguous rewrite rules |
Date | |
Msg-id | 3.0.5.32.19990130183504.00acc300@k4azl.net Whole thread Raw |
In response to | Re: [HACKERS] nested loops in joins, ambiguous rewrite rules (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
|
List | pgsql-hackers |
At 04:07 PM 1/30/99 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >First, you're assuming that a merge-join plan is necessarily better than >a nested-loop plan. That should be true for large queries, but it is >*not* necessarily true for small tables --- when there are only a few >tuples in the tables being scanned, a simple nested loop wins because it >has much less startup overhead. (Or at least that's what our optimizer >thinks; I have not tried to measure this for myself.) OK, I understand that I don't understand whether merge-join plans are necessarily better than nested-loop plans, and that it could make sense to pick one or the other depending on the size of the tables and the number of rows in them. Also, your explanation of how 'vacuum analyze' updates the statistics in pg_class and pg_statistic makes it very clear why I'm seeing one query plan in one DB, and different plan in the other. Thanks for the quick lesson, and my apologies for making it happen on the hackers list. But let's leave that aside for the moment and address the underlying question: Why does it take ~ 11.5 minutes to process the following query? It performs a simple ANDed join on seven tables. All of the columns named in the WHERE clauses are indexed, and all of the tables except article and article_text contain just one single row; article and article_text contain two rows. EXPLAIN SELECT a.article_id, a.print_publ_date, b.section_name, c.source_name, d.headline, e.issue_name,f.locale_name, g.volume_name FROM article a, section b, article_source c, article_text d, issue e, locale f, volume g WHERE a.article_id = d.article_id AND a.section_id = b.section_id AND a.article_source_id = c.source_id AND a.issue_id = e.issue_id AND a.locale_id = f.locale_id AND a.volume_id = g.volume_id ; Let me explain each of the WHERE clauses: 1) a.article_id has a unique index, d.article_id has a non-unique index 2) a.section_idhas a non-unique index, b.section_id has a unique index 3) a.article_source_id has a non-unique index, c.source_idhas a unique index 4) a.issue_id has a non-unique index, e.issue_id has a unique index 5) a.locale_id has a non-unique index, f.locale_idhas a unique index 6) a.volume_id has a non-unique index, g.volume_id has a unique index This is being done under Postgres 6.4.2 (we upgraded from 6.4 last night) on a Pentium 150 with 192MB of RAM running Linux 2.0.35 Here's the query plan it generated: NOTICE: QUERY PLAN: Nested Loop (cost=12.49 size=2 width=124) -> Nested Loop (cost=10.43 size=2 width=108) -> Nested Loop (cost=8.36size=2 width=92) -> Nested Loop (cost=6.30 size=2 width=76) -> Nested Loop (cost=4.16size=2 width=60) -> Nested Loop (cost=2.10 size=2 width=44) -> Seq Scan on locale f (cost=1.03 size=1 width=16) -> Seq Scan on article a (cost=1.07 size=2 width=28) -> Seq Scan on issue e (cost=1.03 size=1 width=16) -> Seq Scanon article_text d (cost=1.07 size=2 width=16) -> Seq Scan on article_source c (cost=1.03 size=1 width=16) -> Seq Scan on section b (cost=1.03size=1 width=16) -> Seq Scan on volume g (cost=1.03 size=1 width=16) >At any rate, if you still think there's something flaky going on, >please have a look at what's happening in these statistical fields; >that'll give us more info about whether there's really a bug or not. > >Also, if the optimizer still wants to use nested loop when it knows >there are a *lot* of tuples to be processed, there might be a bug in >its equations for the cost of these operations --- how large are your >tables? > The tables are tiny. Here are some statistics from the DB. apx00=> select * from pg_statistic; starelid|staattnum|staop|stalokey |stahikey --------+---------+-----+------------------------------------------+-------- ------------------------------------------- 407828| 4| 0|4 |4 407828| 5| 0|2 |2 407828| 6| 0|3 |3 407828| 7| 0|4 |4 407828| 8| 0|04-05-2006 |04-05-2006 407828| 9| 0|01-28-1999 |01-28-1999 407828| 10| 0|Thu Jan 28 19:28:40 1999 PST |ThuJan 28 19:28:40 1999 PST 407828| 11| 0|100 |100 407828| 12| 0|Thu Jan 28 19:28:40 1999 PST |Thu Jan 28 19:28:40 1999 PST 407828| 13| 0|0 |0 407828| 1| 0|10 |11 407828| 2| 0|3 |3 407828| 3| 0|1 |1 407852| 1| 0|10 |11 407852| 2| 0|Mayor Cancels Contract |Mayor Signs Contract With Company 407852| 3| 0|Company Promises to Sue Over Scuttled Deal|Legally binding document said to be four pages long 407863| 1| 0|3 |3 407863| 2| 0|News |News 407874| 1| 0|1 |1 407874| 2| 0|Downtown |Downtown 407885| 1| 0|4 |4 407885| 2| 0|The Times |The Times 407896| 1| 0|2 |2 407896| 2| 0|2 |2 407907| 1| 0|3 |3 407907| 2| 0|25 |25 407907| 3| 0|04-05-2006 |04-05-2006 (27 rows) apx00=> select oid,* from pg_class where relowner <> 505; oid|relname |reltype|relowner|relam|relpages|reltuples|relhasindex|relisshared|relkind|r elnatts|relchecks|reltriggers|relukeys|relfkeys|relrefs|relhaspkey|relhasrul es|relacl ------+----------------------------+-------+--------+-----+--------+-------- -+-----------+-----------+-------+--------+---------+-----------+--------+-- ------+-------+----------+-----------+------ 407923|article_vol_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407926|article_source_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407929|article_issue_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407932|article_locale_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407726|article_article_id_seq | 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407743|section_section_id_seq | 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407760|locale_locale_id_seq | 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407777|article_source_source_id_seq| 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407794|volume_volume_id_seq | 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407811|issue_issue_id_seq | 0| 508| 0| 0| 0|f |f |S | 8| 0| 0| 0| 0| 0|f |f | 407828|article | 0| 508| 0| 1| 2|t |f |r | 13| 0| 0| 0| 0| 0|f |f | 407935|article_section_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407852|article_text | 0| 508| 0| 1| 2|t |f |r | 3| 0| 0| 0| 0| 0|f |f | 407938|article_text_ix | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407863|section | 0| 508| 0| 1| 1|t |f |r | 2| 0| 0| 0| 0| 0|f |f | 407941|section_section_id_key | 0| 508| 403| 2| 1|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407874|locale | 0| 508| 0| 1| 1|t |f |r | 2| 0| 0| 0| 0| 0|f |f | 407944|locale_locale_id_key | 0| 508| 403| 2| 1|f |f |i | 1| 0| 407885|article_source | 0| 508| 0| 1| 1|t |f |r | 2| 0| 0| 0| 0| 0|f |f | 407920|article_article_id_key | 0| 508| 403| 2| 2|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407947|article_source_source_id_key| 0| 508| 403| 2| 1|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407896|volume | 0| 508| 0| 1| 1|t |f |r | 2| 0| 0| 0| 0| 0|f |f | 407950|volume_volume_id_key | 0| 508| 403| 2| 1|f |f |i | 1| 0| 0| 0| 0| 0|f |f | 407907|issue | 0| 508| 0| 1| 1|t |f |r | 3| 0| 0| 0| 0| 0|f |f | 407953|issue_issue_id_key | 0| 508| 403| 2| 1|f |f |i | 1| 0| 0| 0| 0| 0|f |f | (25 rows) I've appended a dump of this database to the end of this e-mail, if anyone else wants to give it a shot. (And finally, a note of apology. I think my last e-mail to the list was posted three times; I got two BOUNCE messages back from the list, and so I resent the message each time. I think the "From" header in my e-mails was wrong, but apparently it wasn't wrong enough to actually stop the message from making its way onto the list; it was just wrong enough to generate a BOUNCE.) Thanks, Charlie >What's really going on here is that when the optimizer *knows* how small >your test tables are, it deliberately chooses nested-loop as being the >fastest thing. If it doesn't know, it makes some default assumptions >about the sizes of the tables, and with those default sizes it decides >that merge-join will be cheaper. > >So the next question is why apparently similar database situations yield >different states of optimizer knowledge. The answer is that creating >an index before inserting tuples and creating it afterwards have >different side effects on the optimizer's statistics. > >I've only spent about ten minutes looking into this, so my understanding >is no doubt incomplete, but what I've found out is: > 1. There are a couple of statistical fields in the system pg_class > table, namely relpages and reltuples (number of disk pages and > tuples in each relation). > 2. There are per-attribute statistics kept in the pg_statistic table. > 3. The pg_class statistics fields are updated by a VACUUM (with or > without ANALYZE) *and also by CREATE INDEX*. Possibly by other > things too ... but plain INSERTs and so forth don't update 'em. > 4. The pg_statistics info only seems to be updated by VACUUM ANALYZE. > >So if you do > CREATE TABLE > CREATE INDEX > insert tuples >then the state of play is that the optimizer thinks the table is empty. >(Or, perhaps, it makes its default assumption --- pg_class doesn't seem >to have any special representation for "size of table unknown" as >opposed to "size of table is zero", so maybe the optimizer treats >reltuples = 0 as meaning it should use a default table size. I haven't >looked at that part of the code to find out.) > >But if you do > CREATE TABLE > insert tuples > CREATE INDEX >the state of play is that the optimizer thinks there are as many tuples >in the table as there were when you created the index. This explains >the varying behavior in your detailed test case. > >I'll bet that if you investigate the contents of pg_class and >pg_statistic in the two databases you were originally working with, >you'll find that they are different. But a VACUUM ANALYZE should >bring everything into sync, assuming that the actual data in the >databases is the same. > >At any rate, if you still think there's something flaky going on, >please have a look at what's happening in these statistical fields; >that'll give us more info about whether there's really a bug or not. > >Also, if the optimizer still wants to use nested loop when it knows >there are a *lot* of tuples to be processed, there might be a bug in >its equations for the cost of these operations --- how large are your >tables? > > regards, tom lane > > CREATE SEQUENCE "article_article_id_seq" start 10 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE SEQUENCE "section_section_id_seq" start 7 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE SEQUENCE "locale_locale_id_seq" start 3 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE SEQUENCE "article_source_source_id_seq" start 4 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE SEQUENCE "volume_volume_id_seq" start 2 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE SEQUENCE "issue_issue_id_seq" start 3 increment 1 maxvalue 2147483647 minvalue 1 cache 1 ; CREATE TABLE "article" ( "article_id" int4 DEFAULT nextval ( 'article_article_id_seq' ) NOT NULL, "section_id" int4 NOT NULL, "locale_id" int4 NOT NULL, "article_source_id" int4, "volume_id"int4, "issue_id" int4, "print_page_no" int4, "print_publ_date" date, "site_publ_date"date NOT NULL, "site_publ_time" datetime NOT NULL, "inputter_id" int4 NOT NULL, "input_date"datetime DEFAULT text 'now', "published" int4 DEFAULT 0); CREATE TABLE "article_text" ( "article_id" int4 NOT NULL, "headline" character varying, "subhead" charactervarying); CREATE TABLE "section" ( "section_id" int4 DEFAULT nextval ( 'section_section_id_seq' ) NOT NULL, "section_name" character varying NOT NULL); CREATE TABLE "locale" ( "locale_id" int4 DEFAULT nextval ( 'locale_locale_id_seq' ) NOT NULL, "locale_name" charactervarying NOT NULL); CREATE TABLE "article_source" ( "source_id" int4 DEFAULT nextval ( 'article_source_source_id_seq' ) NOT NULL, "source_name" character varying NOT NULL); CREATE TABLE "volume" ( "volume_id" int4 DEFAULT nextval ( 'volume_volume_id_seq' ) NOT NULL, "volume_name" charactervarying); CREATE TABLE "issue" ( "issue_id" int4 DEFAULT nextval ( 'issue_issue_id_seq' ) NOT NULL, "issue_name" charactervarying, "issue_date" date NOT NULL); INSERT INTO "article" ("article_id","section_id","locale_id","article_source_id","volume_id","issu e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i nputter_id","input_date","published") values (10,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999 PST',100,'Thu Jan 28 19:28:40 1999 PST',0); INSERT INTO "article" ("article_id","section_id","locale_id","article_source_id","volume_id","issu e_id","print_page_no","print_publ_date","site_publ_date","site_publ_time","i nputter_id","input_date","published") values (11,3,1,4,2,3,4,'04-05-2006','01-28-1999','Thu Jan 28 19:28:40 1999 PST',100,'Thu Jan 28 19:28:40 1999 PST',0); INSERT INTO "article_text" ("article_id","headline","subhead") values (10,'Mayor Signs Contract With Company','Legally binding document said to be four pages long'); INSERT INTO "article_text" ("article_id","headline","subhead") values (11,'Mayor Cancels Contract','Company Promises to Sue Over Scuttled Deal'); INSERT INTO "section" ("section_id","section_name") values (3,'News'); INSERT INTO "locale" ("locale_id","locale_name") values (1,'Downtown'); INSERT INTO "article_source" ("source_id","source_name") values (4,'The Times'); INSERT INTO "volume" ("volume_id","volume_name") values (2,'2'); INSERT INTO "issue" ("issue_id","issue_name","issue_date") values (3,'25','04-05-2006'); CREATE UNIQUE INDEX "article_article_id_key" on "article" using btree ( "article_id" "int4_ops" ); CREATE INDEX "article_vol_ix" on "article" using btree ( "volume_id" "int4_ops" ); CREATE INDEX "article_source_ix" on "article" using btree ( "article_source_id" "int4_ops" ); CREATE INDEX "article_issue_ix" on "article" using btree ( "issue_id" "int4_ops" ); CREATE INDEX "article_locale_ix" on "article" using btree ( "locale_id" "int4_ops" ); CREATE INDEX "article_section_ix" on "article" using btree ( "section_id" "int4_ops" ); CREATE INDEX "article_text_ix" on "article_text" using btree ( "article_id" "int4_ops" ); CREATE UNIQUE INDEX "section_section_id_key" on "section" using btree ( "section_id" "int4_ops" ); CREATE UNIQUE INDEX "locale_locale_id_key" on "locale" using btree ( "locale_id" "int4_ops" ); CREATE UNIQUE INDEX "article_source_source_id_key" on "article_source" using btree ( "source_id" "int4_ops" ); CREATE UNIQUE INDEX "volume_volume_id_key" on "volume" using btree ( "volume_id" "int4_ops" ); CREATE UNIQUE INDEX "issue_issue_id_key" on "issue" using btree ( "issue_id" "int4_ops" );
pgsql-hackers by date: