Re: [HACKERS] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
Date
Msg-id 5397.917730427@sss.pgh.pa.us
Whole thread Raw
In response to Re: [GENERAL] nested loops in joins, ambiguous rewrite rules  (Charles Hornberger <charlie@k4azl.net>)
Responses Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
List pgsql-hackers
Charles Hornberger <charlie@k4azl.net> writes:
> Hello again.  I've got a few more notes on this "strange" optimizer
> behavior, which might be useful to anyone interested in this problem:

Well, I understand at least part of the "problem" here.

First, you're assuming that a merge-join plan is necessarily better than
a nested-loop plan.  That should be true for large queries, but it is
*not* necessarily true for small tables --- when there are only a few
tuples in the tables being scanned, a simple nested loop wins because it
has much less startup overhead.  (Or at least that's what our optimizer
thinks; I have not tried to measure this for myself.)

What's really going on here is that when the optimizer *knows* how small
your test tables are, it deliberately chooses nested-loop as being the
fastest thing.  If it doesn't know, it makes some default assumptions
about the sizes of the tables, and with those default sizes it decides
that merge-join will be cheaper.

So the next question is why apparently similar database situations yield
different states of optimizer knowledge.  The answer is that creating
an index before inserting tuples and creating it afterwards have
different side effects on the optimizer's statistics.

I've only spent about ten minutes looking into this, so my understanding
is no doubt incomplete, but what I've found out is: 1. There are a couple of statistical fields in the system pg_class
 table, namely relpages and reltuples (number of disk pages and    tuples in each relation). 2. There are per-attribute
statisticskept in the pg_statistic table. 3. The pg_class statistics fields are updated by a VACUUM (with or    without
ANALYZE)*and also by CREATE INDEX*.  Possibly by other    things too ... but plain INSERTs and so forth don't update
'em.4. The pg_statistics info only seems to be updated by VACUUM ANALYZE.
 

So if you doCREATE TABLECREATE INDEXinsert tuples
then the state of play is that the optimizer thinks the table is empty.
(Or, perhaps, it makes its default assumption --- pg_class doesn't seem
to have any special representation for "size of table unknown" as
opposed to "size of table is zero", so maybe the optimizer treats
reltuples = 0 as meaning it should use a default table size.  I haven't
looked at that part of the code to find out.)

But if you doCREATE TABLEinsert tuplesCREATE INDEX
the state of play is that the optimizer thinks there are as many tuples
in the table as there were when you created the index.  This explains
the varying behavior in your detailed test case.

I'll bet that if you investigate the contents of pg_class and
pg_statistic in the two databases you were originally working with,
you'll find that they are different.  But a VACUUM ANALYZE should
bring everything into sync, assuming that the actual data in the
databases is the same.

At any rate, if you still think there's something flaky going on,
please have a look at what's happening in these statistical fields;
that'll give us more info about whether there's really a bug or not.

Also, if the optimizer still wants to use nested loop when it knows
there are a *lot* of tuples to be processed, there might be a bug in
its equations for the cost of these operations --- how large are your
tables?
        regards, tom lane


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [DOCS] Re: postgres v6.0 and pgsql-docs-digest V1 #401
Next
From: Tom Lane
Date:
Subject: Reducing sema usage (was Postmaster dies with many child processes)