Re: [HACKERS] nested loops in joins, ambiguous rewrite rules - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] nested loops in joins, ambiguous rewrite rules |
Date | |
Msg-id | 5397.917730427@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [GENERAL] nested loops in joins, ambiguous rewrite rules (Charles Hornberger <charlie@k4azl.net>) |
Responses |
Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
|
List | pgsql-hackers |
Charles Hornberger <charlie@k4azl.net> writes: > Hello again. I've got a few more notes on this "strange" optimizer > behavior, which might be useful to anyone interested in this problem: Well, I understand at least part of the "problem" here. First, you're assuming that a merge-join plan is necessarily better than a nested-loop plan. That should be true for large queries, but it is *not* necessarily true for small tables --- when there are only a few tuples in the tables being scanned, a simple nested loop wins because it has much less startup overhead. (Or at least that's what our optimizer thinks; I have not tried to measure this for myself.) What's really going on here is that when the optimizer *knows* how small your test tables are, it deliberately chooses nested-loop as being the fastest thing. If it doesn't know, it makes some default assumptions about the sizes of the tables, and with those default sizes it decides that merge-join will be cheaper. So the next question is why apparently similar database situations yield different states of optimizer knowledge. The answer is that creating an index before inserting tuples and creating it afterwards have different side effects on the optimizer's statistics. I've only spent about ten minutes looking into this, so my understanding is no doubt incomplete, but what I've found out is: 1. There are a couple of statistical fields in the system pg_class table, namely relpages and reltuples (number of disk pages and tuples in each relation). 2. There are per-attribute statisticskept in the pg_statistic table. 3. The pg_class statistics fields are updated by a VACUUM (with or without ANALYZE)*and also by CREATE INDEX*. Possibly by other things too ... but plain INSERTs and so forth don't update 'em.4. The pg_statistics info only seems to be updated by VACUUM ANALYZE. So if you doCREATE TABLECREATE INDEXinsert tuples then the state of play is that the optimizer thinks the table is empty. (Or, perhaps, it makes its default assumption --- pg_class doesn't seem to have any special representation for "size of table unknown" as opposed to "size of table is zero", so maybe the optimizer treats reltuples = 0 as meaning it should use a default table size. I haven't looked at that part of the code to find out.) But if you doCREATE TABLEinsert tuplesCREATE INDEX the state of play is that the optimizer thinks there are as many tuples in the table as there were when you created the index. This explains the varying behavior in your detailed test case. I'll bet that if you investigate the contents of pg_class and pg_statistic in the two databases you were originally working with, you'll find that they are different. But a VACUUM ANALYZE should bring everything into sync, assuming that the actual data in the databases is the same. At any rate, if you still think there's something flaky going on, please have a look at what's happening in these statistical fields; that'll give us more info about whether there's really a bug or not. Also, if the optimizer still wants to use nested loop when it knows there are a *lot* of tuples to be processed, there might be a bug in its equations for the cost of these operations --- how large are your tables? regards, tom lane
pgsql-hackers by date: