Thread: Statistics with temporary tables, optimizer question
I have a table with 45 million rows, partitioned on a date field. Each of the partitions has a primary key. There are 3 partitions so far, 15 million records each. I retrieve up to 200k document id's (primary key for each partition) from the application and put them into a temporary table which I then join to the partitioned monster and some other tables. I discovered a strange thing: the optimizer chooses hash join, with a full table scan of all underlying tables unless I create a primary key on the temporary table, in which case the appropriate nested loops join is chosen. What makes optimizer do that and why? I can get by the problem by disabling hash join in the postgresql.conf but I don't like that solution. Not even lowering random page cost to the same cost as sequential page cost helps. My question is how does the optimizer calculate stats for the temporary tables? I am probably not expected to do a vacuum analyze on the temporary table, after finishing the inserts? How exactly does the optimizer deal with the temporary tables? Postgresql is 8.4.4 on 64 bit Red Hat 5.5 -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com
On Thu, Sep 23, 2010 at 9:33 AM, Mladen Gogala <mladen.gogala@vmsinfo.com> wrote: > My question is how does the optimizer calculate stats for the temporary > tables? I am probably not expected to do a vacuum analyze on the > temporary table, after finishing the inserts? How exactly does the optimizer > deal > with the temporary tables? The advice I've seen says to manually run an ANALYZE on a temporary table which you're going to be doing something performance-intensive with. Old thread: http://archives.postgresql.org/pgsql-general/2004-01/msg01553.php and the 9.0 docs briefly mention as well: http://www.postgresql.org/docs/current/static/sql-createtable.html saying "... appropriate vacuum and analyze operations should be performed via session SQL commands. For example, if a temporary table is going to be used in complex queries, it is wise to run ANALYZE on the temporary table after it is populated. " I think the planner assumes some bogus default (1000?) number of rows for temporary tables otherwise. Can't find a more-reliable reference, but see here: http://stackoverflow.com/questions/486154/postgresql-temporary-tables I also use temp. tables fairly heavily, and I just run an ANALYZE on any decently-sized tables I'm going to use in further queries, which seems to work well for me. Josh
Josh Kupershmidt <schmiddy@gmail.com> writes: > On Thu, Sep 23, 2010 at 9:33 AM, Mladen Gogala > <mladen.gogala@vmsinfo.com> wrote: >> My question is how does the optimizer calculate stats for the temporary >> tables? I am probably not expected to do a vacuum analyze on the >> temporary table, after finishing the inserts? How exactly does the optimizer >> deal >> with the temporary tables? > The advice I've seen says to manually run an ANALYZE on a temporary > table which you're going to be doing something performance-intensive > with. Yeah. Autovacuum cannot help you with a temp table --- it can't safely access such a table. So the stuff that normally goes on "in the background" has to be done explicitly if you need it done. regards, tom lane