Optimizer fed bad data about some system-table indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Optimizer fed bad data about some system-table indexes
Date
Msg-id 28894.925487663@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Optimizer fed bad data about some system-table indexes  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Last night I was looking into optimizer misbehavior on the sample query

explain select * from pg_class, pg_description
where pg_class.oid = pg_description.objoid;

As of yesterday the system was generating

Hash Join  (cost=86.59 rows=1007 width=101) ->  Seq Scan on pg_description  (cost=41.23 rows=1007 width=16) ->  Hash
(cost=0.00rows=0 width=0)       ->  Index Scan using pg_class_oid_index on pg_class  (cost=5.57 rows=138 width=85)
 

which was pretty stupid; why use an index scan to load the hashtable?
The reason was that the optimizer was actually estimating the index scan
to be cheaper than a sequential scan (cost of sequential scan was
figured at 6.55).  When I poked into this, I found that costsize.c
was being fed a size of zero for pg_class_oid_index, and was generating
a bogus cost for the index scan because of it.

I changed costsize.c to ensure that cost_index with a selectivity of 1
will always return a larger value than cost_seqscan does with the same
relation-size stats, regardless of what it's told about the index size.
This fixes the immediate problem, but it's still bad that costsize is
getting a bogus index size value; the cost estimates won't be very
accurate.  And considering that there are reasonable stats for 
pg_class_oid_index in pg_class, you'd sort of expect those numbers to
get passed to the optimizer.

As near as I can tell, the bogus data is the fault of the relation
cache.  Info about pg_class_oid_index and a couple of other indexes on
system relations is preloaded into the relcache and locked there on
startup --- and it is *not* coming from pg_class, but from an
initialization file that evidently was made when these system tables
were empty.

Bottom line is that optimization estimates that involve these critical
system indexes will be wrong.  That's not a show-stopper, but it seems
to me that it must be costing us performance somewhere along the line.
I'd like to see if it can be fixed.

Does anyone understand:

(a) why does the relcache need an initialization file for the system
index cache entries in the first place?  If I'm reading the code
correctly, it is able to build the initialization file from the info
in pg_class, so one would think it'd be better to just do that during
every startup and forget the initialization file.

(b) if we can't just get rid of the init file, how about dropping and
rebuilding it at the end of the initdb process (after template1 has
been vacuumed)?  Then at least it'd show a size of a few hundred for
pg_class, instead of zero.
        regards, tom lane


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Mailing list 'relay' sites ... News servers ... WWW/FTP Mirrors
Next
From: Massimo Dal Zotto
Date:
Subject: patches for 6.5.0