Patrick Krecker <> writes:
> -> Nested Loop (cost=0.57..6148400.28 rows=1 width=24)
> (actual time=0.063..4.054 rows=88 loops=9)
> -> WorkTable Scan on path path_1 (cost=0.00..33.40
> rows=1670 width=16) (actual time=0.000..0.006 rows=112 loops=9)
> -> Index Scan using component_document_id on
> component t_1 (cost=0.57..3681.65 rows=1 width=16) (actual
> time=0.023..0.036 rows=1 loops=1007)
> Index Cond: (document_id = path_1.document_id)
> Filter: (path_1.parent_id = internal_id)
> Rows Removed by Filter: 237
> I would think that it has decided that the document_id index is not
> very selective for the given mix of rows, however I checked the
> statistics for the table and I found that n_distinct for document_id
> is 101559 (the true value is 162545). The value of pg_class.reltuples
> for the table is 96055600, which is very close to the true value
> In the first query, it appears to me that postgres thinks the index
> scan is much more expensive than it really is. However, given the
> accurate statistics, I can't see how.
I think the problem is that it doesn't have any stats for the output of
path_1, so it's probably falling back on some rather generic assumptions
about how many component rows will match each of the two join conditions.
That causes it to think that the indexscan will reject a lot of rows at
the filter step and therefore be expensive. Possibly that could be
improved, but it won't happen overnight.
The most expeditious way to fix this would likely be to provide an
index on component(document_id, internal_id). The planner should
then think an indexscan on that is cheap, regardless of whether the
check on internal_id is really doing much of anything.
regards, tom lane