On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov <lepihov@gmail.com> wrote:
> Postgres didn't want Materialize in this example because of the low estimation on its outer subquery. AFAIC, by
increasingthe *_page_cost's value, you added extra weight to the inner subquery and shifted the decision to use
materialisation.
Interesting, except I decreased the random_page_cost. Just clarifying.
> I see huge underestimation in the simple scan:
>
> Bitmap Heap Scan on metainfo b_1
> (cost=23.96..35.77 rows=3 width=38)
> (actual time=1.225..4.206 rows=1025 loops=1)
>
> It may be caused by some functional dependency in its filter:
>
> ((relation = 'located'::text) AND (type = 'document'::text))
>
> You can create extended statistics on the columns 'relation' and 'type'. These statistics can reduce estimation
errorsand enable the optimiser to find a better plan without changing the cost balance.
OK, this is the first I'm learning about extended statistics...
I'm looking at https://www.postgresql.org/docs/15/sql-createstatistics.html
and https://www.postgresql.org/docs/15/planner-stats.html#PLANNER-STATS-EXTENDED
What kind of extended statistics do you suggest for this? ndistinct, dependencies, or mcv?
CREATE STATISTICS tablename_stats (<statistics type?>) ON relation, type FROM tablename;
ANALYZE tablename;
Thanks,
Ed