Re: An "obvious" index not being used - Mailing list pgsql-performance

From Daniele Varrazzo
Subject Re: An "obvious" index not being used
Date
Msg-id 488222A7.1000202@develer.com
Whole thread Raw
In response to Re: An "obvious" index not being used  (Francisco Reyes <lists@stringsutils.com>)
Responses Re: An "obvious" index not being used  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Francisco Reyes writes:
> Daniele Varrazzo writes:
>
>> I suspect the foo.account_id statistical data are not used at all in
>> query: the query planner can only estimate the number of accounts to
>> look for, not
>
> You mentioned you bumped your default_statistics_target.
> What did you increase it to?
> My data sets are so "strange" that anything less than 350 gives many bad
> plans.

Not default_statistics_target: I used "ALTER TABLE SET STATISTICS" to change
the stats only for the tables I was interested in, arriving up to 1000. I
think the result is the same, but it was a red herring anyway: these stats
couldn't be used at all in my query.

In my problem I had 2 tables: a small one (accounts), a large one (foo). The
way the query is written doesn't allow the stats from the large table to be
used at all, unless the records from the small table are fetched. This is
independent from the stats accuracy.

What the planner does is to assume an even distribution in the data in the
joined fields. The assumption is probably better than not having anything, but
in my data set (where there were a bunch of accounts with many foo each,but
many accounts with too little foo) this proved false.

The stats can be used only if at planning time the planner knows what values
to look for in the field: this is the reason for which, if the query is split
in two parts, performances become acceptable. In this case we may fall in your
situation: a data set may be "strange" and thus require an increase in the
stats resolution. I can't remember if the default 10 was too low, but 100 was
definitely enough for me.

It would be nice if the planner could perform the "split query" optimization
automatically, i.e. fetch records from small tables to plan the action on
larger tables. But I suspect this doesn't fit at all in the current PostgreSQL
query pipeline... or does it?

--
Daniele Varrazzo - Develer S.r.l.
http://www.develer.com

pgsql-performance by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: Performance on Sun Fire X4150 x64 (dd, bonnie++, pgbench)
Next
From: Tom Lane
Date:
Subject: Re: An "obvious" index not being used