Re: Incorrect assumptions with low LIMITs - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: Incorrect assumptions with low LIMITs
Date
Msg-id CAAZKuFZLGjryXT3C=ebwGyB-NyUNOxn2H8CJLmhab5f8VsU=BQ@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect assumptions with low LIMITs  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Mon, Mar 19, 2012 at 6:19 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sat, 2012-03-17 at 12:48 +0000, Simon Riggs wrote:
>> The problems are as I described them
>>
>> (1) no account made for sparsity, and other factors leading to an
>> overestimate of rows (N)
>>
>> (2) inappropriate assumption of the effect of LIMIT m, which causes a
>> costly SeqScan to appear better than an IndexScan for low m/N, when in
>> fact that is seldom the case.
>>
>> Overestimating N in (1) inverts the problem, so that an overestimate
>> isn't the safe thing at all.
>
> I think the actual problem has more to do with risk. The planner doesn't
> know how uniform the distribution of the table is, which introduces risk
> for the table scan.
>
> I would tend to agree that for low selectivity fraction and a very low
> limit (e.g. 1-3 in your example) and a large table, it doesn't seem like
> a good risk to use a table scan. I don't know how that should be modeled
> or implemented though.

FWIW, we have been bitten by the uniformity assumption a number of
times, but we kind of know what to look for.  It also has an
interesting interpretation as applied to garbage (insomuch as queries
all carry a conceptual "WHERE xmin...xmax... predicate).  Sometimes a
table is mostly garbage for a particular value, and the index scan
constantly has to seek the next tuple in its search to find
non-garbage.

I don't know how many times the uniformity assumption has held up fine
(guess: many), but cases of where an index condhas that little "Filter:" footnote has been a ticking timebomb for a
number of people I've interacted with.  It can be subtle because the
amount of scanned data may not be enormous, so upon re-running the
query it is not nearly as pathological as the original run given cache
effects, the only way to make it very visible is to look at the
EXPLAIN with BUFFERS option and note how many pages are being
discarded by the filter/how many pages of the underlying relation and
indexes are being processed only to be discarded.

Were I writing a OLTP-query performance linting tool, queries with
partially-matched index conditions might even be on my list of things
to warn about.  That is unfortunate for cases where the unmatched part
of the index condition may only qualify, say, five records in all
situations, but the database has no construct to know and enforce this
AFAIK.

-- 
fdr


pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Next
From: Noah Misch
Date:
Subject: Re: [PATCH] Support for foreign keys with arrays