Home > mailing lists

Re: avoiding seq scans when two columns are very correlated - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: avoiding seq scans when two columns are very correlated
Date	November 11, 2011 11:36:34
Msg-id	5467.1321025786@sss.pgh.pa.us Whole thread Raw
In response to	avoiding seq scans when two columns are very correlated (Ruslan Zakirov <ruz@bestpractical.com>)
Responses	Re: avoiding seq scans when two columns are very correlated
List	pgsql-performance

Tree view

Ruslan Zakirov <ruz@bestpractical.com> writes:
> A table has two columns id and EffectiveId. First is primary key.
> EffectiveId is almost always equal to id (95%) unless records are
> merged. Many queries have id = EffectiveId condition. Both columns are
> very distinct and Pg reasonably decides that condition has very low
> selectivity and picks sequence scan.

I think the only way is to rethink your data representation.  PG doesn't
have cross-column statistics at all, and even if it did, you'd be asking
for an estimate of conditions in the "long tail" of the distribution.
That's unlikely to be very accurate.

Consider adding a "merged" boolean, or defining effectiveid differently.
For instance you could set it to null in unmerged records; then you
could get the equivalent of the current meaning with
COALESCE(effectiveid, id).  In either case, PG would then have
statistics that bear directly on the question of how many merged vs
unmerged records there are.

            regards, tom lane

pgsql-performance by date:

From: Ruslan Zakirov
Date: 11 November 2011, 11:01:51
Subject: avoiding seq scans when two columns are very correlated

From: Sorin Dudui
Date: 11 November 2011, 11:38:57
Subject: where clause + function, execution order

Re: avoiding seq scans when two columns are very correlated - Mailing list pgsql-performance

Previous

Next