Thread: partial unique index and the planner

partial unique index and the planner

From

Michal Politowski

Date:

15 February 2009, 13:50:41

I'm using PostgreSQL 8.3.
Is it normal that plans using a scan on a partial unique index
estimate that much more than one row is returned?

Eg. I see:
 ->  Bitmap Index Scan on tmp_idx_oss_archive_object_id_current (cost=0.00..3.12 rows=4189 width=0)
where the tmp_idx_oss_archive_object_id_current index is a partial unique index.

The estimated row count would be correct for the whole table but obviously not for
the part covered by the unique index.

This happens to be a problem in this case because then the planner
prefers a sequence scan on a table joined to this one and a hash join to an index scan
and a nested loop join. Which takes hundreds of milliseconds instead of
one, so setting enable_hashjoin to false increases performance immensely.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: partial unique index and the planner

From

Scott Marlowe

Date:

15 February 2009, 14:28:49

2009/2/15 Michal Politowski <mpol+pg@meep.pl>:
> I'm using PostgreSQL 8.3.
> Is it normal that plans using a scan on a partial unique index
> estimate that much more than one row is returned?
>
> Eg. I see:
>  ->  Bitmap Index Scan on tmp_idx_oss_archive_object_id_current (cost=0.00..3.12 rows=4189 width=0)
> where the tmp_idx_oss_archive_object_id_current index is a partial unique index.
>
> The estimated row count would be correct for the whole table but obviously not for
> the part covered by the unique index.
>
> This happens to be a problem in this case because then the planner
> prefers a sequence scan on a table joined to this one and a hash join to an index scan
> and a nested loop join. Which takes hundreds of milliseconds instead of
> one, so setting enable_hashjoin to false increases performance immensely.

Have you run analyze since creating the unique partial index?

Re: partial unique index and the planner

From

Tom Lane

Date:

15 February 2009, 14:42:26

Michal Politowski <mpol+pg@meep.pl> writes:
> Is it normal that plans using a scan on a partial unique index
> estimate that much more than one row is returned?

There isn't currently any special logic to recognize that case;
the estimate is just whatever is going to come out of the normal
statistics-based estimation.

I'm unsure how hard it'd be to improve the situation.  If we've already
identified relevant partial indexes before any of the stats code has to
run then it'd be pretty easy, but that might be a bit fragile.

Anyway, the usual advice for such cases is to see if raising the
statistics target helps.

            regards, tom lane

Re: partial unique index and the planner

From

Michal Politowski

Date:

16 February 2009, 14:18:45

On Sun, 15 Feb 2009 13:41:05 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  Michal Politowski <mpol+pg@meep.pl> writes:
> > Is it normal that plans using a scan on a partial unique index
> > estimate that much more than one row is returned?
>
>  There isn't currently any special logic to recognize that case;
>  the estimate is just whatever is going to come out of the normal
>  statistics-based estimation.

Too bad. It seems then that the schema is not well suited to what Postgres
would like. Maybe changing it will be the right thing to do.

The situation is that there are potentially several versions of a row,
only one of which is active at any given moment. The partial unique index
lets a query find the active row quickly, but since it is not known to the
planner that there is only one such row, a join caused problems.

So it looks like, at least for the current problem, separating the active
and inactive rows in their own tables would help.

On the other hand it seems that the table finally grew big enough for
Postgres to prefer index scans to heap scans even with the skewed
estimates.

>  I'm unsure how hard it'd be to improve the situation.  If we've already
>  identified relevant partial indexes before any of the stats code has to
>  run then it'd be pretty easy, but that might be a bit fragile.
>
>  Anyway, the usual advice for such cases is to see if raising the
>  statistics target helps.

Unfortunately it's already 1000. And it seems to me it woud not help here
anyway. The estimate is very correct for the number of rows with the most
common identifier, only it cannot take the active status into consideration.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

Re: partial unique index and the planner

From

Alban Hertroys

Date:

16 February 2009, 14:32:01

On Feb 16, 2009, at 7:18 PM, Michal Politowski wrote:

> On Sun, 15 Feb 2009 13:41:05 -0500, Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
>> Michal Politowski <mpol+pg@meep.pl> writes:
>>> Is it normal that plans using a scan on a partial unique index
>>> estimate that much more than one row is returned?
>>
>> There isn't currently any special logic to recognize that case;
>> the estimate is just whatever is going to come out of the normal
>> statistics-based estimation.
>
> Too bad. It seems then that the schema is not well suited to what
> Postgres
> would like. Maybe changing it will be the right thing to do.
>
> The situation is that there are potentially several versions of a row,
> only one of which is active at any given moment. The partial unique
> index
> lets a query find the active row quickly, but since it is not known
> to the
> planner that there is only one such row, a join caused problems.
>
> So it looks like, at least for the current problem, separating the
> active
> and inactive rows in their own tables would help.

I don't know your exact situation, but you could define a foreign key
from some other table to the 'active' rows in your table. That key
would then need to be updated by a few triggers (on insert, update &
delete) on your table. Put an index on the row that's referencing the
foreign key and you get the same index content-wise that your partial
index was covering, except that all the rows in the referencing table
(and thus the index) are known to have only one matching row in your
table. This way you shouldn't have your earlier problem with the
estimates.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4999b115747031962913450!