Thread: partial unique index and the planner
I'm using PostgreSQL 8.3. Is it normal that plans using a scan on a partial unique index estimate that much more than one row is returned? Eg. I see: -> Bitmap Index Scan on tmp_idx_oss_archive_object_id_current (cost=0.00..3.12 rows=4189 width=0) where the tmp_idx_oss_archive_object_id_current index is a partial unique index. The estimated row count would be correct for the whole table but obviously not for the part covered by the unique index. This happens to be a problem in this case because then the planner prefers a sequence scan on a table joined to this one and a hash join to an index scan and a nested loop join. Which takes hundreds of milliseconds instead of one, so setting enable_hashjoin to false increases performance immensely. -- Michał Politowski Talking has been known to lead to communication if practiced carelessly.
2009/2/15 Michal Politowski <mpol+pg@meep.pl>: > I'm using PostgreSQL 8.3. > Is it normal that plans using a scan on a partial unique index > estimate that much more than one row is returned? > > Eg. I see: > -> Bitmap Index Scan on tmp_idx_oss_archive_object_id_current (cost=0.00..3.12 rows=4189 width=0) > where the tmp_idx_oss_archive_object_id_current index is a partial unique index. > > The estimated row count would be correct for the whole table but obviously not for > the part covered by the unique index. > > This happens to be a problem in this case because then the planner > prefers a sequence scan on a table joined to this one and a hash join to an index scan > and a nested loop join. Which takes hundreds of milliseconds instead of > one, so setting enable_hashjoin to false increases performance immensely. Have you run analyze since creating the unique partial index?
Michal Politowski <mpol+pg@meep.pl> writes: > Is it normal that plans using a scan on a partial unique index > estimate that much more than one row is returned? There isn't currently any special logic to recognize that case; the estimate is just whatever is going to come out of the normal statistics-based estimation. I'm unsure how hard it'd be to improve the situation. If we've already identified relevant partial indexes before any of the stats code has to run then it'd be pretty easy, but that might be a bit fragile. Anyway, the usual advice for such cases is to see if raising the statistics target helps. regards, tom lane
On Sun, 15 Feb 2009 13:41:05 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michal Politowski <mpol+pg@meep.pl> writes: > > Is it normal that plans using a scan on a partial unique index > > estimate that much more than one row is returned? > > There isn't currently any special logic to recognize that case; > the estimate is just whatever is going to come out of the normal > statistics-based estimation. Too bad. It seems then that the schema is not well suited to what Postgres would like. Maybe changing it will be the right thing to do. The situation is that there are potentially several versions of a row, only one of which is active at any given moment. The partial unique index lets a query find the active row quickly, but since it is not known to the planner that there is only one such row, a join caused problems. So it looks like, at least for the current problem, separating the active and inactive rows in their own tables would help. On the other hand it seems that the table finally grew big enough for Postgres to prefer index scans to heap scans even with the skewed estimates. > I'm unsure how hard it'd be to improve the situation. If we've already > identified relevant partial indexes before any of the stats code has to > run then it'd be pretty easy, but that might be a bit fragile. > > Anyway, the usual advice for such cases is to see if raising the > statistics target helps. Unfortunately it's already 1000. And it seems to me it woud not help here anyway. The estimate is very correct for the number of rows with the most common identifier, only it cannot take the active status into consideration. -- Michał Politowski Talking has been known to lead to communication if practiced carelessly.
On Feb 16, 2009, at 7:18 PM, Michal Politowski wrote: > On Sun, 15 Feb 2009 13:41:05 -0500, Tom Lane <tgl@sss.pgh.pa.us> > wrote: >> Michal Politowski <mpol+pg@meep.pl> writes: >>> Is it normal that plans using a scan on a partial unique index >>> estimate that much more than one row is returned? >> >> There isn't currently any special logic to recognize that case; >> the estimate is just whatever is going to come out of the normal >> statistics-based estimation. > > Too bad. It seems then that the schema is not well suited to what > Postgres > would like. Maybe changing it will be the right thing to do. > > The situation is that there are potentially several versions of a row, > only one of which is active at any given moment. The partial unique > index > lets a query find the active row quickly, but since it is not known > to the > planner that there is only one such row, a join caused problems. > > So it looks like, at least for the current problem, separating the > active > and inactive rows in their own tables would help. I don't know your exact situation, but you could define a foreign key from some other table to the 'active' rows in your table. That key would then need to be updated by a few triggers (on insert, update & delete) on your table. Put an index on the row that's referencing the foreign key and you get the same index content-wise that your partial index was covering, except that all the rows in the referencing table (and thus the index) are known to have only one matching row in your table. This way you shouldn't have your earlier problem with the estimates. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4999b115747031962913450!