Thread: Migration study, step 2: rewriting queries

Migration study, step 2: rewriting queries

From
"Mikael Carneholm"
Date:
Since my first post in this series, data has been successfully exported
from the proprietary db (in CSV format) and imported into postgres
(PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.4
20050721 (Red Hat 3.4.4-2)) using
COPY. The tablespace holding the tables+indexes is now 60Gb, total time
for exporting+transfering over  network+importing+indexing+vacuuming is
about 7 hours.

I now have a query (originally using an analytical function) that
behaves strangely as soon as I go from 2 person_information__id:s to 3
person_information__id:s (see below):

//
// 2 PID:s
//
mica@TEST=> explain analyze select distinct on
(driver_activity.person_information__id)
TEST-> driver_activity.person_information__id,
TEST-> driver_activity.end_time as prev_timestamp
TEST-> from
TEST-> driver_activity
TEST-> where
TEST-> driver_activity.person_information__id in (5398,5399)
TEST-> and driver_activity.driver_activity_type__id = 5
TEST-> and driver_activity.start_time < '2006-04-01'
TEST-> order by
TEST-> driver_activity.person_information__id,
driver_activity.start_time desc;


QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------
 Unique  (cost=1909.90..1912.32 rows=1 width=20) (actual
time=1.153..1.623 rows=2 loops=1)
   ->  Sort  (cost=1909.90..1911.11 rows=485 width=20) (actual
time=1.151..1.389 rows=213 loops=1)
         Sort Key: person_information__id, start_time
         ->  Bitmap Heap Scan on driver_activity  (cost=6.91..1888.26
rows=485 width=20) (actual time=0.141..0.677 rows=213 loops=1)
               Recheck Cond: (((person_information__id = 5398) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5399) AND
(driver_activity_type__id = 5)))
               Filter: (start_time < '2006-04-01 00:00:00'::timestamp
without time zone)
               ->  BitmapOr  (cost=6.91..6.91 rows=485 width=0) (actual
time=0.102..0.102 rows=0 loops=1)
                     ->  Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.042..0.042 rows=56
loops=1)
                           Index Cond: ((person_information__id = 5398)
AND (driver_activity_type__id = 5))
                     ->  Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.055..0.055 rows=157
loops=1)
                           Index Cond: ((person_information__id = 5399)
AND (driver_activity_type__id = 5))
 Total runtime: 1.693 ms
(12 rows)

//
// 3 PID:s
//
mica@TEST=> explain analyze select distinct on
(driver_activity.person_information__id)
TEST-> driver_activity.person_information__id,
TEST-> driver_activity.end_time as prev_timestamp
TEST-> from
TEST-> driver_activity
TEST-> where
TEST-> driver_activity.person_information__id in (5398,5399,5400)
TEST-> and driver_activity.driver_activity_type__id = 5
TEST-> and driver_activity.start_time < '2006-04-01'
TEST-> order by
TEST-> driver_activity.person_information__id,
driver_activity.start_time desc;

QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------------------------------------------
 Unique  (cost=2808.35..2811.98 rows=1 width=20) (actual
time=5450.281..5450.948 rows=3 loops=1)
   ->  Sort  (cost=2808.35..2810.17 rows=727 width=20) (actual
time=5450.278..5450.607 rows=305 loops=1)
         Sort Key: person_information__id, start_time
         ->  Bitmap Heap Scan on driver_activity  (cost=2713.82..2773.80
rows=727 width=20) (actual time=5436.259..5449.043 rows=305 loops=1)
               Recheck Cond: ((((person_information__id = 5398) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5399) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5400) AND
(driver_activity_type__id = 5))) AND (driver_activity_type__id = 5))
               Filter: (start_time < '2006-04-01 00:00:00'::timestamp
without time zone)
               ->  BitmapAnd  (cost=2713.82..2713.82 rows=15 width=0)
(actual time=5436.148..5436.148 rows=0 loops=1)
                     ->  BitmapOr  (cost=10.37..10.37 rows=728 width=0)
(actual time=0.384..0.384 rows=0 loops=1)
                           ->  Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.135..0.135 rows=56
loops=1)
                                 Index Cond: ((person_information__id =
5398) AND (driver_activity_type__id = 5))
                           ->  Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.115..0.115 rows=157
loops=1)
                                 Index Cond: ((person_information__id =
5399) AND (driver_activity_type__id = 5))
                           ->  Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.126..0.126 rows=93
loops=1)
                                 Index Cond: ((person_information__id =
5400) AND (driver_activity_type__id = 5))
                     ->  Bitmap Index Scan on xda_dat
(cost=0.00..2703.21 rows=474916 width=0) (actual time=5435.431..5435.431
rows=451541 loops=1)
                           Index Cond: (driver_activity_type__id = 5)
 Total runtime: 5451.094 ms
(17 rows)

Question: why is the extra step (Bitmap Index Scan on xda_dat)
introduced in the latter case? I can see it is originating from

((((person_information__id = 5398) AND (driver_activity_type__id = 5))
OR ((person_information__id = 5399) AND (driver_activity_type__id = 5))
OR ((person_information__id = 5400) AND (driver_activity_type__id = 5)))
AND (driver_activity_type__id = 5))

..which, indented for readability, looks like this:

(
    (
        (
            (person_information__id = 5398) AND
(driver_activity_type__id = 5)
        )
        OR
        (
            (person_information__id = 5399) AND
(driver_activity_type__id = 5)
        )
        OR
        (
            (person_information__id = 5400) AND
(driver_activity_type__id = 5)
        )
    )
    AND
    (driver_activity_type__id = 5)
)

..and this last AND seems unnessecary, since the predicate on
(driver_activity_type__id = 5) is included in each of the above
conditions.

Can this be a bug in the planner?

/Mikael

Re: Migration study, step 2: rewriting queries

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> ..and this last AND seems unnessecary, since the predicate on
> (driver_activity_type__id = 5) is included in each of the above
> conditions.

This should be fixed by the changes I made recently in choose_bitmap_and
--- it wasn't being aggressive about pruning overlapping AND conditions
when a sub-OR was involved.  It's possible the new coding is *too*
aggressive, and will reject indexes that it'd be profitable to include;
but at least it won't make this particular mistake.

            regards, tom lane

Re: Migration study, step 2: rewriting queries

From
"Mikael Carneholm"
Date:
>This should be fixed by the changes I made recently in
choose_bitmap_and
>--- it wasn't being aggressive about pruning overlapping AND conditions
when a sub-OR was involved.  It's possible the new coding is >*too*
aggressive, and will reject indexes that it'd be profitable to include;
but at least it won't make this particular mistake.

Ok, cool. I don't have time to test this right now as the project has to
move on (and I guess testing the fix would require a dump+build CVS
version+restore), but as a temporary workaround I simly dropped the
xda_dat index (all queries on that table include the
person_information__id column anyway).

- Mikael



Re: Migration study, step 2: rewriting queries

From
Tom Lane
Date:
"Mikael Carneholm" <Mikael.Carneholm@WirelessCar.com> writes:
> Ok, cool. I don't have time to test this right now as the project has to
> move on (and I guess testing the fix would require a dump+build CVS
> version+restore), but as a temporary workaround I simly dropped the
> xda_dat index (all queries on that table include the
> person_information__id column anyway).

The patch is in the 8.1 branch so you don't need dump/restore anyway...

            regards, tom lane