Since my first post in this series, data has been successfully exported
from the proprietary db (in CSV format) and imported into postgres
(PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.4
20050721 (Red Hat 3.4.4-2)) using
COPY. The tablespace holding the tables+indexes is now 60Gb, total time
for exporting+transfering over network+importing+indexing+vacuuming is
about 7 hours.
I now have a query (originally using an analytical function) that
behaves strangely as soon as I go from 2 person_information__id:s to 3
person_information__id:s (see below):
//
// 2 PID:s
//
mica@TEST=> explain analyze select distinct on
(driver_activity.person_information__id)
TEST-> driver_activity.person_information__id,
TEST-> driver_activity.end_time as prev_timestamp
TEST-> from
TEST-> driver_activity
TEST-> where
TEST-> driver_activity.person_information__id in (5398,5399)
TEST-> and driver_activity.driver_activity_type__id = 5
TEST-> and driver_activity.start_time < '2006-04-01'
TEST-> order by
TEST-> driver_activity.person_information__id,
driver_activity.start_time desc;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----------------------------
Unique (cost=1909.90..1912.32 rows=1 width=20) (actual
time=1.153..1.623 rows=2 loops=1)
-> Sort (cost=1909.90..1911.11 rows=485 width=20) (actual
time=1.151..1.389 rows=213 loops=1)
Sort Key: person_information__id, start_time
-> Bitmap Heap Scan on driver_activity (cost=6.91..1888.26
rows=485 width=20) (actual time=0.141..0.677 rows=213 loops=1)
Recheck Cond: (((person_information__id = 5398) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5399) AND
(driver_activity_type__id = 5)))
Filter: (start_time < '2006-04-01 00:00:00'::timestamp
without time zone)
-> BitmapOr (cost=6.91..6.91 rows=485 width=0) (actual
time=0.102..0.102 rows=0 loops=1)
-> Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.042..0.042 rows=56
loops=1)
Index Cond: ((person_information__id = 5398)
AND (driver_activity_type__id = 5))
-> Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.055..0.055 rows=157
loops=1)
Index Cond: ((person_information__id = 5399)
AND (driver_activity_type__id = 5))
Total runtime: 1.693 ms
(12 rows)
//
// 3 PID:s
//
mica@TEST=> explain analyze select distinct on
(driver_activity.person_information__id)
TEST-> driver_activity.person_information__id,
TEST-> driver_activity.end_time as prev_timestamp
TEST-> from
TEST-> driver_activity
TEST-> where
TEST-> driver_activity.person_information__id in (5398,5399,5400)
TEST-> and driver_activity.driver_activity_type__id = 5
TEST-> and driver_activity.start_time < '2006-04-01'
TEST-> order by
TEST-> driver_activity.person_information__id,
driver_activity.start_time desc;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------------------------------------------------------------------
-----------------------------------------------------------------
Unique (cost=2808.35..2811.98 rows=1 width=20) (actual
time=5450.281..5450.948 rows=3 loops=1)
-> Sort (cost=2808.35..2810.17 rows=727 width=20) (actual
time=5450.278..5450.607 rows=305 loops=1)
Sort Key: person_information__id, start_time
-> Bitmap Heap Scan on driver_activity (cost=2713.82..2773.80
rows=727 width=20) (actual time=5436.259..5449.043 rows=305 loops=1)
Recheck Cond: ((((person_information__id = 5398) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5399) AND
(driver_activity_type__id = 5)) OR ((person_information__id = 5400) AND
(driver_activity_type__id = 5))) AND (driver_activity_type__id = 5))
Filter: (start_time < '2006-04-01 00:00:00'::timestamp
without time zone)
-> BitmapAnd (cost=2713.82..2713.82 rows=15 width=0)
(actual time=5436.148..5436.148 rows=0 loops=1)
-> BitmapOr (cost=10.37..10.37 rows=728 width=0)
(actual time=0.384..0.384 rows=0 loops=1)
-> Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.135..0.135 rows=56
loops=1)
Index Cond: ((person_information__id =
5398) AND (driver_activity_type__id = 5))
-> Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.115..0.115 rows=157
loops=1)
Index Cond: ((person_information__id =
5399) AND (driver_activity_type__id = 5))
-> Bitmap Index Scan on xda_pi_dat
(cost=0.00..3.46 rows=243 width=0) (actual time=0.126..0.126 rows=93
loops=1)
Index Cond: ((person_information__id =
5400) AND (driver_activity_type__id = 5))
-> Bitmap Index Scan on xda_dat
(cost=0.00..2703.21 rows=474916 width=0) (actual time=5435.431..5435.431
rows=451541 loops=1)
Index Cond: (driver_activity_type__id = 5)
Total runtime: 5451.094 ms
(17 rows)
Question: why is the extra step (Bitmap Index Scan on xda_dat)
introduced in the latter case? I can see it is originating from
((((person_information__id = 5398) AND (driver_activity_type__id = 5))
OR ((person_information__id = 5399) AND (driver_activity_type__id = 5))
OR ((person_information__id = 5400) AND (driver_activity_type__id = 5)))
AND (driver_activity_type__id = 5))
..which, indented for readability, looks like this:
(
(
(
(person_information__id = 5398) AND
(driver_activity_type__id = 5)
)
OR
(
(person_information__id = 5399) AND
(driver_activity_type__id = 5)
)
OR
(
(person_information__id = 5400) AND
(driver_activity_type__id = 5)
)
)
AND
(driver_activity_type__id = 5)
)
..and this last AND seems unnessecary, since the predicate on
(driver_activity_type__id = 5) is included in each of the above
conditions.
Can this be a bug in the planner?
/Mikael