Thread: Query performance problems with partitioned tables

From:
Andreas Haumer
Date:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

I'm currently experimenting with PostgreSQL 8.2.4 and table
partitioning in order to improve the performance of an
application I'm working on.

My application is about managing measurement values (lots of!)
I have one table "t_mv" which stores all the measurement values.
A single measurement value has a timestamp and belongs
to a single time series, so table "t_mv" looks like this:

CREATE TABLE t_mv
(
  zr integer NOT NULL,                  -- the time series id
  ts timestamp with time zone NOT NULL, -- the timestamp
  ...                                   -- other attributes of a mv
)
WITHOUT OIDS;

ALTER TABLE t_mv
  ADD CONSTRAINT pk_mv_zr_ts PRIMARY KEY (zr, ts);

Each time series defines several other attributes which are common
to all measurement values of this time series (like sampling location,
physical parameter, aggregation, cardinality, type, visibility, etc.)

The application should be able to handle several thousand
different time series and hundreds of millions of measurement
values, so table t_mv can get quite large.

I have tested installations with up to 70 millions rows in t_mv
and PostgreSQL can handle that with a quite good performance
even on non high-end machines (operating system is Linux, btw)

But as I expect installations witch much more rows in t_mv, I
tried to implement a "partitioned tables" concept using inheritance
and CHECK constraints, just like it is described in the docs
(e.g. chapter 5.9 in the current PostgreSQL 8.2.4 documentation)

I split the t_mv table on the timestamp attribute to build
child tables which hold all measurement values for a single month.
That way I have several tables called "t_mv_YYYYMM" which all
inherit from "t_mv". The number of child tables depends on the
time period the application has to store the measurement values
(which can be several years so I'm expecting up to 100 child
tables or even more).
For the application everything looks the same: inserts, updates
and queries all are against the "t_mv" parent table, the application
is not aware of the fact that this table is actually "split" into
several child tables.

This is working fine and for some standard queries it actually
gives some performance improvement compared to the standard
"everything in one big table" concept. The performance improvement
increases with the number of rows in t_mv, for a small table (less
than 10 million rows or so) IMHO it is not really worth the effort
or even counter-productive.

But I have some special queries where the performance with
partitioned tables actually get much worse: those are queries where
I'm working with "open" time intervals, i.e. where I want to
get the previous and/or next timestamp from a given interval.

A simple example: Get the timestamp of a measurement value for time
series 3622 which is right before the measurement value with time
stamp '2007-04-22 00:00:00':

testdb_std=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1;
           ts
- ------------------------
 2007-04-21 23:00:00+02
(1 row)


Im my application there are many queries like this. Such
queries also come in several variations, including quite
sophisticated joins with lots of other tables "above" the
time series table.

Note: as I'm working with (potentially) non-equidistant
time series I can not just calculate the timestamps, I
have to retrieve them from the database!

In the standard case, the query plan for the example query looks like this:

testdb_std=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc
limit1; 
                                                               QUERY PLAN
-
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.70 rows=1 width=8) (actual time=0.233..0.235 rows=1 loops=1)
   ->  Index Scan Backward using pk_mv_zr_ts on t_mv  (cost=0.00..21068.91 rows=12399 width=8) (actual
time=0.221..0.221rows=1 loops=1) 
         Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22 00:00:00+02'::timestamp
withtime zone)) 
 Total runtime: 0.266 ms
(4 rows)


If I switch to partitioned tables, the query retrieves the same result (of course):

testdb_std=> \c testdb_part
You are now connected to database "testdb_part".
testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1;
           ts
- ------------------------
 2007-04-21 23:00:00+02
(1 row)


But the query plan becomes:

testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc
limit1; 
                                                                          QUERY PLAN
-
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=23985.83..23985.83 rows=1 width=8) (actual time=230.100..230.102 rows=1 loops=1)
   ->  Sort  (cost=23985.83..24019.84 rows=13605 width=8) (actual time=230.095..230.095 rows=1 loops=1)
         Sort Key: mwdb.t_mv.ts
         ->  Result  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.154..177.519 rows=15810 loops=1)
               ->  Append  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.149..114.186 rows=15810 loops=1)
                     ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.047..0.047rows=0 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200507 on t_mv_200507 t_mv  (cost=0.00..2417.53 rows=1519 width=8)
(actualtime=0.095..2.419 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200508 on t_mv_200508 t_mv  (cost=0.00..918.81 rows=539 width=8)
(actualtime=0.081..2.134 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200509 on t_mv_200509 t_mv  (cost=0.00..941.88 rows=555 width=8)
(actualtime=0.061..2.051 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200510 on t_mv_200510 t_mv  (cost=0.00..915.29 rows=538 width=8)
(actualtime=0.064..2.113 rows=715 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200511 on t_mv_200511 t_mv  (cost=0.00..925.93 rows=545 width=8)
(actualtime=0.048..2.986 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200512 on t_mv_200512 t_mv  (cost=0.00..936.53 rows=550 width=8)
(actualtime=0.049..2.212 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..981.42 rows=579 width=8)
(actualtime=0.065..3.029 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..856.25 rows=502 width=8)
(actualtime=0.045..2.866 rows=672 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..977.84 rows=575 width=8)
(actualtime=0.052..3.044 rows=743 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..906.40 rows=531 width=8)
(actualtime=0.053..1.976 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..938.28 rows=550 width=8)
(actualtime=0.050..2.357 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..922.35 rows=541 width=8)
(actualtime=0.054..2.063 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..2112.64 rows=1315 width=8)
(actualtime=0.047..2.226 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..990.23 rows=582 width=8)
(actualtime=0.048..2.094 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.84 rows=528 width=8)
(actualtime=0.039..2.252 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..964.87 rows=567 width=8)
(actualtime=0.033..2.118 rows=745 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..947.17 rows=557 width=8)
(actualtime=0.060..2.160 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..929.43 rows=545 width=8)
(actualtime=0.039..2.051 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..940.05 rows=551 width=8)
(actualtime=0.036..2.217 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..847.38 rows=496 width=8)
(actualtime=0.035..1.830 rows=672 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..956.00 rows=561 width=8)
(actualtime=0.062..2.326 rows=743 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..814.38 rows=378 width=8)
(actualtime=0.050..1.406 rows=504 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
 Total runtime: 231.730 ms
(52 rows)

Oops!
Compare the costs or the actual query time between those queries!
(Note: I set "constraint_exclusion = on", of course!)

As such queries are used all over the application, this nullifies
any performance improvements for standard queries and in fact makes
the overall application performance as "feeled" by the user _much_
worse.

I also tried it with "min()" and "max()" aggregate functions
instead of the "limit 1" query, but this does not change much:


Standard "big" table:

testdb_std=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
          max
- ------------------------
 2007-04-21 23:00:00+02
(1 row)


testdb_std=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
                                                                   QUERY PLAN
-
-------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=1.70..1.71 rows=1 width=0) (actual time=0.071..0.073 rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..1.70 rows=1 width=8) (actual time=0.060..0.062 rows=1 loops=1)
           ->  Index Scan Backward using pk_mv_zr_ts on t_mv  (cost=0.00..21068.91 rows=12399 width=8) (actual
time=0.056..0.056rows=1 loops=1) 
                 Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
                 Filter: ((ts)::timestamp with time zone IS NOT NULL)
 Total runtime: 0.221 ms
(7 rows)


"Partitioned table":

testdb_part=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
          max
- ------------------------
 2007-04-21 23:00:00+02
(1 row)

testdb_part=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
                                                                     QUERY PLAN
-
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23085.73..23085.74 rows=1 width=8) (actual time=390.094..390.096 rows=1 loops=1)
   ->  Append  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.241..290.934 rows=15810 loops=1)
         ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual time=0.038..0.038 rows=0
loops=1)
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200507 on t_mv_200507 t_mv  (cost=0.00..2417.53 rows=1519 width=8) (actual
time=0.197..12.598rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200508 on t_mv_200508 t_mv  (cost=0.00..918.81 rows=539 width=8) (actual
time=0.095..5.947rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200509 on t_mv_200509 t_mv  (cost=0.00..941.88 rows=555 width=8) (actual
time=0.118..2.247rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200510 on t_mv_200510 t_mv  (cost=0.00..915.29 rows=538 width=8) (actual
time=0.121..6.219rows=715 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200511 on t_mv_200511 t_mv  (cost=0.00..925.93 rows=545 width=8) (actual
time=2.287..9.991rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200512 on t_mv_200512 t_mv  (cost=0.00..936.53 rows=550 width=8) (actual
time=0.110..2.285rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..981.42 rows=579 width=8) (actual
time=0.209..4.682rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..856.25 rows=502 width=8) (actual
time=0.079..6.079rows=672 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..977.84 rows=575 width=8) (actual
time=0.091..4.793rows=743 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..906.40 rows=531 width=8) (actual
time=0.108..7.637rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..938.28 rows=550 width=8) (actual
time=0.116..4.772rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..922.35 rows=541 width=8) (actual
time=0.074..6.071rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..2112.64 rows=1315 width=8) (actual
time=0.082..4.807rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..990.23 rows=582 width=8) (actual
time=2.283..8.671rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.84 rows=528 width=8) (actual
time=0.107..6.067rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..964.87 rows=567 width=8) (actual
time=0.074..3.933rows=745 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..947.17 rows=557 width=8) (actual
time=0.091..6.291rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..929.43 rows=545 width=8) (actual
time=0.077..4.101rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..940.05 rows=551 width=8) (actual
time=0.077..2.558rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..847.38 rows=496 width=8) (actual
time=0.073..4.346rows=672 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..956.00 rows=561 width=8) (actual
time=2.532..7.206rows=743 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..814.38 rows=378 width=8) (actual
time=0.120..4.163rows=504 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
 Total runtime: 394.384 ms
(49 rows)


Now my question is: Does the query planner in the case of partitioned tables
really have to scan all indexes in order to get the next timestamp smaller
(or larger) than a given one?

There are check conditions on all table partitions like this:

For table t_mv_200704:
CHECK (ts::timestamp with time zone >= '2007-04-01 00:00:00+02'::timestamp with time zone
   AND ts::timestamp with time zone < '2007-05-01 00:00:00+02'::timestamp with time zone)

For table t_mv_200703:
CHECK (ts::timestamp with time zone >= '2007-03-01 00:00:00+01'::timestamp with time zone
   AND ts::timestamp with time zone < '2007-04-01 00:00:00+02'::timestamp with time zone)

and so on...

So the tables are in a well defined, monotonic sort order regarding the timestamp.

This means that if there is a max(ts) for ts < '2007-04-22 00:00:00'
already in table t_mv_200704, it makes no sense to look further in
other tables where the timestamps can only be smaller than the
timestamp already found. Am I correct?

Is there room for improvements of the query planner for queries
like this or is this a special case which will never get handled
anyway?

Or would you suggest a completely different table structure
or perhaps some other query?

I'm open for any suggestion!

- - andreas

- --
Andreas Haumer                     | mailto:
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGNdZ2xJmyeGcXPhERAsbfAJ9nA+z50uXiV4SHntt1Y9IuZ/rzWwCff8ar
xKSMfzwgjx9kQipeDoEnXWE=
=57aJ
-----END PGP SIGNATURE-----

From:
Guillaume Cottenceau
Date:

Andreas Haumer <andreas 'at' xss.co.at> writes:

[...]

> testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc
limit1; 
>                                                                           QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=23985.83..23985.83 rows=1 width=8) (actual time=230.100..230.102 rows=1 loops=1)
>    ->  Sort  (cost=23985.83..24019.84 rows=13605 width=8) (actual time=230.095..230.095 rows=1 loops=1)
>          Sort Key: mwdb.t_mv.ts
>          ->  Result  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.154..177.519 rows=15810 loops=1)
>                ->  Append  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.149..114.186 rows=15810 loops=1)
>                      ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.047..0.047rows=0 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200507 on t_mv_200507 t_mv  (cost=0.00..2417.53 rows=1519 width=8)
(actualtime=0.095..2.419 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 

[...]

>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..814.38 rows=378 width=8)
(actualtime=0.050..1.406 rows=504 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>  Total runtime: 231.730 ms
> (52 rows)
>
> Oops!
> Compare the costs or the actual query time between those queries!

Well, I'd say that scanning all partitions until the partition
containing april 2007, when one of the query parameter is having
timestamp before april 2007 but without an initial timestamp
limit, looks normal :)


[...]

> Now my question is: Does the query planner in the case of partitioned tables
> really have to scan all indexes in order to get the next timestamp smaller
> (or larger) than a given one?

Well, how can the planner know inside which partition the wanted
row is? There might be no data, say, inside a couple of
partitions in the past before finding the wanted row, in which
case 3 partitions in the past must be scanned.

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

From:
Andreas Haumer
Date:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Guillaume Cottenceau schrieb:
> Andreas Haumer <andreas 'at' xss.co.at> writes:
[...]
>
>> Now my question is: Does the query planner in the case of partitioned tables
>> really have to scan all indexes in order to get the next timestamp smaller
>> (or larger) than a given one?
>
> Well, how can the planner know inside which partition the wanted
> row is? There might be no data, say, inside a couple of
> partitions in the past before finding the wanted row, in which
> case 3 partitions in the past must be scanned.
>

I think the planner could do the following:

a) It could make a better decision in which direction to scan
   the partitions (depending on sort order involved in the query)

b) It could stop scanning as soon as there can not be any further
   resulting row according to the CHECK constraints given on the tables.


Currently it doesn't do this.

Look at this example:

testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc limit 1;
           ts
- ------------------------
 2006-01-01 01:00:00+01
(1 row)

testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc
limit1; 
                                                                          QUERY PLAN
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15843.41..15843.41 rows=1 width=8) (actual time=152.476..152.478 rows=1 loops=1)
   ->  Sort  (cost=15843.41..15865.39 rows=8795 width=8) (actual time=152.472..152.472 rows=1 loops=1)
         Sort Key: mwdb.t_mv.ts
         ->  Result  (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.102..122.540 rows=11629 loops=1)
               ->  Append  (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.098..76.140 rows=11629 loops=1)
                     ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.022..0.022rows=0 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..986.73 rows=582 width=8)
(actualtime=0.070..2.136 rows=743 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..847.40 rows=497 width=8)
(actualtime=0.066..2.063 rows=672 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..961.33 rows=565 width=8)
(actualtime=0.063..2.115 rows=743 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..901.09 rows=528 width=8)
(actualtime=0.156..2.200 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..945.38 rows=555 width=8)
(actualtime=0.052..2.088 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..995.58 rows=587 width=8)
(actualtime=0.054..1.869 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..983.15 rows=578 width=8)
(actualtime=0.045..1.989 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..976.05 rows=573 width=8)
(actualtime=0.048..1.877 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.86 rows=529 width=8)
(actualtime=0.054..2.225 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..934.74 rows=548 width=8)
(actualtime=0.034..2.671 rows=745 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..913.50 rows=536 width=8)
(actualtime=0.053..2.302 rows=720 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..983.15 rows=578 width=8)
(actualtime=0.059..2.449 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..929.43 rows=545 width=8)
(actualtime=0.034..2.035 rows=744 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..863.33 rows=506 width=8)
(actualtime=0.034..1.675 rows=672 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..925.87 rows=542 width=8)
(actualtime=0.055..2.036 rows=743 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
                     ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..1209.39 rows=545 width=8)
(actualtime=0.061..2.296 rows=711 loops=1) 
                           Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
 Total runtime: 153.195 ms
(40 rows)


Table t_mv_200601 gets scanned first, which is fine.

This already gives a row matching the given WHERE clause.
It makes no sense to scan the other tables, as the query
asks for one row only and all the other tables have timestamps
larger than all the timestamps in table t_mv_200601 (according
to the CHECK constraints for the partion tables)

The same would be true with the following query using an aggregate function
(perhaps this is a better example for my reasoning):

testdb_part=> select min(ts) from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00';
          min
- ------------------------
 2006-01-01 01:00:00+01
(1 row)

testdb_part=> explain analyze select min(ts) from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00';
                                                                    QUERY PLAN
-
--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15289.22..15289.23 rows=1 width=8) (actual time=106.735..106.737 rows=1 loops=1)
   ->  Append  (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.184..78.174 rows=11629 loops=1)
         ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual time=0.035..0.035 rows=0
loops=1)
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..986.73 rows=582 width=8) (actual
time=0.143..2.207rows=743 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..847.40 rows=497 width=8) (actual
time=0.020..1.709rows=672 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..961.33 rows=565 width=8) (actual
time=0.033..2.076rows=743 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..901.09 rows=528 width=8) (actual
time=0.027..2.039rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..945.38 rows=555 width=8) (actual
time=0.031..2.109rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..995.58 rows=587 width=8) (actual
time=0.023..2.001rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..983.15 rows=578 width=8) (actual
time=0.027..2.064rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..976.05 rows=573 width=8) (actual
time=0.030..1.932rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.86 rows=529 width=8) (actual
time=0.021..2.408rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..934.74 rows=548 width=8) (actual
time=0.014..2.046rows=745 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..913.50 rows=536 width=8) (actual
time=0.024..1.846rows=720 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..983.15 rows=578 width=8) (actual
time=0.019..2.556rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..929.43 rows=545 width=8) (actual
time=0.022..2.188rows=744 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..863.33 rows=506 width=8) (actual
time=0.023..2.311rows=672 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..925.87 rows=542 width=8) (actual
time=0.027..1.977rows=743 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
         ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..1209.39 rows=545 width=8) (actual
time=0.022..2.084rows=711 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
 Total runtime: 107.152 ms
(37 rows)


As soon as the query found a "min(ts)" which is larger than
timestamp "2006-01-01 00:00:00" (the WHERE clause) in table
t_mv_200601, it can stop scanning, as there *can not* be any
timestamp smaller than the one already found in the other
tables (again, according to the CHECK constraints)!

Perhaps the logic to implement this is complex, but IMHO
it _should_ be doable (and proofable), shouldn't it?

In fact, the query planner already does partly select the tables
to scan in an intelligent way, because it does not scan the tables
with timestamps smaller than "2006-01-01 00:00:00", but IMHO it
still scans too much tables.

Comments?

- - andreas

- --
Andreas Haumer                     | mailto:
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGNe83xJmyeGcXPhERAk2fAJ98aqfKl7pQtac4HvSRr9GYbktadgCfU76J
ZmMj1A3UFejvS+2JrstrTaA=
=Myeo
-----END PGP SIGNATURE-----

From:
Richard Huxton
Date:

Andreas Haumer wrote:
>
> I think the planner could do the following:
>
> a) It could make a better decision in which direction to scan
>    the partitions (depending on sort order involved in the query)
>
> b) It could stop scanning as soon as there can not be any further
>    resulting row according to the CHECK constraints given on the tables.
[snip]
> Perhaps the logic to implement this is complex, but IMHO
> it _should_ be doable (and proofable), shouldn't it?

Ah, it might be do-able for some subset of cases, but is it
cost-effective to check for in *all* cases? Don't forget the constraints
and where clauses can be arbitrarily complex.
--
   Richard Huxton
   Archonet Ltd

From:
Guillaume Cottenceau
Date:

Andreas Haumer <andreas 'at' xss.co.at> writes:

> > Well, how can the planner know inside which partition the wanted
> > row is? There might be no data, say, inside a couple of
> > partitions in the past before finding the wanted row, in which
> > case 3 partitions in the past must be scanned.
> >
>
> I think the planner could do the following:
>
> a) It could make a better decision in which direction to scan
>    the partitions (depending on sort order involved in the query)
>
> b) It could stop scanning as soon as there can not be any further
>    resulting row according to the CHECK constraints given on the tables.

About these precise points, I'll let a pg guru give an answer.

> Look at this example:
>
> testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc limit 1;
>            ts
> ------------------------
>  2006-01-01 01:00:00+01
> (1 row)
>
> testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' order by ts asc
limit1; 
>                                                                           QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=15843.41..15843.41 rows=1 width=8) (actual time=152.476..152.478 rows=1 loops=1)
>    ->  Sort  (cost=15843.41..15865.39 rows=8795 width=8) (actual time=152.472..152.472 rows=1 loops=1)
>          Sort Key: mwdb.t_mv.ts
>          ->  Result  (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.102..122.540 rows=11629 loops=1)
>                ->  Append  (cost=0.00..15267.23 rows=8795 width=8) (actual time=0.098..76.140 rows=11629 loops=1)
>                      ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.022..0.022rows=0 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 

[...]

>                      ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..1209.39 rows=545 width=8)
(actualtime=0.061..2.296 rows=711 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2006-01-01
00:00:00+01'::timestampwith time zone)) 
>  Total runtime: 153.195 ms
> (40 rows)
>
>
> Table t_mv_200601 gets scanned first, which is fine.
>
> This already gives a row matching the given WHERE clause.
> It makes no sense to scan the other tables, as the query
> asks for one row only and all the other tables have timestamps
> larger than all the timestamps in table t_mv_200601 (according
> to the CHECK constraints for the partion tables)

I think this is the last claimed point which is incorrect. Pg has
no general guarantee the partitions actually create a disjoint
set, even with the CHECK constraints. Pg can only optimize by
avoiding scanning the partitions inside which no satisfactory
data could be found by the CHECK constraint, but I think it's not
possible (too complicated) to infer that any found row in your
other partitions would not be in the final resultset because of
1. the query's resultset order 2. the limit 3. the actual
conditions in the CHECK constraints (there is no direct way to
see that timestamps in your 200704 partition are greater than
timsteamp in your 200601 partition).

I guess some sort of pg guru would be needed here to clarify
things in a smart way, unlike me :)

--
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36

From:
"Neil Peter Braggio"
Date:

Just cast the value in the WHERE clause:

select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00' ::TIMESTAMP order by ts asc limit 1;

This search only into the right partitioned tables if you build the rules based in the ts field.


----
Neil Peter Braggio



On 4/30/07, Richard Huxton <> wrote:
Andreas Haumer wrote:
>
> I think the planner could do the following:
>
> a) It could make a better decision in which direction to scan
>    the partitions (depending on sort order involved in the query)
>
> b) It could stop scanning as soon as there can not be any further
>    resulting row according to the CHECK constraints given on the tables.
[snip]
> Perhaps the logic to implement this is complex, but IMHO
> it _should_ be doable (and proofable), shouldn't it?

Ah, it might be do-able for some subset of cases, but is it
cost-effective to check for in *all* cases? Don't forget the constraints
and where clauses can be arbitrarily complex.
--
   Richard Huxton
   Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to so that your
       message can get through to the mailing list cleanly

From:
Tom Lane
Date:

Andreas Haumer <> writes:
> A simple example: Get the timestamp of a measurement value for time
> series 3622 which is right before the measurement value with time
> stamp '2007-04-22 00:00:00':

> testdb_std=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1;

As already pointed out, this is only going to be able to exclude
partitions that are strictly after the limit-time, since you have no
WHERE clause that excludes anything before.  Can you set a reasonable
upper bound on the maximum inter-measurement time?  If so, you could
query something like this:

  select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00'
    and ts > '2007-04-21 00:00:00'
    order by ts desc limit 1;

If you don't have a hard limit, but do have some smarts on the client
side, you could try successive queries like this with larger and larger
windows until you get an answer.

            regards, tom lane

From:
Gregory Stark
Date:

"Guillaume Cottenceau" <> writes:

> I think this is the last claimed point which is incorrect. Pg has
> no general guarantee the partitions actually create a disjoint
> set, even with the CHECK constraints. Pg can only optimize by
> avoiding scanning the partitions inside which no satisfactory
> data could be found by the CHECK constraint, but I think it's not
> possible (too complicated) to infer that any found row in your
> other partitions would not be in the final resultset because of
> 1. the query's resultset order 2. the limit 3. the actual
> conditions in the CHECK constraints (there is no direct way to
> see that timestamps in your 200704 partition are greater than
> timsteamp in your 200601 partition).

I think the answer is that yes there are a number of query transformations
that could be done for partitioned tables that we're not doing currently.

Generally speaking we need to look at each type of plan node and figure out
whether it can usefully be pushed down below the Append node and how we can
determine when that can be done.

So if each arm of the Append was already in order we could easily push the
LIMIT inside the Append (and duplicating the work above the Append). But that
wouldn't save us any work because we only generate rows from the partitions as
they're needed anyways.

In your example we could save work because the Sort needs all the rows before
it starts. So we could sort each arm and apply the limit before passing it up
to the outer Sort. That might save work in this case.

But figuring out when that's less work than just sorting all of them is
tricky. If the LIMIT is 1,000 and you have ten arms of 1,000 tuples each then
you aren't going to save any work doing this. But if the LIMIT is 1 and you
have few arms with many tuples then you could save a lot of work.

Actually I happen to have been reading up on algorithms related to this this
weekend. It's possible to implement a LimitUnsorted in linear-time which would
fetch the first n records according to some sort key without actually sorting
the records. That might make it more worthwhile.

In short. Yes, there are a lot of optimizations possible around partitioned
tables that we don't do either because it's not clear how to tell when they're
worthwhile or because the code just isn't there yet.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


From:
Andreas Haumer
Date:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Neil Peter Braggio schrieb:
> Just cast the value in the WHERE clause:
>
> select ts from mwdb.t_mv where zr=3622 and ts > '2006-01-01 00:00:00'
> ::TIMESTAMP order by ts asc limit 1;
>
> This search only into the right partitioned tables if you build the
> rules based in the ts field.
>

This doesn't help.

A cast is not needed in this case, as the following query
shows, where the query planner already is able to reduce
the scan to the right tables:

testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts > '2005-12-31 22:00:00' and ts < '2006-01-01 02:00:00';
           ts
- ------------------------
 2005-12-31 23:00:00+01
 2006-01-01 00:00:00+01
 2006-01-01 01:00:00+01
(3 rows)


testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts > '2005-12-31 22:00:00' and ts <
'2006-01-0102:00:00'; 
                                                                                                               QUERY
PLAN
-
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..26.64 rows=4 width=8) (actual time=0.040..0.088 rows=3 loops=1)
   ->  Append  (cost=0.00..26.64 rows=4 width=8) (actual time=0.035..0.071 rows=3 loops=1)
         ->  Index Scan using i_mv_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual time=0.010..0.010 rows=0
loops=1)
               Index Cond: (((ts)::timestamp with time zone > '2005-12-31 22:00:00+01'::timestamp with time zone) AND
((ts)::timestampwith time zone < '2006-01-01 02:00:00+01'::timestamp with time zone)) 
               Filter: ((zr)::integer = 3622)
         ->  Index Scan using pk_mv_200512 on t_mv_200512 t_mv  (cost=0.00..8.30 rows=1 width=8) (actual
time=0.019..0.022rows=1 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2005-12-31
22:00:00+01'::timestampwith time zone) AND ((ts)::timestamp with time zone < '2006-01-01 02:00:00+01'::timestamp with
timezone)) 
         ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..10.07 rows=2 width=8) (actual
time=0.014..0.019rows=2 loops=1) 
               Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone > '2005-12-31
22:00:00+01'::timestampwith time zone) AND ((ts)::timestamp with time zone < '2006-01-01 02:00:00+01'::timestamp with
timezone)) 
 Total runtime: 0.176 ms
(10 rows)


Here, two child tables are involved (t_mv_200512 and t_mv_200601)
and the query only uses those two, even without cast of the constants
in the where clause.

- - andreas

- --
Andreas Haumer                     | mailto:
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGNiATxJmyeGcXPhERAo23AJwPCBwvWQT/m3QRXRWqK0aECeMQ2gCbBDjA
E5iZNnU41vrFBNtXzdCSmWY=
=0+pC
-----END PGP SIGNATURE-----

From:
Andreas Haumer
Date:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Tom Lane schrieb:
[...]
> As already pointed out, this is only going to be able to exclude
> partitions that are strictly after the limit-time, since you have no
> WHERE clause that excludes anything before.  Can you set a reasonable
> upper bound on the maximum inter-measurement time?  If so, you could
> query something like this:
>
>   select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00'
>     and ts > '2007-04-21 00:00:00'
>     order by ts desc limit 1;
>

That might be possible, though I'll have to check with our
business logic. Those "open interval" queries usually are
needed to catch the immediate neighbors for navigation
purposes (e.g. the "next" and "previous" values in a list)
or for drawing diagrams where the line starts somewhere
left or right "outside" the diagram.

> If you don't have a hard limit, but do have some smarts on the client
> side, you could try successive queries like this with larger and larger
> windows until you get an answer.
>

Well, the beauty of the "inheritance method" of course is
to keep such rules out of the application... ;-)

I have a DAO layer on top of Hibernate and I'd rather not
touch this to put special database access logic in (especially
as I plan to use partitioned tables as an option for really
large installations. For small ones it looks like we don't
need or want partitioned tables anyway)

Perhaps I can hide this logic in some stored procedures
(I already have several stored procedures to handle
automatic and transparent creation of child tables on
INSERTs anyway...)

- - andreas

- --
Andreas Haumer                     | mailto:
*x Software + Systeme              | http://www.xss.co.at/
Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFGNiVvxJmyeGcXPhERAkbzAJj7HBK6tMZpb0RPD7iN6vpyc1tiAKC2heFx
7pnq02iqW2QosLd93Y03PA==
=pJ7q
-----END PGP SIGNATURE-----

From:
"Steinar H. Gunderson"
Date:

On Mon, Apr 30, 2007 at 03:29:30PM +0200, Andreas Haumer wrote:
> This already gives a row matching the given WHERE clause.
> It makes no sense to scan the other tables, as the query
> asks for one row only and all the other tables have timestamps
> larger than all the timestamps in table t_mv_200601 (according
> to the CHECK constraints for the partion tables)

So for each row, it has to check all CHECK constraints to see if it has
enough rows? That sounds fairly inefficient.

I wonder if the planner could copy the limit down through the Append, though
-- it certainly doesn't need more than one row from each partition. It sounds
slightly cumbersome to try to plan such a thing, though...

/* Steinar */
--
Homepage: http://www.sesse.net/

From:
Gregory Stark
Date:

Wait, rereading the original queries I seem to have misunderstood something.
The individual parts of the partitioned tables are being accessed in timestamp
order. So what's missing is some way for the optimizer to know that the
resulting append results will still be in order. If it knew that all the
constraints were mutually exclusive and covered ascending ranges then it could
avoid doing the extra sort. Hm...

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


From:
Fei Liu
Date:

Andreas Haumer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi!
>
> I'm currently experimenting with PostgreSQL 8.2.4 and table
> partitioning in order to improve the performance of an
> application I'm working on.
>
> My application is about managing measurement values (lots of!)
> I have one table "t_mv" which stores all the measurement values.
> A single measurement value has a timestamp and belongs
> to a single time series, so table "t_mv" looks like this:
>
> CREATE TABLE t_mv
> (
>   zr integer NOT NULL,                  -- the time series id
>   ts timestamp with time zone NOT NULL, -- the timestamp
>   ...                                   -- other attributes of a mv
> )
> WITHOUT OIDS;
>
> ALTER TABLE t_mv
>   ADD CONSTRAINT pk_mv_zr_ts PRIMARY KEY (zr, ts);
>
> Each time series defines several other attributes which are common
> to all measurement values of this time series (like sampling location,
> physical parameter, aggregation, cardinality, type, visibility, etc.)
>
> The application should be able to handle several thousand
> different time series and hundreds of millions of measurement
> values, so table t_mv can get quite large.
>
> I have tested installations with up to 70 millions rows in t_mv
> and PostgreSQL can handle that with a quite good performance
> even on non high-end machines (operating system is Linux, btw)
>
> But as I expect installations witch much more rows in t_mv, I
> tried to implement a "partitioned tables" concept using inheritance
> and CHECK constraints, just like it is described in the docs
> (e.g. chapter 5.9 in the current PostgreSQL 8.2.4 documentation)
>
> I split the t_mv table on the timestamp attribute to build
> child tables which hold all measurement values for a single month.
> That way I have several tables called "t_mv_YYYYMM" which all
> inherit from "t_mv". The number of child tables depends on the
> time period the application has to store the measurement values
> (which can be several years so I'm expecting up to 100 child
> tables or even more).
> For the application everything looks the same: inserts, updates
> and queries all are against the "t_mv" parent table, the application
> is not aware of the fact that this table is actually "split" into
> several child tables.
>
> This is working fine and for some standard queries it actually
> gives some performance improvement compared to the standard
> "everything in one big table" concept. The performance improvement
> increases with the number of rows in t_mv, for a small table (less
> than 10 million rows or so) IMHO it is not really worth the effort
> or even counter-productive.
>
> But I have some special queries where the performance with
> partitioned tables actually get much worse: those are queries where
> I'm working with "open" time intervals, i.e. where I want to
> get the previous and/or next timestamp from a given interval.
>
> A simple example: Get the timestamp of a measurement value for time
> series 3622 which is right before the measurement value with time
> stamp '2007-04-22 00:00:00':
>
> testdb_std=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1;
>            ts
> - ------------------------
>  2007-04-21 23:00:00+02
> (1 row)
>
>
> Im my application there are many queries like this. Such
> queries also come in several variations, including quite
> sophisticated joins with lots of other tables "above" the
> time series table.
>
> Note: as I'm working with (potentially) non-equidistant
> time series I can not just calculate the timestamps, I
> have to retrieve them from the database!
>
> In the standard case, the query plan for the example query looks like this:
>
> testdb_std=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc
limit1; 
>                                                                QUERY PLAN
> -
-----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..1.70 rows=1 width=8) (actual time=0.233..0.235 rows=1 loops=1)
>    ->  Index Scan Backward using pk_mv_zr_ts on t_mv  (cost=0.00..21068.91 rows=12399 width=8) (actual
time=0.221..0.221rows=1 loops=1) 
>          Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>  Total runtime: 0.266 ms
> (4 rows)
>
>
> If I switch to partitioned tables, the query retrieves the same result (of course):
>
> testdb_std=> \c testdb_part
> You are now connected to database "testdb_part".
> testdb_part=> select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc limit 1;
>            ts
> - ------------------------
>  2007-04-21 23:00:00+02
> (1 row)
>
>
> But the query plan becomes:
>
> testdb_part=> explain analyze select ts from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' order by ts desc
limit1; 
>                                                                           QUERY PLAN
> -
---------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=23985.83..23985.83 rows=1 width=8) (actual time=230.100..230.102 rows=1 loops=1)
>    ->  Sort  (cost=23985.83..24019.84 rows=13605 width=8) (actual time=230.095..230.095 rows=1 loops=1)
>          Sort Key: mwdb.t_mv.ts
>          ->  Result  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.154..177.519 rows=15810 loops=1)
>                ->  Append  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.149..114.186 rows=15810 loops=1)
>                      ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.047..0.047rows=0 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200507 on t_mv_200507 t_mv  (cost=0.00..2417.53 rows=1519 width=8)
(actualtime=0.095..2.419 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200508 on t_mv_200508 t_mv  (cost=0.00..918.81 rows=539 width=8)
(actualtime=0.081..2.134 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200509 on t_mv_200509 t_mv  (cost=0.00..941.88 rows=555 width=8)
(actualtime=0.061..2.051 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200510 on t_mv_200510 t_mv  (cost=0.00..915.29 rows=538 width=8)
(actualtime=0.064..2.113 rows=715 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200511 on t_mv_200511 t_mv  (cost=0.00..925.93 rows=545 width=8)
(actualtime=0.048..2.986 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200512 on t_mv_200512 t_mv  (cost=0.00..936.53 rows=550 width=8)
(actualtime=0.049..2.212 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..981.42 rows=579 width=8)
(actualtime=0.065..3.029 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..856.25 rows=502 width=8)
(actualtime=0.045..2.866 rows=672 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..977.84 rows=575 width=8)
(actualtime=0.052..3.044 rows=743 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..906.40 rows=531 width=8)
(actualtime=0.053..1.976 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..938.28 rows=550 width=8)
(actualtime=0.050..2.357 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..922.35 rows=541 width=8)
(actualtime=0.054..2.063 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..2112.64 rows=1315 width=8)
(actualtime=0.047..2.226 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..990.23 rows=582 width=8)
(actualtime=0.048..2.094 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.84 rows=528 width=8)
(actualtime=0.039..2.252 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..964.87 rows=567 width=8)
(actualtime=0.033..2.118 rows=745 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..947.17 rows=557 width=8)
(actualtime=0.060..2.160 rows=720 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..929.43 rows=545 width=8)
(actualtime=0.039..2.051 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..940.05 rows=551 width=8)
(actualtime=0.036..2.217 rows=744 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..847.38 rows=496 width=8)
(actualtime=0.035..1.830 rows=672 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..956.00 rows=561 width=8)
(actualtime=0.062..2.326 rows=743 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                      ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..814.38 rows=378 width=8)
(actualtime=0.050..1.406 rows=504 loops=1) 
>                            Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>  Total runtime: 231.730 ms
> (52 rows)
>
> Oops!
> Compare the costs or the actual query time between those queries!
> (Note: I set "constraint_exclusion = on", of course!)
>
> As such queries are used all over the application, this nullifies
> any performance improvements for standard queries and in fact makes
> the overall application performance as "feeled" by the user _much_
> worse.
>
> I also tried it with "min()" and "max()" aggregate functions
> instead of the "limit 1" query, but this does not change much:
>
>
> Standard "big" table:
>
> testdb_std=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
>           max
> - ------------------------
>  2007-04-21 23:00:00+02
> (1 row)
>
>
> testdb_std=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
>                                                                    QUERY PLAN
> -
-------------------------------------------------------------------------------------------------------------------------------------------------
>  Result  (cost=1.70..1.71 rows=1 width=0) (actual time=0.071..0.073 rows=1 loops=1)
>    InitPlan
>      ->  Limit  (cost=0.00..1.70 rows=1 width=8) (actual time=0.060..0.062 rows=1 loops=1)
>            ->  Index Scan Backward using pk_mv_zr_ts on t_mv  (cost=0.00..21068.91 rows=12399 width=8) (actual
time=0.056..0.056rows=1 loops=1) 
>                  Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>                  Filter: ((ts)::timestamp with time zone IS NOT NULL)
>  Total runtime: 0.221 ms
> (7 rows)
>
>
> "Partitioned table":
>
> testdb_part=> select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
>           max
> - ------------------------
>  2007-04-21 23:00:00+02
> (1 row)
>
> testdb_part=> explain analyze select max(ts) from mwdb.t_mv where zr=3622 and ts < '2007-04-22 00:00:00' ;
>                                                                      QUERY PLAN
> -
----------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=23085.73..23085.74 rows=1 width=8) (actual time=390.094..390.096 rows=1 loops=1)
>    ->  Append  (cost=0.00..23051.72 rows=13605 width=8) (actual time=0.241..290.934 rows=15810 loops=1)
>          ->  Index Scan using pk_mv_zr_ts on t_mv  (cost=0.00..8.27 rows=1 width=8) (actual time=0.038..0.038 rows=0
loops=1)
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200507 on t_mv_200507 t_mv  (cost=0.00..2417.53 rows=1519 width=8) (actual
time=0.197..12.598rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200508 on t_mv_200508 t_mv  (cost=0.00..918.81 rows=539 width=8) (actual
time=0.095..5.947rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200509 on t_mv_200509 t_mv  (cost=0.00..941.88 rows=555 width=8) (actual
time=0.118..2.247rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200510 on t_mv_200510 t_mv  (cost=0.00..915.29 rows=538 width=8) (actual
time=0.121..6.219rows=715 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200511 on t_mv_200511 t_mv  (cost=0.00..925.93 rows=545 width=8) (actual
time=2.287..9.991rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200512 on t_mv_200512 t_mv  (cost=0.00..936.53 rows=550 width=8) (actual
time=0.110..2.285rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200601 on t_mv_200601 t_mv  (cost=0.00..981.42 rows=579 width=8) (actual
time=0.209..4.682rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200602 on t_mv_200602 t_mv  (cost=0.00..856.25 rows=502 width=8) (actual
time=0.079..6.079rows=672 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200603 on t_mv_200603 t_mv  (cost=0.00..977.84 rows=575 width=8) (actual
time=0.091..4.793rows=743 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200604 on t_mv_200604 t_mv  (cost=0.00..906.40 rows=531 width=8) (actual
time=0.108..7.637rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200605 on t_mv_200605 t_mv  (cost=0.00..938.28 rows=550 width=8) (actual
time=0.116..4.772rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200606 on t_mv_200606 t_mv  (cost=0.00..922.35 rows=541 width=8) (actual
time=0.074..6.071rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200607 on t_mv_200607 t_mv  (cost=0.00..2112.64 rows=1315 width=8) (actual
time=0.082..4.807rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200608 on t_mv_200608 t_mv  (cost=0.00..990.23 rows=582 width=8) (actual
time=2.283..8.671rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200609 on t_mv_200609 t_mv  (cost=0.00..902.84 rows=528 width=8) (actual
time=0.107..6.067rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200610 on t_mv_200610 t_mv  (cost=0.00..964.87 rows=567 width=8) (actual
time=0.074..3.933rows=745 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200611 on t_mv_200611 t_mv  (cost=0.00..947.17 rows=557 width=8) (actual
time=0.091..6.291rows=720 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200612 on t_mv_200612 t_mv  (cost=0.00..929.43 rows=545 width=8) (actual
time=0.077..4.101rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200701 on t_mv_200701 t_mv  (cost=0.00..940.05 rows=551 width=8) (actual
time=0.077..2.558rows=744 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200702 on t_mv_200702 t_mv  (cost=0.00..847.38 rows=496 width=8) (actual
time=0.073..4.346rows=672 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200703 on t_mv_200703 t_mv  (cost=0.00..956.00 rows=561 width=8) (actual
time=2.532..7.206rows=743 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>          ->  Index Scan using pk_mv_200704 on t_mv_200704 t_mv  (cost=0.00..814.38 rows=378 width=8) (actual
time=0.120..4.163rows=504 loops=1) 
>                Index Cond: (((zr)::integer = 3622) AND ((ts)::timestamp with time zone < '2007-04-22
00:00:00+02'::timestampwith time zone)) 
>  Total runtime: 394.384 ms
> (49 rows)
>
>
> Now my question is: Does the query planner in the case of partitioned tables
> really have to scan all indexes in order to get the next timestamp smaller
> (or larger) than a given one?
>
> There are check conditions on all table partitions like this:
>
> For table t_mv_200704:
> CHECK (ts::timestamp with time zone >= '2007-04-01 00:00:00+02'::timestamp with time zone
>    AND ts::timestamp with time zone < '2007-05-01 00:00:00+02'::timestamp with time zone)
>
> For table t_mv_200703:
> CHECK (ts::timestamp with time zone >= '2007-03-01 00:00:00+01'::timestamp with time zone
>    AND ts::timestamp with time zone < '2007-04-01 00:00:00+02'::timestamp with time zone)
>
> and so on...
>
> So the tables are in a well defined, monotonic sort order regarding the timestamp.
>
> This means that if there is a max(ts) for ts < '2007-04-22 00:00:00'
> already in table t_mv_200704, it makes no sense to look further in
> other tables where the timestamps can only be smaller than the
> timestamp already found. Am I correct?
>
> Is there room for improvements of the query planner for queries
> like this or is this a special case which will never get handled
> anyway?
>
> Or would you suggest a completely different table structure
> or perhaps some other query?
>
> I'm open for any suggestion!
>
> - - andreas
>
> - --
> Andreas Haumer                     | mailto:
> *x Software + Systeme              | http://www.xss.co.at/
> Karmarschgasse 51/2/20             | Tel: +43-1-6060114-0
> A-1100 Vienna, Austria             | Fax: +43-1-6060114-71
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGNdZ2xJmyeGcXPhERAsbfAJ9nA+z50uXiV4SHntt1Y9IuZ/rzWwCff8ar
> xKSMfzwgjx9kQipeDoEnXWE=
> =57aJ
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
Hello, Andreas, I too am having exactly the same issue as you do.
Comparing my partitioned and plain table performance, I've found that
the plain tables perform about 25% faster than partitioned table. Using
'explain select ...', I see that constraints are being used so in
partitioned tables fewer rows are examined. But still partitioned tables
are 25% slower, what a let down.

Fei

From:
"Merlin Moncure"
Date:

On 5/3/07, Fei Liu <> wrote:
> Hello, Andreas, I too am having exactly the same issue as you do.
> Comparing my partitioned and plain table performance, I've found that
> the plain tables perform about 25% faster than partitioned table. Using
> 'explain select ...', I see that constraints are being used so in
> partitioned tables fewer rows are examined. But still partitioned tables
> are 25% slower, what a let down.

That's a little bit harsh.  The main use of partitioning is not to
make the table faster but to make the maintenance easier.  When
constraint exclusion works well for a particular query you can get a
small boost but many queries will break down in a really negative way.
 So, you are sacrificing flexibility for easier maintenance.  You have
to really be careful how you use it.

The best case for partitioning is when you can logically divide up
your data so that you really only have to deal with one sliver of it
at a time...for joins and such.  If the OP could force the constraint
exclusion (maybe by hashing the timestamp down to a period and using
that for where clause), his query would be fine.  The problem is it's
not always easy to do that.

merlin

From:
Scott Marlowe
Date:

On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:
> On 5/3/07, Fei Liu <> wrote:
> > Hello, Andreas, I too am having exactly the same issue as you do.
> > Comparing my partitioned and plain table performance, I've found that
> > the plain tables perform about 25% faster than partitioned table. Using
> > 'explain select ...', I see that constraints are being used so in
> > partitioned tables fewer rows are examined. But still partitioned tables
> > are 25% slower, what a let down.
>
> That's a little bit harsh.  The main use of partitioning is not to
> make the table faster but to make the maintenance easier.  When
> constraint exclusion works well for a particular query you can get a
> small boost but many queries will break down in a really negative way.
>  So, you are sacrificing flexibility for easier maintenance.  You have
> to really be careful how you use it.
>
> The best case for partitioning is when you can logically divide up
> your data so that you really only have to deal with one sliver of it
> at a time...for joins and such.  If the OP could force the constraint
> exclusion (maybe by hashing the timestamp down to a period and using
> that for where clause), his query would be fine.  The problem is it's
> not always easy to do that.

Agree++

I've been testing partitioning for a zip code lookup thing that was
posted here earlier, and I partitioned a 10,000,000 row set into about
400 partitions.  I found that selecting a range of areas defined by x/y
coordinates was faster without any indexes.  The same selection with one
big table and one big (x,y) index took 3 to 10 seconds typically, same
select against the partitions with no indexes took 0.2 to 0.5 seconds.

For that particular application, the only way to scale it was with
partitioning.

From:
"Merlin Moncure"
Date:

On 5/4/07, Scott Marlowe <> wrote:
> On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:
> > On 5/3/07, Fei Liu <> wrote:
> > > Hello, Andreas, I too am having exactly the same issue as you do.
> > > Comparing my partitioned and plain table performance, I've found that
> > > the plain tables perform about 25% faster than partitioned table. Using
> > > 'explain select ...', I see that constraints are being used so in
> > > partitioned tables fewer rows are examined. But still partitioned tables
> > > are 25% slower, what a let down.
> >
> > That's a little bit harsh.  The main use of partitioning is not to
> > make the table faster but to make the maintenance easier.  When
> > constraint exclusion works well for a particular query you can get a
> > small boost but many queries will break down in a really negative way.
> >  So, you are sacrificing flexibility for easier maintenance.  You have
> > to really be careful how you use it.
> >
> > The best case for partitioning is when you can logically divide up
> > your data so that you really only have to deal with one sliver of it
> > at a time...for joins and such.  If the OP could force the constraint
> > exclusion (maybe by hashing the timestamp down to a period and using
> > that for where clause), his query would be fine.  The problem is it's
> > not always easy to do that.
>
> Agree++
>
> I've been testing partitioning for a zip code lookup thing that was
> posted here earlier, and I partitioned a 10,000,000 row set into about
> 400 partitions.  I found that selecting a range of areas defined by x/y
> coordinates was faster without any indexes.  The same selection with one
> big table and one big (x,y) index took 3 to 10 seconds typically, same
> select against the partitions with no indexes took 0.2 to 0.5 seconds.

I was thinking about that problem....one approach I was playing with
was to normalize the 10mm table to zipcode (chopping off + 4) and then
doing bounding box ops on the zipcode (using earthdistance/gist) table
and also the detail table using tradictional tactics or gist.  I think
this would give reasonable performance without partitioning (10mm
records doesn't scare me anymore!).  If the records are frequently
updated you may want to TP anways though do to (pre-hot) vacuum
issues.

merlin

From:
Fei Liu
Date:

Scott Marlowe wrote:
> On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:
>
>> On 5/3/07, Fei Liu <> wrote:
>>
>>> Hello, Andreas, I too am having exactly the same issue as you do.
>>> Comparing my partitioned and plain table performance, I've found that
>>> the plain tables perform about 25% faster than partitioned table. Using
>>> 'explain select ...', I see that constraints are being used so in
>>> partitioned tables fewer rows are examined. But still partitioned tables
>>> are 25% slower, what a let down.
>>>
>> That's a little bit harsh.  The main use of partitioning is not to
>> make the table faster but to make the maintenance easier.  When
>> constraint exclusion works well for a particular query you can get a
>> small boost but many queries will break down in a really negative way.
>>  So, you are sacrificing flexibility for easier maintenance.  You have
>> to really be careful how you use it.
>>
>> The best case for partitioning is when you can logically divide up
>> your data so that you really only have to deal with one sliver of it
>> at a time...for joins and such.  If the OP could force the constraint
>> exclusion (maybe by hashing the timestamp down to a period and using
>> that for where clause), his query would be fine.  The problem is it's
>> not always easy to do that.
>>
>
> Agree++
>
> I've been testing partitioning for a zip code lookup thing that was
> posted here earlier, and I partitioned a 10,000,000 row set into about
> 400 partitions.  I found that selecting a range of areas defined by x/y
> coordinates was faster without any indexes.  The same selection with one
> big table and one big (x,y) index took 3 to 10 seconds typically, same
> select against the partitions with no indexes took 0.2 to 0.5 seconds.
>
> For that particular application, the only way to scale it was with
> partitioning.
>
In my particular case, I have 2 million records uniformly split up in 40
partitions. It's ranged data varying with time, each partition has one
month of data. Do you think this is a good candidate to seek performance
boost with partitioned tables?

From:
Scott Marlowe
Date:

On Tue, 2007-05-08 at 13:41, Fei Liu wrote:
> Scott Marlowe wrote:
> > On Thu, 2007-05-03 at 21:37, Merlin Moncure wrote:
> >
> >> On 5/3/07, Fei Liu <> wrote:
> >>
> >>> Hello, Andreas, I too am having exactly the same issue as you do.
> >>> Comparing my partitioned and plain table performance, I've found that
> >>> the plain tables perform about 25% faster than partitioned table. Using
> >>> 'explain select ...', I see that constraints are being used so in
> >>> partitioned tables fewer rows are examined. But still partitioned tables
> >>> are 25% slower, what a let down.
> >>>
> >> That's a little bit harsh.  The main use of partitioning is not to
> >> make the table faster but to make the maintenance easier.  When
> >> constraint exclusion works well for a particular query you can get a
> >> small boost but many queries will break down in a really negative way.
> >>  So, you are sacrificing flexibility for easier maintenance.  You have
> >> to really be careful how you use it.
> >>
> >> The best case for partitioning is when you can logically divide up
> >> your data so that you really only have to deal with one sliver of it
> >> at a time...for joins and such.  If the OP could force the constraint
> >> exclusion (maybe by hashing the timestamp down to a period and using
> >> that for where clause), his query would be fine.  The problem is it's
> >> not always easy to do that.
> >>
> >
> > Agree++
> >
> > I've been testing partitioning for a zip code lookup thing that was
> > posted here earlier, and I partitioned a 10,000,000 row set into about
> > 400 partitions.  I found that selecting a range of areas defined by x/y
> > coordinates was faster without any indexes.  The same selection with one
> > big table and one big (x,y) index took 3 to 10 seconds typically, same
> > select against the partitions with no indexes took 0.2 to 0.5 seconds.
> >
> > For that particular application, the only way to scale it was with
> > partitioning.
> >
> In my particular case, I have 2 million records uniformly split up in 40
> partitions. It's ranged data varying with time, each partition has one
> month of data. Do you think this is a good candidate to seek performance
> boost with partitioned tables?

That really really really depends on your access patterns.  IF you
typically access them by certain date ranges, then partitioning is
almost always a win.  If you have enough requests that don't select a
range of dates, it might wind up being slow.

There are other advantages to partitioning though, such as ease of
maintenance, being able to do partial backups easily, archiving old
partitions, placing the more active partitions on faster storage.