Thread: partitioning question 1

partitioning question 1

From

Ben

Date:

28 October 2010, 13:37:08

hello --

my last email was apparently too long to respond to so i'll split it up into shorter pieces.  my first question :

my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy any
queryperformance that a clustered index doesn't already give you -- the advantages are all in maintainability.  an
indexis able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions.  the I/O
advantagesof having queries target small subtables are the same as the I/O advantages of clustering the index : result
pagesin a small range are very close to each other on disk. 

finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that
constraintexclusion only works with static constants in where clauses, and only works with simple operators like >, <
whichbasically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than
onebig table with a clustered index. 

is my intuition completely off on this?

best regards, ben

Re: partitioning question 1

From

"Joshua D. Drake"

Date:

28 October 2010, 14:31:59

On Thu, 2010-10-28 at 09:36 -0700, Ben wrote:
> hello --
>
> my last email was apparently too long to respond to so i'll split it up into shorter pieces.  my first question :
>
> my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy
anyquery performance that a clustered index doesn't already give you -- the advantages are all in maintainability.  an
indexis able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions.  the I/O
advantagesof having queries target small subtables are the same as the I/O advantages of clustering the index : result
pagesin a small range are very close to each other on disk. 

Not entirely true. One a clustered index will not stay clustered if you
are still updating data that is in the partition. You shouldn't
underestimate the benefit of smaller relations in terms of maintenance
either.

>
> finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that
constraintexclusion only works with static constants in where clauses, and only works with simple operators like >, <
whichbasically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than
onebig table with a clustered index. 

Yes the constraints have to be static. Not sure about the operator
question honestly.


> is my intuition completely off on this?

You may actually want to look into expression indexes, not clustered
ones.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: partitioning question 1

From

Ben

Date:

28 October 2010, 15:44:56

thanks for the prompt response.  some comments / questions below :

On Oct 28, 2010, at 10:31 AM, Joshua D. Drake wrote:
>> ...constraint exclusion is able to eliminate table partitions.  the I/O advantages of having queries target small
subtablesare the same as the I/O advantages of clustering the index : result pages in a small range are very close to
eachother on disk. 
>
> Not entirely true. One a clustered index will not stay clustered if you
> are still updating data that is in the partition. You shouldn't
> underestimate the benefit of smaller relations in terms of maintenance
> either.

in my situation, the update come in-order (it is timeseries data and the clustered index is on time.)  so the table
shouldremain relatively clustered.  updates also happen relatively infrequently (once a day in one batch.)  so it
appearsthat we will continue to get the I/O benefits described above. 

are there any other benefits which partitioning provides for query performance (as opposed to update performance)
besidesthe ones which i have mentioned? 

> Yes the constraints have to be static. Not sure about the operator
> question honestly.

this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type  queries, so the
constraintsare usually data-driven (come from joining against other tables.) 

>> is my intuition completely off on this?
>
> You may actually want to look into expression indexes, not clustered
> ones.

what would expression indexes give me?

thanks and best regards, ben

Re: partitioning question 1

From

"Joshua D. Drake"

Date:

28 October 2010, 15:50:33

On Thu, 2010-10-28 at 11:44 -0700, Ben wrote:

> > Yes the constraints have to be static. Not sure about the operator
> > question honestly.
>
> this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type  queries, so the
constraintsare usually data-driven (come from joining against other tables.) 

Well it does and it doesn't. Keep in mind that the constraint can be:

date >= '2010-10-01" and date <= '2010-10-31'

What it can't be is something that contains date_part() or extract() (as
an example)

>
> >> is my intuition completely off on this?
> >
> > You may actually want to look into expression indexes, not clustered
> > ones.

Take a look at the docs:

http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html

It "could" be considered partitioning without breaking up the table,
just the indexes.

Sincerely,

Joshua D. Drake


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: partitioning question 1

From

Ben

Date:

28 October 2010, 16:25:30

On Oct 28, 2010, at 11:50 AM, Joshua D. Drake wrote:
>>> Yes the constraints have to be static. Not sure about the operator
>>> question honestly.
>>
>> this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type  queries, so the
constraintsare usually data-driven (come from joining against other tables.) 
>
> Well it does and it doesn't. Keep in mind that the constraint can be:
>
> date >= '2010-10-01" and date <= '2010-10-31'
>
> What it can't be is something that contains date_part() or extract() (as
> an example)

i think we are talking about two different things here: the constraints on the table, and the where-clause constraints
ina query which may or may not trigger constraint exclusion.  i understand that table constraints have to be constants
--it doesn't make much sense otherwise.  what i am wondering about is, will constraint exclusion be triggered for
querieswhere the column that is being partitioned on is being constrained things that are not static constants, for
instance,in a join.  (i'm pretty sure the answer is no, because i think constraint exclusion happens before real query
planning.) a concrete example : 

create table foo (i integer not null, j float not null);
create table foo_1 (check ( i >= 0 and i < 10) ) inherits (foo);
create table foo_2 (check ( i >= 10 and i < 20) ) inherits (foo);
create table foo_3 (check ( i >= 20 and i < 30) ) inherits (foo);
etc..

create table bar (i integer not null, k float not null);

my understanding is that a query like

select * from foo, bar using (i);

can't use constraint exclusion, even if the histogram of i-values on table bar says they only live in the range 0-9,
andso the query will touch all of the tables.  i think this is not favorable compared to a single foo table with a
well-maintainedbtree index on i. 

>>>> is my intuition completely off on this?
>>>
>>> You may actually want to look into expression indexes, not clustered
>>> ones.
>
> Take a look at the docs:
>
> http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html
>
> It "could" be considered partitioning without breaking up the table,
> just the indexes.

do you mean partial indexes?  i have to confess to not understanding how this is relevant -- how could partial indexes
giveany advantage over a full clustered index? 

b

Re: partitioning question 1

From

"Joshua D. Drake"

Date:

28 October 2010, 16:44:32

On Thu, 2010-10-28 at 12:25 -0700, Ben wrote:

> i think we are talking about two different things here: the constraints on the table, and the where-clause
constraintsin a query which may or may not trigger constraint exclusion.  i understand that table constraints have to
beconstants -- it doesn't make much sense otherwise.  what i am wondering about is, will constraint exclusion be
triggeredfor queries where the column that is being partitioned on is being constrained things that are not static
constants,for instance, in a join.  (i'm pretty sure the answer is no, because i think constraint exclusion happens
beforereal query planning.)  a concrete example : 
>
> create table foo (i integer not null, j float not null);
> create table foo_1 (check ( i >= 0 and i < 10) ) inherits (foo);
> create table foo_2 (check ( i >= 10 and i < 20) ) inherits (foo);
> create table foo_3 (check ( i >= 20 and i < 30) ) inherits (foo);
> etc..
>
> create table bar (i integer not null, k float not null);
>
> my understanding is that a query like
>
> select * from foo, bar using (i);
>
> can't use constraint exclusion, even if the histogram of i-values on table bar says they only live in the range 0-9,
andso the query will touch all of the tables.  i think this is not favorable compared to a single foo table with a
well-maintainedbtree index on i. 
>

My tests show you are incorrect:


part_test=# explain analyze select * from foo join bar using (i) where
i=9;
                                                    QUERY
PLAN
------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
time=0.004..0.004 rows=0 loops=1)
   ->  Append  (cost=0.00..68.50 rows=20 width=12) (actual
time=0.004..0.004 rows=0 loops=1)
         ->  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
(actual time=0.001..0.001 rows=0 loops=1)
               Filter: (i = 9)
         ->  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
(actual time=0.000..0.000 rows=0 loops=1)
               Filter: (i = 9)
   ->  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
executed)
         ->  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
executed)
               Filter: (i = 9)
 Total runtime: 0.032 ms
(10 rows)



--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: partitioning question 1

From

Ben

Date:

28 October 2010, 17:00:03

On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
>
> My tests show you are incorrect:
>
>
> part_test=# explain analyze select * from foo join bar using (i) where
> i=9;
>                                                    QUERY
> PLAN
> ------------------------------------------------------------------------------------------------------------------
> Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
> time=0.004..0.004 rows=0 loops=1)
>   ->  Append  (cost=0.00..68.50 rows=20 width=12) (actual
> time=0.004..0.004 rows=0 loops=1)
>         ->  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
> (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (i = 9)
>         ->  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
> (actual time=0.000..0.000 rows=0 loops=1)
>               Filter: (i = 9)
>   ->  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
> executed)
>         ->  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
> executed)
>               Filter: (i = 9)
> Total runtime: 0.032 ms
> (10 rows)

strange.  my tests don't agree with your tests :

create table foo (i integer not null, j float not null);
create table foo_1 ( check (i >= 0 and i < 10) ) inherits (foo);
create table foo_2 ( check (i >= 10 and i < 20) ) inherits (foo);
create table foo_3 ( check (i >= 20 and i < 30) ) inherits (foo);
create index foo_1_idx on foo_1 (i);
create index foo_2_idx on foo_2 (i);
create index foo_3_idx on foo_3 (i);
insert into foo_1 select generate_series, generate_series from generate_series(0,9);
insert into foo_2 select generate_series, generate_series from generate_series(10,19);
insert into foo_3 select generate_series, generate_series from generate_series(20,29);
create table bar (i integer not null, k float not null);
create index bar_idx on bar (i);
insert into bar select generate_series, -generate_series from generate_series(0,9);
vacuum analyze;
explain analyze select * from foo join bar using (i);

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.23..42.29 rows=98 width=20) (actual time=0.056..0.118 rows=10 loops=1)
   Hash Cond: (public.foo.i = bar.i)
   ->  Append  (cost=0.00..32.70 rows=1970 width=12) (actual time=0.008..0.043 rows=30 loops=1)
         ->  Seq Scan on foo  (cost=0.00..29.40 rows=1940 width=12) (actual time=0.001..0.001 rows=0 loops=1)
         ->  Seq Scan on foo_1 foo  (cost=0.00..1.10 rows=10 width=12) (actual time=0.005..0.008 rows=10 loops=1)
         ->  Seq Scan on foo_2 foo  (cost=0.00..1.10 rows=10 width=12) (actual time=0.003..0.006 rows=10 loops=1)
         ->  Seq Scan on foo_3 foo  (cost=0.00..1.10 rows=10 width=12) (actual time=0.003..0.006 rows=10 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=12) (actual time=0.025..0.025 rows=10 loops=1)
         ->  Seq Scan on bar  (cost=0.00..1.10 rows=10 width=12) (actual time=0.005..0.013 rows=10 loops=1)
 Total runtime: 0.205 ms
(10 rows)


i'm running pg 8.4.3 with constraint_exclusion=on (just to be safe.)

best, b

Re: partitioning question 1

From

"Reid Thompson"

Date:

28 October 2010, 17:08:57

On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
> explain analyze select * from foo join bar using (i);
vs
explain analyze select * from foo join bar using (i) where i=9;

Re: partitioning question 1

From

"Joshua D. Drake"

Date:

28 October 2010, 17:48:58

On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
> On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
> >
> > My tests show you are incorrect:
> >
> >
> > part_test=# explain analyze select * from foo join bar using (i) where
> > i=9;
> >                                                    QUERY
> > PLAN
> > ------------------------------------------------------------------------------------------------------------------
> > Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
> > time=0.004..0.004 rows=0 loops=1)
> >   ->  Append  (cost=0.00..68.50 rows=20 width=12) (actual
> > time=0.004..0.004 rows=0 loops=1)
> >         ->  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
> > (actual time=0.001..0.001 rows=0 loops=1)
> >               Filter: (i = 9)
> >         ->  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
> > (actual time=0.000..0.000 rows=0 loops=1)
> >               Filter: (i = 9)
> >   ->  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
> > executed)
> >         ->  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
> > executed)
> >               Filter: (i = 9)
> > Total runtime: 0.032 ms
> > (10 rows)
>
> strange.  my tests don't agree with your tests :

Do you have constraint_exclusion turned on? You should verify with show
constraint_exclusion (I saw what you wrote below).

JD

P.S. Blatant plug, you coming to http://www.postgresqlconference.org ?


--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

Re: partitioning question 1

From

Ben

Date:

28 October 2010, 18:07:13

whoops, didn't see the i=9 (linebreak! linebreak!)

nonetheless that is a static constant constraint on the column i, and i was asking if constraint exclusions would work
fordynamic constraints (like those derived from a table joined against.)  so for example the bar table has only 0-9 in
itshistogram for i, but constraint exclusion can't use that to eliminate tables foo_2 and foo_3.  this is precisely the
kindof information an index can use via join selectivity. 

i am not going to the pg conference, sorry to say.

b


On Oct 28, 2010, at 1:48 PM, Joshua D. Drake wrote:

> On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
>> On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
>>>
>>> My tests show you are incorrect:
>>>
>>>
>>> part_test=# explain analyze select * from foo join bar using (i) where
>>> i=9;
>>>                                                   QUERY
>>> PLAN
>>> ------------------------------------------------------------------------------------------------------------------
>>> Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
>>> time=0.004..0.004 rows=0 loops=1)
>>>  ->  Append  (cost=0.00..68.50 rows=20 width=12) (actual
>>> time=0.004..0.004 rows=0 loops=1)
>>>        ->  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
>>> (actual time=0.001..0.001 rows=0 loops=1)
>>>              Filter: (i = 9)
>>>        ->  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
>>> (actual time=0.000..0.000 rows=0 loops=1)
>>>              Filter: (i = 9)
>>>  ->  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
>>> executed)
>>>        ->  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
>>> executed)
>>>              Filter: (i = 9)
>>> Total runtime: 0.032 ms
>>> (10 rows)
>>
>> strange.  my tests don't agree with your tests :
>
> Do you have constraint_exclusion turned on? You should verify with show
> constraint_exclusion (I saw what you wrote below).
>
> JD
>
> P.S. Blatant plug, you coming to http://www.postgresqlconference.org ?
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>

Re: partitioning question 1

From

"Igor Neyman"

Date:

29 October 2010, 11:38:56

> -----Original Message-----
> From: Ben [mailto:midfield@gmail.com]
> Sent: Thursday, October 28, 2010 12:37 PM
> To: pgsql-performance@postgresql.org
> Subject: partitioning question 1
>
> hello --
>
> my last email was apparently too long to respond to so i'll
> split it up into shorter pieces.  my first question :
>
> my understanding of how range partitioning and constraint
> exclusion works leads me to believe that it does not buy any
> query performance that a clustered index doesn't already give
> you -- the advantages are all in maintainability.  an index
> is able to eliminate pages just as well as constraint
> exclusion is able to eliminate table partitions.  the I/O
> advantages of having queries target small subtables are the
> same as the I/O advantages of clustering the index : result
> pages in a small range are very close to each other on disk.
>
> finally, since constraint exclusion isn't as flexible as
> indexing (i've seen old mailing list posts that say that
> constraint exclusion only works with static constants in
> where clauses, and only works with simple operators like >, <
> which basically forces btree indexes when i want to use gist)
> it is indeed likely that partitioning can be slower than one
> big table with a clustered index.
>
> is my intuition completely off on this?
>
> best regards, ben
>

If your SELECT retrieves substantial amount of records, table scan could
be more efficient than index access.

Now, if while retrieving large amount of records "WHERE clause" of this
SELECT still satisfies constraints on some partition(s), then obviously
one (or few) partition scans will be more efficient than full table scan
of non-partitioned table.

So, yes partitioning provides performance improvements, not only
maintenance convenience.

Regards,
Igor Neyman

Re: partitioning question 1

From

Ben

Date:

29 October 2010, 13:16:25

On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:

>> is my intuition completely off on this?
>>
>> best regards, ben
>>
>
> If your SELECT retrieves substantial amount of records, table scan could
> be more efficient than index access.
>
> Now, if while retrieving large amount of records "WHERE clause" of this
> SELECT still satisfies constraints on some partition(s), then obviously
> one (or few) partition scans will be more efficient than full table scan
> of non-partitioned table.
>
> So, yes partitioning provides performance improvements, not only
> maintenance convenience.

my impression was that a *clustered* index would give a lot of the same I/O benefits, in a more flexible way.  if
you'reclustered on the column in question, then an index scan for a range is much like a sequential scan over a
partition(as far as i understand.) 

b

Re: partitioning question 1

From

"Igor Neyman"

Date:

29 October 2010, 13:28:29

> -----Original Message-----
> From: Ben [mailto:midfield@gmail.com]
> Sent: Friday, October 29, 2010 12:16 PM
> To: Igor Neyman
> Cc: pgsql-performance@postgresql.org
> Subject: Re: partitioning question 1
>
> On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:
>
> >> is my intuition completely off on this?
> >>
> >> best regards, ben
> >>
> >
> > If your SELECT retrieves substantial amount of records, table scan
> > could be more efficient than index access.
> >
> > Now, if while retrieving large amount of records "WHERE clause" of
> > this SELECT still satisfies constraints on some partition(s), then
> > obviously one (or few) partition scans will be more efficient than
> > full table scan of non-partitioned table.
> >
> > So, yes partitioning provides performance improvements, not only
> > maintenance convenience.
>
> my impression was that a *clustered* index would give a lot
> of the same I/O benefits, in a more flexible way.  if you're
> clustered on the column in question, then an index scan for a
> range is much like a sequential scan over a partition (as far
> as i understand.)
>
> b
>

Even with clustered index you still read index+table, which is more
expensive than just table scan (in situation I described above).
PG clustered index is not the same as SQL Server clustered index (which
includes actual table pages on the leaf level).

Igor Neyman