Thread: partitioned table query question

partitioned table query question

From

"Mason Hale"

Date:

08 December 2007, 00:51:48

I'm implementing table partitioning on 8.2.5 -- I've got the tables set up to partition based on the % 10 value of a key.

My problem is that I can't get the planner to take advantage of the partitioning without also adding a key % 10 to the where clause.

Is there any way around that?

My child table definitions are:

CREATE TABLE topic_version_page_0 (

CHECK (topic_version_id % 10 = 0::integer )

) inherits (topic_version_page);

...

CREATE TABLE topic_version_page_9 (

CHECK (topic_version_id % 10 = 9::integer )

) inherits (topic_version_page);

I've also created indexes and constraints for each child table, and an insert trigger on the master table (topic_version_page).

If I include a 'topic_version_id % 10 = [some value]' in my query, then the partitioning shows up in the query plan:

test=> explain select * from topic_version_page where topic_version_id % 10 = (102 % 10) and topic_version_id = 102;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------

Result (cost=4.27..19.23 rows=2 width=194)

-> Append (cost= 4.27..19.23 rows=2 width=194)

-> Bitmap Heap Scan on topic_version_page (cost=4.27..9.62 rows=1 width=194)

Recheck Cond: (topic_version_id = 102)

Filter: ((topic_version_id % 10) = 2)

-> Bitmap Index Scan on index_topic_version_page_on_topic_version_id_and_created_at (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_2 topic_version_page (cost=4.27..9.62 rows=1 width=194)

Recheck Cond: (topic_version_id = 102)

Filter: ((topic_version_id % 10) = 2)

-> Bitmap Index Scan on index_topic_version_page_2_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

(12 rows)

But if I don't explicitly include a 'topic_version_id % 10' -- the plan gets much worse, checking every table (see below).

test=> explain select * from topic_version_page where topic_version_id = 102;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------

Result (cost=4.27..105.68 rows=22 width=194)

-> Append (cost= 4.27..105.68 rows=22 width=194)

-> Bitmap Heap Scan on topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_on_topic_version_id_and_created_at (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_0 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_0_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_1 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_1_on_topic_version_id_and_page_id (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_2 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_2_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_3 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_3_on_topic_version_id_and_page_id (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_4 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_4_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_5 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_5_on_topic_version_id_and_page_id (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_6 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_6_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_7 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_7_on_topic_version_id_and_page_id (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_8 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_8_on_topic_version_id_and_page_id (cost=0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

-> Bitmap Heap Scan on topic_version_page_9 topic_version_page (cost=4.27..9.61 rows=2 width=194)

Recheck Cond: (topic_version_id = 102)

-> Bitmap Index Scan on index_topic_version_page_9_on_topic_version_id_and_page_id (cost= 0.00..4.27 rows=2 width=0)

Index Cond: (topic_version_id = 102)

(46 rows)

Is there anyway to get the benefit of partitioning without adding a additional 'topic_version_id % 10' condition to every query that touches this table?

Thanks in advance.

Mason

Re: partitioned table query question

From

Erik Jones

Date:

10 December 2007, 14:22:02

On Dec 7, 2007, at 10:51 PM, Mason Hale wrote:

> I'm implementing table partitioning on 8.2.5 -- I've got the tables
> set up to partition based on the % 10 value of a key.
>
> My problem is that I can't get the planner to take advantage of the
> partitioning without also adding a key % 10 to the where clause.
> Is there any way around that?
>
> My child table definitions are:
>
> CREATE TABLE topic_version_page_0 (
>    CHECK (topic_version_id % 10 = 0::integer )
> ) inherits (topic_version_page);
>
> ...
>
> CREATE TABLE topic_version_page_9 (
>    CHECK (topic_version_id % 10 = 9::integer )
> ) inherits (topic_version_page);
>
>
> I've also created indexes and constraints for each child table, and
> an insert trigger on the master table (topic_version_page).
>
> If I include a 'topic_version_id % 10 = [some value]' in my query,
> then the partitioning shows up in the query plan:
>
> test=> explain select * from topic_version_page where
> topic_version_id % 10 = (102 % 10) and topic_version_id = 102;
>                                                               QUERY
> PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------
>  Result  (cost=4.27..19.23 rows=2 width=194)
>    ->  Append  (cost= 4.27..19.23 rows=2 width=194)
>          ->  Bitmap Heap Scan on topic_version_page
> (cost=4.27..9.62 rows=1 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                Filter: ((topic_version_id % 10) = 2)
>                ->  Bitmap Index Scan on
> index_topic_version_page_on_topic_version_id_and_created_at
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_2
> topic_version_page  (cost=4.27..9.62 rows=1 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                Filter: ((topic_version_id % 10) = 2)
>                ->  Bitmap Index Scan on
> index_topic_version_page_2_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
> (12 rows)
>
> But if I don't explicitly include a  'topic_version_id % 10' -- the
> plan gets much worse, checking every table (see below).
>
>
> test=> explain select * from topic_version_page where
> topic_version_id = 102;
>                                                               QUERY
> PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------
>  Result  (cost=4.27..105.68 rows=22 width=194)
>    ->  Append  (cost= 4.27..105.68 rows=22 width=194)
>          ->  Bitmap Heap Scan on topic_version_page
> (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_on_topic_version_id_and_created_at  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_0
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_0_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_1
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_1_on_topic_version_id_and_page_id  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_2
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_2_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_3
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_3_on_topic_version_id_and_page_id  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_4
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_4_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_5
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_5_on_topic_version_id_and_page_id  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_6
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_6_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_7
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_7_on_topic_version_id_and_page_id  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_8
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_8_on_topic_version_id_and_page_id
> (cost=0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
>          ->  Bitmap Heap Scan on topic_version_page_9
> topic_version_page  (cost=4.27..9.61 rows=2 width=194)
>                Recheck Cond: (topic_version_id = 102)
>                ->  Bitmap Index Scan on
> index_topic_version_page_9_on_topic_version_id_and_page_id  (cost=
> 0.00..4.27 rows=2 width=0)
>                      Index Cond: (topic_version_id = 102)
> (46 rows)
>
>
> Is there anyway to get the benefit of partitioning without adding a
> additional 'topic_version_id % 10' condition to every query that
> touches this table?
>
> Thanks in advance.

You beat me to the punch on this one.  I was wanting to use modulo
operations for bin style partitioning as well, but this makes things
pretty awkward as well as unintuitive.  So, to the postgres gurus:
What are the limitations of check constraints when used with
constraint exclusion?  Is this really the intended behavior?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: partitioned table query question

From

Vivek Khera

Date:

10 December 2007, 15:17:05

On Dec 10, 2007, at 1:21 PM, Erik Jones wrote:

> You beat me to the punch on this one.  I was wanting to use modulo
> operations for bin style partitioning as well, but this makes things
> pretty awkward as well as unintuitive.  So, to the postgres gurus:
> What are the limitations of check constraints when used with
> constraint exclusion?  Is this really the intended behavior?
>

/me too!

I have vague recollection of reading that the constraints on the child
tables needed to be free of computation (ie, just straight comparison
ranges) but I can't find that reference now.

But in my case, I can almost always pick the appropriate sub-table
from the application level anyway.

Script to reset all sequence values in the a given DB?

From

Nathan Wilhelmi

Date:

10 December 2007, 16:01:11

Hello - Does anyone happen to have a SQL script or function that can
reset all the sequence values found in a given DB? When we rebuild the
DB it would be handy to be able to set all the sequence back to a known
starting place.

Thanks!

-Nate

Re: Script to reset all sequence values in the a given DB?

From

"Pavel Stehule"

Date:

10 December 2007, 17:33:24

On 10/12/2007, Nathan Wilhelmi <wilhelmi@ucar.edu> wrote:
> Hello - Does anyone happen to have a SQL script or function that can
> reset all the sequence values found in a given DB? When we rebuild the
> DB it would be handy to be able to set all the sequence back to a known
> starting place.
>

create or replace function resetall()
returns void as $$
declare
  v varchar;
  m integer;
begin
  for v in
     select n.nspname || '.' || c.relname
         from pg_catalog.pg_class c
                  left join
                  pg_catalog.pg_namespace n
                  on n.oid = c.relnamespace
        where c.relkind = 'S'
  loop
    execute 'select min_value from '||v into m;
    setval(v, m, false);
  end loop;
  return;
end; $$ language plpgsql;

Regards

Pavel Stehule
> Thanks!
>
> -Nate
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Script to reset all sequence values in the a given DB?

From

"Obe, Regina"

Date:

10 December 2007, 17:52:58

 Did you want to set to a specific known value or the min value of the
sequence.  I think Pavel's sets to the min value of the sequence.

The below sets all the sequences to the same value

CREATE AGGREGATE sum ( BASETYPE = text,
                      SFUNC = textcat,
                        STYPE = text,
                        INITCOND = '' );


CREATE OR REPLACE FUNCTION cp_resetsequences(resetto integer)
  RETURNS void AS
$BODY$
BEGIN
    EXECUTE (SELECT SUM('ALTER SEQUENCE ' || sequence_schema || '.'
|| sequence_name || ' RESTART WITH ' || CAST(resetto As varchar(50)) ||
'; ' ) 
        FROM  information_schema.sequences);
END
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


--Note this will set all the sequences in the database to 150
 SELECT cp_resetsequences(150);



-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pavel Stehule
Sent: Monday, December 10, 2007 4:33 PM
To: Nathan Wilhelmi
Cc: PGSQL Mailing List
Subject: Re: [GENERAL] Script to reset all sequence values in the a
given DB?

On 10/12/2007, Nathan Wilhelmi <wilhelmi@ucar.edu> wrote:
> Hello - Does anyone happen to have a SQL script or function that can
> reset all the sequence values found in a given DB? When we rebuild the
> DB it would be handy to be able to set all the sequence back to a
known
> starting place.
>

create or replace function resetall()
returns void as $$
declare
  v varchar;
  m integer;
begin
  for v in
     select n.nspname || '.' || c.relname
         from pg_catalog.pg_class c
                  left join
                  pg_catalog.pg_namespace n
                  on n.oid = c.relnamespace
        where c.relkind = 'S'
  loop
    execute 'select min_value from '||v into m;
    setval(v, m, false);
  end loop;
  return;
end; $$ language plpgsql;

Regards

Pavel Stehule
> Thanks!
>
> -Nate
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.

Re: Script to reset all sequence values in the a given DB?

From

Vivek Khera

Date:

10 December 2007, 18:25:50

please don't hijack old threads ("partitioned table query question" in
this case) and change the subject line to start your new question. it
messes up threaded mail readers.

thanks.

On Dec 10, 2007, at 3:00 PM, Nathan Wilhelmi wrote:

> Hello - Does anyone happen to have a SQL script or function that can
> reset all the sequence values found in a given DB? When we rebuild
> the DB it would be handy to be able to set all the sequence back to
> a known starting place.
>
> Thanks!
>
> -Nate
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: partitioned table query question

From

Tom Lane

Date:

10 December 2007, 18:29:55

Erik Jones <erik@myemma.com> writes:
> You beat me to the punch on this one.  I was wanting to use modulo
> operations for bin style partitioning as well, but this makes things
> pretty awkward as well as unintuitive.  So, to the postgres gurus:
> What are the limitations of check constraints when used with
> constraint exclusion?  Is this really the intended behavior?

Don't hold your breath.  predtest.c has some intelligence about
btree-indexable comparison operators, but none about modulo.

In the particular case here, the reason that
    WHERE (foo % 10) = 3
is seen to be incompatible with a check constraint
    (foo % 10) = 9
is that the "=" is btree indexable, so predtest knows something about
its semantics; and given that % is an immutable operator, the code is
able to see that these could only both be true if 3 = 9.  This
deduction involves exactly zero %-specific knowledge.  In particular
it doesn't require assuming that "a=b" implies "(a % c) = (b % c)",
which would involve much more knowledge about the specific operators
involved than is available to the planner.  (The fact that an operator
is a btree equality member doesn't mean that it might not consider two
values to be equal that are distinct to some other operators of the
data type.  See plus and minus zero in IEEE float arithmetic for one
handy example ... and that's not even considering nonstandard versions
of equality.)

            regards, tom lane

Re: partitioned table query question

From

Erik Jones

Date:

10 December 2007, 19:12:33

On Dec 10, 2007, at 4:29 PM, Tom Lane wrote:

> Erik Jones <erik@myemma.com> writes:
>> You beat me to the punch on this one.  I was wanting to use modulo
>> operations for bin style partitioning as well, but this makes things
>> pretty awkward as well as unintuitive.  So, to the postgres gurus:
>> What are the limitations of check constraints when used with
>> constraint exclusion?  Is this really the intended behavior?
>
> Don't hold your breath.  predtest.c has some intelligence about
> btree-indexable comparison operators, but none about modulo.
>
> In the particular case here, the reason that
>     WHERE (foo % 10) = 3
> is seen to be incompatible with a check constraint
>     (foo % 10) = 9
> is that the "=" is btree indexable, so predtest knows something about
> its semantics; and given that % is an immutable operator, the code is
> able to see that these could only both be true if 3 = 9.

I get that.

> This
> deduction involves exactly zero %-specific knowledge.  In particular
> it doesn't require assuming that "a=b" implies "(a % c) = (b % c)",
> which would involve much more knowledge about the specific operators
> involved than is available to the planner.  (The fact that an operator
> is a btree equality member doesn't mean that it might not consider two
> values to be equal that are distinct to some other operators of the
> data type.  See plus and minus zero in IEEE float arithmetic for one
> handy example ... and that's not even considering nonstandard versions
> of equality.)

Forgive me if I'm nagging on this, I just want to understand this
better.  Why does evaluating a CHECK constraint like 'CHECK some_id %
100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
anything about equality properites of %?  Or, rather, why does it
stop there?  Can't it just substitute the given value for some_id in
to the check expression, execute it and check the result value for
TRUE/FALSE?

On a related note, how would you recommend implementing some kind of
bin based (i.e. hash values, round robin, etc...) partitioning scheme
if this won't work?  I've tried a number of different approaches with
functions in the check constraint but can't seem to get anything
going there either.  I'm the third person this week (all in this
thread and another I had going) that's interested in this approach.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: partitioned table query question

From

Tom Lane

Date:

10 December 2007, 19:51:06

Erik Jones <erik@myemma.com> writes:
> Forgive me if I'm nagging on this, I just want to understand this
> better.  Why does evaluating a CHECK constraint like 'CHECK some_id %
> 100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
> anything about equality properites of %?  Or, rather, why does it
> stop there?  Can't it just substitute the given value for some_id in
> to the check expression, execute it and check the result value for
> TRUE/FALSE?

What "given value"?

What you're missing is that the condition "a = b" does not mean that
"f(a) = f(b)" for every function f.  It is possible to define
constraints on equality that would make that true, but such constraints
would be far stronger than what is required to make btree (or even hash)
indexes work.

In the example I gave, we are able to conclude that 3 is unequal to 9
not because of any a-priori knowledge, but because we apply the specific
operator to the specific constants and find out that it yields false.
Our knowledge of the consistency requirements that are imposed on btree
equality operators then allows us to determine that the two original
conditions can't be true at the same time.

This does *not* imply assuming that the two constants are really "the
same" in the sense that no other operator in the system could tell them
apart.  This isn't mere academic hairsplitting: there actually are
standard equality operators in the system for which such a conclusion
would fail.  I already mentioned float comparison, and numeric
comparison has similar behaviors --- for instance,

regression=# select '0.00'::numeric = '0.0'::numeric;
 ?column?
----------
 t
(1 row)

regression=# select text('0.00'::numeric) = text('0.0'::numeric);
 ?column?
----------
 f
(1 row)

            regards, tom lane

Re: partitioned table query question

From

Erik Jones

Date:

10 December 2007, 21:01:59

On Dec 10, 2007, at 5:50 PM, Tom Lane wrote:

> Erik Jones <erik@myemma.com> writes:
>> Forgive me if I'm nagging on this, I just want to understand this
>> better.  Why does evaluating a CHECK constraint like 'CHECK some_id %
>> 100 = 32' against WHERE clause like 'WHERE some_id=1132' need to know
>> anything about equality properites of %?  Or, rather, why does it
>> stop there?  Can't it just substitute the given value for some_id in
>> to the check expression, execute it and check the result value for
>> TRUE/FALSE?
>
> What "given value"?

The where clause in the select query.
>
> What you're missing is that the condition "a = b" does not mean that
> "f(a) = f(b)" for every function f.  It is possible to define
> constraints on equality that would make that true, but such
> constraints
> would be far stronger than what is required to make btree (or even
> hash)
> indexes work.

In the table constraint CHECK (some_id % 100 = 32), isn't that just f
(a) = 32?

> In the example I gave, we are able to conclude that 3 is unequal to 9
> not because of any a-priori knowledge, but because we apply the
> specific
> operator to the specific constants and find out that it yields false.
> Our knowledge of the consistency requirements that are imposed on
> btree
> equality operators then allows us to determine that the two original
> conditions can't be true at the same time.
>
> This does *not* imply assuming that the two constants are really "the
> same" in the sense that no other operator in the system could tell
> them
> apart.  This isn't mere academic hairsplitting: there actually are
> standard equality operators in the system for which such a conclusion
> would fail.  I already mentioned float comparison, and numeric
> comparison has similar behaviors --- for instance,
>
> regression=# select '0.00'::numeric = '0.0'::numeric;
>  ?column?
> ----------
>  t
> (1 row)
>
> regression=# select text('0.00'::numeric) = text('0.0'::numeric);
>  ?column?
> ----------
>  f
> (1 row)

I get your arguments wrt the known (to the planner) semantics of the
equality operator.  I guess what I don't understand is that given the
query

SELECT COUNT(*)
FROM table
WHERE some_id=34;

on a table with the much discussed constraint (34 % 100) = 32 isn't
simply evaluated as a one-time filter whenever whatever constraint
exclusion code examines child partition tables' constraints.

Again, though, is there some better way to go about implementing some
kind of hash based partitioning in postgres besides this that would
be more natural wrt queries?

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: partitioned table query question

From

Tom Lane

Date:

11 December 2007, 00:19:02

Erik Jones <erik@myemma.com> writes:
> I guess what I don't understand is that given the query

> SELECT COUNT(*)
> FROM table
> WHERE some_id=34;

> on a table with the much discussed constraint (34 % 100) = 32 isn't
> simply evaluated as a one-time filter whenever whatever constraint
> exclusion code examines child partition tables' constraints.

I'm not sure how else to explain it: the fact that the WHERE clause
asserts that some operator named "=" will succeed on some_id and 34
is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
will give the same result.  Knowing that the "=" operator is a btree
equality operator gives us latitude to make certain conclusions, but
not that one, because there is no way to know whether the semantics
of the particular btree operator class have anything to do with the
behavior of "%".

If you dig in the PG archives you will find some discussions of
inventing a "real equality" flag for operators, which would authorize
the planner to make such deductions for any immutable operator/function.
The idea hasn't gone anywhere, partly because it's not clear that it
would really help in very many common cases.  The fact that we could
*not* set the flag on such common cases as float and numeric equality
is a bit discouraging in that connection.

            regards, tom lane

Re: partitioned table query question

From

"Trevor Talbot"

Date:

11 December 2007, 00:50:22

On 12/10/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Erik Jones <erik@myemma.com> writes:
> > I guess what I don't understand is that given the query
>
> > SELECT COUNT(*)
> > FROM table
> > WHERE some_id=34;
>
> > on a table with the much discussed constraint (34 % 100) = 32 isn't
> > simply evaluated as a one-time filter whenever whatever constraint
> > exclusion code examines child partition tables' constraints.
>
> I'm not sure how else to explain it: the fact that the WHERE clause
> asserts that some operator named "=" will succeed on some_id and 34
> is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
> will give the same result.  Knowing that the "=" operator is a btree
> equality operator gives us latitude to make certain conclusions, but
> not that one, because there is no way to know whether the semantics
> of the particular btree operator class have anything to do with the
> behavior of "%".

Erik is questioning is why it has to assume anything. Why can't it
just execute the expression and find out? On a high level, the
partitioning system looks exactly like partial expression indexes.

Re: partitioned table query question

From

"Trevor Talbot"

Date:

11 December 2007, 00:53:15

On 12/10/07, Trevor Talbot <quension@gmail.com> wrote:
> On 12/10/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Erik Jones <erik@myemma.com> writes:
> > > I guess what I don't understand is that given the query
> >
> > > SELECT COUNT(*)
> > > FROM table
> > > WHERE some_id=34;
> >
> > > on a table with the much discussed constraint (34 % 100) = 32 isn't
> > > simply evaluated as a one-time filter whenever whatever constraint
> > > exclusion code examines child partition tables' constraints.
> >
> > I'm not sure how else to explain it: the fact that the WHERE clause
> > asserts that some operator named "=" will succeed on some_id and 34
> > is not sufficient grounds to assume that "some_id % 100" and "34 % 100"
> > will give the same result.  Knowing that the "=" operator is a btree
> > equality operator gives us latitude to make certain conclusions, but
> > not that one, because there is no way to know whether the semantics
> > of the particular btree operator class have anything to do with the
> > behavior of "%".
>
> Erik is questioning is why it has to assume anything. Why can't it
> just execute the expression and find out? On a high level, the
> partitioning system looks exactly like partial expression indexes.

...Oops. I sit here for 10 minutes pondering it, and figure out the
comparison with expression indexes isn't really true 2 seconds after I
hit "send". Sigh.

Re: partitioned table query question

From

Tom Lane

Date:

11 December 2007, 01:18:55

"Trevor Talbot" <quension@gmail.com> writes:
> Erik is questioning is why it has to assume anything. Why can't it
> just execute the expression and find out?

Because the whole point of the problem is to *not* execute the
expression, but to assume that it must yield false, for every row
of a given partition.  Without a solid logical basis for that
assumption, you're just building a house of cards.

The bottom line here is that we have built a partitioning facility
out of spare parts, ie, a very generalized contradiction-proving
section of the planner.  It's been an interesting exercise, and
it's certainly resulted in a better contradiction-prover than
we would have had otherwise, but it's got obvious limitations both
in planning performance and in the sorts of partitioning rules we
can support.  My feeling is that trying to push the current approach to
do bin or hash partitioning transparently is likely not reasonable.
Eventually we'll have to push an understanding of partitioning down to
some lower level of the system --- that is, if we think it's critical
enough to justify that much effort.

            regards, tom lane

Re: partitioned table query question

From

Simon Riggs

Date:

11 December 2007, 04:41:03

On Mon, 2007-12-10 at 23:18 -0500, Tom Lane wrote:
> Erik Jones <erik@myemma.com> writes:

> If you dig in the PG archives you will find some discussions of
> inventing a "real equality" flag for operators, which would authorize
> the planner to make such deductions for any immutable operator/function.
> The idea hasn't gone anywhere, partly because it's not clear that it
> would really help in very many common cases.

I think the only thing we can really say is that we haven't done it yet,
everything takes time and this is a tricky problem.

People are asking for this to work with an integer to allow hashing. We
already use that exact concept internally for hash aggregation and hash
joins, so I don't think we can argue it wouldn't help in common cases.

> The fact that we could
> *not* set the flag on such common cases as float and numeric equality
> is a bit discouraging in that connection.

With respect, that argument is circular. You required the implementation
to be a marking of the operator in that way, which leads to the
restriction you note. It can work for float and numeric, just not in all
conceivable cases. Most datatypes have some kind of weirdness that
prevents us from specifying it for everything in an easy manner.

Can I ask you to please lead us out of this impasse? The main problem is
working out an acceptable way to mark operators/datatypes/whatever or
combinations or thereof as "able to be transformed". Once we have that,
working out the rest is complex but doable. And it will then allow
partitioning, partial indexes and functional indexes to have a wider
range of application.

We really only want this for
- hashing functions, various, mostly on integers, uuids, text?
- upper() and lower() on text
- substr() on text
- very few others

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: partitioned table query question

From

"Mike Rylander"

Date:

11 December 2007, 09:20:44

On Dec 10, 2007 8:01 PM, Erik Jones <erik@myemma.com> wrote:
>
[snip]

> Again, though, is there some better way to go about implementing some
> kind of hash based partitioning in postgres besides this that would
> be more natural wrt queries?
>

Adding a column to hold the result of the %, perhaps updated by a
trigger so your app needn't change, and partitioning on that would be
the obvious way to get what you want today.  If you have a byte or two
of slack space in the tuple (by alignment), just use a "char" or an
INT2.  Assuming you don't affect fully aligned base tuple size, there
should be no table bloat, and no noticeable effect on speed.  As far
as being more natural WRT queries, well, you'd add to your where
clause

 bin = 34

instead of

  some_id % 100 = 34

The former seems to me to be more natural from the narrow perspective
of the SELECT statement.

--miker

Re: partitioned table query question

From

Erik Jones

Date:

11 December 2007, 11:08:53

On Dec 11, 2007, at 7:20 AM, Mike Rylander wrote:

> On Dec 10, 2007 8:01 PM, Erik Jones <erik@myemma.com> wrote:
>>
> [snip]
>
>> Again, though, is there some better way to go about implementing some
>> kind of hash based partitioning in postgres besides this that would
>> be more natural wrt queries?
>>
>
> Adding a column to hold the result of the %, perhaps updated by a
> trigger so your app needn't change, and partitioning on that would be
> the obvious way to get what you want today.  If you have a byte or two
> of slack space in the tuple (by alignment), just use a "char" or an
> INT2.  Assuming you don't affect fully aligned base tuple size, there
> should be no table bloat, and no noticeable effect on speed.  As far
> as being more natural WRT queries, well, you'd add to your where
> clause
>
>  bin = 34
>
> instead of
>
>   some_id % 100 = 34
>
> The former seems to me to be more natural from the narrow perspective
> of the SELECT statement.

Well, given that the bin is computed as a function of some_id, the
most natural way would be to not have to mention that bin in SELECT
statements at all.  However, it does appear that either a.) including
the bin as a table attribute and in the where clause (either directly
or the computation) or b.) precomputing the bin and directly
accessing the child table will be the only options we have for now.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: partitioned table query question

From

"Mason Hale"

Date:

11 December 2007, 11:37:47

Well, given that the bin is computed as a function of some_id, the
most natural way would be to not have to mention that bin in SELECT
statements at all. However, it does appear that either a.) including
the bin as a table attribute and in the where clause (either directly
or the computation) or b.) precomputing the bin and directly
accessing the child table will be the only options we have for now.

It occurs to me that if you are going to have to compute the bin anyway, you can also determine which table you need to work with directly.
And if you can do that you can modify the table name in the query instead of the adding an extra condition. This will save you a (short) step in the query plan, by avoiding checking the parent table for any matching rows. It may be a very small difference, but hey, it adds up.

The downside, that my application code needs to be aware of partitioning at the database layer, seems equivalent either way. And to be clear this is a big downside for me, I'm going to have to make some significant application layer changes to take advantage of partitioning, and if we later decide to change our partitioning rules in the future, we're going to have to update the application logic again. I'll willing to bite that bullet now, but just want to register my disappointment that partitioning isn't able to handle this common case more effectively. I hope it will handle it better in some future release.

In effect, all partitioning is doing for you in this case is giving you a more simple way to query the entire set of tables at once, rather than building a query that UNIONs all the tables. I also guess that if you do any bulk insert via COPY or INSERT ... SELECT, and have an insert trigger on the parent table, then that will help you route the inserted rows to the appropriate child tables. Of course the trigger is doing the work in that case as well, not the partitioning.

Re: partitioned table query question

From

Gregory Stark

Date:

11 December 2007, 11:44:16

"Erik Jones" <erik@myemma.com> writes:

> Well, given that the bin is computed as a function of some_id, the  most
> natural way would be to not have to mention that bin in SELECT  statements at
> all.

The problem Tom's tried to explain is that the function may or may not
preserve the bin. So for example if you wanted to bin based on the final digit
of a numeric number, so you had a constraint like

 CHECK substring(x::text, length(x::text)) = 0

And then you performed a query with something like "WHERE x = 1.0". The
constraint would appear to exclude all but bin 0. Whereas in fact it's
possible that records with the value "1" would appear in bin 1.

What's needed to make this work is some knowledge in the planner that the
numeric->text cast does not preserve the equality property of the numeric
operator class.

This would be the same information that would be needed to expression indexes
more useful. So if you had an expression index on "substring(name,1,3)" and
performed a query with a clause like "WHERE name = 'Gregory'" it could
intelligently perform an index scan on the key "Greg" and then recheck the key
"Gregory" against the table column.

The problem is that that's quite a lot of machinery. It's not just a boolean
flag for each function since there could be multiple "equals". Also you want
to know separately whether it preserves equality and whether it preserves the
entire btree ordering. So you potentially need a whole new table with every
combination of btree operator class and function and several boolean columns
for each combination.

> However, it does appear that either a.) including the bin as a table
> attribute and in the where clause (either directly or the computation) or
> b.) precomputing the bin and directly accessing the child table will be the
> only options we have for now.

Or the near future.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: partitioned table query question

From

Vivek Khera

Date:

11 December 2007, 13:20:57

On Dec 11, 2007, at 10:08 AM, Erik Jones wrote:

> b.) precomputing the bin and directly accessing the child table will
> be the only options we have for now.

This is where I'm headed.... I have only one or two queries that don't
specify the partitioned ID, and those need a full table scan anyhow. :-(

Re: partitioned table query question

From

Erik Jones

Date:

11 December 2007, 13:22:46

On Dec 11, 2007, at 9:44 AM, Gregory Stark wrote:

>
> "Erik Jones" <erik@myemma.com> writes:
>
>> Well, given that the bin is computed as a function of some_id,
>> the  most
>> natural way would be to not have to mention that bin in SELECT
>> statements at
>> all.
>
> The problem Tom's tried to explain is that the function may or may not
> preserve the bin. So for example if you wanted to bin based on the
> final digit
> of a numeric number, so you had a constraint like
>
>  CHECK substring(x::text, length(x::text)) = 0
>
> And then you performed a query with something like "WHERE x = 1.0".
> The
> constraint would appear to exclude all but bin 0. Whereas in fact it's
> possible that records with the value "1" would appear in bin 1.
>
> What's needed to make this work is some knowledge in the planner
> that the
> numeric->text cast does not preserve the equality property of the
> numeric
> operator class.
>
> This would be the same information that would be needed to
> expression indexes
> more useful. So if you had an expression index on "substring(name,
> 1,3)" and
> performed a query with a clause like "WHERE name = 'Gregory'" it could
> intelligently perform an index scan on the key "Greg" and then
> recheck the key
> "Gregory" against the table column.
>
> The problem is that that's quite a lot of machinery. It's not just
> a boolean
> flag for each function since there could be multiple "equals". Also
> you want
> to know separately whether it preserves equality and whether it
> preserves the
> entire btree ordering. So you potentially need a whole new table
> with every
> combination of btree operator class and function and several
> boolean columns
> for each combination.

Ok, that all makes sense and I can see that that's what Tom was
saying, it just took a bit of paraphrasing for me to get it.

>> However, it does appear that either a.) including the bin as a table
>> attribute and in the where clause (either directly or the
>> computation) or
>> b.) precomputing the bin and directly accessing the child table
>> will be the
>> only options we have for now.
>
> Or the near future.

sigh :)

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: partitioned table query question

From

Vivek Khera

Date:

11 December 2007, 13:32:10

On Dec 11, 2007, at 10:44 AM, Gregory Stark wrote:

> The problem Tom's tried to explain is that the function may or may not
> preserve the bin. So for example if you wanted to bin based on the
> final digit
> of a numeric number, so you had a constraint like

I, along with at least Erik, was thinking that the constraint
expression would be evaluated to determine whether to include the
partition in the final plan.  Based on Tom's description, it is not
the case: the planner basically proves that the constraint will be
false.  Until this was clarified, Tom's points totally confused the
heck out of me.

It would be amazingly wonderful if this distinction could be posted to
the online docs.  It will surely help future generations :-)

Re: partitioned table query question

From

Tom Lane

Date:

11 December 2007, 13:47:44

Vivek Khera <khera@kcilink.com> writes:
> I, along with at least Erik, was thinking that the constraint
> expression would be evaluated to determine whether to include the
> partition in the final plan.  Based on Tom's description, it is not
> the case: the planner basically proves that the constraint will be
> false.  Until this was clarified, Tom's points totally confused the
> heck out of me.

> It would be amazingly wonderful if this distinction could be posted to
> the online docs.  It will surely help future generations :-)

Feel free to send in a proposed doc patch.  I'm not very clear on where
you think this should go or what it should say instead of what it does
say.

BTW, I always think of it the other way around: we're proving that the
WHERE condition must be false for any row meeting the check constraint.

            regards, tom lane

Re: partitioned table query question

From

Robert Treat

Date:

12 December 2007, 10:51:08

On Monday 10 December 2007 20:01, Erik Jones wrote:
> Again, though, is there some better way to go about implementing some
> kind of hash based partitioning in postgres besides this that would
> be more natural wrt queries?
>

One way is to set a static bin id for each partition, then do a select with
where bin_id = mod(2112,3);   Where 2112 equals the number your looking for,
and 3 would be the number of buckets.

--
Robert Treat
http://www.omniti.com/
We're Big. On PostgreSQL.