Thread: questions about very large table and partitioning

questions about very large table and partitioning

From
"fdu.xiaojf@gmail.com"
Date:
Hi all,

I'm trying to create a very large table with more than 0.6 billion rows,
which is really a big number, so I think I have to create partitioned
tables after some googling.

However, I have a few questions about partitioning in PostgreSQL.

1) PostgreSQL only support partition by inheritance, and rules have to
be created for each child table, this will result *a lot of* rules if
the number of child tables is large.

Are there some smart ways to avoid this kind of mass ?

2) I have added check constraints for child tables. According to the
documents, "query performance can be improved dramatically for certain
kinds of queries". Does this mean that the query can be improved only if
the query contains the constrained column? What will happen if the
constrained column doesn't appear in the WHERE clause?

3) Is partition by inheritance the only appropriate way to organize very
large table in PostgreSQL ?

Thanks in advance.



Re: questions about very large table and partitioning

From
Enrico Sirola
Date:
Il giorno 18/feb/08, alle ore 17:37, fdu.xiaojf@gmail.com ha scritto:
> 1) PostgreSQL only support partition by inheritance, and rules have to
> be created for each child table, this will result *a lot of* rules if
> the number of child tables is large.
>
> Are there some smart ways to avoid this kind of mass ?

you can obtain the same result using a trigger, but you must replace
the trigger function every time you add/remove a partition. The
trigger also has an additional feature: you can use "copy in" in the
"father" table, while copy in bypasses the rules subsystem

> 2) I have added check constraints for child tables. According to the
> documents, "query performance can be improved dramatically for certain
> kinds of queries". Does this mean that the query can be improved
> only if
> the query contains the constrained column? What will happen if the
> constrained column doesn't appear in the WHERE clause?

if the constraint doesn't appear in the where clause, then it is
executed in all partitions

> 3) Is partition by inheritance the only appropriate way to organize
> very
> large table in PostgreSQL ?

don't know. I think partitioning is useful when you perform partitions
"rotation" e.g. when you periodically delete old rows and insert new
ones (think about log files). In this case you should periodically
perform vacuums to ensure that the dead rows gets recycled otherwise
the DB will continue to grow. Partitions help a lot in this case (also
autovacuum does)
I'd try to tune autovacuum for your workload, and only at a second
time I'd try to partition the tables.
There has been some discussion on partitioning in this list in the
past. Try also to take a look at the archives for last june or july
Bye,
e.


Re: questions about very large table and partitioning

From
Erik Jones
Date:
On Feb 18, 2008, at 11:23 AM, Enrico Sirola wrote:

>
> Il giorno 18/feb/08, alle ore 17:37, fdu.xiaojf@gmail.com ha scritto:
>> 1) PostgreSQL only support partition by inheritance, and rules
>> have to
>> be created for each child table, this will result *a lot of* rules if
>> the number of child tables is large.
>>
>> Are there some smart ways to avoid this kind of mass ?
>
> you can obtain the same result using a trigger, but you must
> replace the trigger function every time you add/remove a partition.

This is not strictly true.  If you use EXECUTE to run dynamically
built INSERT/UPDATE/DELETE statements you don't have to replace the
trigger each time.



Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




Re: questions about very large table and partitioning

From
Enrico Sirola
Date:
Il giorno 18/feb/08, alle ore 18:42, Erik Jones ha scritto:

>
> On Feb 18, 2008, at 11:23 AM, Enrico Sirola wrote:
>
>>
>> Il giorno 18/feb/08, alle ore 17:37, fdu.xiaojf@gmail.com ha scritto:
>>> 1) PostgreSQL only support partition by inheritance, and rules
>>> have to
>>> be created for each child table, this will result *a lot of* rules
>>> if
>>> the number of child tables is large.
>>>
>>> Are there some smart ways to avoid this kind of mass ?
>>
>> you can obtain the same result using a trigger, but you must
>> replace the trigger function every time you add/remove a partition.
>
> This is not strictly true.  If you use EXECUTE to run dynamically
> built INSERT/UPDATE/DELETE statements you don't have to replace the
> trigger each time.

oh sure, in that case you can, but I do the opposite way: dynamically
generate the trigger function with a plpgsql script at partition-
creation time :-)


Re: questions about very large table and partitioning

From
"fdu.xiaojf@gmail.com"
Date:
Hi,

Thank you guys.


Enrico Sirola wrote:
>
> Il giorno 18/feb/08, alle ore 17:37, fdu.xiaojf@gmail.com ha scritto:
>> 1) PostgreSQL only support partition by inheritance, and rules have to
>> be created for each child table, this will result *a lot of* rules if
>> the number of child tables is large.
>>
>> Are there some smart ways to avoid this kind of mass ?
>
> you can obtain the same result using a trigger, but you must replace the
> trigger function every time you add/remove a partition. The trigger also
> has an additional feature: you can use "copy in" in the "father" table,
> while copy in bypasses the rules subsystem
>
>> 2) I have added check constraints for child tables. According to the
>> documents, "query performance can be improved dramatically for certain
>> kinds of queries". Does this mean that the query can be improved only if
>> the query contains the constrained column? What will happen if the
>> constrained column doesn't appear in the WHERE clause?
>
> if the constraint doesn't appear in the where clause, then it is
> executed in all partitions
>
>> 3) Is partition by inheritance the only appropriate way to organize very
>> large table in PostgreSQL ?
>
> don't know. I think partitioning is useful when you perform partitions
> "rotation" e.g. when you periodically delete old rows and insert new
> ones (think about log files). In this case you should periodically
> perform vacuums to ensure that the dead rows gets recycled otherwise the
> DB will continue to grow. Partitions help a lot in this case (also
> autovacuum does)
> I'd try to tune autovacuum for your workload, and only at a second time
> I'd try to partition the tables.
> There has been some discussion on partitioning in this list in the past.
> Try also to take a look at the archives for last june or july
> Bye,
> e.
>
>

I have tried to do partition with inheritance and rules. First, I created
master table and many child table, and also the rules for insert, delete and
update. Then I do some select, insert, delete and update operations on the
master to test if it works. However, the insert an delete work very well, but
the update operation seems never return. I tried several times, and could wait
it to return and killed the process.

I tried the commands manually, and it seemed very weird.
The delete command:
DELETE FROM master_table WHERE id='9999' AND data_type='aaa'
and select command with the same condition expression:
SELECT * FROM master_table WHERE id='9999' AND data_type='aaa'
both return without delay.
But the update command with the same condition expression:
UPDATE master_table set data_value='somevalue' WHERE id='9999' AND data_type='aaa'
didn't return in 5 minutes.

Every table has index and constraint on column "id". I have already set
constraint_exclusion=true.

Why the update command runs so slow ?

Thanks a lot.

Re: questions about very large table and partitioning

From
Erik Jones
Date:
On Feb 19, 2008, at 7:46 AM, fdu.xiaojf@gmail.com wrote:

> I have tried to do partition with inheritance and rules. First, I
> created master table and many child table, and also the rules for
> insert, delete and update. Then I do some select, insert, delete
> and update operations on the master to test if it works. However,
> the insert an delete work very well, but the update operation seems
> never return. I tried several times, and could wait it to return
> and killed the process.
>
> I tried the commands manually, and it seemed very weird.
> The delete command:
> DELETE FROM master_table WHERE id='9999' AND data_type='aaa'
> and select command with the same condition expression:
> SELECT * FROM master_table WHERE id='9999' AND data_type='aaa'
> both return without delay.
> But the update command with the same condition expression:
> UPDATE master_table set data_value='somevalue' WHERE id='9999' AND
> data_type='aaa'
> didn't return in 5 minutes.
>
> Every table has index and constraint on column "id". I have already
> set constraint_exclusion=true.
>
> Why the update command runs so slow ?

You'll need to post some of your tables' schemas along with the
results of running your queries through EXPLAIN or EXPLAIN ANALYZE
(better) in order for anyone to answer that.

Erik Jones

DBA | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com