Thread: Partitioning an existing table

Partitioning an existing table

From
Phoenix Kiula
Date:
Hi.

The partitioning documentation in PG is very clear on how to partition
a new table. Create child tables, and have triggers that manage
INSERT, UPDATE and DELETE commands.

How about doing this with existing massive tables? (Over 120 million rows)

I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?

Thanks

Re: Partitioning an existing table

From
Cédric Villemain
Date:
2011/4/25 Phoenix Kiula <phoenix.kiula@gmail.com>:
> Hi.
>
> The partitioning documentation in PG is very clear on how to partition
> a new table. Create child tables, and have triggers that manage
> INSERT, UPDATE and DELETE commands.
>
> How about doing this with existing massive tables? (Over 120 million rows)
>
> I could create a new parent table with child tables, and then INSERT
> all these millions of rows to put them into the right partition. But
> is that recommended?

It will be faster to INSERT directly in the good child tables and you
can make multiple INSERT in parrallel (depend of your IO system and
number of tables)

>
> Thanks
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Partitioning an existing table

From
"David Johnston"
Date:
INSERT INTO child SELECT FROM parent WHERE ....;

Once you have fully allocated all the records on the parent table you can
then:

TRUNCATE parent;

Then install the triggers.

The only real downside is that the queries are going to take forever to run.
If you can naturally sub-divide the child tables (say into timestamp ranges)
you might try moving one month at a time just to keep your eye on query
progress.  Make use of functions to perform the actual INSERT INTO SELECT;
and probably have it capture and log statistics regarding each "INSERT INTO
SELECT" statement that is fired (data start time, data end time, target
child, number of records, execution start/end times).

David J.


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Cédric Villemain
Sent: Monday, April 25, 2011 6:58 AM
To: Phoenix Kiula
Cc: PG-General Mailing List
Subject: Re: [GENERAL] Partitioning an existing table

2011/4/25 Phoenix Kiula <phoenix.kiula@gmail.com>:
> Hi.
>
> The partitioning documentation in PG is very clear on how to partition
> a new table. Create child tables, and have triggers that manage
> INSERT, UPDATE and DELETE commands.
>
> How about doing this with existing massive tables? (Over 120 million
> rows)
>
> I could create a new parent table with child tables, and then INSERT
> all these millions of rows to put them into the right partition. But
> is that recommended?

It will be faster to INSERT directly in the good child tables and you can
make multiple INSERT in parrallel (depend of your IO system and number of
tables)

>
> Thanks
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To
> make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain               2ndQuadrant http://2ndQuadrant.fr/    
PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Partitioning an existing table

From
Vick Khera
Date:
On Mon, Apr 25, 2011 at 6:46 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?

I did this twice (several years ago).  Basically, you create your partitions and set up the necessary triggers you want (I re-write the app to insert directly into the correct partition).  Then all new data starts going into the partitions.  Next, write a program that loops over the current master table, and moves the data into each partition some small hunk at a time, in a transaction.  This can take a long time.  For us, it took about 7 days to move O(100m) rows.  Then, when you're done, truncate the master table, and enforce that no new data is allowed to be inserted into it.

If you can, of course, try this out on a spare copy of that table.

Re: Partitioning an existing table

From
Raghavendra
Date:
On Mon, Apr 25, 2011 at 7:40 PM, Vick Khera <vivek@khera.org> wrote:
On Mon, Apr 25, 2011 at 6:46 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?

I did this twice (several years ago).  Basically, you create your partitions and set up the necessary triggers you want (I re-write the app to insert directly into the correct partition).  Then all new data starts going into the partitions.  Next, write a program that loops over the current master table, and moves the data into each partition some small hunk at a time, in a transaction.  This can take a long time.  For us, it took about 7 days to move O(100m) rows.  Then, when you're done, truncate the master table, and enforce that no new data is allowed to be inserted into it.

If you can, of course, try this out on a spare copy of that table.


COPY command would be the best and fast way to do bulk inserts. Even for partitioning table from the base table, 

1. Send all the data to .csv file with COPY TO command of the base table(which will undergo for partition).
2. Create the partition setup with TRIGGER's
3. Use COPY FROM command for inserting data into partition table. 

This approach will fasten the inserts. Increasing the memory will also help.

Note: RULE based approach wont work with COPY command. 

Best Regards,
Raghavendra
EnterpriseDB Corporation
The Enterprise Postgres Company




Re: Partitioning an existing table

From
Vick Khera
Date:


On Mon, Apr 25, 2011 at 10:53 AM, Raghavendra <raghavendra.rao@enterprisedb.com> wrote:
1. Send all the data to .csv file with COPY TO command of the base table(which will undergo for partition).
2. Create the partition setup with TRIGGER's
3. Use COPY FROM command for inserting data into partition table. 

so now when your running application goes to query the table, it gets doubles?  if you do it in transactions, then how long are you going to cause the master table to be locked when doing such a bulk delete?

my point is to minimize service interruption, and that means moving small hunks at a time to minimize the locks needed.

Re: Partitioning an existing table

From
Raghavendra
Date:
so now when your running application goes to query the table, it gets doubles?  if you do it in transactions, then how long are you going to cause the master table to be locked when doing such a bulk delete?

my point is to minimize service interruption, and that means moving small hunks at a time to minimize the locks needed.


Agreed, if you are pointing to the application..

The partitioning documentation in PG is very clear on how to partition
a new table. Create child tables, and have triggers that manage
INSERT, UPDATE and DELETE commands.
How about doing this with existing massive tables? (Over 120 million rows)
I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?

Here, I would go with COPY command rather than INSERT. Firstly, setup the partition/child tables with relevant triggers and calling function on it. Use COPY FROM command pointing to parent table by calling the .csv file(created on MASSIVE table).  Triggers will push the data to the respective child tables. Faster and efficient way.  

Best Regards,
Raghavendra
EnterpriseDB Corporation
The Enterprise Postgres Company

Re: Partitioning an existing table

From
Greg Smith
Date:
On 04/25/2011 10:10 AM, Vick Khera wrote:
> Basically, you create your partitions and set up the necessary
> triggers you want (I re-write the app to insert directly into the
> correct partition).  Then all new data starts going into the
> partitions.  Next, write a program that loops over the current master
> table, and moves the data into each partition some small hunk at a
> time, in a transaction.  This can take a long time.  For us, it took
> about 7 days to move O(100m) rows.  Then, when you're done, truncate
> the master table, and enforce that no new data is allowed to be
> inserted into it.

Vick's presentation at
http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf
is still one of the best case studies of how to do this sort of
migration around.  I was inspired by several of the ideas there when
doing the partitioning chapter of my book, which is the only place I'm
aware of covering this in even more detail than his case study.

Cedric's idea for how to do this even more aggressively (multiple
workers) is what you want if this is a one-time operation you're taking
the system down for.  In other situations, the gradual migration
strategy Vick is suggesting is more appropriate.

Some of the other ideas suggested in this thread won't work at all, so
be careful who you listen to here.  You can't leave copies of the data
in the parent and put it into the child partition without all sorts of
potential downsides.  And you really, really want to do this as a proper
database transaction, which is easiest to express using INSERT instead
of COPY.  If any step of the migration goes wrong, being able to do
ROLLBACK and undo the recent bad steps is vital.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Partitioning an existing table

From
Phoenix Kiula
Date:
On Tue, Apr 26, 2011 at 8:28 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 04/25/2011 10:10 AM, Vick Khera wrote:
>>
>> Basically, you create your partitions and set up the necessary triggers
>> you want (I re-write the app to insert directly into the correct partition).
>>  Then all new data starts going into the partitions.  Next, write a program
>> that loops over the current master table, and moves the data into each
>> partition some small hunk at a time, in a transaction.  This can take a long
>> time.  For us, it took about 7 days to move O(100m) rows.  Then, when you're
>> done, truncate the master table, and enforce that no new data is allowed to
>> be inserted into it.
>
> Vick's presentation at
> http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf



Thanks everyone for the excellent suggestions.

Vick/Greg, thanks in particular for this reference. The doc gives me
ideas for other things too!

Re: Partitioning an existing table

From
Vick Khera
Date:
On Thu, Apr 28, 2011 at 12:12 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Vick's presentation at
> http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf


Thanks everyone for the excellent suggestions.

Vick/Greg, thanks in particular for this reference. The doc gives me
ideas for other things too!

I've been trying to get the OSCON folk to accept this talk for several years now, to reach a wider audience.  Seems they don't like me... :(

Re: Partitioning an existing table

From
Raghavendra
Date:
Thanks everyone for the excellent suggestions.

Vick/Greg, thanks in particular for this reference. The doc gives me
ideas for other things too!


+1
 
I've been trying to get the OSCON folk to accept this talk for several years now, to reach a wider audience.  Seems they don't like me... :(

Seems they missed one valid presentation :)

Best Regards,
Raghavendra
EnterpriseDB Corporation

Re: Partitioning an existing table

From
Greg Smith
Date:
Vick Khera wrote:
> I've been trying to get the OSCON folk to accept this talk for several
> years now, to reach a wider audience.  Seems they don't like me... :(

Too specific for OSCON.  It's not you, it's the topic.  I've tried
submitting far more generic things than that, but still with a
PostgreSQL tilt to them, and I never get those accepted either.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Partitioning an existing table

From
Dimitri Fontaine
Date:
disclaimer : I didn't read the presentation paper Greg Smith talked
about yet, nor his partitioning chapter yet, so it might be about the
same trick.

Phoenix Kiula <phoenix.kiula@gmail.com> writes:
> How about doing this with existing massive tables? (Over 120 million rows)
>
> I could create a new parent table with child tables, and then INSERT
> all these millions of rows to put them into the right partition. But
> is that recommended?

If you're partitioning by date, for example, then what I regularly do is
to consider the existing table to be the first partition with data from
origin to now.

Then what I do is to create a new parent table and is children, prepare
the trigger(s), etc.  The switch is then a light transaction which only
renames the current table to say "name_past_201104", have it inherits
the parent table, and finally rename the new parent table to the "name".

Later on you still can rejigger your data around if you wish.  With time
based partitioning it's best to wait until the old partition is not the
target of INSERTs or UPDATEs any more.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support