Thread: Partitioning an existing table
Hi. The partitioning documentation in PG is very clear on how to partition a new table. Create child tables, and have triggers that manage INSERT, UPDATE and DELETE commands. How about doing this with existing massive tables? (Over 120 million rows) I could create a new parent table with child tables, and then INSERT all these millions of rows to put them into the right partition. But is that recommended? Thanks
2011/4/25 Phoenix Kiula <phoenix.kiula@gmail.com>: > Hi. > > The partitioning documentation in PG is very clear on how to partition > a new table. Create child tables, and have triggers that manage > INSERT, UPDATE and DELETE commands. > > How about doing this with existing massive tables? (Over 120 million rows) > > I could create a new parent table with child tables, and then INSERT > all these millions of rows to put them into the right partition. But > is that recommended? It will be faster to INSERT directly in the good child tables and you can make multiple INSERT in parrallel (depend of your IO system and number of tables) > > Thanks > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
INSERT INTO child SELECT FROM parent WHERE ....; Once you have fully allocated all the records on the parent table you can then: TRUNCATE parent; Then install the triggers. The only real downside is that the queries are going to take forever to run. If you can naturally sub-divide the child tables (say into timestamp ranges) you might try moving one month at a time just to keep your eye on query progress. Make use of functions to perform the actual INSERT INTO SELECT; and probably have it capture and log statistics regarding each "INSERT INTO SELECT" statement that is fired (data start time, data end time, target child, number of records, execution start/end times). David J. -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Cédric Villemain Sent: Monday, April 25, 2011 6:58 AM To: Phoenix Kiula Cc: PG-General Mailing List Subject: Re: [GENERAL] Partitioning an existing table 2011/4/25 Phoenix Kiula <phoenix.kiula@gmail.com>: > Hi. > > The partitioning documentation in PG is very clear on how to partition > a new table. Create child tables, and have triggers that manage > INSERT, UPDATE and DELETE commands. > > How about doing this with existing massive tables? (Over 120 million > rows) > > I could create a new parent table with child tables, and then INSERT > all these millions of rows to put them into the right partition. But > is that recommended? It will be faster to INSERT directly in the good child tables and you can make multiple INSERT in parrallel (depend of your IO system and number of tables) > > Thanks > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To > make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, Apr 25, 2011 at 6:46 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?
I did this twice (several years ago). Basically, you create your partitions and set up the necessary triggers you want (I re-write the app to insert directly into the correct partition). Then all new data starts going into the partitions. Next, write a program that loops over the current master table, and moves the data into each partition some small hunk at a time, in a transaction. This can take a long time. For us, it took about 7 days to move O(100m) rows. Then, when you're done, truncate the master table, and enforce that no new data is allowed to be inserted into it.
If you can, of course, try this out on a spare copy of that table.
On Mon, Apr 25, 2011 at 7:40 PM, Vick Khera <vivek@khera.org> wrote:
On Mon, Apr 25, 2011 at 6:46 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?I did this twice (several years ago). Basically, you create your partitions and set up the necessary triggers you want (I re-write the app to insert directly into the correct partition). Then all new data starts going into the partitions. Next, write a program that loops over the current master table, and moves the data into each partition some small hunk at a time, in a transaction. This can take a long time. For us, it took about 7 days to move O(100m) rows. Then, when you're done, truncate the master table, and enforce that no new data is allowed to be inserted into it.If you can, of course, try this out on a spare copy of that table.
COPY command would be the best and fast way to do bulk inserts. Even for partitioning table from the base table,
1. Send all the data to .csv file with COPY TO command of the base table(which will undergo for partition).
2. Create the partition setup with TRIGGER's
3. Use COPY FROM command for inserting data into partition table.
This approach will fasten the inserts. Increasing the memory will also help.
Note: RULE based approach wont work with COPY command.
Best Regards,
Raghavendra
EnterpriseDB Corporation
The Enterprise Postgres Company
On Mon, Apr 25, 2011 at 10:53 AM, Raghavendra <raghavendra.rao@enterprisedb.com> wrote:
1. Send all the data to .csv file with COPY TO command of the base table(which will undergo for partition).2. Create the partition setup with TRIGGER's3. Use COPY FROM command for inserting data into partition table.
so now when your running application goes to query the table, it gets doubles? if you do it in transactions, then how long are you going to cause the master table to be locked when doing such a bulk delete?
my point is to minimize service interruption, and that means moving small hunks at a time to minimize the locks needed.
so now when your running application goes to query the table, it gets doubles? if you do it in transactions, then how long are you going to cause the master table to be locked when doing such a bulk delete?my point is to minimize service interruption, and that means moving small hunks at a time to minimize the locks needed.
Agreed, if you are pointing to the application..
The partitioning documentation in PG is very clear on how to partition
a new table. Create child tables, and have triggers that manage
INSERT, UPDATE and DELETE commands.
How about doing this with existing massive tables? (Over 120 million rows)
I could create a new parent table with child tables, and then INSERT
all these millions of rows to put them into the right partition. But
is that recommended?
Here, I would go with COPY command rather than INSERT. Firstly, setup the partition/child tables with relevant triggers and calling function on it. Use COPY FROM command pointing to parent table by calling the .csv file(created on MASSIVE table). Triggers will push the data to the respective child tables. Faster and efficient way.
Best Regards,
Raghavendra
EnterpriseDB Corporation
The Enterprise Postgres Company
On 04/25/2011 10:10 AM, Vick Khera wrote: > Basically, you create your partitions and set up the necessary > triggers you want (I re-write the app to insert directly into the > correct partition). Then all new data starts going into the > partitions. Next, write a program that loops over the current master > table, and moves the data into each partition some small hunk at a > time, in a transaction. This can take a long time. For us, it took > about 7 days to move O(100m) rows. Then, when you're done, truncate > the master table, and enforce that no new data is allowed to be > inserted into it. Vick's presentation at http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf is still one of the best case studies of how to do this sort of migration around. I was inspired by several of the ideas there when doing the partitioning chapter of my book, which is the only place I'm aware of covering this in even more detail than his case study. Cedric's idea for how to do this even more aggressively (multiple workers) is what you want if this is a one-time operation you're taking the system down for. In other situations, the gradual migration strategy Vick is suggesting is more appropriate. Some of the other ideas suggested in this thread won't work at all, so be careful who you listen to here. You can't leave copies of the data in the parent and put it into the child partition without all sorts of potential downsides. And you really, really want to do this as a proper database transaction, which is easiest to express using INSERT instead of COPY. If any step of the migration goes wrong, being able to do ROLLBACK and undo the recent bad steps is vital. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On Tue, Apr 26, 2011 at 8:28 PM, Greg Smith <greg@2ndquadrant.com> wrote: > On 04/25/2011 10:10 AM, Vick Khera wrote: >> >> Basically, you create your partitions and set up the necessary triggers >> you want (I re-write the app to insert directly into the correct partition). >> Then all new data starts going into the partitions. Next, write a program >> that loops over the current master table, and moves the data into each >> partition some small hunk at a time, in a transaction. This can take a long >> time. For us, it took about 7 days to move O(100m) rows. Then, when you're >> done, truncate the master table, and enforce that no new data is allowed to >> be inserted into it. > > Vick's presentation at > http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf Thanks everyone for the excellent suggestions. Vick/Greg, thanks in particular for this reference. The doc gives me ideas for other things too!
On Thu, Apr 28, 2011 at 12:12 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
> Vick's presentation at
> http://cdn.mailermailer.com/documents/PGCon2008TablePartitioningCaseStudy.pdf
Thanks everyone for the excellent suggestions.
Vick/Greg, thanks in particular for this reference. The doc gives me
ideas for other things too!
I've been trying to get the OSCON folk to accept this talk for several years now, to reach a wider audience. Seems they don't like me... :(
Thanks everyone for the excellent suggestions.
Vick/Greg, thanks in particular for this reference. The doc gives me
ideas for other things too!
+1
I've been trying to get the OSCON folk to accept this talk for several years now, to reach a wider audience. Seems they don't like me... :(
Seems they missed one valid presentation :)
Best Regards,
Vick Khera wrote: > I've been trying to get the OSCON folk to accept this talk for several > years now, to reach a wider audience. Seems they don't like me... :( Too specific for OSCON. It's not you, it's the topic. I've tried submitting far more generic things than that, but still with a PostgreSQL tilt to them, and I never get those accepted either. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
disclaimer : I didn't read the presentation paper Greg Smith talked about yet, nor his partitioning chapter yet, so it might be about the same trick. Phoenix Kiula <phoenix.kiula@gmail.com> writes: > How about doing this with existing massive tables? (Over 120 million rows) > > I could create a new parent table with child tables, and then INSERT > all these millions of rows to put them into the right partition. But > is that recommended? If you're partitioning by date, for example, then what I regularly do is to consider the existing table to be the first partition with data from origin to now. Then what I do is to create a new parent table and is children, prepare the trigger(s), etc. The switch is then a light transaction which only renames the current table to say "name_past_201104", have it inherits the parent table, and finally rename the new parent table to the "name". Later on you still can rejigger your data around if you wish. With time based partitioning it's best to wait until the old partition is not the target of INSERTs or UPDATEs any more. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support