Re: best practice for moving millions of rows to child table when setting up partitioning? - Mailing list pgsql-admin

From Bob Lunney
Subject Re: best practice for moving millions of rows to child table when setting up partitioning?
Date
Msg-id 659395.83416.qm@web39708.mail.mud.yahoo.com
Whole thread Raw
In response to best practice for moving millions of rows to child table when setting up partitioning?  (Mark Stosberg <mark@summersault.com>)
Responses Re: best practice for moving millions of rows to child table when setting up partitioning?
List pgsql-admin
Mark,

Comments inline below for items 2 and 3.  What you are planning should work like a charm.

Bob Lunney

--- On Wed, 4/27/11, Mark Stosberg <mark@summersault.com> wrote:

> From: Mark Stosberg <mark@summersault.com>
> Subject: [ADMIN] best practice for moving millions of rows to child table when setting up partitioning?
> To: pgsql-admin@postgresql.org
> Date: Wednesday, April 27, 2011, 10:48 AM
>
> Hello,
>
> I'm working on moving a table with over 30 million to rows
> to be
> partitioned. The table seeing several inserts per second.
> It's
> essentially an activity log that only sees insert activity
> and is
> lightly used for reporting, such that queries against it
> can safely be
> disabled during a transition.
>
> I'm looking for recommendations for a way to do this that
> will be least
> disruptive to the flow of inserts statements that will
> continue to
> stream in.
>
> Here's the plan which seems best to me at the moment. Is it
> is
> reasonable?
>
> 1. Handling creating the empty/future partitions is easy. I
> have the
> code for this done already, and will make several
> partitions in advance
> of needing them.
>
> 2. To create the partitions that should have data moved
> from the parent,
> I'm thinking of creating them, and then before they are
> "live",
> using INSERT ... SELECT to fill them with data from the
> parent table.
> I'll run the INSERT first, and then add their indexes.
>

Use "create table as select ..." (CTAS) instead of creating the table, then inserting.  Since the table is created and
populatedatomically there is no need to log the inserts in WAL, and the operation is much faster. 

> 3. I will then install the trigger to redirect the inserts
> to the child
> table.
>

If possible, its better to have the code simply do inserts directly into the child table - after all, if the
partitioningis based on date, both the code and database know the date, so the code knows to which child table it
shouldwrite at any given moment. 

> 4. There will still be a relatively small number of new
> rows from the
> parent table to be deal with that came in after the INSERT
> from #2 was
> started, so a final INSERT .. SELECT statement will be made
> to copy the
> remaining rows.
>
> 5. Finally, I'll drop the indexes on the parent table and
> truncate it.
>
> Thanks for advice here. If there's a tutorial out there
> about this that
> I've missed, I'm happy to review it instead having it
> rehashed here.
>
> Thanks for the help!
>
>     Mark
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

pgsql-admin by date:

Previous
From: Mark Stosberg
Date:
Subject: Re: best practice for moving millions of rows to child table when setting up partitioning?
Next
From: "Fred Parkinson"
Date:
Subject: Re: Starting psql without a database?