Re: [GENERAL] efficiently migrating 'old' data from one table to another - Mailing list pgsql-general

From Merlin Moncure
Subject Re: [GENERAL] efficiently migrating 'old' data from one table to another
Date
Msg-id CAHyXU0y5nY6Zp-mYANNbyOa0L=9HMVBbrhJaLq5HHcNeTenAEA@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] efficiently migrating 'old' data from one table to another  ("" <kbrannen@pwhome.com>)
List pgsql-general
On Fri, Jan 13, 2017 at 12:03 PM,  <kbrannen@pwhome.com> wrote:
> On Jan 12, 2017, Jonathan Vanasco <postgres@2xlp.com> wrote:
>>On Jan 12, 2017, at 5:52 PM, Merlin Moncure wrote:
>>
>>> On Thu, Jan 12, 2017 at 2:19 PM, btober@computer.org
>>> <btober@broadstripe.net> wrote:
>>>>
>>>> Review manual section 7.8.2. Data-Modifying Statements in WITH
>>>>
>>>>
>>>> https://www.postgresql.org/docs/9.6/static/queries-with.html
>>>
>>> this.
>>>
>>> with data as (delete from foo where ... returning * ) insert into
>>> foo_backup select * from data;
>>
>>Thanks, btober and merlin.  that's exactly what i want.
>
> To help you a little more, I just did this for a set of tables within the
> last week. :) The heart of the program is this sql:
>
>     my $Chunk_size = 10000;
>     my $Interval = 24;
>     my $sql = "
>         WITH
>             keys AS (
>                 SELECT $pk_column
>                 FROM $table
>                 WHERE $time_column < NOW() - '$Interval MONTHS'::INTERVAL
>                 ORDER BY $pk_column
>                 LIMIT $Chunk_size ),
>             data AS (
>                 DELETE FROM $table
>                 WHERE $pk_column <= (SELECT MAX($pk_column) FROM keys)
>                 RETURNING * )
>         INSERT INTO archive_$table SELECT * FROM data;";
>
> That's from Perl, but I suspect you can guess as to what each var should be for
> your application. You can set $Chunk_size to whatever you want. There is
> obviously a loop around that which executes until we get 0 rows, then we move
> on to the next table.
>
> The point of the chunks was to limit the impact on the production tables
> as we move data out of them. If you don't have that concern and want to do all
> rows at once then remove the LIMIT and ORDER BY.

FYI, although it's likely ok in this instance, directly inserting
table names without precaution is considered dubious and should be
avoided as practice.  SQL injection is a risk, and your code will fail
in the presence of unusual bug legal table names containing spaces.

For posterity handling this kind of action inside the database (via
plpgsql/EXECUTE) in order to leverage some internal routines,
especially quote_ident(), is generally a good idea.

merlin


pgsql-general by date:

Previous
From: "Peter J. Holzer"
Date:
Subject: Re: [GENERAL] Are new connection/security features in order, givenconnection pooling?
Next
From: "Christopher J. Bottaro"
Date:
Subject: [GENERAL] Streaming replication protocol