Thread: continuous copy/update one table to another

continuous copy/update one table to another

From
Terry
Date:
Hello,

I am looking for a way to copy all the data from one table to another
on a regular basis, every 5 minutes let's say.

INSERT INTO table2 SELECT * FROM table1;

The above will copy all the data as is and insert it into the other
table.  What happens if I rerun it again?  Will it just append table1
again into table2?  How can I have it only insert rows that are
different?  Would that be a program to lookup the most last record in
table 1 and then a query to only select after that row for the insert
into table2?

Thanks!

Re: continuous copy/update one table to another

From
John R Pierce
Date:
Terry wrote:
> Hello,
>
> I am looking for a way to copy all the data from one table to another
> on a regular basis, every 5 minutes let's say.
>
> INSERT INTO table2 SELECT * FROM table1;
>
> The above will copy all the data as is and insert it into the other
> table.  What happens if I rerun it again?  Will it just append table1
> again into table2?  How can I have it only insert rows that are
> different?  Would that be a program to lookup the most last record in
> table 1 and then a query to only select after that row for the insert
> into table2?
>
>

both tables should have a serial 'id' (or bigserial if you expect over 2
billion entries), and use something like ...


       insert into table2  select * from table1 as t1 where t1.id >
(select max(t.id) from table2 as t);


i haven't tested this but I think it should work.  as long as id is
indexed in both tables



Re: continuous copy/update one table to another

From
David W Noon
Date:
On Sun, 28 Feb 2010 15:56:41 -0600, Terry wrote about [GENERAL]
continuous copy/update one table to another:

>Hello,
>
>I am looking for a way to copy all the data from one table to another
>on a regular basis, every 5 minutes let's say.
>
>INSERT INTO table2 SELECT * FROM table1;
>
>The above will copy all the data as is and insert it into the other
>table.  What happens if I rerun it again?  Will it just append table1
>again into table2?

You will get key duplication errors.

>How can I have it only insert rows that are different?

Use an EXISTS predicate:

INSERT INTO table2 SELECT * FROM table1 AS t1
  WHERE NOT EXISTS (SELECT * FROM table2 WHERE table2.key = t1.key);

You will need to compare all fields in the key if it is a multi-column
key.
--
Regards,

Dave  [RLU #314465]
=======================================================================
david.w.noon@ntlworld.com (David W Noon)
=======================================================================

Re: continuous copy/update one table to another

From
Szymon Guz
Date:
2010/2/28 John R Pierce <pierce@hogranch.com>
Terry wrote:
Hello,

I am looking for a way to copy all the data from one table to another
on a regular basis, every 5 minutes let's say.

INSERT INTO table2 SELECT * FROM table1;

The above will copy all the data as is and insert it into the other
table.  What happens if I rerun it again?  Will it just append table1
again into table2?  How can I have it only insert rows that are
different?  Would that be a program to lookup the most last record in
table 1 and then a query to only select after that row for the insert
into table2?

 

both tables should have a serial 'id' (or bigserial if you expect over 2 billion entries), and use something like ...

         insert into table2  select * from table1 as t1 where t1.id > (select max(t.id) from table2 as t);


i haven't tested this but I think it should work.  as long as id is indexed in both tables

 
Different doesn't mean that the id should be greater or lower, rather should be different. I'd rather do something like:

insert into table2 select * from table1 as t1 where not exists (select 42 from table2 as t2 where t2.id = t1.id);

of course assuming that the primary key is id;

Another problem is that it wouldn't copy changed records (this should rather be done using some triggers)

Szymon Guz

Re: continuous copy/update one table to another

From
John R Pierce
Date:
Szymon Guz wrote:
> Different doesn't mean that the id should be greater or lower, rather
> should be different. I'd rather do something like:

indeed, my code assumed that records were only INSERT'd into table1 and
never UPDATE or DELETE'd.  my statement -did- have the advantage of
being fast, at least assuming the id is an index on both tables.   if
you do update records, you could use a seperate SERIAL/BIGSERIAL field
for this, which you update on your INSERT's, and use this bigserial for
your inserts, but you'd need a UPSERT kind of function to handle
duplicate primary keys.

checking for deletions will be more difficult and more importantly, more
time consuming as it will likely require multiple full table scans of
both tables.



Re: continuous copy/update one table to another

From
Terry
Date:
On Sun, Feb 28, 2010 at 6:29 PM, John R Pierce <pierce@hogranch.com> wrote:
> Szymon Guz wrote:
>>
>> Different doesn't mean that the id should be greater or lower, rather
>> should be different. I'd rather do something like:
>
> indeed, my code assumed that records were only INSERT'd into table1 and
> never UPDATE or DELETE'd.  my statement -did- have the advantage of being
> fast, at least assuming the id is an index on both tables.   if you do
> update records, you could use a seperate SERIAL/BIGSERIAL field for this,
> which you update on your INSERT's, and use this bigserial for your inserts,
> but you'd need a UPSERT kind of function to handle duplicate primary keys.
>
> checking for deletions will be more difficult and more importantly, more
> time consuming as it will likely require multiple full table scans of both
> tables.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Good stuff.  My rows won't change.  I wrote a script to do all of the
logic in perl but clearly won't be as efficient as what you guys did
in 1 line.  :)

Re: continuous copy/update one table to another

From
Terry
Date:
On Sun, Feb 28, 2010 at 6:29 PM, John R Pierce <pierce@hogranch.com> wrote:
> Szymon Guz wrote:
>>
>> Different doesn't mean that the id should be greater or lower, rather
>> should be different. I'd rather do something like:
>
> indeed, my code assumed that records were only INSERT'd into table1 and
> never UPDATE or DELETE'd.  my statement -did- have the advantage of being
> fast, at least assuming the id is an index on both tables.   if you do
> update records, you could use a seperate SERIAL/BIGSERIAL field for this,
> which you update on your INSERT's, and use this bigserial for your inserts,
> but you'd need a UPSERT kind of function to handle duplicate primary keys.
>
> checking for deletions will be more difficult and more importantly, more
> time consuming as it will likely require multiple full table scans of both
> tables.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

One more question.  This is a pretty decent sized table.  It is
estimated to be 19,038,200 rows.  That said, should I see results
immediately pouring into the destination table while this is running?

Re: continuous copy/update one table to another

From
John R Pierce
Date:
Terry wrote:
> One more question.  This is a pretty decent sized table.  It is
> estimated to be 19,038,200 rows.  That said, should I see results
> immediately pouring into the destination table while this is running?
>

SQL transactions are atomic.   you wont' see anything in the 'new' table
until the INSERT finishes committing, then you'll see it all at once.

you will see a fair amount of disk write activity while its running.
20M rows will take a while to run the first time, and probably a fair
amount of memory.









Re: continuous copy/update one table to another

From
Terry
Date:
On Sun, Feb 28, 2010 at 7:12 PM, John R Pierce <pierce@hogranch.com> wrote:
> Terry wrote:
>>
>> One more question.  This is a pretty decent sized table.  It is
>> estimated to be 19,038,200 rows.  That said, should I see results
>> immediately pouring into the destination table while this is running?
>>
>
> SQL transactions are atomic.   you wont' see anything in the 'new' table
> until the INSERT finishes committing, then you'll see it all at once.
>
> you will see a fair amount of disk write activity while its running.   20M
> rows will take a while to run the first time, and probably a fair amount of
> memory.

This is working very well.  The initial load worked great.  Took a
little while but fine after that.  I am using this:
INSERT INTO client_logs SELECT * FROM clients_event_log as t1 where
t1.ev_id > (select max(t.ev_id) from client_logs as t);

However, I got lost in this little problem and overlooked another.  I
need to convert the unix time in the ev_time column to a timestamp.  I
have the idea with this little bit but not sure how to integrate it
nicely:
select timestamptz 'epoch' + 1267417261 * interval '1 second'

Re: continuous copy/update one table to another

From
Terry
Date:
On Sun, Feb 28, 2010 at 10:23 PM, Terry <td3201@gmail.com> wrote:
> On Sun, Feb 28, 2010 at 7:12 PM, John R Pierce <pierce@hogranch.com> wrote:
>> Terry wrote:
>>>
>>> One more question.  This is a pretty decent sized table.  It is
>>> estimated to be 19,038,200 rows.  That said, should I see results
>>> immediately pouring into the destination table while this is running?
>>>
>>
>> SQL transactions are atomic.   you wont' see anything in the 'new' table
>> until the INSERT finishes committing, then you'll see it all at once.
>>
>> you will see a fair amount of disk write activity while its running.   20M
>> rows will take a while to run the first time, and probably a fair amount of
>> memory.
>
> This is working very well.  The initial load worked great.  Took a
> little while but fine after that.  I am using this:
> INSERT INTO client_logs SELECT * FROM clients_event_log as t1 where
> t1.ev_id > (select max(t.ev_id) from client_logs as t);
>
> However, I got lost in this little problem and overlooked another.  I
> need to convert the unix time in the ev_time column to a timestamp.  I
> have the idea with this little bit but not sure how to integrate it
> nicely:
> select timestamptz 'epoch' + 1267417261 * interval '1 second'
>

I love overcomplicating things:
SELECT *,to_timestamp(ev_time) FROM clients_event_log as t1 where
t1.ev_id > (select max(t.ev_id) from client_logs as t)

Re: continuous copy/update one table to another

From
Grzegorz Jaśkiewicz
Date:
don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN.

Re: continuous copy/update one table to another

From
Szymon Guz
Date:

W dniu 1 marca 2010 09:40 użytkownik Grzegorz Jaśkiewicz <gryzman@gmail.com> napisał:
don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN.


Right, LEFT JOIN should be faster (with proper indices even much faster).

Re: continuous copy/update one table to another

From
Adrian von Bidder
Date:
Hi Terry,

On Sunday 28 February 2010 22.56:41 Terry wrote:
> I am looking for a way to copy all the data from one table to another
> on a regular basis, every 5 minutes let's say.
>
> INSERT INTO table2 SELECT * FROM table1;

Why do you want this?  Is it necessary for the data in table2 to appear only
delayed?

I usually try to avoid all scheduled actions and try to use a "push" or
"pull" model where data is replicated / generated when needed or when it is
generated by the last step.  So to replicate (a part of the) data into a 2nd
table, I'd usually go for triggers, or try to avoid having the data
duplication alltogether.

(But obviously, I don't know what you're doing, this is just my opinion
based on a feeling that the underlying problem you're solving might be
solved in a more elegant way.)

cheers
-- vbi

--
Pzat!

Attachment

Re: continuous copy/update one table to another

From
Tom Lane
Date:
Szymon Guz <mabewlun@gmail.com> writes:
> W dniu 1 marca 2010 09:40 użytkownik Grzegorz Jaśkiewicz
> <gryzman@gmail.com>napisał:
>> don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN.

> Right, LEFT JOIN should be faster (with proper indices even much faster).

Converting NOT EXISTS into an outer join is a manual application of an
optimization that Postgres can do for itself in 8.4 and up.  So the
above advice should only be correct for 8.3 or before.

            regards, tom lane