Thread: continuous copy/update one table to another
Hello, I am looking for a way to copy all the data from one table to another on a regular basis, every 5 minutes let's say. INSERT INTO table2 SELECT * FROM table1; The above will copy all the data as is and insert it into the other table. What happens if I rerun it again? Will it just append table1 again into table2? How can I have it only insert rows that are different? Would that be a program to lookup the most last record in table 1 and then a query to only select after that row for the insert into table2? Thanks!
Terry wrote: > Hello, > > I am looking for a way to copy all the data from one table to another > on a regular basis, every 5 minutes let's say. > > INSERT INTO table2 SELECT * FROM table1; > > The above will copy all the data as is and insert it into the other > table. What happens if I rerun it again? Will it just append table1 > again into table2? How can I have it only insert rows that are > different? Would that be a program to lookup the most last record in > table 1 and then a query to only select after that row for the insert > into table2? > > both tables should have a serial 'id' (or bigserial if you expect over 2 billion entries), and use something like ... insert into table2 select * from table1 as t1 where t1.id > (select max(t.id) from table2 as t); i haven't tested this but I think it should work. as long as id is indexed in both tables
On Sun, 28 Feb 2010 15:56:41 -0600, Terry wrote about [GENERAL] continuous copy/update one table to another: >Hello, > >I am looking for a way to copy all the data from one table to another >on a regular basis, every 5 minutes let's say. > >INSERT INTO table2 SELECT * FROM table1; > >The above will copy all the data as is and insert it into the other >table. What happens if I rerun it again? Will it just append table1 >again into table2? You will get key duplication errors. >How can I have it only insert rows that are different? Use an EXISTS predicate: INSERT INTO table2 SELECT * FROM table1 AS t1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE table2.key = t1.key); You will need to compare all fields in the key if it is a multi-column key. -- Regards, Dave [RLU #314465] ======================================================================= david.w.noon@ntlworld.com (David W Noon) =======================================================================
2010/2/28 John R Pierce <pierce@hogranch.com>
Terry wrote:both tables should have a serial 'id' (or bigserial if you expect over 2 billion entries), and use something like ...Hello,
I am looking for a way to copy all the data from one table to another
on a regular basis, every 5 minutes let's say.
INSERT INTO table2 SELECT * FROM table1;
The above will copy all the data as is and insert it into the other
table. What happens if I rerun it again? Will it just append table1
again into table2? How can I have it only insert rows that are
different? Would that be a program to lookup the most last record in
table 1 and then a query to only select after that row for the insert
into table2?
insert into table2 select * from table1 as t1 where t1.id > (select max(t.id) from table2 as t);
i haven't tested this but I think it should work. as long as id is indexed in both tables
Different doesn't mean that the id should be greater or lower, rather should be different. I'd rather do something like:
insert into table2 select * from table1 as t1 where not exists (select 42 from table2 as t2 where t2.id = t1.id);
of course assuming that the primary key is id;
Another problem is that it wouldn't copy changed records (this should rather be done using some triggers)
Szymon Guz
Szymon Guz wrote: > Different doesn't mean that the id should be greater or lower, rather > should be different. I'd rather do something like: indeed, my code assumed that records were only INSERT'd into table1 and never UPDATE or DELETE'd. my statement -did- have the advantage of being fast, at least assuming the id is an index on both tables. if you do update records, you could use a seperate SERIAL/BIGSERIAL field for this, which you update on your INSERT's, and use this bigserial for your inserts, but you'd need a UPSERT kind of function to handle duplicate primary keys. checking for deletions will be more difficult and more importantly, more time consuming as it will likely require multiple full table scans of both tables.
On Sun, Feb 28, 2010 at 6:29 PM, John R Pierce <pierce@hogranch.com> wrote: > Szymon Guz wrote: >> >> Different doesn't mean that the id should be greater or lower, rather >> should be different. I'd rather do something like: > > indeed, my code assumed that records were only INSERT'd into table1 and > never UPDATE or DELETE'd. my statement -did- have the advantage of being > fast, at least assuming the id is an index on both tables. if you do > update records, you could use a seperate SERIAL/BIGSERIAL field for this, > which you update on your INSERT's, and use this bigserial for your inserts, > but you'd need a UPSERT kind of function to handle duplicate primary keys. > > checking for deletions will be more difficult and more importantly, more > time consuming as it will likely require multiple full table scans of both > tables. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Good stuff. My rows won't change. I wrote a script to do all of the logic in perl but clearly won't be as efficient as what you guys did in 1 line. :)
On Sun, Feb 28, 2010 at 6:29 PM, John R Pierce <pierce@hogranch.com> wrote: > Szymon Guz wrote: >> >> Different doesn't mean that the id should be greater or lower, rather >> should be different. I'd rather do something like: > > indeed, my code assumed that records were only INSERT'd into table1 and > never UPDATE or DELETE'd. my statement -did- have the advantage of being > fast, at least assuming the id is an index on both tables. if you do > update records, you could use a seperate SERIAL/BIGSERIAL field for this, > which you update on your INSERT's, and use this bigserial for your inserts, > but you'd need a UPSERT kind of function to handle duplicate primary keys. > > checking for deletions will be more difficult and more importantly, more > time consuming as it will likely require multiple full table scans of both > tables. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > One more question. This is a pretty decent sized table. It is estimated to be 19,038,200 rows. That said, should I see results immediately pouring into the destination table while this is running?
Terry wrote: > One more question. This is a pretty decent sized table. It is > estimated to be 19,038,200 rows. That said, should I see results > immediately pouring into the destination table while this is running? > SQL transactions are atomic. you wont' see anything in the 'new' table until the INSERT finishes committing, then you'll see it all at once. you will see a fair amount of disk write activity while its running. 20M rows will take a while to run the first time, and probably a fair amount of memory.
On Sun, Feb 28, 2010 at 7:12 PM, John R Pierce <pierce@hogranch.com> wrote: > Terry wrote: >> >> One more question. This is a pretty decent sized table. It is >> estimated to be 19,038,200 rows. That said, should I see results >> immediately pouring into the destination table while this is running? >> > > SQL transactions are atomic. you wont' see anything in the 'new' table > until the INSERT finishes committing, then you'll see it all at once. > > you will see a fair amount of disk write activity while its running. 20M > rows will take a while to run the first time, and probably a fair amount of > memory. This is working very well. The initial load worked great. Took a little while but fine after that. I am using this: INSERT INTO client_logs SELECT * FROM clients_event_log as t1 where t1.ev_id > (select max(t.ev_id) from client_logs as t); However, I got lost in this little problem and overlooked another. I need to convert the unix time in the ev_time column to a timestamp. I have the idea with this little bit but not sure how to integrate it nicely: select timestamptz 'epoch' + 1267417261 * interval '1 second'
On Sun, Feb 28, 2010 at 10:23 PM, Terry <td3201@gmail.com> wrote: > On Sun, Feb 28, 2010 at 7:12 PM, John R Pierce <pierce@hogranch.com> wrote: >> Terry wrote: >>> >>> One more question. This is a pretty decent sized table. It is >>> estimated to be 19,038,200 rows. That said, should I see results >>> immediately pouring into the destination table while this is running? >>> >> >> SQL transactions are atomic. you wont' see anything in the 'new' table >> until the INSERT finishes committing, then you'll see it all at once. >> >> you will see a fair amount of disk write activity while its running. 20M >> rows will take a while to run the first time, and probably a fair amount of >> memory. > > This is working very well. The initial load worked great. Took a > little while but fine after that. I am using this: > INSERT INTO client_logs SELECT * FROM clients_event_log as t1 where > t1.ev_id > (select max(t.ev_id) from client_logs as t); > > However, I got lost in this little problem and overlooked another. I > need to convert the unix time in the ev_time column to a timestamp. I > have the idea with this little bit but not sure how to integrate it > nicely: > select timestamptz 'epoch' + 1267417261 * interval '1 second' > I love overcomplicating things: SELECT *,to_timestamp(ev_time) FROM clients_event_log as t1 where t1.ev_id > (select max(t.ev_id) from client_logs as t)
don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN.
W dniu 1 marca 2010 09:40 użytkownik Grzegorz Jaśkiewicz <gryzman@gmail.com> napisał:
don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN.
Right, LEFT JOIN should be faster (with proper indices even much faster).
Hi Terry, On Sunday 28 February 2010 22.56:41 Terry wrote: > I am looking for a way to copy all the data from one table to another > on a regular basis, every 5 minutes let's say. > > INSERT INTO table2 SELECT * FROM table1; Why do you want this? Is it necessary for the data in table2 to appear only delayed? I usually try to avoid all scheduled actions and try to use a "push" or "pull" model where data is replicated / generated when needed or when it is generated by the last step. So to replicate (a part of the) data into a 2nd table, I'd usually go for triggers, or try to avoid having the data duplication alltogether. (But obviously, I don't know what you're doing, this is just my opinion based on a feeling that the underlying problem you're solving might be solved in a more elegant way.) cheers -- vbi -- Pzat!
Attachment
Szymon Guz <mabewlun@gmail.com> writes: > W dniu 1 marca 2010 09:40 użytkownik Grzegorz Jaśkiewicz > <gryzman@gmail.com>napisał: >> don't use 'NOT EXISTS', as this will be damn slow. Use LEFT JOIN. > Right, LEFT JOIN should be faster (with proper indices even much faster). Converting NOT EXISTS into an outer join is a manual application of an optimization that Postgres can do for itself in 8.4 and up. So the above advice should only be correct for 8.3 or before. regards, tom lane