Thread: Continuous inserts...
Hi! I have an application, where I have to insert data into a table at several rows per second, 24 hours a day, 365 days a year. After some period (a week, maybe a month) the data will be reducted to some degree and deleted from the table. As far as I understood, I would have to use VACUUM to really free the table from deleted rows - but VACUUM (esp. on a table with several million rows) takes some time and prevents me from inserting new data. Now, I thought I could just rename the table, inserting into a temp table, and switch the tables back after VACUUMing. Ideally, this should work unnoticed (and thus without prog. effort) on the client (inserter) side. Question: would it work to use a transaction to perform the rename? i.e.: continuous insert into table 'main' from client. From somewhere else, execute: begin; alter table main rename to vac_main; create table main (...); end; would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed? I know, I could simulate such functionality in the client (inserter). But it seems more elegant this way... Greetings,Joerg +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+
Isn't easier to reduce the table every day and make a daily vacuum which only lasts a few seconds? Joerg Hessdoerfer wrote: > Hi! > > I have an application, where I have to insert data into a table at several > rows per second, 24 hours a day, 365 days a year. > > After some period (a week, maybe a month) the data will be reducted to some > degree and deleted from the table. > > As far as I understood, I would have to use VACUUM to really free the table > from deleted rows - but VACUUM (esp. on a table with several million rows) > takes some time and prevents me from inserting new data. > > Now, I thought I could just rename the table, inserting into a temp table, and > switch the tables back after VACUUMing. Ideally, this should work unnoticed > (and thus without prog. effort) on the client (inserter) side. > > Question: would it work to use a transaction to perform the rename? > > i.e.: continuous insert into table 'main' from client. > > From somewhere else, execute: > > begin; > alter table main rename to vac_main; > create table main (...); > end; > > would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed? > > I know, I could simulate such functionality in the client (inserter). But it > seems more elegant this way... > > Greetings, > Joerg > +------**** Science & Engineering Applications GmbH ****------+ > | | > | Joerg Hessdoerfer | > | Leading SW developer Phone: +49 (0)2203-962211 | > | S.E.A GmbH Fax: -962212 | > | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | > | http://www.sea-gmbh.com | > +---------------------------------------------------------------+
On Thu, 17 Aug 2000, Joerg Hessdoerfer wrote: > Hi! > > I have an application, where I have to insert data into a table at several > rows per second, 24 hours a day, 365 days a year. > > After some period (a week, maybe a month) the data will be reducted to some > degree and deleted from the table. > > As far as I understood, I would have to use VACUUM to really free the table > from deleted rows - but VACUUM (esp. on a table with several million rows) > takes some time and prevents me from inserting new data. > > Now, I thought I could just rename the table, inserting into a temp table, and > switch the tables back after VACUUMing. Ideally, this should work unnoticed > (and thus without prog. effort) on the client (inserter) side. > > Question: would it work to use a transaction to perform the rename? > > i.e.: continuous insert into table 'main' from client. > > From somewhere else, execute: > > begin; > alter table main rename to vac_main; > create table main (...); > end; > > would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed? Unfortunately -- no. Also, bad things can happen if the transaction errors since the rename happens immediately. There's been talk on -hackers about this subject in the past. However, you might be able to do something like this, but I'm not sure it'll work and it's rather wierd: Have three tables you work with, a and b and c Set up rule on a to change insert to insert on b. Insert into a. When you want to vacuum, change the rule to insert to c. Vacuum b Change rule back move rows from a and c into b vacuum c [you will slowly lose space in a, but it should only be an occasional row since you should only insert into a while you've deleted the insert rule to b, but haven't yet added the insert rule to c -- not too many rows here]
Hi All. Shouldn't Postgres block while vacuuming, and then continue inserting starting where it left off? Is the time lag too much? I am curious because I am going to build a similar app soon, basically parsing and inserting log file entries. W --- Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > > On Thu, 17 Aug 2000, Joerg Hessdoerfer wrote: > > > Hi! > > > > I have an application, where I have to insert data > into a table at several > > rows per second, 24 hours a day, 365 days a year. > > > > After some period (a week, maybe a month) the data > will be reducted to some > > degree and deleted from the table. > > > > As far as I understood, I would have to use VACUUM > to really free the table > > from deleted rows - but VACUUM (esp. on a table > with several million rows) > > takes some time and prevents me from inserting new > data. > > > > Now, I thought I could just rename the table, > inserting into a temp table, and > > switch the tables back after VACUUMing. Ideally, > this should work unnoticed > > (and thus without prog. effort) on the client > (inserter) side. > > > > Question: would it work to use a transaction to > perform the rename? > > > > i.e.: continuous insert into table 'main' from > client. > > > > From somewhere else, execute: > > > > begin; > > alter table main rename to vac_main; > > create table main (...); > > end; > > > > would the inserter notice this? Read: would ALL > inserts AT ANY TIME succeed? > > Unfortunately -- no. Also, bad things can happen if > the transaction > errors since the rename happens immediately. > There's been talk on > -hackers about this subject in the past. > > However, you might be able to do something like > this, but > I'm not sure it'll work and it's rather wierd: > > Have three tables you work with, a and b and c > > Set up rule on a to change insert to insert on b. > Insert into a. > When you want to vacuum, change the rule to insert > to c. > Vacuum b > Change rule back > move rows from a and c into b > vacuum c > > [you will slowly lose space in a, but it should only > be an occasional row since you should only insert > into > a while you've deleted the insert rule to b, but > haven't yet added the insert rule to c -- not too > many > rows here] > > __________________________________________________ Do You Yahoo!? Send instant messages & get email alerts with Yahoo! Messenger. http://im.yahoo.com/
Poul L. Christiansen writes: > Isn't easier to reduce the table every day and make a daily vacuum which only > lasts a few seconds? I doubt that it would last just a few seconds. From my experience, VACUUM on large tables can sap your I/O subsystem, slowing down overall performance for everyone else. Joerg, if this is a logging-type application, you may want to consider creating new tables periodically, e.g. rawdata_YYYY_MM or rawdata_WEEKNO and put a little more logic into your app to correctly name the table to perform the INSERT on. The rawdata_YYYY_MM tables should be created in advance, of course. You can then safely post-process last month's data, insert results into a much smaller postprocess_YYYY_MM table, then archive or drop rawdata_YYYY_MM altogether. Perhaps my suggestions are coloured by my experiences w/ 6.5, but this seems to be the safest way to do it without losing data. Alternately, you could log data to flat files, post-process and then INSERT into Postgres. Brian > Joerg Hessdoerfer wrote: > > > Hi! > > > > I have an application, where I have to insert data into a table at several > > rows per second, 24 hours a day, 365 days a year. > > > > After some period (a week, maybe a month) the data will be reducted to some > > degree and deleted from the table. > > > > As far as I understood, I would have to use VACUUM to really free the table > > from deleted rows - but VACUUM (esp. on a table with several million rows) > > takes some time and prevents me from inserting new data. > > > > Now, I thought I could just rename the table, inserting into a temp table, and > > switch the tables back after VACUUMing. Ideally, this should work unnoticed > > (and thus without prog. effort) on the client (inserter) side. > > > > Question: would it work to use a transaction to perform the rename? > > > > i.e.: continuous insert into table 'main' from client. > > > > From somewhere else, execute: > > > > begin; > > alter table main rename to vac_main; > > create table main (...); > > end; > > > > would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed? > > -- Brian Baquiran <brianb@edsamail.com> http://www.baquiran.com/ AIM: bbaquiran Work: +63(2)7182222 Home: +63(2) 9227123 I'm smarter than average. Therefore, average, to me, seems kind of stupid. People weren't purposely being stupid. It just came naturally. -- Bruce "Tog" Toganazzini
Hi! Thanks all for your input... At 09:15 17.08.00 -0700, you wrote: [...] > > Question: would it work to use a transaction to perform the rename? > > > > i.e.: continuous insert into table 'main' from client. > > > > From somewhere else, execute: > > > > begin; > > alter table main rename to vac_main; > > create table main (...); > > end; > > > > would the inserter notice this? Read: would ALL inserts AT ANY TIME > succeed? > >Unfortunately -- no. Also, bad things can happen if the transaction >errors since the rename happens immediately. There's been talk on >-hackers about this subject in the past. > >However, you might be able to do something like this, but >I'm not sure it'll work and it's rather wierd: > >Have three tables you work with, a and b and c > >Set up rule on a to change insert to insert on b. >Insert into a. >When you want to vacuum, change the rule to insert to c. >Vacuum b >Change rule back >move rows from a and c into b >vacuum c [...] Good idea - I immediately tested it - rules rule! That seems to work perfectly, and the client doesn't even see it happen (except for 'selects', one would have to setup a rule to return something meaningful then...). I did: Two tables, a and b. Normally, insert into a. When Vacuuming starts, create rule on a to insert into b Vacuum a drop rule copy records from b to a vacuum b Why did you suppose three tables? Did I overlook something? Greetings, Joerg +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+
hi! At 11:38 17.08.00 -0700, you wrote: >Hi All. > >Shouldn't Postgres block while vacuuming, and then >continue inserting starting where it left off? Is the >time lag too much? For me - yes. My app can accept some hundredes of ms time lag - not seconds or, like with a VACUUM, minutes (I've seen it taking hours, but that was a _LARGE_ table on 6.5.x). >I am curious because I am going to build a similar app >soon, basically parsing and inserting log file >entries. Rules do the trick - insert into a second table unnoticed by the inserter client. See my previous posting for slightly more detail. Greetings, Joerg +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+
I'm not familiar with rules. Could you please post the SQL for creating the rule that you've created? I going to make such a setup in the near future and this seems to . Joerg Hessdoerfer wrote: > Hi! > > Thanks all for your input... > > At 09:15 17.08.00 -0700, you wrote: > [...] > > > Question: would it work to use a transaction to perform the rename? > > > > > > i.e.: continuous insert into table 'main' from client. > > > > > > From somewhere else, execute: > > > > > > begin; > > > alter table main rename to vac_main; > > > create table main (...); > > > end; > > > > > > would the inserter notice this? Read: would ALL inserts AT ANY TIME > > succeed? > > > >Unfortunately -- no. Also, bad things can happen if the transaction > >errors since the rename happens immediately. There's been talk on > >-hackers about this subject in the past. > > > >However, you might be able to do something like this, but > >I'm not sure it'll work and it's rather wierd: > > > >Have three tables you work with, a and b and c > > > >Set up rule on a to change insert to insert on b. > >Insert into a. > >When you want to vacuum, change the rule to insert to c. > >Vacuum b > >Change rule back > >move rows from a and c into b > >vacuum c > [...] > > Good idea - I immediately tested it - rules rule! That seems to work perfectly, > and the client doesn't even see it happen (except for 'selects', one would > have to setup > a rule to return something meaningful then...). > > I did: > Two tables, a and b. > Normally, insert into a. > When Vacuuming starts, create rule on a to insert into b > Vacuum a > drop rule > copy records from b to a > vacuum b > > Why did you suppose three tables? Did I overlook something? > > Greetings, > Joerg > +------**** Science & Engineering Applications GmbH ****------+ > | | > | Joerg Hessdoerfer | > | Leading SW developer Phone: +49 (0)2203-962211 | > | S.E.A GmbH Fax: -962212 | > | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | > | http://www.sea-gmbh.com | > +---------------------------------------------------------------+
Hi! At 11:57 18.08.00 +0100, you wrote: >I'm not familiar with rules. Could you please post the SQL for creating >the rule >that you've created? Here we go (if memory serves ;-) create table a ( num int4, name text ); create table b ( num int4, name text ); rule to insert into b instead of a: CREATE RULE redirect AS ON insert TO a DO INSTEAD insert into b values ( new.num, new.name ); ... the INSTEAD is important! BTW: is it really necessary to list all fields in the instead part? Anyone? when finished vacuuming a, do a DROP RULE redirect; of course, when you have more/other fields in your table, you need to change rule's definition. Hope this helps, Joerg +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+
On Fri, 18 Aug 2000, Joerg Hessdoerfer wrote: > Good idea - I immediately tested it - rules rule! That seems to work perfectly, > and the client doesn't even see it happen (except for 'selects', one would > have to setup > a rule to return something meaningful then...). > > I did: > Two tables, a and b. > Normally, insert into a. > When Vacuuming starts, create rule on a to insert into b > Vacuum a > drop rule > copy records from b to a > vacuum b > > Why did you suppose three tables? Did I overlook something? I didn't try with vacuum, I just did a table lock and that seemed to still hang the inserts with two tables, so I figured maximum safety was adding the third table. If it works with two that's much cooler. Was this with real data or just a small test set?
Hi! At 08:18 18.08.00 -0700, you wrote: [...] >I didn't try with vacuum, I just did a table lock and that >seemed to still hang the inserts with two tables, so I figured >maximum safety was adding the third table. If it works with two >that's much cooler. Was this with real data or just a small test >set? It was a test set ... ~20000 records, *BUT* I found that postgres decides when it starts to use the rule - means, if you do continous inserts on the table and create the rule, there's a varying time until the rule applies. In my first tests, I re-connected the DB very often, and the the change seemed immediate. Any ideas on how to 'promote' the rules faster?!? Greetings, Joe -- +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+
Wierd, I've not seen that behavior really, although I've never done time sensitive stuff. It might be the time before the shared cache updates? Not sure really. If you do the rule inline with your inserts (rather than a second transaction) does it still wait? Stephan Szabo sszabo@bigpanda.com On Tue, 22 Aug 2000, Joerg Hessdoerfer wrote: > Hi! > > At 08:18 18.08.00 -0700, you wrote: > [...] > > >I didn't try with vacuum, I just did a table lock and that > >seemed to still hang the inserts with two tables, so I figured > >maximum safety was adding the third table. If it works with two > >that's much cooler. Was this with real data or just a small test > >set? > > It was a test set ... ~20000 records, *BUT* I found that postgres > decides when it starts to use the rule - means, if you do continous > inserts on the table and create the rule, there's a varying time until > the rule applies. In my first tests, I re-connected the DB very often, > and the the change seemed immediate. > > Any ideas on how to 'promote' the rules faster?!? >
Stephan Szabo wrote: > Wierd, I've not seen that behavior really, although I've never > done time sensitive stuff. It might be the time before the > shared cache updates? Not sure really. If you do the rule > inline with your inserts (rather than a second transaction) > does it still wait? Just jumping in not having followed the discussion. But... The rules applied to a table by the rewriter are taken out of the relation descriptor that is returned by heap_open() or heap_openr(). I haven't looked at the code, but pg_class only has a boolean telling if a class has rules or not. Could it be that adding more rules (or dropping just a few instead of all) doesn't update the pg_class tuple, thus the syscache for the table isn't invalidated and other backends continue to use the old information instead of rescanningpg_rewrite? Jan > > Stephan Szabo > sszabo@bigpanda.com > > On Tue, 22 Aug 2000, Joerg Hessdoerfer wrote: > > > Hi! > > > > At 08:18 18.08.00 -0700, you wrote: > > [...] > > > > >I didn't try with vacuum, I just did a table lock and that > > >seemed to still hang the inserts with two tables, so I figured > > >maximum safety was adding the third table. If it works with two > > >that's much cooler. Was this with real data or just a small test > > >set? > > > > It was a test set ... ~20000 records, *BUT* I found that postgres > > decides when it starts to use the rule - means, if you do continous > > inserts on the table and create the rule, there's a varying time until > > the rule applies. In my first tests, I re-connected the DB very often, > > and the the change seemed immediate. > > > > Any ideas on how to 'promote' the rules faster?!? > > > -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@Yahoo.com> writes: > I haven't looked at the code, but pg_class only has a boolean > telling if a class has rules or not. Could it be that adding > more rules (or dropping just a few instead of all) doesn't > update the pg_class tuple, thus the syscache for the table > isn't invalidated and other backends continue to use the old > information instead of rescanning pg_rewrite? This is done correctly in current sources --- see setRelhasrulesInRelation(). However I recall having dorked with that code not long ago, and I forget what it looked like before. Perhaps 7.0.* is broken in this respect? Would think people would have noticed, though. regards, tom lane
Hi! At 14:16 22.08.00 -0400, you wrote: >Jan Wieck <janwieck@Yahoo.com> writes: > > I haven't looked at the code, but pg_class only has a boolean > > telling if a class has rules or not. Could it be that adding > > more rules (or dropping just a few instead of all) doesn't > > update the pg_class tuple, thus the syscache for the table > > isn't invalidated and other backends continue to use the old > > information instead of rescanning pg_rewrite? > >This is done correctly in current sources --- see >setRelhasrulesInRelation(). However I recall having dorked with that >code not long ago, and I forget what it looked like before. Perhaps >7.0.* is broken in this respect? Would think people would have noticed, >though. > > regards, tom lane Perhaps I should have mentioned that the test was done using 6.5.3 - I didn't come around to upgrade yet... Oh, yes, the mentioned rule was the ONLY rule on that database. Greetings, Joerg -- +------**** Science & Engineering Applications GmbH ****------+ | | | Joerg Hessdoerfer | | Leading SW developer Phone: +49 (0)2203-962211 | | S.E.A GmbH Fax: -962212 | | D-51147 Koeln Internet: joerg.hessdoerfer@sea-gmbh.com | | http://www.sea-gmbh.com | +---------------------------------------------------------------+