Thread: Continuous inserts...

Continuous inserts...

From
Joerg Hessdoerfer
Date:
Hi!

I have an application, where I have to insert data into a table at several
rows per second, 24 hours a day, 365 days a year.

After some period (a week, maybe a month) the data will be reducted to some
degree and deleted from the table.

As far as I understood, I would have to use VACUUM to really free the table
from deleted rows - but VACUUM (esp. on a table with several million rows)
takes some time and prevents me from inserting new data.

Now, I thought I could just rename the table, inserting into a temp table, and
switch the tables back after VACUUMing. Ideally, this should work unnoticed
(and thus without prog. effort) on the client (inserter) side.

Question: would it work to use a transaction to perform the rename?

i.e.: continuous insert into table 'main' from client.
From somewhere else, execute:

begin;
alter table main rename to vac_main;
create table main (...);
end;

would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed?

I know, I could simulate such functionality in the client (inserter). But it
seems more elegant this way...

Greetings,Joerg
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+



Re: Continuous inserts...

From
"Poul L. Christiansen"
Date:
Isn't easier to reduce the table every day and make a daily vacuum which only
lasts a few seconds?

Joerg Hessdoerfer wrote:

> Hi!
>
> I have an application, where I have to insert data into a table at several
> rows per second, 24 hours a day, 365 days a year.
>
> After some period (a week, maybe a month) the data will be reducted to some
> degree and deleted from the table.
>
> As far as I understood, I would have to use VACUUM to really free the table
> from deleted rows - but VACUUM (esp. on a table with several million rows)
> takes some time and prevents me from inserting new data.
>
> Now, I thought I could just rename the table, inserting into a temp table, and
> switch the tables back after VACUUMing. Ideally, this should work unnoticed
> (and thus without prog. effort) on the client (inserter) side.
>
> Question: would it work to use a transaction to perform the rename?
>
> i.e.: continuous insert into table 'main' from client.
>
>  From somewhere else, execute:
>
> begin;
> alter table main rename to vac_main;
> create table main (...);
> end;
>
> would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed?
>
> I know, I could simulate such functionality in the client (inserter). But it
> seems more elegant this way...
>
> Greetings,
>         Joerg
> +------****  Science & Engineering Applications GmbH  ****------+
> |                                                               |
> | Joerg Hessdoerfer                                             |
> | Leading SW developer Phone:    +49 (0)2203-962211             |
> | S.E.A GmbH           Fax:                 -962212             |
> | D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
> |                                http://www.sea-gmbh.com        |
> +---------------------------------------------------------------+



Re: Continuous inserts...

From
Stephan Szabo
Date:
On Thu, 17 Aug 2000, Joerg Hessdoerfer wrote:

> Hi!
> 
> I have an application, where I have to insert data into a table at several
> rows per second, 24 hours a day, 365 days a year.
> 
> After some period (a week, maybe a month) the data will be reducted to some
> degree and deleted from the table.
> 
> As far as I understood, I would have to use VACUUM to really free the table
> from deleted rows - but VACUUM (esp. on a table with several million rows)
> takes some time and prevents me from inserting new data.
> 
> Now, I thought I could just rename the table, inserting into a temp table, and
> switch the tables back after VACUUMing. Ideally, this should work unnoticed
> (and thus without prog. effort) on the client (inserter) side.
> 
> Question: would it work to use a transaction to perform the rename?
> 
> i.e.: continuous insert into table 'main' from client.
> 
>  From somewhere else, execute:
> 
> begin;
> alter table main rename to vac_main;
> create table main (...);
> end;
> 
> would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed?

Unfortunately -- no.  Also, bad things can happen if the transaction
errors since the rename happens immediately.  There's been talk on 
-hackers about this subject in the past.

However, you might be able to do something like this, but
I'm not sure it'll work and it's rather wierd:

Have three tables you work with, a and b and c

Set up rule on a to change insert to insert on b.
Insert into a.
When you want to vacuum, change the rule to insert to c.
Vacuum b
Change rule back
move rows from a and c into b
vacuum c

[you will slowly lose space in a, but it should only
be an occasional row since you should only insert into
a while you've deleted the insert rule to b, but
haven't yet added the insert rule to c -- not too many
rows here]




Re: Continuous inserts...

From
Webb Sprague
Date:
Hi All.

Shouldn't Postgres block while vacuuming, and then
continue inserting starting where it left off?  Is the
time lag too much?

I am curious because I am going to build a similar app
soon, basically parsing and inserting log file
entries.

W 
--- Stephan Szabo <sszabo@megazone23.bigpanda.com>
wrote:
> 
> On Thu, 17 Aug 2000, Joerg Hessdoerfer wrote:
> 
> > Hi!
> > 
> > I have an application, where I have to insert data
> into a table at several
> > rows per second, 24 hours a day, 365 days a year.
> > 
> > After some period (a week, maybe a month) the data
> will be reducted to some
> > degree and deleted from the table.
> > 
> > As far as I understood, I would have to use VACUUM
> to really free the table
> > from deleted rows - but VACUUM (esp. on a table
> with several million rows)
> > takes some time and prevents me from inserting new
> data.
> > 
> > Now, I thought I could just rename the table,
> inserting into a temp table, and
> > switch the tables back after VACUUMing. Ideally,
> this should work unnoticed
> > (and thus without prog. effort) on the client
> (inserter) side.
> > 
> > Question: would it work to use a transaction to
> perform the rename?
> > 
> > i.e.: continuous insert into table 'main' from
> client.
> > 
> >  From somewhere else, execute:
> > 
> > begin;
> > alter table main rename to vac_main;
> > create table main (...);
> > end;
> > 
> > would the inserter notice this? Read: would ALL
> inserts AT ANY TIME succeed?
> 
> Unfortunately -- no.  Also, bad things can happen if
> the transaction
> errors since the rename happens immediately. 
> There's been talk on 
> -hackers about this subject in the past.
> 
> However, you might be able to do something like
> this, but
> I'm not sure it'll work and it's rather wierd:
> 
> Have three tables you work with, a and b and c
> 
> Set up rule on a to change insert to insert on b.
> Insert into a.
> When you want to vacuum, change the rule to insert
> to c.
> Vacuum b
> Change rule back
> move rows from a and c into b
> vacuum c
> 
> [you will slowly lose space in a, but it should only
> be an occasional row since you should only insert
> into
> a while you've deleted the insert rule to b, but
> haven't yet added the insert rule to c -- not too
> many
> rows here]
> 
> 


__________________________________________________
Do You Yahoo!?
Send instant messages & get email alerts with Yahoo! Messenger.
http://im.yahoo.com/


Re: Continuous inserts...

From
brianb-pgsql@edsamail.com
Date:
Poul L. Christiansen writes:

> Isn't easier to reduce the table every day and make a daily vacuum which only
> lasts a few seconds?

I doubt that it would last just a few seconds. From my experience, VACUUM
on large tables can sap your I/O subsystem, slowing down overall
performance for everyone else.

Joerg, if this is a logging-type application, you may want to consider
creating new tables periodically, e.g. rawdata_YYYY_MM or rawdata_WEEKNO
and put a little more logic into your app to correctly name the table to
perform the INSERT on. The rawdata_YYYY_MM tables should be created in
advance, of course. 

You can then safely post-process last month's data, insert results into a
much smaller postprocess_YYYY_MM table, then archive or drop
rawdata_YYYY_MM altogether.

Perhaps my suggestions are coloured by my experiences w/ 6.5, but this
seems to be the safest way to do it without losing data.

Alternately, you could log data to flat files, post-process and then INSERT
into Postgres.

Brian

> Joerg Hessdoerfer wrote:
> 
> > Hi!
> >
> > I have an application, where I have to insert data into a table at several
> > rows per second, 24 hours a day, 365 days a year.
> >
> > After some period (a week, maybe a month) the data will be reducted to some
> > degree and deleted from the table.
> >
> > As far as I understood, I would have to use VACUUM to really free the table
> > from deleted rows - but VACUUM (esp. on a table with several million rows)
> > takes some time and prevents me from inserting new data.
> >
> > Now, I thought I could just rename the table, inserting into a temp table, and
> > switch the tables back after VACUUMing. Ideally, this should work unnoticed
> > (and thus without prog. effort) on the client (inserter) side.
> >
> > Question: would it work to use a transaction to perform the rename?
> >
> > i.e.: continuous insert into table 'main' from client.
> >
> >  From somewhere else, execute:
> >
> > begin;
> > alter table main rename to vac_main;
> > create table main (...);
> > end;
> >
> > would the inserter notice this? Read: would ALL inserts AT ANY TIME succeed?
> >

--
Brian Baquiran <brianb@edsamail.com>
http://www.baquiran.com/  AIM: bbaquiran 
Work: +63(2)7182222       Home: +63(2) 9227123

I'm smarter than average. Therefore, average, to me, seems kind of stupid. 
People weren't purposely being stupid. It just came naturally.                             -- Bruce "Tog" Toganazzini


Re: Continuous inserts...

From
Joerg Hessdoerfer
Date:
Hi!

Thanks all for your input...

At 09:15 17.08.00 -0700, you wrote:
[...]
> > Question: would it work to use a transaction to perform the rename?
> >
> > i.e.: continuous insert into table 'main' from client.
> >
> >  From somewhere else, execute:
> >
> > begin;
> > alter table main rename to vac_main;
> > create table main (...);
> > end;
> >
> > would the inserter notice this? Read: would ALL inserts AT ANY TIME 
> succeed?
>
>Unfortunately -- no.  Also, bad things can happen if the transaction
>errors since the rename happens immediately.  There's been talk on
>-hackers about this subject in the past.
>
>However, you might be able to do something like this, but
>I'm not sure it'll work and it's rather wierd:
>
>Have three tables you work with, a and b and c
>
>Set up rule on a to change insert to insert on b.
>Insert into a.
>When you want to vacuum, change the rule to insert to c.
>Vacuum b
>Change rule back
>move rows from a and c into b
>vacuum c
[...]

Good idea - I immediately tested it - rules rule! That seems to work perfectly,
and the client doesn't even see it happen (except for 'selects', one would 
have to setup
a rule to return something meaningful then...).

I did:
Two tables, a and b.
Normally, insert into a.
When Vacuuming starts, create rule on a to insert into b
Vacuum a
drop rule
copy records from b to a
vacuum b

Why did you suppose three tables? Did I overlook something?

Greetings,        Joerg
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+



Re: Continuous inserts...

From
Joerg Hessdoerfer
Date:
hi!

At 11:38 17.08.00 -0700, you wrote:
>Hi All.
>
>Shouldn't Postgres block while vacuuming, and then
>continue inserting starting where it left off?  Is the
>time lag too much?

For me - yes. My app can accept some hundredes of ms time lag - not
seconds or, like with a VACUUM, minutes (I've seen it taking hours,
but that was a _LARGE_ table on 6.5.x).

>I am curious because I am going to build a similar app
>soon, basically parsing and inserting log file
>entries.

Rules do the trick - insert into a second table unnoticed by the inserter
client. See my previous posting for slightly more detail.

Greetings,        Joerg
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+



Re: Continuous inserts...

From
"Poul L. Christiansen"
Date:
I'm not familiar with rules. Could you please post the SQL for creating the rule
that you've created?

I going to make such a setup in the near future and this seems to .

Joerg Hessdoerfer wrote:

> Hi!
>
> Thanks all for your input...
>
> At 09:15 17.08.00 -0700, you wrote:
> [...]
> > > Question: would it work to use a transaction to perform the rename?
> > >
> > > i.e.: continuous insert into table 'main' from client.
> > >
> > >  From somewhere else, execute:
> > >
> > > begin;
> > > alter table main rename to vac_main;
> > > create table main (...);
> > > end;
> > >
> > > would the inserter notice this? Read: would ALL inserts AT ANY TIME
> > succeed?
> >
> >Unfortunately -- no.  Also, bad things can happen if the transaction
> >errors since the rename happens immediately.  There's been talk on
> >-hackers about this subject in the past.
> >
> >However, you might be able to do something like this, but
> >I'm not sure it'll work and it's rather wierd:
> >
> >Have three tables you work with, a and b and c
> >
> >Set up rule on a to change insert to insert on b.
> >Insert into a.
> >When you want to vacuum, change the rule to insert to c.
> >Vacuum b
> >Change rule back
> >move rows from a and c into b
> >vacuum c
> [...]
>
> Good idea - I immediately tested it - rules rule! That seems to work perfectly,
> and the client doesn't even see it happen (except for 'selects', one would
> have to setup
> a rule to return something meaningful then...).
>
> I did:
> Two tables, a and b.
> Normally, insert into a.
> When Vacuuming starts, create rule on a to insert into b
> Vacuum a
> drop rule
> copy records from b to a
> vacuum b
>
> Why did you suppose three tables? Did I overlook something?
>
> Greetings,
>          Joerg
> +------****  Science & Engineering Applications GmbH  ****------+
> |                                                               |
> | Joerg Hessdoerfer                                             |
> | Leading SW developer Phone:    +49 (0)2203-962211             |
> | S.E.A GmbH           Fax:                 -962212             |
> | D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
> |                                http://www.sea-gmbh.com        |
> +---------------------------------------------------------------+



Re: Continuous inserts...

From
Joerg Hessdoerfer
Date:
Hi!

At 11:57 18.08.00 +0100, you wrote:
>I'm not familiar with rules. Could you please post the SQL for creating 
>the rule
>that you've created?

Here we go (if memory serves ;-)

create table a ( num int4, name text );
create table b ( num int4, name text );

rule to insert into b instead of a:

CREATE RULE redirect AS ON insert TO a DO INSTEAD insert into b values ( 
new.num, new.name );

... the INSTEAD is important!
BTW: is it really necessary to list all fields in the instead part? Anyone?

when finished vacuuming a, do a

DROP RULE redirect;

of course, when you have more/other fields in your table, you need to 
change rule's definition.

Hope this helps,        Joerg
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+



Re: Continuous inserts...

From
Stephan Szabo
Date:
On Fri, 18 Aug 2000, Joerg Hessdoerfer wrote:
> Good idea - I immediately tested it - rules rule! That seems to work perfectly,
> and the client doesn't even see it happen (except for 'selects', one would 
> have to setup
> a rule to return something meaningful then...).
> 
> I did:
> Two tables, a and b.
> Normally, insert into a.
> When Vacuuming starts, create rule on a to insert into b
> Vacuum a
> drop rule
> copy records from b to a
> vacuum b
> 
> Why did you suppose three tables? Did I overlook something?

I didn't try with vacuum, I just did a table lock and that
seemed to still hang the inserts with two tables, so I figured 
maximum safety was adding the third table.  If it works with two 
that's much cooler.  Was this with real data or just a small test 
set? 



Re: Continuous inserts...

From
Joerg Hessdoerfer
Date:
Hi!

At 08:18 18.08.00 -0700, you wrote:
[...]

>I didn't try with vacuum, I just did a table lock and that
>seemed to still hang the inserts with two tables, so I figured
>maximum safety was adding the third table.  If it works with two
>that's much cooler.  Was this with real data or just a small test
>set?

It was a test set ... ~20000 records, *BUT* I found that postgres
decides when it starts to use the rule - means, if you do continous
inserts on the table and create the rule, there's a varying time until
the rule applies. In my first tests, I re-connected the DB very often,
and the the change seemed immediate.

Any ideas on how to 'promote' the rules faster?!?

Greetings,        Joe
--
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+



Re: Continuous inserts...

From
Stephan Szabo
Date:
Wierd, I've not seen that behavior really, although I've never
done time sensitive stuff.  It might be the time before the
shared cache updates?  Not sure really.  If you do the rule
inline with your inserts (rather than a second transaction)
does it still wait?

Stephan Szabo
sszabo@bigpanda.com

On Tue, 22 Aug 2000, Joerg Hessdoerfer wrote:

> Hi!
> 
> At 08:18 18.08.00 -0700, you wrote:
> [...]
> 
> >I didn't try with vacuum, I just did a table lock and that
> >seemed to still hang the inserts with two tables, so I figured
> >maximum safety was adding the third table.  If it works with two
> >that's much cooler.  Was this with real data or just a small test
> >set?
> 
> It was a test set ... ~20000 records, *BUT* I found that postgres
> decides when it starts to use the rule - means, if you do continous
> inserts on the table and create the rule, there's a varying time until
> the rule applies. In my first tests, I re-connected the DB very often,
> and the the change seemed immediate.
> 
> Any ideas on how to 'promote' the rules faster?!?
> 



Re: Continuous inserts...

From
Jan Wieck
Date:
Stephan Szabo wrote:
> Wierd, I've not seen that behavior really, although I've never
> done time sensitive stuff.  It might be the time before the
> shared cache updates?  Not sure really.  If you do the rule
> inline with your inserts (rather than a second transaction)
> does it still wait?
   Just jumping in not having followed the discussion. But...
   The rules applied to a table by the rewriter are taken out of   the relation descriptor that is returned  by
heap_open() or   heap_openr().
 
   I haven't looked at the code, but pg_class only has a boolean   telling if a class has rules or not. Could it be
that adding   more  rules  (or  dropping just a few instead of all) doesn't   update the pg_class tuple, thus the
syscache for  the  table   isn't  invalidated and other backends continue to use the old   information instead of
rescanningpg_rewrite?
 


Jan

>
> Stephan Szabo
> sszabo@bigpanda.com
>
> On Tue, 22 Aug 2000, Joerg Hessdoerfer wrote:
>
> > Hi!
> >
> > At 08:18 18.08.00 -0700, you wrote:
> > [...]
> >
> > >I didn't try with vacuum, I just did a table lock and that
> > >seemed to still hang the inserts with two tables, so I figured
> > >maximum safety was adding the third table.  If it works with two
> > >that's much cooler.  Was this with real data or just a small test
> > >set?
> >
> > It was a test set ... ~20000 records, *BUT* I found that postgres
> > decides when it starts to use the rule - means, if you do continous
> > inserts on the table and create the rule, there's a varying time until
> > the rule applies. In my first tests, I re-connected the DB very often,
> > and the the change seemed immediate.
> >
> > Any ideas on how to 'promote' the rules faster?!?
> >
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Continuous inserts...

From
Tom Lane
Date:
Jan Wieck <janwieck@Yahoo.com> writes:
>     I haven't looked at the code, but pg_class only has a boolean
>     telling if a class has rules or not. Could it be that  adding
>     more  rules  (or  dropping just a few instead of all) doesn't
>     update the pg_class tuple, thus the syscache  for  the  table
>     isn't  invalidated and other backends continue to use the old
>     information instead of rescanning pg_rewrite?

This is done correctly in current sources --- see
setRelhasrulesInRelation().  However I recall having dorked with that
code not long ago, and I forget what it looked like before.  Perhaps
7.0.* is broken in this respect?  Would think people would have noticed,
though.
        regards, tom lane


Re: Continuous inserts...

From
Joerg Hessdoerfer
Date:
Hi!

At 14:16 22.08.00 -0400, you wrote:
>Jan Wieck <janwieck@Yahoo.com> writes:
> >     I haven't looked at the code, but pg_class only has a boolean
> >     telling if a class has rules or not. Could it be that  adding
> >     more  rules  (or  dropping just a few instead of all) doesn't
> >     update the pg_class tuple, thus the syscache  for  the  table
> >     isn't  invalidated and other backends continue to use the old
> >     information instead of rescanning pg_rewrite?
>
>This is done correctly in current sources --- see
>setRelhasrulesInRelation().  However I recall having dorked with that
>code not long ago, and I forget what it looked like before.  Perhaps
>7.0.* is broken in this respect?  Would think people would have noticed,
>though.
>
>                         regards, tom lane

Perhaps I should have mentioned that the test was done using 6.5.3 - I didn't
come around to upgrade yet...

Oh, yes, the mentioned rule was the ONLY rule on that database.

Greetings,        Joerg
--
+------****  Science & Engineering Applications GmbH  ****------+
|                                                               |
| Joerg Hessdoerfer                                             |
| Leading SW developer Phone:    +49 (0)2203-962211             |
| S.E.A GmbH           Fax:                 -962212             |
| D-51147 Koeln        Internet: joerg.hessdoerfer@sea-gmbh.com |
|                                http://www.sea-gmbh.com        |
+---------------------------------------------------------------+