Thread: Possibilities of PgSQL

Possibilities of PgSQL

From

Karel Břinda

Date:

22 August 2007, 18:08:28

Hi,

I have a question about PgSQL. I am working at some project and I want
to have few tables with special properties:

There would be many rows and these would be changed every day for many
times. I want to be able to get know how did the row looked in given
time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). Nevertheless
it should work with diffs (f.e. if I had a row with 100MB string and I
changed only 2 chars it should not require on disk 200MB but only 100MB
+ few (kilo)Bytes).

Is there any possibility how to solve it? I have heard that I can do it
with triggers (but the worst thing is how to implement diffs) but I hope
that there is any other (easier) way.

Karel Břinda

Re: Possibilities of PgSQL

From

Shane Ambler

Date:

23 August 2007, 09:52:04

Karel Břinda wrote:
> Hi,
>
> I have a question about PgSQL. I am working at some project and I want
> to have few tables with special properties:
>
> There would be many rows and these would be changed every day for many
> times. I want to be able to get know how did the row looked in given
> time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). Nevertheless
> it should work with diffs (f.e. if I had a row with 100MB string and I
> changed only 2 chars it should not require on disk 200MB but only 100MB
> + few (kilo)Bytes).

100MB of text in each row?? is each row a copy of the english dictionary?

My thoughts are to have a table that records the changes, something
along the lines of a timestamp of the change with substring start and
end positions of the change with the before and after text that has
changed that can be used to 'replay' the changes.

When you want to rewind to a certain time you find all entries after the
given time and undo the changes to get back to where it was.

> Is there any possibility how to solve it? I have heard that I can do it
> with triggers (but the worst thing is how to implement diffs) but I hope
> that there is any other (easier) way.

Triggers are the most reliable way to implement this, I would think that
pl/perl may be the fastest way to implement it as a trigger.

Basically you would need to compare before and after as the record was
saved - doing that with 100MB of data will make saving changes somewhat
slower.

What sort of client is updating this data?
I would think that the client could keep track of changes as they are
typed and save this info with the updated data thus removing the time to
compare the before and after data at the DB end.

--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz

Re: Possibilities of PgSQL

From

Karel Břinda

Date:

24 August 2007, 05:12:54

> 100MB of text in each row?? is each row a copy of the english
dictionary?
>

No, that was only example.

It will be web application (Python + PgSQL). I thought about table with
records + table with changes but is it some easy way how to get
differences between 2 strings?



Shane Ambler píše v Čt 23. 08. 2007 v 22:21 +0930:
> Karel Břinda wrote:
> > Hi,
> >
> > I have a question about PgSQL. I am working at some project and I
want
> > to have few tables with special properties:
> >
> > There would be many rows and these would be changed every day for
many
> > times. I want to be able to get know how did the row looked in given
> > time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...).
Nevertheless
> > it should work with diffs (f.e. if I had a row with 100MB string and
I
> > changed only 2 chars it should not require on disk 200MB but only
100MB
> > + few (kilo)Bytes).
>
> 100MB of text in each row?? is each row a copy of the english
dictionary?
>
> My thoughts are to have a table that records the changes, something
> along the lines of a timestamp of the change with substring start and
> end positions of the change with the before and after text that has
> changed that can be used to 'replay' the changes.
>
> When you want to rewind to a certain time you find all entries after
the
> given time and undo the changes to get back to where it was.
>
> > Is there any possibility how to solve it? I have heard that I can do
it
> > with triggers (but the worst thing is how to implement diffs) but I
hope
> > that there is any other (easier) way.
>
> Triggers are the most reliable way to implement this, I would think
that
> pl/perl may be the fastest way to implement it as a trigger.
>
> Basically you would need to compare before and after as the record
was
> saved - doing that with 100MB of data will make saving changes
somewhat
> slower.
>
> What sort of client is updating this data?
> I would think that the client could keep track of changes as they are
> typed and save this info with the updated data thus removing the time
to
> compare the before and after data at the DB end.
>
>

Re: Possibilities of PgSQL

From

Shane Ambler

Date:

24 August 2007, 09:42:52

Karel Břinda wrote:
>> 100MB of text in each row?? is each row a copy of the english
> dictionary?
>
> No, that was only example.
>
> It will be web application (Python + PgSQL). I thought about table with
> records + table with changes but is it some easy way how to get
> differences between 2 strings?
>

Using a trigger will be the most reliable way for you to go. Unless you
can find one that fits your needs (or gets you close) you will need to
build your own.

There is no built in functions that meet your needs.

A quick search at pgfoundry shows Auditable Postgresql
http://pgfoundry.org/projects/aupg/
this seems to have been started a while ago and been left but you may
find it to be a good starting point. The developer that started it may
even be able to help you to get it to meet your needs.

How precise do you want your changes to be recorded? Just each character
that has changed or the entire line that has changed?

If the text in you are storing is say 200-2000 lines of 100-200
characters per line then recording the line before and after the changes
may be fine.

If you are developing with python then look at using python for the
trigger function see
http://www.postgresql.org/docs/8.1/interactive/plpython.html and the
next couple of pages to get you started.

Either you write a trigger that asks another program to generate the
diff info for you that you can then record in your history table or you
write a function that goes through line by line (or character by
character) and records the differences.

>
> Shane Ambler pí¹e v Èt 23. 08. 2007 v 22:21 +0930:
>> Karel Bøinda wrote:
>>> Hi,
>>>
>>> I have a question about PgSQL. I am working at some project and I
> want
>>> to have few tables with special properties:
>>>
>>> There would be many rows and these would be changed every day for
> many
>>> times. I want to be able to get know how did the row looked in given
>>> time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...).
> Nevertheless
>>> it should work with diffs (f.e. if I had a row with 100MB string and
> I
>>> changed only 2 chars it should not require on disk 200MB but only
> 100MB
>>> + few (kilo)Bytes).
>> 100MB of text in each row?? is each row a copy of the english
> dictionary?
>> My thoughts are to have a table that records the changes, something
>> along the lines of a timestamp of the change with substring start and
>> end positions of the change with the before and after text that has
>> changed that can be used to 'replay' the changes.
>>
>> When you want to rewind to a certain time you find all entries after
> the
>> given time and undo the changes to get back to where it was.
>>
>>> Is there any possibility how to solve it? I have heard that I can do
> it
>>> with triggers (but the worst thing is how to implement diffs) but I
> hope
>>> that there is any other (easier) way.
>> Triggers are the most reliable way to implement this, I would think
> that
>> pl/perl may be the fastest way to implement it as a trigger.
>>
>> Basically you would need to compare before and after as the record
> was
>> saved - doing that with 100MB of data will make saving changes
> somewhat
>> slower.
>>
>> What sort of client is updating this data?
>> I would think that the client could keep track of changes as they are
>> typed and save this info with the updated data thus removing the time
> to
>> compare the before and after data at the DB end.
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

--

Shane Ambler
pgSQL@Sheeky.Biz

Get Sheeky @ http://Sheeky.Biz