Thread: Possibilities of PgSQL
Hi, I have a question about PgSQL. I am working at some project and I want to have few tables with special properties: There would be many rows and these would be changed every day for many times. I want to be able to get know how did the row looked in given time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). Nevertheless it should work with diffs (f.e. if I had a row with 100MB string and I changed only 2 chars it should not require on disk 200MB but only 100MB + few (kilo)Bytes). Is there any possibility how to solve it? I have heard that I can do it with triggers (but the worst thing is how to implement diffs) but I hope that there is any other (easier) way. Karel Břinda
Karel Břinda wrote: > Hi, > > I have a question about PgSQL. I am working at some project and I want > to have few tables with special properties: > > There would be many rows and these would be changed every day for many > times. I want to be able to get know how did the row looked in given > time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). Nevertheless > it should work with diffs (f.e. if I had a row with 100MB string and I > changed only 2 chars it should not require on disk 200MB but only 100MB > + few (kilo)Bytes). 100MB of text in each row?? is each row a copy of the english dictionary? My thoughts are to have a table that records the changes, something along the lines of a timestamp of the change with substring start and end positions of the change with the before and after text that has changed that can be used to 'replay' the changes. When you want to rewind to a certain time you find all entries after the given time and undo the changes to get back to where it was. > Is there any possibility how to solve it? I have heard that I can do it > with triggers (but the worst thing is how to implement diffs) but I hope > that there is any other (easier) way. Triggers are the most reliable way to implement this, I would think that pl/perl may be the fastest way to implement it as a trigger. Basically you would need to compare before and after as the record was saved - doing that with 100MB of data will make saving changes somewhat slower. What sort of client is updating this data? I would think that the client could keep track of changes as they are typed and save this info with the updated data thus removing the time to compare the before and after data at the DB end. -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz
> 100MB of text in each row?? is each row a copy of the english dictionary? > No, that was only example. It will be web application (Python + PgSQL). I thought about table with records + table with changes but is it some easy way how to get differences between 2 strings? Shane Ambler píše v Čt 23. 08. 2007 v 22:21 +0930: > Karel Břinda wrote: > > Hi, > > > > I have a question about PgSQL. I am working at some project and I want > > to have few tables with special properties: > > > > There would be many rows and these would be changed every day for many > > times. I want to be able to get know how did the row looked in given > > time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). Nevertheless > > it should work with diffs (f.e. if I had a row with 100MB string and I > > changed only 2 chars it should not require on disk 200MB but only 100MB > > + few (kilo)Bytes). > > 100MB of text in each row?? is each row a copy of the english dictionary? > > My thoughts are to have a table that records the changes, something > along the lines of a timestamp of the change with substring start and > end positions of the change with the before and after text that has > changed that can be used to 'replay' the changes. > > When you want to rewind to a certain time you find all entries after the > given time and undo the changes to get back to where it was. > > > Is there any possibility how to solve it? I have heard that I can do it > > with triggers (but the worst thing is how to implement diffs) but I hope > > that there is any other (easier) way. > > Triggers are the most reliable way to implement this, I would think that > pl/perl may be the fastest way to implement it as a trigger. > > Basically you would need to compare before and after as the record was > saved - doing that with 100MB of data will make saving changes somewhat > slower. > > What sort of client is updating this data? > I would think that the client could keep track of changes as they are > typed and save this info with the updated data thus removing the time to > compare the before and after data at the DB end. > >
Karel Břinda wrote: >> 100MB of text in each row?? is each row a copy of the english > dictionary? > > No, that was only example. > > It will be web application (Python + PgSQL). I thought about table with > records + table with changes but is it some easy way how to get > differences between 2 strings? > Using a trigger will be the most reliable way for you to go. Unless you can find one that fits your needs (or gets you close) you will need to build your own. There is no built in functions that meet your needs. A quick search at pgfoundry shows Auditable Postgresql http://pgfoundry.org/projects/aupg/ this seems to have been started a while ago and been left but you may find it to be a good starting point. The developer that started it may even be able to help you to get it to meet your needs. How precise do you want your changes to be recorded? Just each character that has changed or the entire line that has changed? If the text in you are storing is say 200-2000 lines of 100-200 characters per line then recording the line before and after the changes may be fine. If you are developing with python then look at using python for the trigger function see http://www.postgresql.org/docs/8.1/interactive/plpython.html and the next couple of pages to get you started. Either you write a trigger that asks another program to generate the diff info for you that you can then record in your history table or you write a function that goes through line by line (or character by character) and records the differences. > > Shane Ambler pí¹e v Èt 23. 08. 2007 v 22:21 +0930: >> Karel Bøinda wrote: >>> Hi, >>> >>> I have a question about PgSQL. I am working at some project and I > want >>> to have few tables with special properties: >>> >>> There would be many rows and these would be changed every day for > many >>> times. I want to be able to get know how did the row looked in given >>> time (f.e. 2 day ago, 6 minutes ago, 2nd Sept. 2002,...). > Nevertheless >>> it should work with diffs (f.e. if I had a row with 100MB string and > I >>> changed only 2 chars it should not require on disk 200MB but only > 100MB >>> + few (kilo)Bytes). >> 100MB of text in each row?? is each row a copy of the english > dictionary? >> My thoughts are to have a table that records the changes, something >> along the lines of a timestamp of the change with substring start and >> end positions of the change with the before and after text that has >> changed that can be used to 'replay' the changes. >> >> When you want to rewind to a certain time you find all entries after > the >> given time and undo the changes to get back to where it was. >> >>> Is there any possibility how to solve it? I have heard that I can do > it >>> with triggers (but the worst thing is how to implement diffs) but I > hope >>> that there is any other (easier) way. >> Triggers are the most reliable way to implement this, I would think > that >> pl/perl may be the fastest way to implement it as a trigger. >> >> Basically you would need to compare before and after as the record > was >> saved - doing that with 100MB of data will make saving changes > somewhat >> slower. >> >> What sort of client is updating this data? >> I would think that the client could keep track of changes as they are >> typed and save this info with the updated data thus removing the time > to >> compare the before and after data at the DB end. >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Shane Ambler pgSQL@Sheeky.Biz Get Sheeky @ http://Sheeky.Biz