Thread: diff-/patch-functionality for text-type data inside PostgreSQL
Hi! I want to implement a versioning system for text columns of a table inside a PostgreSQL 8.3 database. As some of the changes to a text field might be very small compared to the total text size, I'd prefer storing diffs instead of full previous versions of the text and use a patch-like function whenever I want to actually roll back to a certain version. I know that I could probably handle this quite easily in the application code, but I'd prefer some solution running on the database itself, so that the application wouldn't have to know anything about storing the diffs, instead that process would be handled by a ON UPDATE trigger. So far I have been playing around with PL/PerlU for diff/path functionality, using CPAN modules Text::Diff and Text::Patch, but haven't been too successful, as there seems to be some issue with this mechanism if the text data doesn't contain newlines. Just as an off-topic info, because it's some issue with the CPAN modules, not with PostgreSQL: #!/usr/bin/perl use Text::Patch; use Text::Diff; $src = "foo sdffasd"; $dst = "34asd sdf"; $diff = diff( \$src, \$dst, { STYLE => 'Unified' } ); print $diff . "\n"; $out = patch( $src, $diff, { STYLE => 'Unified' } ); print "Patch successful\n" if $out eq $dst; Running this results in the following output: @@ -1 +1 @@ -foo sdffasd+34asd sdf Hunk #1 failed at line 1. Anyway, has anybody already done something in this direction, preferrably in some way that is purely pl/* and wouldn't require any custom-made C-library? So far I have only found this interesting description of the implementation of the very same functionality here: http://www.ciselant.de/projects/pg_ci_diff/doc.html - but there's no source code supplied for the libpg_ci_diff.so library. Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Furth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jurg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
"Markus Wollny" <Markus.Wollny@computec.de> writes: > So far I have been playing around with PL/PerlU for diff/path > functionality, using CPAN modules Text::Diff and Text::Patch, but > haven't been too successful, as there seems to be some issue with this > mechanism if the text data doesn't contain newlines. Almost all diff/patch functions operate line-by-line, so that hardly seems surprising. regards, tom lane
Re: diff-/patch-functionality for text-type data inside PostgreSQL
From
Martijn van Oosterhout
Date:
On Mon, May 04, 2009 at 12:26:13PM +0200, Markus Wollny wrote: > So far I have been playing around with PL/PerlU for diff/path > functionality, using CPAN modules Text::Diff and Text::Patch, but > haven't been too successful, as there seems to be some issue with this > mechanism if the text data doesn't contain newlines. Just as an > off-topic info, because it's some issue with the CPAN modules, not with > PostgreSQL: I've used the Algorithm::Diff module in the past with success. It works on sequences of objects rather than just text but it works well. That means you can diff on word or character level at your choice, and even control what sequences you consider "equal". That said, it doesn't have a patch function but that should be fairly easy to make. You'll need to define your own storage format for the diff though. http://search.cpan.org/~nedkonz/Algorithm-Diff-1.15/lib/Algorithm/Diff.pm Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Hi! > -----Ursprüngliche Nachricht----- > Von: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Gesendet: Montag, 4. Mai 2009 15:04 > "Markus Wollny" <Markus.Wollny@computec.de> writes: > > So far I have been playing around with PL/PerlU for diff/path > > functionality, using CPAN modules Text::Diff and Text::Patch, but > > haven't been too successful, as there seems to be some > issue with this > > mechanism if the text data doesn't contain newlines. > > Almost all diff/patch functions operate line-by-line, so that > hardly seems surprising. Not so much surprising, no, but I hadn't expected it to fail altogether on entries that just end after one line of text justbecause they lack a newline character - they are a one line text after all, so I assumed that the diff would producea "replace this old line with the new one" type of instruction instead of producing something that patch doesn't seemto be able to process at all. Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276
Hi! > -----Ursprüngliche Nachricht----- > Von: Martijn van Oosterhout [mailto:kleptog@svana.org] > Gesendet: Montag, 4. Mai 2009 15:30 > I've used the Algorithm::Diff module in the past with > success. It works on sequences of objects rather than just > text but it works well. That means you can diff on word or > character level at your choice, and even control what > sequences you consider "equal". That said, it doesn't have a > patch function but that should be fairly easy to make. You'll > need to define your own storage format for the diff though. > > http://search.cpan.org/~nedkonz/Algorithm-Diff-1.15/lib/Algori > thm/Diff.pm Thank you - I have considered using Algorithm::Diff before, as Text::Diff seems to be based on that and one could, as youdescribe, create even more granular deltas than just line by line comparisons. The latter would however be fully sufficientfor my needs, but I think I'll give Algorithm::Diff a closer look now :) Kind regards Markus Computec Media AG Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818) Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch Vorsitzender des Aufsichtsrates: Jürg Marquard Umsatzsteuer-Identifikationsnummer: DE 812 575 276