Thread: help with data recovery from injected UPDATE

help with data recovery from injected UPDATE

From
Gus Gutoski
Date:
Hi,

I'm a noob who failed to properly sanitize incoming data from the
front end.  As a result, a poor hapless user managed to smuggle in a
malicious UPDATE statement that corrupted every single record in a
70000+ table.  Only 3 fields were corrupted and of those only one is
vital.  But it's REALLY vital.

I don't expect there's anything anyone can do, but I've been advised
that some subscribers to this list are miracle-workers, so it's worth
a shot.

Here's how it happened.  A typical update statement from the front end
has the form
UPDATE collections SET foreign_id=2, coin=50-30, bills=10+20 WHERE
entry_date='2009-09-09';

The hapless user accidentally included TWO minus signs in one entry,
so the statement looked like this:
UPDATE collections SET foreign_id=2, coin=50--30, bills=10+20 WHERE
entry_date='2009-09-09';

(These examples are simplified for the sake of brevity in this message.)

Of course, the double minus sign comments out the rest of the line and
the statement is left dangling, looking for a terminating semicolon.

Now, my front-end happens to be Visual Basic 6.0 (yeah, I know) via
ActiveX Data Objects (ADO).  In particular, the SQL statement is
invoked via the ADO Recordset object's Open() method.  It appears that
this Open() method automagically terminates unfinished statements,
because the above statement *actually executes* in postgres when
invoked form the VB front end.

Naturally then, *every* record in the database has its "foreign_id"
field set to 2 and its "coin" field set to 50.  I *really* need to
recover that "foreign_id" field.  (As its name suggests, that field is
a foreign key into a different table.)

Here's some more info.  As I'm a noob, I don't know what all to
include here -- please ask for more info if you need it.

psql version() returns
PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)

OS is Windows XP (I think -- possibly Vista.  I'll check next time I'm
at that machine).

Yours in need,

Gus

Re: help with data recovery from injected UPDATE

From
Tom Lane
Date:
Gus Gutoski <shared.entanglement@gmail.com> writes:
> Naturally then, *every* record in the database has its "foreign_id"
> field set to 2 and its "coin" field set to 50.  I *really* need to
> recover that "foreign_id" field.  (As its name suggests, that field is
> a foreign key into a different table.)

Well, in principle you could roll back that transaction, as long as the
old versions of the rows are still there and haven't been vacuumed away.
This being 8.1, if you haven't turned on autovacuum there is some chance
of that.  Shut down any manual vacuuming setup you may have *NOW*,
and then go trawl the pgsql archives for discussions of data recovery.
You most likely will need to find or write a special-purpose program
to change the xmin/xmax fields on the modified rows.

            regards, tom lane

Re: help with data recovery from injected UPDATE

From
Vick Khera
Date:
On Wed, Jun 10, 2009 at 12:49 PM, Gus
Gutoski<shared.entanglement@gmail.com> wrote:
> Of course, the double minus sign comments out the rest of the line and
> the statement is left dangling, looking for a terminating semicolon.

SQL statements are not terminated with semi-colons.  The semi-colon is
used in the psql shell to indicate that you're done typing your
statement and to execute it.  The API just submits the string you give
it to the executor.

Re: help with data recovery from injected UPDATE

From
Merlin Moncure
Date:
On Wed, Jun 10, 2009 at 12:49 PM, Gus
Gutoski<shared.entanglement@gmail.com> wrote:
> Hi,
>
> I'm a noob who failed to properly sanitize incoming data from the
> front end.  As a result, a poor hapless user managed to smuggle in a
> malicious UPDATE statement that corrupted every single record in a
> 70000+ table.  Only 3 fields were corrupted and of those only one is
> vital.  But it's REALLY vital.
>
> I don't expect there's anything anyone can do, but I've been advised
> that some subscribers to this list are miracle-workers, so it's worth
> a shot.
>
> Here's how it happened.  A typical update statement from the front end
> has the form
> UPDATE collections SET foreign_id=2, coin=50-30, bills=10+20 WHERE
> entry_date='2009-09-09';
>
> The hapless user accidentally included TWO minus signs in one entry,
> so the statement looked like this:
> UPDATE collections SET foreign_id=2, coin=50--30, bills=10+20 WHERE
> entry_date='2009-09-09';
>
> (These examples are simplified for the sake of brevity in this message.)
>
> Of course, the double minus sign comments out the rest of the line and
> the statement is left dangling, looking for a terminating semicolon.
>
> Now, my front-end happens to be Visual Basic 6.0 (yeah, I know) via
> ActiveX Data Objects (ADO).  In particular, the SQL statement is
> invoked via the ADO Recordset object's Open() method.  It appears that
> this Open() method automagically terminates unfinished statements,
> because the above statement *actually executes* in postgres when
> invoked form the VB front end.
>
> Naturally then, *every* record in the database has its "foreign_id"
> field set to 2 and its "coin" field set to 50.  I *really* need to
> recover that "foreign_id" field.  (As its name suggests, that field is
> a foreign key into a different table.)
>
> Here's some more info.  As I'm a noob, I don't know what all to
> include here -- please ask for more info if you need it.
>
> psql version() returns
> PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
> 3.4.2 (mingw-special)

If you have not done so already, immediately shut down the database,
and make a full filesystem copy of it (two better).  On windows, iirc
this in postgresql/$pgversion/data by default.  Following that you are
in for a tough slog, depending on how proficient you are with manually
setting up the database...

does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables
coiumn? if so, I'd check that to see if vacuum was fired since the
'incident'.

plus, there is no way you are escaping the obligatory 'where are your
backups?'.  :-).   postgresql  8.1 supports pitr archiving.  you can
do continuous backups and restore the database to just before the bad
data.

merlin

Re: help with data recovery from injected UPDATE

From
Gus Gutoski
Date:
Thanks for the replies.

Tom Lane wrote:
> This being 8.1, if you haven't turned on autovacuum there is some chance
> of that.

Unfortunately, autovacuum was on.  I don't recall ever turning it on,
but this database is over two years old; it's possible that I blindly
followed advice from pgAdmin or something way back when.

Merlin Moncure wrote:
> does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables
> coiumn?

I did not see a column called last_vacuum in the pg_stat_all_tables table.

> postgresql  8.1 supports pitr archiving.  you can
> do continuous backups and restore the database to just before the bad
> data.

I'm learning about this now.  I'm cautiously optimistic, as my pg_xlog
directory contains some files whose timestamp is near the time of the
'incident'.

By "backup" do you mean the contents of a pg_dump?  The most recent
dump was two months ago.  I'm worried that my log files might not go
far enough back in time to restore the table from the most recent
dump.

Both Tom's and Merlin's suggestions carry a significant learning
curve.  I'll do what I can in the coming days and post to the list if
anything noteworthy happens.

> plus, there is no way you are escaping the obligatory 'where are your
> backups?'.  :-).

It's a classic story.  I'm volunteering about one day per month for
this project, learning SQL as I go.  Priority was always given to the
"get it working" tasks and never the "make it safe" tasks.  I had/have
grandiose plans to rewrite the whole system properly after I graduate.
 Unfortunately, the inevitable corruption didn't wait that long.

Cheers.

-Gus

Re: help with data recovery from injected UPDATE

From
"Chris Spotts"
Date:
> It's a classic story.  I'm volunteering about one day per month for
> this project, learning SQL as I go.  Priority was always given to the
> "get it working" tasks and never the "make it safe" tasks.  I had/have
> grandiose plans to rewrite the whole system properly after I graduate.
>  Unfortunately, the inevitable corruption didn't wait that long.
As you're learning, it sounds like parametrized queries might have saved you
from the sql injection that caused this.

Chris
> Cheers.
>
> -Gus


Re: help with data recovery from injected UPDATE

From
Scott Marlowe
Date:
On Thu, Jun 11, 2009 at 1:32 PM, Chris Spotts<rfusca@gmail.com> wrote:
>
>> It's a classic story.  I'm volunteering about one day per month for
>> this project, learning SQL as I go.  Priority was always given to the
>> "get it working" tasks and never the "make it safe" tasks.  I had/have
>> grandiose plans to rewrite the whole system properly after I graduate.
>>  Unfortunately, the inevitable corruption didn't wait that long.
> As you're learning, it sounds like parametrized queries might have saved you
> from the sql injection that caused this.

Very true, and always a good idea.  However, OPs true failure here is
on the backup front.  Without recent, reliable backups, on another
machine / media / datacenter etc. is the only way your data can be
truly safe.

Re: help with data recovery from injected UPDATE

From
"Chris Spotts"
Date:
> >> It's a classic story.  I'm volunteering about one day per month for
> >> this project, learning SQL as I go.  Priority was always given to
> the
> >> "get it working" tasks and never the "make it safe" tasks.  I
> had/have
> >> grandiose plans to rewrite the whole system properly after I
> graduate.
> >>  Unfortunately, the inevitable corruption didn't wait that long.
> > As you're learning, it sounds like parametrized queries might have
> saved you
> > from the sql injection that caused this.
>
> Very true, and always a good idea.  However, OPs true failure here is
> on the backup front.  Without recent, reliable backups, on another
> machine / media / datacenter etc. is the only way your data can be
> truly safe.
[Spotts, Christopher]
Oh absolutely. Regardless of anything you do on the functional aspect, you'd
still need backups.  I was just saying that if you're eventually going to
redesign (like mentioned), a nudge towards parameterized queries doesn't
hurt.


Re: help with data recovery from injected UPDATE

From
Gus Gutoski
Date:
Merlin Moncure wrote:
>> postgresql  8.1 supports pitr archiving.  you can
>> do continuous backups and restore the database to just before the bad
>> data.

I tried using point-in-time-recovery to restore the state of the
database immediately before the corruption.  It didn't work, but it
was quite a show.  Here's the story.

After much wailing and gnashing of teeth, I got postmaster to execute
a recovery (so that recovery.conf was renamed to recovery.done).  But
the database was completely screwed after the recovery.  Here's an
example of the kind of output I saw while executing a simple SELECT
statement:

postgres=# SELECT entry_date,machine_id,coin FROM collections WHERE
entry_date::date>'2009-06-06' ORDER BY entry_date;
WARNING:  could not write block 32 of 1663/10793/2608
DETAIL:  Multiple failures --- write error may be permanent.
ERROR:  xlog flush request 0/4DC6CC88 is not satisfied --- flushed
only to 0/4DC06180
CONTEXT:  writing block 32 of relation 1663/10793/2608

Here's the recovery procedure I followed:
0. Shortly after the corruption on June 9, 2009, I shut down the
server and backed up the entire data directory.  The recovery
procedure described herein begins with this file system backup.
1. The most recent non-corrupted snapshot of the database is a pg_dump
from May 13, 2009.  (I don't have any file system backups from before
the corruption.)  I restored the database to this snapshot by
executing the commands from the May 13 pg_dump on the June 9 corrupted
data.
2.  I removed the files in the pg_xlog directory and replaced them
with the contents of pg_xlog from the corrupted file system backup
from June 9.
3.  I modified the sample recovery.conf file so as to replay all the
transactions right up until the point of corruption.  The hope was
that postmaster would somehow know to begin replaying transactions at
the appropriate point from the May 13 state.

I guess it's too much to ask postmaster to do a PITR from a pg_dump
backup, as opposed to a file system backup.  Bummer.

By the way, I can reliably get postmaster to hang during startup if I
manually create the pg_xlog\RECOVERYHISTORY and pg_xlog\RECOVERYXLOG
directories (even with correct permissions) before starting up the
server.  When I say "hang", I mean that
(i) any attempt to connect is met with the response "FATAL: the
database system is starting up", and
(ii) "pg_ctl stop" cannot be used to shut down the server -- I have to
use "pg_ctl kill"

Anyway, I'm going to try implementing Tom's suggestion of writing a
program to modify the xmin/xmax values.  I expect this approach won't
work, as autovacuum was on at the time of corruption.  However, the
files in the data directory are quite large -- many times larger than
a pg_dump.  The database sees such a small amount of traffic that it's
possible that even vacuum decided not to bother reclaiming the unused
storage created by the corrupting transaction (?).  Here's hoping.

-Gus

On Thu, Jun 11, 2009 at 1:43 PM, Gus Gutoski wrote:
> Thanks for the replies.
>
> Tom Lane wrote:
>> This being 8.1, if you haven't turned on autovacuum there is some chance
>> of that.
>
> Unfortunately, autovacuum was on.  I don't recall ever turning it on,
> but this database is over two years old; it's possible that I blindly
> followed advice from pgAdmin or something way back when.
>
> Merlin Moncure wrote:
>> does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables
>> coiumn?
>
> I did not see a column called last_vacuum in the pg_stat_all_tables table.
>
>> postgresql  8.1 supports pitr archiving.  you can
>> do continuous backups and restore the database to just before the bad
>> data.
>
> I'm learning about this now.  I'm cautiously optimistic, as my pg_xlog
> directory contains some files whose timestamp is near the time of the
> 'incident'.
>
> By "backup" do you mean the contents of a pg_dump?  The most recent
> dump was two months ago.  I'm worried that my log files might not go
> far enough back in time to restore the table from the most recent
> dump.
>
> Both Tom's and Merlin's suggestions carry a significant learning
> curve.  I'll do what I can in the coming days and post to the list if
> anything noteworthy happens.
>
>> plus, there is no way you are escaping the obligatory 'where are your
>> backups?'.  :-).
>
> It's a classic story.  I'm volunteering about one day per month for
> this project, learning SQL as I go.  Priority was always given to the
> "get it working" tasks and never the "make it safe" tasks.  I had/have
> grandiose plans to rewrite the whole system properly after I graduate.
>  Unfortunately, the inevitable corruption didn't wait that long.
>
> Cheers.
>
> -Gus
>

Re: help with data recovery from injected UPDATE

From
Craig Ringer
Date:
On Sun, 2009-06-14 at 10:32 -0400, Gus Gutoski wrote:

> 0. Shortly after the corruption on June 9, 2009, I shut down the
> server and backed up the entire data directory.  The recovery
> procedure described herein begins with this file system backup.
> 1. The most recent non-corrupted snapshot of the database is a pg_dump
> from May 13, 2009.  (I don't have any file system backups from before
> the corruption.)  I restored the database to this snapshot by
> executing the commands from the May 13 pg_dump on the June 9 corrupted
> data.
> 2.  I removed the files in the pg_xlog directory and replaced them
> with the contents of pg_xlog from the corrupted file system backup
> from June 9.

That really, REALLY won't work. It just doesn't work like that.

You're trying to use a block-level restore process (the transaction
logs) with a base backup that's at a much higher level, and isn't
block-for-block the same as the old database files. Additionally, you're
trying to do so over a known corrupt database.

The only thing that confuses me is how you convinced Pg to run recovery
using the xlog files you put in place. It should've refused, surely?

> I guess it's too much to ask postmaster to do a PITR from a pg_dump
> backup, as opposed to a file system backup.  Bummer.

Yep. No hope.

--
Craig Ringer


Re: help with data recovery from injected UPDATE

From
Merlin Moncure
Date:
On Sun, Jun 14, 2009 at 10:32 AM, Gus
Gutoski<shared.entanglement@gmail.com> wrote:
> Merlin Moncure wrote:
>>> postgresql  8.1 supports pitr archiving.  you can
>>> do continuous backups and restore the database to just before the bad
>>> data.
>
> I tried using point-in-time-recovery to restore the state of the
> database immediately before the corruption.  It didn't work, but it
> was quite a show.  Here's the story.

yes, I'm sorry...you  misunderstood my suggestion.  the database
supports continuous *archiving* from which a recovery can be made.  No
archives, no recovery :-).  Here is what I'd do if I in your shoes:

From a copy of your filesystem backup, set up the database to run and
attempt pg_resetxlog before starting it up.  Log in and see if your
data is there...if it is, you hit the jackpot...if not...the next step
is to determine if the data is actually _in_ the table.  There are a
couple of ways to do this..tinkering around with transaction
visibility is one...simply dumping the heap file for the table and
inspecting it is another.

merlin

Re: help with data recovery from injected UPDATE

From
Merlin Moncure
Date:
On Tue, Jun 23, 2009 at 2:05 PM, Gus
Gutoski<shared.entanglement@gmail.com> wrote:
> Success, of sorts.  I was able to retrieve 90% the corrupted data by
> dumping the heap file.  Many thanks to those who replied with helpful
> suggestions.
>
> If you're interested in detail then read on.  Otherwise, don't bother.
>
> The data was still in the table -- I could see it using a hex editor.
> This surprised me, as autovacuum was on at the time of corruption.
> Perhaps vacuum didn't bother reclaiming storage space because the
> database is relatively small and low-traffic.
>
> The attempt at point-in-time-recovery via transaction logs was doomed
> to failure, as I do not have a file system backup from before the
> corruption.  Still, I tried Merlin's trick with pg_resetxlog to no
> avail.
>
> I tried using the pg_filedump utility to dump the heap file, but it
> wasn't what I needed.  I later discovered a souped-up utility called
> pg_dumpdata:
> http://blogs.sun.com/avalon/entry/recovering_postgres_data
> While this utility still didn't provide everything I needed, it was a
> sufficient starting point.
> (It's written for postgres 8.2, whereas I'm running 8.1 -- it
> segfaulted when I first ran it on my heap file.)
>
> I sifted through the postgres source tree looking for the code that
> reads/writes the heap files, but I couldn't make head or tail of
> anything.  In the end, it was easier to reverse engineer the format
> for user data and use the pg_dumpdata source as a base to get me to
> the "items" in the heap files.  The reason that I couldn't get 100% of
> the lost data is that the heap tuple header that points to the user
> data sometimes landed me at a random point in the middle of the item,
> rather than at the beginning.  At this point I gave up trying to get
> the last 10% of the data -- I had run out of time and patience.
>
> Having partially learned my lesson, I've set up a utility to run
> pg_dump each day.  After I've taken a break, I'll look into a
> reasonabe set-up for file system backups with point-in-time recovery.
> But really, what are the chances anything like this will ever happen
> again? ;-)

Regular scheduled pg_dump is often enough :-)

merlin

Re: help with data recovery from injected UPDATE

From
Gus Gutoski
Date:
Success, of sorts.  I was able to retrieve 90% the corrupted data by
dumping the heap file.  Many thanks to those who replied with helpful
suggestions.

If you're interested in detail then read on.  Otherwise, don't bother.

The data was still in the table -- I could see it using a hex editor.
This surprised me, as autovacuum was on at the time of corruption.
Perhaps vacuum didn't bother reclaiming storage space because the
database is relatively small and low-traffic.

The attempt at point-in-time-recovery via transaction logs was doomed
to failure, as I do not have a file system backup from before the
corruption.  Still, I tried Merlin's trick with pg_resetxlog to no
avail.

I tried using the pg_filedump utility to dump the heap file, but it
wasn't what I needed.  I later discovered a souped-up utility called
pg_dumpdata:
http://blogs.sun.com/avalon/entry/recovering_postgres_data
While this utility still didn't provide everything I needed, it was a
sufficient starting point.
(It's written for postgres 8.2, whereas I'm running 8.1 -- it
segfaulted when I first ran it on my heap file.)

I sifted through the postgres source tree looking for the code that
reads/writes the heap files, but I couldn't make head or tail of
anything.  In the end, it was easier to reverse engineer the format
for user data and use the pg_dumpdata source as a base to get me to
the "items" in the heap files.  The reason that I couldn't get 100% of
the lost data is that the heap tuple header that points to the user
data sometimes landed me at a random point in the middle of the item,
rather than at the beginning.  At this point I gave up trying to get
the last 10% of the data -- I had run out of time and patience.

Having partially learned my lesson, I've set up a utility to run
pg_dump each day.  After I've taken a break, I'll look into a
reasonabe set-up for file system backups with point-in-time recovery.
But really, what are the chances anything like this will ever happen
again? ;-)

-Gus

On Mon, Jun 15, 2009 at 9:02 AM, Merlin Moncure<mmoncure@gmail.com> wrote:
> On Sun, Jun 14, 2009 at 10:32 AM, Gus
> Gutoski<shared.entanglement@gmail.com> wrote:
>> Merlin Moncure wrote:
>>>> postgresql  8.1 supports pitr archiving.  you can
>>>> do continuous backups and restore the database to just before the bad
>>>> data.
>>
>> I tried using point-in-time-recovery to restore the state of the
>> database immediately before the corruption.  It didn't work, but it
>> was quite a show.  Here's the story.
>
> yes, I'm sorry...you  misunderstood my suggestion.  the database
> supports continuous *archiving* from which a recovery can be made.  No
> archives, no recovery :-).  Here is what I'd do if I in your shoes:
>
> From a copy of your filesystem backup, set up the database to run and
> attempt pg_resetxlog before starting it up.  Log in and see if your
> data is there...if it is, you hit the jackpot...if not...the next step
> is to determine if the data is actually _in_ the table.  There are a
> couple of ways to do this..tinkering around with transaction
> visibility is one...simply dumping the heap file for the table and
> inspecting it is another.
>
> merlin