Thread: help with data recovery from injected UPDATE
Hi, I'm a noob who failed to properly sanitize incoming data from the front end. As a result, a poor hapless user managed to smuggle in a malicious UPDATE statement that corrupted every single record in a 70000+ table. Only 3 fields were corrupted and of those only one is vital. But it's REALLY vital. I don't expect there's anything anyone can do, but I've been advised that some subscribers to this list are miracle-workers, so it's worth a shot. Here's how it happened. A typical update statement from the front end has the form UPDATE collections SET foreign_id=2, coin=50-30, bills=10+20 WHERE entry_date='2009-09-09'; The hapless user accidentally included TWO minus signs in one entry, so the statement looked like this: UPDATE collections SET foreign_id=2, coin=50--30, bills=10+20 WHERE entry_date='2009-09-09'; (These examples are simplified for the sake of brevity in this message.) Of course, the double minus sign comments out the rest of the line and the statement is left dangling, looking for a terminating semicolon. Now, my front-end happens to be Visual Basic 6.0 (yeah, I know) via ActiveX Data Objects (ADO). In particular, the SQL statement is invoked via the ADO Recordset object's Open() method. It appears that this Open() method automagically terminates unfinished statements, because the above statement *actually executes* in postgres when invoked form the VB front end. Naturally then, *every* record in the database has its "foreign_id" field set to 2 and its "coin" field set to 50. I *really* need to recover that "foreign_id" field. (As its name suggests, that field is a foreign key into a different table.) Here's some more info. As I'm a noob, I don't know what all to include here -- please ask for more info if you need it. psql version() returns PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special) OS is Windows XP (I think -- possibly Vista. I'll check next time I'm at that machine). Yours in need, Gus
Gus Gutoski <shared.entanglement@gmail.com> writes: > Naturally then, *every* record in the database has its "foreign_id" > field set to 2 and its "coin" field set to 50. I *really* need to > recover that "foreign_id" field. (As its name suggests, that field is > a foreign key into a different table.) Well, in principle you could roll back that transaction, as long as the old versions of the rows are still there and haven't been vacuumed away. This being 8.1, if you haven't turned on autovacuum there is some chance of that. Shut down any manual vacuuming setup you may have *NOW*, and then go trawl the pgsql archives for discussions of data recovery. You most likely will need to find or write a special-purpose program to change the xmin/xmax fields on the modified rows. regards, tom lane
On Wed, Jun 10, 2009 at 12:49 PM, Gus Gutoski<shared.entanglement@gmail.com> wrote: > Of course, the double minus sign comments out the rest of the line and > the statement is left dangling, looking for a terminating semicolon. SQL statements are not terminated with semi-colons. The semi-colon is used in the psql shell to indicate that you're done typing your statement and to execute it. The API just submits the string you give it to the executor.
On Wed, Jun 10, 2009 at 12:49 PM, Gus Gutoski<shared.entanglement@gmail.com> wrote: > Hi, > > I'm a noob who failed to properly sanitize incoming data from the > front end. As a result, a poor hapless user managed to smuggle in a > malicious UPDATE statement that corrupted every single record in a > 70000+ table. Only 3 fields were corrupted and of those only one is > vital. But it's REALLY vital. > > I don't expect there's anything anyone can do, but I've been advised > that some subscribers to this list are miracle-workers, so it's worth > a shot. > > Here's how it happened. A typical update statement from the front end > has the form > UPDATE collections SET foreign_id=2, coin=50-30, bills=10+20 WHERE > entry_date='2009-09-09'; > > The hapless user accidentally included TWO minus signs in one entry, > so the statement looked like this: > UPDATE collections SET foreign_id=2, coin=50--30, bills=10+20 WHERE > entry_date='2009-09-09'; > > (These examples are simplified for the sake of brevity in this message.) > > Of course, the double minus sign comments out the rest of the line and > the statement is left dangling, looking for a terminating semicolon. > > Now, my front-end happens to be Visual Basic 6.0 (yeah, I know) via > ActiveX Data Objects (ADO). In particular, the SQL statement is > invoked via the ADO Recordset object's Open() method. It appears that > this Open() method automagically terminates unfinished statements, > because the above statement *actually executes* in postgres when > invoked form the VB front end. > > Naturally then, *every* record in the database has its "foreign_id" > field set to 2 and its "coin" field set to 50. I *really* need to > recover that "foreign_id" field. (As its name suggests, that field is > a foreign key into a different table.) > > Here's some more info. As I'm a noob, I don't know what all to > include here -- please ask for more info if you need it. > > psql version() returns > PostgreSQL 8.1.5 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) > 3.4.2 (mingw-special) If you have not done so already, immediately shut down the database, and make a full filesystem copy of it (two better). On windows, iirc this in postgresql/$pgversion/data by default. Following that you are in for a tough slog, depending on how proficient you are with manually setting up the database... does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables coiumn? if so, I'd check that to see if vacuum was fired since the 'incident'. plus, there is no way you are escaping the obligatory 'where are your backups?'. :-). postgresql 8.1 supports pitr archiving. you can do continuous backups and restore the database to just before the bad data. merlin
Thanks for the replies. Tom Lane wrote: > This being 8.1, if you haven't turned on autovacuum there is some chance > of that. Unfortunately, autovacuum was on. I don't recall ever turning it on, but this database is over two years old; it's possible that I blindly followed advice from pgAdmin or something way back when. Merlin Moncure wrote: > does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables > coiumn? I did not see a column called last_vacuum in the pg_stat_all_tables table. > postgresql 8.1 supports pitr archiving. you can > do continuous backups and restore the database to just before the bad > data. I'm learning about this now. I'm cautiously optimistic, as my pg_xlog directory contains some files whose timestamp is near the time of the 'incident'. By "backup" do you mean the contents of a pg_dump? The most recent dump was two months ago. I'm worried that my log files might not go far enough back in time to restore the table from the most recent dump. Both Tom's and Merlin's suggestions carry a significant learning curve. I'll do what I can in the coming days and post to the list if anything noteworthy happens. > plus, there is no way you are escaping the obligatory 'where are your > backups?'. :-). It's a classic story. I'm volunteering about one day per month for this project, learning SQL as I go. Priority was always given to the "get it working" tasks and never the "make it safe" tasks. I had/have grandiose plans to rewrite the whole system properly after I graduate. Unfortunately, the inevitable corruption didn't wait that long. Cheers. -Gus
> It's a classic story. I'm volunteering about one day per month for > this project, learning SQL as I go. Priority was always given to the > "get it working" tasks and never the "make it safe" tasks. I had/have > grandiose plans to rewrite the whole system properly after I graduate. > Unfortunately, the inevitable corruption didn't wait that long. As you're learning, it sounds like parametrized queries might have saved you from the sql injection that caused this. Chris > Cheers. > > -Gus
On Thu, Jun 11, 2009 at 1:32 PM, Chris Spotts<rfusca@gmail.com> wrote: > >> It's a classic story. I'm volunteering about one day per month for >> this project, learning SQL as I go. Priority was always given to the >> "get it working" tasks and never the "make it safe" tasks. I had/have >> grandiose plans to rewrite the whole system properly after I graduate. >> Unfortunately, the inevitable corruption didn't wait that long. > As you're learning, it sounds like parametrized queries might have saved you > from the sql injection that caused this. Very true, and always a good idea. However, OPs true failure here is on the backup front. Without recent, reliable backups, on another machine / media / datacenter etc. is the only way your data can be truly safe.
> >> It's a classic story. I'm volunteering about one day per month for > >> this project, learning SQL as I go. Priority was always given to > the > >> "get it working" tasks and never the "make it safe" tasks. I > had/have > >> grandiose plans to rewrite the whole system properly after I > graduate. > >> Unfortunately, the inevitable corruption didn't wait that long. > > As you're learning, it sounds like parametrized queries might have > saved you > > from the sql injection that caused this. > > Very true, and always a good idea. However, OPs true failure here is > on the backup front. Without recent, reliable backups, on another > machine / media / datacenter etc. is the only way your data can be > truly safe. [Spotts, Christopher] Oh absolutely. Regardless of anything you do on the functional aspect, you'd still need backups. I was just saying that if you're eventually going to redesign (like mentioned), a nudge towards parameterized queries doesn't hurt.
Merlin Moncure wrote: >> postgresql 8.1 supports pitr archiving. you can >> do continuous backups and restore the database to just before the bad >> data. I tried using point-in-time-recovery to restore the state of the database immediately before the corruption. It didn't work, but it was quite a show. Here's the story. After much wailing and gnashing of teeth, I got postmaster to execute a recovery (so that recovery.conf was renamed to recovery.done). But the database was completely screwed after the recovery. Here's an example of the kind of output I saw while executing a simple SELECT statement: postgres=# SELECT entry_date,machine_id,coin FROM collections WHERE entry_date::date>'2009-06-06' ORDER BY entry_date; WARNING: could not write block 32 of 1663/10793/2608 DETAIL: Multiple failures --- write error may be permanent. ERROR: xlog flush request 0/4DC6CC88 is not satisfied --- flushed only to 0/4DC06180 CONTEXT: writing block 32 of relation 1663/10793/2608 Here's the recovery procedure I followed: 0. Shortly after the corruption on June 9, 2009, I shut down the server and backed up the entire data directory. The recovery procedure described herein begins with this file system backup. 1. The most recent non-corrupted snapshot of the database is a pg_dump from May 13, 2009. (I don't have any file system backups from before the corruption.) I restored the database to this snapshot by executing the commands from the May 13 pg_dump on the June 9 corrupted data. 2. I removed the files in the pg_xlog directory and replaced them with the contents of pg_xlog from the corrupted file system backup from June 9. 3. I modified the sample recovery.conf file so as to replay all the transactions right up until the point of corruption. The hope was that postmaster would somehow know to begin replaying transactions at the appropriate point from the May 13 state. I guess it's too much to ask postmaster to do a PITR from a pg_dump backup, as opposed to a file system backup. Bummer. By the way, I can reliably get postmaster to hang during startup if I manually create the pg_xlog\RECOVERYHISTORY and pg_xlog\RECOVERYXLOG directories (even with correct permissions) before starting up the server. When I say "hang", I mean that (i) any attempt to connect is met with the response "FATAL: the database system is starting up", and (ii) "pg_ctl stop" cannot be used to shut down the server -- I have to use "pg_ctl kill" Anyway, I'm going to try implementing Tom's suggestion of writing a program to modify the xmin/xmax values. I expect this approach won't work, as autovacuum was on at the time of corruption. However, the files in the data directory are quite large -- many times larger than a pg_dump. The database sees such a small amount of traffic that it's possible that even vacuum decided not to bother reclaiming the unused storage created by the corrupting transaction (?). Here's hoping. -Gus On Thu, Jun 11, 2009 at 1:43 PM, Gus Gutoski wrote: > Thanks for the replies. > > Tom Lane wrote: >> This being 8.1, if you haven't turned on autovacuum there is some chance >> of that. > > Unfortunately, autovacuum was on. I don't recall ever turning it on, > but this database is over two years old; it's possible that I blindly > followed advice from pgAdmin or something way back when. > > Merlin Moncure wrote: >> does postgresql 8.1 have last_vacuum time in the pg_stat_all_tables >> coiumn? > > I did not see a column called last_vacuum in the pg_stat_all_tables table. > >> postgresql 8.1 supports pitr archiving. you can >> do continuous backups and restore the database to just before the bad >> data. > > I'm learning about this now. I'm cautiously optimistic, as my pg_xlog > directory contains some files whose timestamp is near the time of the > 'incident'. > > By "backup" do you mean the contents of a pg_dump? The most recent > dump was two months ago. I'm worried that my log files might not go > far enough back in time to restore the table from the most recent > dump. > > Both Tom's and Merlin's suggestions carry a significant learning > curve. I'll do what I can in the coming days and post to the list if > anything noteworthy happens. > >> plus, there is no way you are escaping the obligatory 'where are your >> backups?'. :-). > > It's a classic story. I'm volunteering about one day per month for > this project, learning SQL as I go. Priority was always given to the > "get it working" tasks and never the "make it safe" tasks. I had/have > grandiose plans to rewrite the whole system properly after I graduate. > Unfortunately, the inevitable corruption didn't wait that long. > > Cheers. > > -Gus >
On Sun, 2009-06-14 at 10:32 -0400, Gus Gutoski wrote: > 0. Shortly after the corruption on June 9, 2009, I shut down the > server and backed up the entire data directory. The recovery > procedure described herein begins with this file system backup. > 1. The most recent non-corrupted snapshot of the database is a pg_dump > from May 13, 2009. (I don't have any file system backups from before > the corruption.) I restored the database to this snapshot by > executing the commands from the May 13 pg_dump on the June 9 corrupted > data. > 2. I removed the files in the pg_xlog directory and replaced them > with the contents of pg_xlog from the corrupted file system backup > from June 9. That really, REALLY won't work. It just doesn't work like that. You're trying to use a block-level restore process (the transaction logs) with a base backup that's at a much higher level, and isn't block-for-block the same as the old database files. Additionally, you're trying to do so over a known corrupt database. The only thing that confuses me is how you convinced Pg to run recovery using the xlog files you put in place. It should've refused, surely? > I guess it's too much to ask postmaster to do a PITR from a pg_dump > backup, as opposed to a file system backup. Bummer. Yep. No hope. -- Craig Ringer
On Sun, Jun 14, 2009 at 10:32 AM, Gus Gutoski<shared.entanglement@gmail.com> wrote: > Merlin Moncure wrote: >>> postgresql 8.1 supports pitr archiving. you can >>> do continuous backups and restore the database to just before the bad >>> data. > > I tried using point-in-time-recovery to restore the state of the > database immediately before the corruption. It didn't work, but it > was quite a show. Here's the story. yes, I'm sorry...you misunderstood my suggestion. the database supports continuous *archiving* from which a recovery can be made. No archives, no recovery :-). Here is what I'd do if I in your shoes: From a copy of your filesystem backup, set up the database to run and attempt pg_resetxlog before starting it up. Log in and see if your data is there...if it is, you hit the jackpot...if not...the next step is to determine if the data is actually _in_ the table. There are a couple of ways to do this..tinkering around with transaction visibility is one...simply dumping the heap file for the table and inspecting it is another. merlin
On Tue, Jun 23, 2009 at 2:05 PM, Gus Gutoski<shared.entanglement@gmail.com> wrote: > Success, of sorts. I was able to retrieve 90% the corrupted data by > dumping the heap file. Many thanks to those who replied with helpful > suggestions. > > If you're interested in detail then read on. Otherwise, don't bother. > > The data was still in the table -- I could see it using a hex editor. > This surprised me, as autovacuum was on at the time of corruption. > Perhaps vacuum didn't bother reclaiming storage space because the > database is relatively small and low-traffic. > > The attempt at point-in-time-recovery via transaction logs was doomed > to failure, as I do not have a file system backup from before the > corruption. Still, I tried Merlin's trick with pg_resetxlog to no > avail. > > I tried using the pg_filedump utility to dump the heap file, but it > wasn't what I needed. I later discovered a souped-up utility called > pg_dumpdata: > http://blogs.sun.com/avalon/entry/recovering_postgres_data > While this utility still didn't provide everything I needed, it was a > sufficient starting point. > (It's written for postgres 8.2, whereas I'm running 8.1 -- it > segfaulted when I first ran it on my heap file.) > > I sifted through the postgres source tree looking for the code that > reads/writes the heap files, but I couldn't make head or tail of > anything. In the end, it was easier to reverse engineer the format > for user data and use the pg_dumpdata source as a base to get me to > the "items" in the heap files. The reason that I couldn't get 100% of > the lost data is that the heap tuple header that points to the user > data sometimes landed me at a random point in the middle of the item, > rather than at the beginning. At this point I gave up trying to get > the last 10% of the data -- I had run out of time and patience. > > Having partially learned my lesson, I've set up a utility to run > pg_dump each day. After I've taken a break, I'll look into a > reasonabe set-up for file system backups with point-in-time recovery. > But really, what are the chances anything like this will ever happen > again? ;-) Regular scheduled pg_dump is often enough :-) merlin
Success, of sorts. I was able to retrieve 90% the corrupted data by dumping the heap file. Many thanks to those who replied with helpful suggestions. If you're interested in detail then read on. Otherwise, don't bother. The data was still in the table -- I could see it using a hex editor. This surprised me, as autovacuum was on at the time of corruption. Perhaps vacuum didn't bother reclaiming storage space because the database is relatively small and low-traffic. The attempt at point-in-time-recovery via transaction logs was doomed to failure, as I do not have a file system backup from before the corruption. Still, I tried Merlin's trick with pg_resetxlog to no avail. I tried using the pg_filedump utility to dump the heap file, but it wasn't what I needed. I later discovered a souped-up utility called pg_dumpdata: http://blogs.sun.com/avalon/entry/recovering_postgres_data While this utility still didn't provide everything I needed, it was a sufficient starting point. (It's written for postgres 8.2, whereas I'm running 8.1 -- it segfaulted when I first ran it on my heap file.) I sifted through the postgres source tree looking for the code that reads/writes the heap files, but I couldn't make head or tail of anything. In the end, it was easier to reverse engineer the format for user data and use the pg_dumpdata source as a base to get me to the "items" in the heap files. The reason that I couldn't get 100% of the lost data is that the heap tuple header that points to the user data sometimes landed me at a random point in the middle of the item, rather than at the beginning. At this point I gave up trying to get the last 10% of the data -- I had run out of time and patience. Having partially learned my lesson, I've set up a utility to run pg_dump each day. After I've taken a break, I'll look into a reasonabe set-up for file system backups with point-in-time recovery. But really, what are the chances anything like this will ever happen again? ;-) -Gus On Mon, Jun 15, 2009 at 9:02 AM, Merlin Moncure<mmoncure@gmail.com> wrote: > On Sun, Jun 14, 2009 at 10:32 AM, Gus > Gutoski<shared.entanglement@gmail.com> wrote: >> Merlin Moncure wrote: >>>> postgresql 8.1 supports pitr archiving. you can >>>> do continuous backups and restore the database to just before the bad >>>> data. >> >> I tried using point-in-time-recovery to restore the state of the >> database immediately before the corruption. It didn't work, but it >> was quite a show. Here's the story. > > yes, I'm sorry...you misunderstood my suggestion. the database > supports continuous *archiving* from which a recovery can be made. No > archives, no recovery :-). Here is what I'd do if I in your shoes: > > From a copy of your filesystem backup, set up the database to run and > attempt pg_resetxlog before starting it up. Log in and see if your > data is there...if it is, you hit the jackpot...if not...the next step > is to determine if the data is actually _in_ the table. There are a > couple of ways to do this..tinkering around with transaction > visibility is one...simply dumping the heap file for the table and > inspecting it is another. > > merlin