Thread: Recovering data via raw table and field separators

Recovering data via raw table and field separators

From
"John Wells"
Date:
Guys,

We had a bit of a misfortunate communication breakdown here at work,
which led to a particular database not being backed up. Before we
recognized this problem, and entire database table was deleted.

I immediately copied the pgdata directory and have been able to find
the file that represents the data that was blown away. Fortunately, I
only really need to restore two pieces of data from the table...the id
field (primary key) and the employee_number field. Once I have this
data, I can re-populate and all my relationships will be honored.

I've been looking through the records with a hex editor, but the
unfortunate thing is that I either don't see consistency with field
separators or I'm overlooking them.

The table definition looks like this:

                                           Table "public.users"
          Column           |            Type             |
        Modifiers
---------------------------+-----------------------------+----------------------------------------------------
 id                        | integer                     | not null
default nextval('users_id_seq'::regclass)
 login                     | character varying(255)      |
 email                     | character varying(255)      |
 crypted_password          | character varying(40)       |
 salt                      | character varying(40)       |
 created_at                | timestamp without time zone |
 updated_at                | timestamp without time zone |
 remember_token            | character varying(255)      |
 remember_token_expires_at | timestamp without time zone |
 manager_id                | integer                     |
 employee_number           | integer                     |
 classification_id         | integer                     |
 name                      | character varying(255)      |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

Can anyone guide me in how I might parse this out? There has to be a
way...I think I just need a helpful push ;)

Thanks!
John

Re: Recovering data via raw table and field separators

From
Alvaro Herrera
Date:
John Wells wrote:

> I've been looking through the records with a hex editor, but the
> unfortunate thing is that I either don't see consistency with field
> separators or I'm overlooking them.

There are no field separators.  Perhaps you could extract some useful
info with pg_filedump, which you can grab at
http://sources.redhat.com/rhdb


--
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/4/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> There are no field separators.  Perhaps you could extract some useful
> info with pg_filedump, which you can grab at
> http://sources.redhat.com/rhdb

So is it simply field width? Can one count the number of bytes based
on native datatype length and determine field start/end?

Thanks!
John

Re: Recovering data via raw table and field separators

From
Alvaro Herrera
Date:
John Wells wrote:
> On 12/4/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > There are no field separators.  Perhaps you could extract some useful
> > info with pg_filedump, which you can grab at
> > http://sources.redhat.com/rhdb
>
> So is it simply field width? Can one count the number of bytes based
> on native datatype length and determine field start/end?

Yes.  For variable length types, there is a 4-byte length word at the
start of the field (unless you are using 8.3 which introduces more
compact representations in some cases).

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"If it wasn't for my companion, I believe I'd be having
the time of my life"  (John Dunbar)

Re: Recovering data via raw table and field separators

From
Martijn van Oosterhout
Date:
On Tue, Dec 04, 2007 at 03:38:16PM -0300, Alvaro Herrera wrote:
> > So is it simply field width? Can one count the number of bytes based
> > on native datatype length and determine field start/end?
>
> Yes.  For variable length types, there is a 4-byte length word at the
> start of the field (unless you are using 8.3 which introduces more
> compact representations in some cases).

And NULLs are skipped entirely. They are represented in the null-bitmap
at the beginning of the tuple.

What sometimes works is creating a new table with the exact same
structure, shutting down the postmaster and copying the old table over
the new one. If it's the same cluster and the clog/xlog are still there
it might work.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Recovering data via raw table and field separators

From
Alvaro Herrera
Date:
John Wells wrote:
> On 12/4/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > John Wells wrote:
> > > On 12/4/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > > There are no field separators.  Perhaps you could extract some useful
> > > > info with pg_filedump, which you can grab at
> > > > http://sources.redhat.com/rhdb
> > >
> > > So is it simply field width? Can one count the number of bytes based
> > > on native datatype length and determine field start/end?
> >
> > Yes.  For variable length types, there is a 4-byte length word at the
> > start of the field (unless you are using 8.3 which introduces more
> > compact representations in some cases).
>
> Oh fun...what about record separators? I think I could live with doing
> it by name and id. Since name is the last field in the table, and id
> is the first, one could assume that it looks something like this:
>
> 1rst record name | record separator | 2nd record id | etc, etc
>
> If I could split on that record separator I might be able to get what I'm after.

There are no record separators either.  Records are stored following
offsets which are stored in a fixed-size array at the start of each
page, called "line pointers".  Have a look at what pg_filedump shows you
for interpreted output.

Maybe you can get away with your idea, keeping in mind that record
separators are just not there.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/4/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Tue, Dec 04, 2007 at 03:38:16PM -0300, Alvaro Herrera wrote:
> > > So is it simply field width? Can one count the number of bytes based
> > > on native datatype length and determine field start/end?
> >
> > Yes.  For variable length types, there is a 4-byte length word at the
> > start of the field (unless you are using 8.3 which introduces more
> > compact representations in some cases).
>
> And NULLs are skipped entirely. They are represented in the null-bitmap
> at the beginning of the tuple.
>
> What sometimes works is creating a new table with the exact same
> structure, shutting down the postmaster and copying the old table over
> the new one. If it's the same cluster and the clog/xlog are still there
> it might work.
>
> Have a nice day,

Martijn,

Wow...interesting idea...but to clarify, I copied the table file
*after* the delete was run on the table. Although the data appears to
still be there, wouldn't they be marked as deleted in some way and not
appear in the new table even if the copy worked?

Re: Recovering data via raw table and field separators

From
Martijn van Oosterhout
Date:
On Tue, Dec 04, 2007 at 02:26:21PM -0500, John Wells wrote:
> Wow...interesting idea...but to clarify, I copied the table file
> *after* the delete was run on the table. Although the data appears to
> still be there, wouldn't they be marked as deleted in some way and not
> appear in the new table even if the copy worked?

Ah sorry, I though you meant de table was dropped or the database was
deleted. If you actually ran a DELETE FROM on the table, then yes
they'll all be marked deleted.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/4/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> Ah sorry, I though you meant de table was dropped or the database was
> deleted. If you actually ran a DELETE FROM on the table, then yes
> they'll all be marked deleted.


So, given a database table file that still has records in it, and
given the fact that these records could be parsed and displayed if the
proper utilty knew how to read the various data structures used to
denote field and record length, is there no utility to do this? I
seems that it would be fairly straight forward to somehow read the
records, yet to pay no mind to the deleted flag (or whatever mechanism
postgresql uses to mark them as deleted).

Re: Recovering data via raw table and field separators

From
Martijn van Oosterhout
Date:
On Tue, Dec 04, 2007 at 03:05:53PM -0500, John Wells wrote:
> So, given a database table file that still has records in it, and
> given the fact that these records could be parsed and displayed if the
> proper utilty knew how to read the various data structures used to
> denote field and record length, is there no utility to do this? I
> seems that it would be fairly straight forward to somehow read the
> records, yet to pay no mind to the deleted flag (or whatever mechanism
> postgresql uses to mark them as deleted).

Ofcourse, see the pg_filedump mentioned at the beginning of this
thread.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/4/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Tue, Dec 04, 2007 at 03:05:53PM -0500, John Wells wrote:
> > So, given a database table file that still has records in it, and
> > given the fact that these records could be parsed and displayed if the
> > proper utilty knew how to read the various data structures used to
> > denote field and record length, is there no utility to do this? I
> > seems that it would be fairly straight forward to somehow read the
> > records, yet to pay no mind to the deleted flag (or whatever mechanism
> > postgresql uses to mark them as deleted).
>
> Ofcourse, see the pg_filedump mentioned at the beginning of this
> thread.

Thanks Martijn,

I have pg_filedump installed, but can't figure out how to dump the
rows themselves. I get the equivalent of the output at the end of this
post. Looking over the --help, there's nothing obvious that has gotten
me further.

Is there a trick I'm missing?

Thanks!
John

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1
*
* File: 17741
* Options used: -i -f
*
* Dump created on: Wed Dec  5 11:21:07 2007
*******************************************************************

Block    0 ********************************************************
<Header> -----
 Block Offset: 0x00000000         Offsets: Lower     196 (0x00c4)
 Block: Size 8192  Version    3            Upper    8192 (0x2000)
 LSN:  logid      0 recoff 0x0181e758      Special  8192 (0x2000)
 Items:   44                   Free Space: 7996
 Length (including item array): 200

  0000: 00000000 58e78101 01000000 c4000020  ....X..........
  0010: 00200320 441f0000 781e0000 b81d0000  . . D...x.......
  0020: f41c0000 301c0000 641b0000 981a0000  ....0...d.......
  0030: c4190000 f4180000 24180000 54170000  ........$...T...
  0040: 80160000 ac150000 e0140000 10140000  ................
  0050: 40130000 74120000 a0110000 d0100000  @...t...........
  0060: 04100000 380f0000 680e0000 980d0000  ....8...h.......
  0070: c40c0000 f80b0000 280b0000 540a0000  ........(...T...
  0080: 88090000 b4080000 00080000 48070000  ............H...
  0090: 90060000 d8050000 20050000 68040000  ........ ...h...
  00a0: b4030000 fc020000 48020000 90010000  ........H.......
  00b0: d4000000 48030000 94020000 e0010000  ....H...........
  00c0: 30010000 00000000                    0.......

Re: Recovering data via raw table and field separators

From
Alvaro Herrera
Date:
John Wells wrote:

> I have pg_filedump installed, but can't figure out how to dump the
> rows themselves. I get the equivalent of the output at the end of this
> post. Looking over the --help, there's nothing obvious that has gotten
> me further.

-i is the option you need; but you have to keep in mind that it won't
show you the textual representation of data.  It will only get you
internal representation (binary).

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"MySQL is a toy compared to PostgreSQL."             (Randal L. Schwartz)
      (http://archives.postgresql.org/pgsql-general/2005-07/msg00517.php)

Re: Recovering data via raw table and field separators

From
Alvaro Herrera
Date:
John Wells wrote:
> On 12/5/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > John Wells wrote:
> >
> > > I have pg_filedump installed, but can't figure out how to dump the
> > > rows themselves. I get the equivalent of the output at the end of this
> > > post. Looking over the --help, there's nothing obvious that has gotten
> > > me further.
> >
> > -i is the option you need; but you have to keep in mind that it won't
> > show you the textual representation of data.  It will only get you
> > internal representation (binary).
>
> Well, then isn't that no better than having the raw table file in this
> case? What I really need is a way to just spit out all tuples in the
> file in a readable format, regardless of whether they've been marked
> deleted. Is there any way to do this?

Not that I know of.

I think the simplest thing to get your tuples back is:

1. mark the transaction that deleted them as aborted in pg_clog
2. reset the hint bits in the deleted tuples, or hack your postgres copy
to ignore hint bits

You can figure out the transaction that deleted the tuples by seeing
that their Xmax value is with pg_filedump.

The hint bits part makes it rather messy :-(

Please keep the list on CC:.

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Puedes vivir solo una vez, pero si lo haces bien, una vez es suficiente"

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/5/07, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Not that I know of.
>
> I think the simplest thing to get your tuples back is:
>
> 1. mark the transaction that deleted them as aborted in pg_clog
> 2. reset the hint bits in the deleted tuples, or hack your postgres copy
> to ignore hint bits
>
> You can figure out the transaction that deleted the tuples by seeing
> that their Xmax value is with pg_filedump.
>
> The hint bits part makes it rather messy :-(

A bit beyond me I'm afriad, at least at my current level with
postgresql. Does anyone offer a commercial tool to do this? Or, would
anyone be interested in doing it for a fee?

Re: Recovering data via raw table and field separators

From
Martijn van Oosterhout
Date:
On Thu, Dec 06, 2007 at 02:35:42PM -0500, John Wells wrote:
> A bit beyond me I'm afriad, at least at my current level with
> postgresql. Does anyone offer a commercial tool to do this? Or, would
> anyone be interested in doing it for a fee?

There was a tool pgfsck which could dump table data, but it's not been
updated in quite a while so I don't know if it'll work for your
version...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

Re: Recovering data via raw table and field separators

From
"John Wells"
Date:
On 12/6/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Thu, Dec 06, 2007 at 02:35:42PM -0500, John Wells wrote:
> > A bit beyond me I'm afriad, at least at my current level with
> > postgresql. Does anyone offer a commercial tool to do this? Or, would
> > anyone be interested in doing it for a fee?
>
> There was a tool pgfsck which could dump table data, but it's not been
> updated in quite a while so I don't know if it'll work for your
> version...

I'm on 8.1. I'll check and see if I can make it work.

Thanks!
John

Re: Recovering data via raw table and field separators

From
Bruce Momjian
Date:
John Wells wrote:
> On 12/4/07, Martijn van Oosterhout <kleptog@svana.org> wrote:
> > Ah sorry, I though you meant de table was dropped or the database was
> > deleted. If you actually ran a DELETE FROM on the table, then yes
> > they'll all be marked deleted.
>
>
> So, given a database table file that still has records in it, and
> given the fact that these records could be parsed and displayed if the
> proper utilty knew how to read the various data structures used to
> denote field and record length, is there no utility to do this? I
> seems that it would be fairly straight forward to somehow read the
> records, yet to pay no mind to the deleted flag (or whatever mechanism
> postgresql uses to mark them as deleted).

We used to have the C defined MAKE_EXPIRED_TUPLES_VISIBLE that would
make deleted rows visible, but it seems it was removed in this commit as
part of a restructuring:

    http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/time/tqual.c.diff?r1=1.95;r2=1.96;f=h

    Sun Sep 3 15:59:39 2006 UTC (15 months, 1 week ago) by tgl
    Branches: MAIN
    Diff to: previous 1.95: preferred, colored
    Changes since revision 1.95: +100 -66 lines

    Arrange for GetSnapshotData to copy live-subtransaction XIDs from the
    PGPROC array into snapshots, and use this information to avoid visits
    to pg_subtrans in HeapTupleSatisfiesSnapshot.  This appears to solve
    the pg_subtrans-related context swap storm problem that's been reported
    by several people for 8.1.  While at it, modify GetSnapshotData to not
    take an exclusive lock on ProcArrayLock, as closer analysis shows that
    shared lock is always sufficient.
    Itagaki Takahiro and Tom Lane

Not sure if we should re-add it for later use.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Recovering data via raw table and field separators

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> We used to have the C defined MAKE_EXPIRED_TUPLES_VISIBLE that would
> make deleted rows visible, but it seems it was removed in this commit as
> part of a restructuring:

It was removed because it was utterly useless.

            regards, tom lane

Re: Recovering data via raw table and field separators

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > We used to have the C defined MAKE_EXPIRED_TUPLES_VISIBLE that would
> > make deleted rows visible, but it seems it was removed in this commit as
> > part of a restructuring:
>
> It was removed because it was utterly useless.

It worked in some cases so I don't think it was useless.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +