Thread: Dump/restore with bad data and large objects

Dump/restore with bad data and large objects

From
"John T. Dow"
Date:
By "bad data", I mean a character that's not UTF8, such as hex 98.

As far as I can tell, pg_dump is the tool to use. But it has
serious drawbacks.

If you dump in the custom format, the data is compressed (nice) and
includes large objects (very nice). But, from my tests and the postings of
others, if there is invalid data in a table, although PostgreSQL won't complain and
pg_dump won't complain, pg_restore will strenuously object, rejecting all rows for that
particular table (not nice at all).

If you dump in plain text format, you can at least inspect the dumped
data and fix it manually or with iconv. But the plain text
format doesn't support large objects (again, not nice). While byte arrays are supported, they result in very large dump
files.

Also, neither of these methods gets information such as the roles, so
that has to be captured some other way if the database has to be rebuilt
from scratch.

Is my understanding incomplete or wrong? Is there no good solution?

Why isn't there a dumpall that writes in compressed format and allows recovery from bad data?

John


Re: Dump/restore with bad data and large objects

From
Tom Lane
Date:
"John T. Dow" <john@johntdow.com> writes:
> If you dump in plain text format, you can at least inspect the dumped
> data and fix it manually or with iconv. But the plain text
> format doesn't support large objects (again, not nice).

It does in 8.1 and later ...

> Also, neither of these methods gets information such as the roles,

Use pg_dumpall.

            regards, tom lane

Re: Dump/restore with bad data and large objects

From
"John T. Dow"
Date:
Tom

My mistake in not realizing that 8.1 and later can dump large objects in the plain text format. I guess when searching
foranswers to a problem, the posted information doesn't always specify the version. So, sorry about that. 

But the plain text format still has serious problems in that the generated file is large for byte arrays and large
objects,there is no ability to selectively restore a table, and bad data still isn't detected until you try to restore. 

Or did I miss something else?

John

PS: Yes, I know you can pipe the output from pg_dumpall into an archiver, but it's my understanding that the binary
datais output in an inefficient format so even if zipped, the resulting file would be significantly larger than the
customformat. 



On Mon, 25 Aug 2008 12:14:41 -0400, Tom Lane wrote:

>"John T. Dow" <john@johntdow.com> writes:
>> If you dump in plain text format, you can at least inspect the dumped
>> data and fix it manually or with iconv. But the plain text
>> format doesn't support large objects (again, not nice).
>
>It does in 8.1 and later ...
>
>> Also, neither of these methods gets information such as the roles,
>
>Use pg_dumpall.
>
>            regards, tom lane



Re: Dump/restore with bad data and large objects

From
Joshua Drake
Date:
On Mon, 25 Aug 2008 10:21:54 -0400
"John T. Dow" <john@johntdow.com> wrote:

> By "bad data", I mean a character that's not UTF8, such as hex 98.
>
> As far as I can tell, pg_dump is the tool to use. But it has
> serious drawbacks.
>
> If you dump in the custom format, the data is compressed (nice) and
> includes large objects (very nice). But, from my tests and the
> postings of others, if there is invalid data in a table, although
> PostgreSQL won't complain and pg_dump won't complain, pg_restore will
> strenuously object, rejecting all rows for that particular table (not
> nice at all).

You can use the TOC feature of -Fc to remove restoring of that single
table. You can then convert that single table to a plain text dump and
clean the data. Then restore it separately.

If you have foregin keys and indexes on the bad data table, don't
restore the keys until *after* you have done the above.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Re: Dump/restore with bad data and large objects

From
"John T. Dow"
Date:
Joshua

The TOC feature sounds good, as does converting a single table to plain text.

But I can't find documentation for the TOC feature under pg_dump or pg_restore. I'm looking in postgresql-8.2.1-US.pdf.

Neither could I see anything about converting a single table to a plain text dump.

Also, I stumbled across the statement that you can't restore large objects for a single table. Is that true?

Another thing I couldn't find was how to dump roles using -Fc.

John



On Mon, 25 Aug 2008 10:04:13 -0700, Joshua Drake wrote:

>On Mon, 25 Aug 2008 10:21:54 -0400
>"John T. Dow" <john@johntdow.com> wrote:
>
>> By "bad data", I mean a character that's not UTF8, such as hex 98.
>>
>> As far as I can tell, pg_dump is the tool to use. But it has
>> serious drawbacks.
>>
>> If you dump in the custom format, the data is compressed (nice) and
>> includes large objects (very nice). But, from my tests and the
>> postings of others, if there is invalid data in a table, although
>> PostgreSQL won't complain and pg_dump won't complain, pg_restore will
>> strenuously object, rejecting all rows for that particular table (not
>> nice at all).
>
>You can use the TOC feature of -Fc to remove restoring of that single
>table. You can then convert that single table to a plain text dump and
>clean the data. Then restore it separately.
>
>If you have foregin keys and indexes on the bad data table, don't
>restore the keys until *after* you have done the above.
>
>Sincerely,
>
>Joshua D. Drake
>
>--
>The PostgreSQL Company since 1997: http://www.commandprompt.com/
>PostgreSQL Community Conference: http://www.postgresqlconference.org/
>United States PostgreSQL Association: http://www.postgresql.us/
>Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



Re: Dump/restore with bad data and large objects

From
Joshua Drake
Date:
On Mon, 25 Aug 2008 13:37:13 -0400
"John T. Dow" <john@johntdow.com> wrote:

> Joshua
>
> The TOC feature sounds good, as does converting a single table to
> plain text.
>
> But I can't find documentation for the TOC feature under pg_dump or
> pg_restore. I'm looking in postgresql-8.2.1-US.pdf.

The commands you are looking for are:

pg_restore -l to get the toc
pg_restore -L to use the toc

If you open the resulting file from something like pg_restore -l >
foo.toc it is just a plain text list of objects to restore.

I don't know how well it is documented but I am sure we would accept a
patch.

>
> Neither could I see anything about converting a single table to a
> plain text dump.

pg_restore allows you to do so. Something like:

pg_restore foo.sqlc --file=foo.sql

>
> Also, I stumbled across the statement that you can't restore large
> objects for a single table. Is that true?

Large objects are stored in a central table called pg_largeobject, so
yes that would be accuarate.


>
> Another thing I couldn't find was how to dump roles using -Fc.
>

You can't; that is a known and irritating limitation.

Sincerely,

Joshua D. Drake



--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Re: Dump/restore with bad data and large objects

From
"John T. Dow"
Date:
Joshua

Thank you very much for answering these various questions.

I guess the compressed format is the best overall solution, except for roles. I find myself having a table with other
informationabout users (application specific user type, etc) so perhaps the thing to do is record enough information
thereto reconstruct the roles should that become necessary. 

Can pg_dump dump roles to plain text? How does pg_dumpall do it, doesn't it do everything via pg_dump?

John


On Mon, 25 Aug 2008 10:47:11 -0700, Joshua Drake wrote:

>On Mon, 25 Aug 2008 13:37:13 -0400
>"John T. Dow" <john@johntdow.com> wrote:
>
>> Joshua
>>
>> The TOC feature sounds good, as does converting a single table to
>> plain text.
>>
>> But I can't find documentation for the TOC feature under pg_dump or
>> pg_restore. I'm looking in postgresql-8.2.1-US.pdf.
>
>The commands you are looking for are:
>
>pg_restore -l to get the toc
>pg_restore -L to use the toc
>
>If you open the resulting file from something like pg_restore -l >
>foo.toc it is just a plain text list of objects to restore.
>
>I don't know how well it is documented but I am sure we would accept a
>patch.
>
>>
>> Neither could I see anything about converting a single table to a
>> plain text dump.
>
>pg_restore allows you to do so. Something like:
>
>pg_restore foo.sqlc --file=foo.sql
>
>>
>> Also, I stumbled across the statement that you can't restore large
>> objects for a single table. Is that true?
>
>Large objects are stored in a central table called pg_largeobject, so
>yes that would be accuarate.
>
>
>>
>> Another thing I couldn't find was how to dump roles using -Fc.
>>
>
>You can't; that is a known and irritating limitation.
>
>Sincerely,
>
>Joshua D. Drake
>
>
>
>--
>The PostgreSQL Company since 1997: http://www.commandprompt.com/
>PostgreSQL Community Conference: http://www.postgresqlconference.org/
>United States PostgreSQL Association: http://www.postgresql.us/
>Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



Re: Dump/restore with bad data and large objects

From
Joshua Drake
Date:
On Mon, 25 Aug 2008 17:05:53 -0400
"John T. Dow" <john@johntdow.com> wrote:

> Joshua
>
> Thank you very much for answering these various questions.
>
> I guess the compressed format is the best overall solution, except
> for roles. I find myself having a table with other information about
> users (application specific user type, etc) so perhaps the thing to
> do is record enough information there to reconstruct the roles should
> that become necessary.
>
> Can pg_dump dump roles to plain text? How does pg_dumpall do it,

pg_dumpall -g will dump just roles via plain text.

Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Re: Dump/restore with bad data and large objects

From
"John T. Dow"
Date:
Sorry, I missed that. Thanks again.

Now to put this all into effect.

John

On Mon, 25 Aug 2008 14:25:12 -0700, Joshua Drake wrote:

>On Mon, 25 Aug 2008 17:05:53 -0400
>"John T. Dow" <john@johntdow.com> wrote:
>
>> Joshua
>>
>> Thank you very much for answering these various questions.
>>
>> I guess the compressed format is the best overall solution, except
>> for roles. I find myself having a table with other information about
>> users (application specific user type, etc) so perhaps the thing to
>> do is record enough information there to reconstruct the roles should
>> that become necessary.
>>
>> Can pg_dump dump roles to plain text? How does pg_dumpall do it,
>
>pg_dumpall -g will dump just roles via plain text.
>
>Joshua D. Drake
>--
>The PostgreSQL Company since 1997: http://www.commandprompt.com/
>PostgreSQL Community Conference: http://www.postgresqlconference.org/
>United States PostgreSQL Association: http://www.postgresql.us/
>Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>
>