Thread: [PATCH] Prevent pg_dump running out of memory

[PATCH] Prevent pg_dump running out of memory

From
Martijn van Oosterhout
Date:
Just a patch to clean up a bug in pg_dump whose sole purpose is to confuse
users. Why should -d crash pg_dump just because you have a big table? I
couldn't find this listed anywhere, not even on the TODO list. So if some
change to the library fixed this, I apologise.

This patch replaces the simple SELECT * with a cursor that fetches 1,000 rows
at a time. The 1,000 was chosen because it was small enough to test but I
think realisitically 10,000 wouldn't be too much.

Also, it seems there is no regression test for pg_dump. Is this intentional
or has noone come up with a good way to test it?

http://svana.org/kleptog/pgsql/pgsql-pg_dump.patch (also attached)

Please CC any replies.

P.S. For those people waiting for the timing patch, I'm just dealing with a
little issue involving getting a flag from ExplainOneQuery to ExecInitNode.
I think I may have an answer but it needs testing.
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Attachment

Re: [PATCH] Prevent pg_dump running out of memory

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> [ use a cursor for pg_dump -d ]

Excellent idea.  Thanks!

> Also, it seems there is no regression test for pg_dump. Is this intentional
> or has noone come up with a good way to test it?

The latter.  We certainly need one...

            regards, tom lane

Re: [PATCH] Prevent pg_dump running out of memory

From
Martijn van Oosterhout
Date:
On Mon, Aug 27, 2001 at 10:42:34AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > [ use a cursor for pg_dump -d ]
>
> Excellent idea.  Thanks!

No problems.

> > Also, it seems there is no regression test for pg_dump. Is this intentional
> > or has noone come up with a good way to test it?
>
> The latter.  We certainly need one...

The only thing I can think of right now is for other parts of the regression
tests to leave various tables, triggers, etc lying around. Then somewhere
near the end, do a pg_dump regress | psql newdb (for various options of
pg_dump) and then somehow compare the two databases.

The comparison would be the tricky part because you should avoid using
pg_dump since that is what you are testing...

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.

Re: [PATCH] Prevent pg_dump running out of memory

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> This patch replaces the simple SELECT * with a cursor that fetches 1,000 rows
> at a time. The 1,000 was chosen because it was small enough to test but I
> think realisitically 10,000 wouldn't be too much.

Applied.  However, I went the other way: I reduced the 1000 to 100.
I doubt it'll make much difference in performance, and with TOAST
available I can believe people might have multi-megabyte rows.

I also added a cursor CLOSE command, just to be a neatnik.  It's
probably not critical given that the same cursor name is used each
time, but still...

            regards, tom lane

Re: [PATCH] Prevent pg_dump running out of memory

From
Martijn van Oosterhout
Date:
On Mon, Aug 27, 2001 at 04:36:50PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > This patch replaces the simple SELECT * with a cursor that fetches 1,000 rows
> > at a time. The 1,000 was chosen because it was small enough to test but I
> > think realisitically 10,000 wouldn't be too much.
>
> Applied.  However, I went the other way: I reduced the 1000 to 100.
> I doubt it'll make much difference in performance, and with TOAST
> available I can believe people might have multi-megabyte rows.

Good point. Just as long as you're not getting one row at a time it's a win.

> I also added a cursor CLOSE command, just to be a neatnik.  It's
> probably not critical given that the same cursor name is used each
> time, but still...

Doh! I even thought of that while writing it but I couldn't remember the
command. I tend to let cursors die at the end of the transaction.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> It would be nice if someone came up with a certification system that
> actually separated those who can barely regurgitate what they crammed over
> the last few weeks from those who command secret ninja networking powers.