Re: Largeobject Access Controls (r2460) - Mailing list pgsql-hackers

From KaiGai Kohei
Subject Re: Largeobject Access Controls (r2460)
Date
Msg-id 4B5AA7A5.5020206@kaigai.gr.jp
Whole thread Raw
In response to Re: Largeobject Access Controls (r2460)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Largeobject Access Controls (r2460)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Largeobject Access Controls (r2460)  (KaiGai Kohei <kaigai@ak.jp.nec.com>)
List pgsql-hackers
(2010/01/23 5:12), Tom Lane wrote:
> KaiGai Kohei<kaigai@ak.jp.nec.com>  writes:
>> The attached patch is a revised version.
> 
> I'm inclined to wonder whether this patch doesn't prove that we've
> reached the end of the line for the current representation of blobs
> in pg_dump archives.  The alternative that I'm thinking about is to
> treat each blob as an independent object (hence, with its own TOC
> entry).  If we did that, then the standard pg_dump mechanisms for
> ownership, ACLs, and comments would apply, and we could get rid of
> the messy hacks that this patch is just adding to.  That would also
> open the door to future improvements like being able to selectively
> restore blobs.  (Actually you could do it immediately if you didn't
> mind editing a -l file...)  And it would for instance allow loading
> of blobs to be parallelized.

I also think it is better approach than the current blob representation.

> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs.  However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either.  If your large objects are actually
> large, there's not going to be an enormous number of them.  We've
> heard of people with many tens of thousands of tables, and pg_dump
> speed didn't seem to be a huge bottleneck for them (at least not
> in recent versions).  So I'm feeling we should not dismiss the
> idea of one TOC entry per blob.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.
If we have one TOC entry per blob, the amount of total i/o size between
server and pg_dump is not different from the current approach.

If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of
RAM for 7 million BLOBs.
In the recent computers, is it unacceptable pain?
If you try to dump TB class database, I'd like to assume the machine
where pg_dump runs has adequate storage and RAM.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: restructuring "alter table" privilege checks (was: remove redundant ownership checks)
Next
From: Hitoshi Harada
Date:
Subject: Re: review: More frame options in window functions