Re: [PATCH] Largeobject access controls - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [PATCH] Largeobject access controls
Date
Msg-id 603c8f070910141743o7c856c3bqa0eb80f226d7a304@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Largeobject access controls  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Largeobject access controls  (KaiGai Kohei <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On Wed, Oct 14, 2009 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> KaiGai Kohei <kaigai@ak.jp.nec.com> writes:
>> Tom Lane wrote:
>>> The most serious problem is that you ripped out myLargeObjectExists(),
>>> apparently because you didn't understand what it's there for.  The reason
>>> it's there is to ensure that pg_dump runs don't fail.  pg_dump is expected
>>> to produce a consistent dump of all large objects that existed at the
>>> time of its transaction snapshot.  With this code, pg_dump would get a
>>> "large object doesn't exist" error on any LO that is deleted between the
>>> time of the snapshot and the time pg_dump actually tries to dump it ---
>>> which could be quite a large window in a large database.
>
>> The origin of this matter is the basis of existence was changed.
>> Our current basis is whether pg_largeobject has one or more data chunk for
>> the given loid in the correct snapshot, or not.
>> The newer one is whether pg_largeobject_metadata has the entry for the given
>> loid in the SnapshotNow, or not.
>
> Which is wrong.  You can certainly switch your attention as to which
> catalog to look in, but you can't change the snapshot that the test is
> referenced to.
>
>> The newer basis is same as other database objects, such as functions.
>> But why pg_dump performs correctly for these database objects?
>
> The reason pg_dump works reasonably well is that it takes locks on
> tables, and the other objects it dumps (such as functions) have
> indivisible one-row representations in the catalogs.  They're either
> there when pg_dump looks, or they're not.  What you would have here
> is that pg_dump would see LO data chunks when it looks into
> pg_largeobject using its transaction snapshot, and then its attempts to
> open those LO OIDs would fail because the metadata is not there anymore
> according to SnapshotNow.
>
> There are certainly plenty of corner-case issues in pg_dump; I've
> complained before that it's generally a bad idea to be migrating pg_dump
> functionality into internal backend routines that tend to rely on
> SnapshotNow.  But if we change LO dumping like this, it's not going to
> be a corner case --- it's going to be a common race-condition failure
> with a window measured in many minutes if not hours.  That's more than
> sufficient reason to reject this patch, because the added functionality
> isn't worth it.  And pg_dump isn't even the only client that depends
> on the assumption that a read-only open LO is static.
>
>>> Moving on to lesser but still significant problems, you probably already
>>> guessed my next one: there's no pg_dump support.
>
>> Hmm. I planed to add support to the pg_dump next to the serve-side enhancement.
>
> We do not commit system changes that lack necessary pg_dump support.
> There are some things you can leave till later, but pg_dump is not
> negotiable.  (Otherwise, testers would be left with databases they
> can't dump/reload across the next forced initdb.)

As part of closing out this CommitFest, I am marking this patch as
Returned with Feedback.  Because the amount of work that remains to be
done is so substantial, that might have been a good idea anyway, but
since the clock is expiring the point is moot.

...Robert


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Inappropriate failure conditions in foreign_data regression test
Next
From: Robert Haas
Date:
Subject: Re: Buffer usage in EXPLAIN and pg_stat_statements (review)