Re: [PATCH] Largeobject access controls - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCH] Largeobject access controls |
Date | |
Msg-id | 3224.1255488185@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [PATCH] Largeobject access controls (KaiGai Kohei <kaigai@ak.jp.nec.com>) |
Responses |
Re: [PATCH] Largeobject access controls
|
List | pgsql-hackers |
KaiGai Kohei <kaigai@ak.jp.nec.com> writes: > The attached patch is the revised one for largeobejct access controls, > because it conflicted to the "GRANT/REVOKE ON ALL xxx IN SCHEMA". I started to look through this patch with the hope of committing it, but found out that it's not really ready. The most serious problem is that you ripped out myLargeObjectExists(), apparently because you didn't understand what it's there for. The reason it's there is to ensure that pg_dump runs don't fail. pg_dump is expected to produce a consistent dump of all large objects that existed at the time of its transaction snapshot. With this code, pg_dump would get a "large object doesn't exist" error on any LO that is deleted between the time of the snapshot and the time pg_dump actually tries to dump it --- which could be quite a large window in a large database. The reason this is a significant problem and not just an easily fixable oversight is that it's not entirely clear what to do instead. We can certainly make the pure existence test use the query snapshot instead of SnapshotNow, but what about the implied permissions tests? Should we attempt to make them run using the version of the LO's ACL found in the query-snapshot-time metadata row? The trouble with that is it might refer to roles that don't exist anymore, perhaps resulting in failures down inside the ACL checking routines. It would be safer to rely on the current metadata row contents, but then we have the question of whether to allow the access if the row doesn't exist according to SnapshotNow. Now these issues arise to some extent already in pg_dump, but the current window for failure is quite short because it obtains access share locks on all the tables it will dump at the start of the run. With large objects the window in which things could have changed is very much longer. Of course, in the cases that people are most concerned about, pg_dump is running as superuser and so the actual ACL contents don't really matter anyway, so long as we don't fail outright before getting to the check. So I'm kind of inclined to say that the least evil solution is to apply the permissions check using the query-snapshot-time metadata row. It's definitely a debatable question though. We'd also want to make sure that the aclcheck code doesn't fail if it finds a nonexistent role ID in the ACL. Moving on to lesser but still significant problems, you probably already guessed my next one: there's no pg_dump support. If the system tracks owner and ACL for large objects, pg_dump *must* be prepared to dump that information. It'd also be worthwhile to teach pg_dump that in servers >= 8.5, it can look in the metadata catalog to make the list of LO OIDs instead of having to do a SELECT DISTINCT from pg_largeobject. I notice that the patch decides to change the pg_description classoid for LO comments from pg_largeobject's OID to pg_largeobject_metadata's. This will break existing clients that look at pg_description (eg, pg_dump and psql, plus any other clients that have any intelligence about comments, for instance it probably breaks pgAdmin). And there's just not a lot of return that I can see. I agree that using pg_largeobject_metadata would be more consistent given the new catalog layout, but I'm inclined to think we should stick to the old convention on compatibility grounds. Given that choice, for consistency we'd better also use pg_largeobject's OID not pg_largeobject_metadata's in pg_shdepend entries for LOs. In the category of lesser issues that have still got to be fixed: * You need to pay more attention to updating comments. For example your changes to LargeObjectCreate render its header comment a complete lie, but you didn't change the comment. (And what is the purpose of renaming it to CreateLargeObject, and similarly for the adjacent routines? You didn't change the API meaningfully, so there is no reason to break calling code that way.) * The documentation needs work too, eg a new system catalog requires a page in catalogs.sgml, and I'll bet there's a few references to large objects and/or permissions that need to be updated. * "largeobject" is not an English word. The occurrences in user-visible messages and documentation must get changed to "large object". I've got mixed emotions even about using it in code identifiers, although we certainly aren't going to rename pg_largeobject, so anything that's named in parallel to that should stay as-is. In the same vein, "acl" is not a word we want to expose to users, so "largeobject_check_acl" is doubly bad as a GUC variable name. Perhaps "large_object_privilege_checks" would do. * I'm not really happy with the ac_largeobject_foo shim layer, and would frankly prefer to rip it out and put those tests inline. It's poorly thought out IMO --- eg, what the heck is that cascade boolean --- and doesn't look like any of the rest of the Postgres code stylistically, and it makes the calling code look different from similar tests elsewhere too. When and if SELinux support ever gets in, I'd expect that the stubs for it would get put into the aclchk.c routines not into all their callers. So this doesn't seem to me that it's going in the right direction even if we posit that that support will get in. * Lastly, that change in psql is neither version-aware nor schema-safe. psql is expected to still work with older server versions, so you need two versions of that query, not just a replacement. And don't omit the "pg_catalog." qualifier. (Also, I wonder whether it shouldn't have a way to display permissions for LOs, though maybe that ought to be conditional. Time for "\lo_list+" perhaps?) regards, tom lane
pgsql-hackers by date: