Thread: Must be owner to truncate?
Greetings, The current permissions checks for truncate seem to be excessive. It requires that you're the owner of the relation insteadof requiring that you have delete permissions on the relation. It was pointed out that truncate doesn't call triggersbut it seems like that would be something easy enough to check for. My thinking is to replace the existing ownercheckwith: Must have delete permissions on the relation If the relation has triggers: Check that the caller is the owner of the relation,if so, then issue a NOTICE that those triggers won't be called and perform the truncate. If not the owner,then error out saying there are ON DELETE triggers and that you're not the owner. I can submit a patch for this today if there's general agreement on this change. An alternative that was mentioned wasto make 'delete' smart enough to know when it's delete'ing all the rows and there aren't any triggers on it, etc, to performlike truncate, perhaps leaving the old file around until all transactions using it have finished. This sounds likea good idea but also sounds like it'd be a larger change and might have to wait till 8.2. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > The current permissions checks for truncate seem to be excessive. It > requires that you're the owner of the relation instead of requiring > that you have delete permissions on the relation. It was pointed out > that truncate doesn't call triggers but it seems like that would be > something easy enough to check for. There are other reasons for restricting it:* truncate takes a much stronger lock than a plain delete does.* truncate is notMVCC-safe. I don't really agree with the viewpoint that truncate is just a quick DELETE, and so I do not agree that DELETE permissions should be enough to let you do a TRUNCATE. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > The current permissions checks for truncate seem to be excessive. It > > requires that you're the owner of the relation instead of requiring > > that you have delete permissions on the relation. It was pointed out > > that truncate doesn't call triggers but it seems like that would be > > something easy enough to check for. > > There are other reasons for restricting it: > * truncate takes a much stronger lock than a plain delete does. What permissions are required to lock a table? With just select, insert, update and delete on a table I can LOCK TABLE it, which acquires an ACCESS EXCLUSIVE on it and will therefore hold off anyone else from using the table till the end of my transaction anyway. So I don't see this as being a reason to disallow non-owners use of truncate. > * truncate is not MVCC-safe. Erm, that's why it gets a stronger lock, so I don't really see what this has to do with it. > I don't really agree with the viewpoint that truncate is just a quick > DELETE, and so I do not agree that DELETE permissions should be enough > to let you do a TRUNCATE. Truncate is exactly a quick DELETE, in fact, DELETE could stand to learn some thing from truncate to make it suck a little less to 'delete from x;' when x is a reasonably large table. This probably wouldn't actually be all that difficult to do if there's a way to keep the old file around until all the transactions using it have completed that's not too expensive, etc. Thanks, Stephen
On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote: > >> * truncate is not MVCC-safe. > > Erm, that's why it gets a stronger lock, so I don't really see what > this has to do with it. It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots that were taken before the truncate operation but which don't have a lock on the table yet. The only reason it doesn't break pg_dump is that the first thing that pg_dump does is to take AccessShare locks on every table that it's going to dump. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > The current permissions checks for truncate seem to be excessive. It > > requires that you're the owner of the relation instead of requiring > > that you have delete permissions on the relation. It was pointed out > > that truncate doesn't call triggers but it seems like that would be > > something easy enough to check for. > > There are other reasons for restricting it: > * truncate takes a much stronger lock than a plain delete does. > * truncate is not MVCC-safe. > > I don't really agree with the viewpoint that truncate is just a quick > DELETE, and so I do not agree that DELETE permissions should be enough > to let you do a TRUNCATE. What about adding a truncate permission? I would find it useful, as it seems would others. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
* Jim C. Nasby (decibel@decibel.org) wrote: > On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote: > > I don't really agree with the viewpoint that truncate is just a quick > > DELETE, and so I do not agree that DELETE permissions should be enough > > to let you do a TRUNCATE. > > What about adding a truncate permission? I would find it useful, as it > seems would others. That would be acceptable for me as well. I'd prefer it just work off delete, but as long as I can grant truncate to someone w/o giving them ownership rights on the table I'd be happy. Thanks, Stephen
* Andrew - Supernews (andrew+nonews@supernews.com) wrote: > On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote: > > > >> * truncate is not MVCC-safe. > > > > Erm, that's why it gets a stronger lock, so I don't really see what > > this has to do with it. > > It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots > that were taken before the truncate operation but which don't have a lock > on the table yet. The only reason it doesn't break pg_dump is that the > first thing that pg_dump does is to take AccessShare locks on every table > that it's going to dump. This seems like something which should probably be fixed, but which is probably too late to fix for 8.1. Of course, if we could fix this then it seems like it would be possible for us to just change 'delete from x' to behave as truncate does now given appropriate conditions. I'm not as familiar with that area as others are; is this a very difficult thing to do? If not then I may take a look at it, it'd be a very nice improvement. Thanks, Stephen
> The current permissions checks for truncate seem to be excessive. It > requires that you're the owner of the relation instead of requiring > that you have delete permissions on the relation. It was pointed out > that truncate doesn't call triggers but it seems like that would be > something easy enough to check for. My thinking is to replace the > existing ownercheck with: > > Must have delete permissions on the relation > If the relation has triggers: > Check that the caller is the owner of the relation, if so, then > issue a NOTICE that those triggers won't be called and perform the > truncate. > If not the owner, then error out saying there are ON DELETE triggers > and that you're not the owner. I'm strongly in favour of this patch. I am currently in this situation: 1. Web db user runs as non-superuser, non-owner. 2. I have a table of a tens of thousands of rows that I must delete entirely and rebuild every day at least (pg_trgm word list) 3. It just gets slow over time, even with autovac. 4. I can't vacuum it as i'm not the owner, and I cannot truncate it either. 5. Table has no triggers or FK's whatsoever. So, stephen frost's suggestion would be fantastic. Chris
> There are other reasons for restricting it: > * truncate takes a much stronger lock than a plain delete does. > * truncate is not MVCC-safe. > > I don't really agree with the viewpoint that truncate is just a quick > DELETE, and so I do not agree that DELETE permissions should be enough > to let you do a TRUNCATE. Ah. I didn't realise that 2nd point. I don't care so much about the stronger lock in my application. Chris
Stephen Frost <sfrost@snowman.net> writes: > * Andrew - Supernews (andrew+nonews@supernews.com) wrote: >> It's not MVCC-safe even with the AccessExclusive lock; > This seems like something which should probably be fixed, You've missed the point entirely: this *cannot* be fixed, at least not without giving up the performance advantages that make TRUNCATE interesting. regards, tom lane
* Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote: > I'm strongly in favour of this patch. I am currently in this situation: > > 1. Web db user runs as non-superuser, non-owner. > 2. I have a table of a tens of thousands of rows that I must delete > entirely and rebuild every day at least (pg_trgm word list) > 3. It just gets slow over time, even with autovac. > 4. I can't vacuum it as i'm not the owner, and I cannot truncate it either. > 5. Table has no triggers or FK's whatsoever. > > So, stephen frost's suggestion would be fantastic. This is a very similar situation to what I'm in, which is why I was asking for the change. :) Thanks, Stephen
* Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote: > >There are other reasons for restricting it: > > * truncate takes a much stronger lock than a plain delete does. > > * truncate is not MVCC-safe. > > > >I don't really agree with the viewpoint that truncate is just a quick > >DELETE, and so I do not agree that DELETE permissions should be enough > >to let you do a TRUNCATE. > > Ah. I didn't realise that 2nd point. I don't care so much about the > stronger lock in my application. Does truncate not being MVCC-safe cause problems in your situation? It certainly doesn't in mine and I expect the same is true for alot of others in the same situation. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Andrew - Supernews (andrew+nonews@supernews.com) wrote: > >> It's not MVCC-safe even with the AccessExclusive lock; > > > This seems like something which should probably be fixed, > > You've missed the point entirely: this *cannot* be fixed, at least not > without giving up the performance advantages that make TRUNCATE > interesting. Alright, can we give that rather significant performance advantage to non-owners in some way then? Perhaps as an extra grant right? This is along the lines of what I was thinking, though I do see that it gets more complicated when dealing with transactions which started before the one committing the truncate (Not a problem in my case, but would have to be dealt with to be MVCC-safe): TRUNCATE is fast because it knows that it's delete'ing everything and so it just creates a new (empty) file and deletes the old file. DELETE goes through the entire file marking each record for deletion and then the system has to wait around for the vacuum'er to come through and clean up the file. New transactions using that file have to scan past all of the deleted tuples until they get vacuumed though. My thinking is along these lines: delete from x;/truncate x; --> Creates a new, empty, file and makes it the 'current' file --> Marks the old file for deletion,but it is kept around for any transactions which were started before the truncate; --> New transactions usethe empty file --> Once all transactions using the old file have completed, the old file can be deleted. --> Old transactionswhich insert rows would need to use the new file or scan the old file for rows which they added, I suppose. Thanks, Stephen
On Thu, 7 Jul 2005, Stephen Frost wrote: > * Andrew - Supernews (andrew+nonews@supernews.com) wrote: > > On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote: > > > > > >> * truncate is not MVCC-safe. > > > > > > Erm, that's why it gets a stronger lock, so I don't really see what > > > this has to do with it. > > > > It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots > > that were taken before the truncate operation but which don't have a lock > > on the table yet. The only reason it doesn't break pg_dump is that the > > first thing that pg_dump does is to take AccessShare locks on every table > > that it's going to dump. > > This seems like something which should probably be fixed, but which is > probably too late to fix for 8.1. Of course, if we could fix this then > it seems like it would be possible for us to just change 'delete from x' > to behave as truncate does now given appropriate conditions. I'm not as Doesn't the lock difference between delete and truncate mean that suddenly deletes on x may or may not block concurrent selects to x (depending on whether it's a full table delete and whether x has delete triggers)? Or are you thinking that after making it MVCC safe the lock could be lessened? With the current truncate lock, it seems bad to me for users who want to do: begin;delete from x;-- do inserts and other stuff to the now empty x while still allowing access to x. Especially if whether or not you have access depends on whether there are delete triggers on x.
> Does truncate not being MVCC-safe cause problems in your situation? It > certainly doesn't in mine and I expect the same is true for alot of > others in the same situation. Well, it is done inside a transaction, plus has concurrent use... Chris
Stephen Frost wrote: > delete from x;/truncate x; > --> Creates a new, empty, file and makes it the 'current' file > --> Marks the old file for deletion, but it is kept around for any > transactions which were started before the truncate; > --> New transactions use the empty file > --> Once all transactions using the old file have completed, the old > file can be deleted. > --> Old transactions which insert rows would need to use the new file > or scan the old file for rows which they added, I suppose. And when the transaction that issued the TRUNCATE aborts after step 3, but newer transactions commit? Mike Mascari
* Mike Mascari (mascarm@mascari.com) wrote: > Stephen Frost wrote: > > >delete from x;/truncate x; > > --> Creates a new, empty, file and makes it the 'current' file > > --> Marks the old file for deletion, but it is kept around for any > > transactions which were started before the truncate; > > --> New transactions use the empty file > > --> Once all transactions using the old file have completed, the old > > file can be deleted. > > --> Old transactions which insert rows would need to use the new file > > or scan the old file for rows which they added, I suppose. > > And when the transaction that issued the TRUNCATE aborts after step 3, > but newer transactions commit? The newer transactions would have to check for that situation. It's not completely thought through, but at the same time I don't necessairly think it's something that would be completely impossible to do and still retain most of the performance benefits, at least in the most common case. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Mike Mascari (mascarm@mascari.com) wrote: >> And when the transaction that issued the TRUNCATE aborts after step 3, >> but newer transactions commit? > The newer transactions would have to check for that situation. How would they do that? They might be long gone by the time the truncating transaction rolls back. It might be possible to do something that preserves full MVCC-ness for concurrent readers, but I don't believe there is any choice but to lock out concurrent writers until the truncate commits. If you try to allow that, there's no way to keep straight whose change goes into which file. regards, tom lane
On L, 2005-07-09 at 09:47 -0400, Mike Mascari wrote: > Stephen Frost wrote: > > > delete from x;/truncate x; > > --> Creates a new, empty, file and makes it the 'current' file > > --> Marks the old file for deletion, but it is kept around for any > > transactions which were started before the truncate; > > --> New transactions use the empty file > > --> Once all transactions using the old file have completed, the old > > file can be deleted. > > --> Old transactions which insert rows would need to use the new file > > or scan the old file for rows which they added, I suppose. > > And when the transaction that issued the TRUNCATE aborts after step 3, > but newer transactions commit? should be the same as when newer transactions had used a file after a DELETE ; had been issued. Could the new file not be made to cover the next available 1GB of file space, that is a new physical file ? This could made using of same kind of machinery my proposal for concurrent index does (i.e. locks that forbid putting new tuples in certain tuple ranges) Then, if the truncating transaction commits, the N first pgysical 1GB files are removed, and just the remaining ones are used. if it aborts, the first files stay, and we just have some tuples placed sparcely starting at the next 1GB boundary. -- Hannu Krosing <hannu@skype.net>
On Sat, Jul 09, 2005 at 11:43:52PM +0300, Hannu Krosing wrote: > Could the new file not be made to cover the next available 1GB of file > space, that is a new physical file ? > > This could made using of same kind of machinery my proposal for > concurrent index does (i.e. locks that forbid putting new tuples in > certain tuple ranges) I think your proposals are too "handwavy", but there is a similar mechanism outlined for on-line index reorganizarion, whereby new tuples can be inserted concurrently with the reorganization, being stored on a "spill area". See @inproceedings{DBLP:conf/sigmod/ZouS96, author = {C. Zou and B. Salzberg}, editor = {H. V. Jagadish and InderpalSingh Mumick}, title = {On-line Reorganization of Sparsely-populated B+trees}, booktitle = {Proceedings ofthe 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996}, publisher = {ACM Press}, year = {1996}, pages = {115-124}, bibsource = {DBLP, \url{http://dblp.uni-trier.de}} } -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Para tener más hay que desear menos"
On L, 2005-07-09 at 16:50 -0400, Alvaro Herrera wrote: > On Sat, Jul 09, 2005 at 11:43:52PM +0300, Hannu Krosing wrote: > > > Could the new file not be made to cover the next available 1GB of file > > space, that is a new physical file ? > > > > This could made using of same kind of machinery my proposal for > > concurrent index does (i.e. locks that forbid putting new tuples in > > certain tuple ranges) > > I think your proposals are too "handwavy", but there is a similar > mechanism outlined for on-line index reorganizarion, whereby new tuples > can be inserted concurrently with the reorganization, being stored on a > "spill area". See I try to state my reworked idea of concurrent indexing in a more clear way: The index build in done 2 transactions, need one new type of lock and a new system column in pg_class to tell planner not to use an incomplete index. This similar to vacuum in thet ot needs its own transactions and is not rollbackable. Perhaps the decision to use either this or current INDEX should be based on weather (RE)INDEX command is run in its own transaction. 1st transaction: ---------------- The index for the part of datafile that exists at the start of INDEX command, is created within the 1st transacton, in similar way we do now. "the part" is definded as all tuples with ctid below (<=) the max(ctid) stored at the start as MAX_CTID_1. To be sure that we cover all the tuples in range <= MAX_CTID_1, and no new tuples are stored there as the result of INSERT or UPDATE, we need a new type of lock (lets call it "TupleStoreRangeLock"), which prevents new tuples to be placed below MAX_CTID_1 and which is aquired before starting the initial build. After the initial build of index for tuples below MAX_CTID_1 is finished, it is made visible to the rest of the system by committing the transaction, but marking the index as "incomplete" (probably a new column in pg_class is needed for that), so that it will not be used by planner, but all new inerts/updates will see and use it. 2nd transaction --------------- After that we need to wait for all other running transactions to complete, so we can be sure that all other backends know about the new index. Once we are sure they do, we record the new max(ctid) as MAX_CTID_2. At this point we can release the TupleStoreRangeLock, to make it possible to place new tuples below MAX_CTID_1 As the final step we need to scan all tuples in range ( MAX_CTID_1 to MAX_CTID_2 ) and insert their corresponding index entries into the new index. If the entry already exists for exact same ctid, that is ok. After reaching MAX_CTID_2, the index is ready for use by planner as well and can be marked as "complete" in pg_class. In case of REINDEX, the new index can be made to replace the old one at this point. TODO: work out with which locks TupleStoreRangeLock conflicts and with which it can coexists. -- Hannu Krosing <hannu@skype.net>
Hannu Krosing <hannu@skype.net> writes: > I try to state my reworked idea of concurrent indexing in a more clear > way: > The index build in done 2 transactions, need one new type of lock and a > new system column in pg_class to tell planner not to use an incomplete > index. This similar to vacuum in thet ot needs its own transactions and > is not rollbackable. "Not rollbackable" is certainly "not acceptable"... what if there's a crash partway through? VACUUM FULL is designed so that there isn't anything special that needs to be done to clean up after it if it fails partway through, but this seems to be untrue of your proposal. You'd have committed-created but useless indexes that have to be got rid of somehow. > To be sure that we cover all the tuples in range <= MAX_CTID_1, and no > new tuples are stored there as the result of INSERT or UPDATE, we need a > new type of lock (lets call it "TupleStoreRangeLock"), which prevents > new tuples to be placed below MAX_CTID_1 and which is aquired before > starting the initial build. Checking for such a lock will put a nontrivial distributed cost on every insert and update, whether the facility is in use or not. > After the initial build of index for tuples below MAX_CTID_1 is > finished, it is made visible to the rest of the system by committing the > transaction, but marking the index as "incomplete" (probably a new > column in pg_class is needed for that), so that it will not be used by > planner, but all new inerts/updates will see and use it. I can see how this might work for a new index build, but it doesn't work for REINDEX --- you get to have other transactions inserting into either the old or the new index, not both. You could possibly make it work by creating a complete new index with its own OID, and then swapping that out for the old index at completion --- but see below. > After that we need to wait for all other running transactions to > complete, so we can be sure that all other backends know about the new > index. That could be a pretty long time ... and presumably the index build is pretty long, too, or we'd not be bothering with all this mechanism. All the while, tuples are getting inserted into highly nonoptimal pages far from the start of the table. Doesn't this idea conflict rather badly with your desire expressed nearby to force tuples to be inserted near the front of the table? > As the final step we need to scan all tuples in range ( MAX_CTID_1 to > MAX_CTID_2 ) and insert their corresponding index entries into the new > index. If the entry already exists for exact same ctid, that is ok. I don't think that's as easy as it sounds; at the very least it requires mechanism comparable to the unique-index checking code, which we don't have for any index type except btree. Also, in an index that's actually highly non-unique, that mechanism is *expensive* --- you may have to scan many pages of identically-keyed entries to see if any of them match the target ctid ... all the while holding a lock that prevents anyone else from inserting on the same starting page. What's more, the pseudo uniqueness check has to be done on both sides --- else index build might decide the entry's not there, insert it, only to have the original tuple inserter come along right after and insert again. So this is a second major operational mode that has to affect everybody in the system, not only index build. I'm not sure whether there are race conditions associated with getting in and out of this mode, but it wouldn't surprise me. > After reaching MAX_CTID_2, the index is ready for use by planner as well > and can be marked as "complete" in pg_class. In case of REINDEX, the new > index can be made to replace the old one at this point. AFAICS, the "replace" bit requires exclusive lock to make sure that no one is in the midst of using the old index. This means that you have a situation where you need to upgrade your table lock at the very end of the operation --- which means the whole thing is prone to failing at the very end because of deadlock. regards, tom lane
On Sat, Jul 09, 2005 at 11:48:35AM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Mike Mascari (mascarm@mascari.com) wrote: > >> And when the transaction that issued the TRUNCATE aborts after step 3, > >> but newer transactions commit? > > > The newer transactions would have to check for that situation. > > How would they do that? They might be long gone by the time the > truncating transaction rolls back. > > It might be possible to do something that preserves full MVCC-ness for > concurrent readers, but I don't believe there is any choice but to lock > out concurrent writers until the truncate commits. If you try to allow > that, there's no way to keep straight whose change goes into which file. What about if inserts go into both files until the truncate commits? Updates are a bit trickier; AFAICS they would have to block until the truncate commits because they would have different effects depending on if the commit happenend or not. But I don't see this as an issue. Likewise, inserts that would violate a unique constraint would have to wait until the truncate committed, but inserts that are unique could commit. Of course something else to consider is that just being able to support concurrent reads against truncate would be a step forward, even if writes were still blocked. That alone means this is probably worth-while, although it might require a slightly different syntax since it would behave differently than the existing truncate does. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On T, 2005-07-12 at 12:20 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > I try to state my reworked idea of concurrent indexing in a more clear > > way: > > > The index build in done 2 transactions, need one new type of lock and a > > new system column in pg_class to tell planner not to use an incomplete > > index. This similar to vacuum in thet ot needs its own transactions and > > is not rollbackable. > > "Not rollbackable" is certainly "not acceptable"... what if there's a > crash partway through? VACUUM FULL is designed so that there isn't > anything special that needs to be done to clean up after it if it fails > partway through, but this seems to be untrue of your proposal. You'd > have committed-created but useless indexes that have to be got rid of > somehow. I meant non-rollbackable in the sense that it can't be rolled back by user command. If a there is an index marked "incomplete" after recovery, it is simply dropped, as it was logically never there anyway. If it fails in the first stage, it is the same to what happens with current implementation. > > To be sure that we cover all the tuples in range <= MAX_CTID_1, and no > > new tuples are stored there as the result of INSERT or UPDATE, we need a > > new type of lock (lets call it "TupleStoreRangeLock"), which prevents > > new tuples to be placed below MAX_CTID_1 and which is aquired before > > starting the initial build. > > Checking for such a lock will put a nontrivial distributed cost on every > insert and update, whether the facility is in use or not. Do we not take at least some kind of lock at each insert/update now ? I guess we do, how else am I prevented from changing a table during create index now ? We are currently most likely doing the locking _after_ deciding where to put the tuple, so taking that lock should inform us about existance of TupleStoreRangeLock on that relation if there is one and force a reallocation of tuple space. This way there is very little extra cost for the case with no TupleStoreRangeLock. My aim is not making concurrent indexing as fast as current indexing with exclusive lock, just making it bearable on 24/7 system. And I'd like to run it in the same kind of situation as my VACUUM patch, so that it won't show up in VACUUMs GetOldestXmin calculations. In this case it could run for hours/days/weeks ;) if that is needed for doing it without halting the system. > > After the initial build of index for tuples below MAX_CTID_1 is > > finished, it is made visible to the rest of the system by committing the > > transaction, but marking the index as "incomplete" (probably a new > > column in pg_class is needed for that), so that it will not be used by > > planner, but all new inerts/updates will see and use it. > > I can see how this might work for a new index build, but it doesn't work > for REINDEX --- you get to have other transactions inserting into either > the old or the new index, not both. You could possibly make it work by > creating a complete new index with its own OID, and then swapping that > out for the old index at completion --- but see below. That's what I meant - create another index, then drop old one and rename the new to old. Index names are not _that_ important. And I see REINDEX really as a shorthand for BEGIN;CREATE INDEX newndx; DROP INDEX oldndx;ALTER TABLE newndx RENAME TO oldndx; COMMIT; without the need to look up the index definition. Or is there anything more REINDEX provides us ? And even the BEGIN/COMMIT dont give us any added benefit - any sane query plan won't use 2 similar indexes in the same query anyway even if they see them. for only a very brief moment will there be slight extra burden for planner in considering 2 similar indexes. > > After that we need to wait for all other running transactions to > > complete, so we can be sure that all other backends know about the new > > index. > > That could be a pretty long time ... and presumably the index build is > pretty long, too, or we'd not be bothering with all this mechanism. <sarcasm>Of course it would take less time if all other critical, "need to complete below 5 sec", transactions were waiting for the create index to complete.</sarcasm> I mainly see a use case for concurrent indexing in 24/7 OLTP systems, which can't allow even scheduled downtime. There are generally no long transactions, except VACUUM or COPY and these can be scheduled to run at different time from create index. > All the while, tuples are getting inserted into highly nonoptimal pages > far from the start of the table. Doesn't this idea conflict rather > badly with your desire expressed nearby to force tuples to be inserted > near the front of the table? They are for two different use-cases, both needed in maintaining 24/7 systems: 1. concurrent index is needed for a *big* table for either replacing a *bloated* *index* or creating a new index 2. placing new tuples near beginning is for getting down the size of a *bloated* *table* (say 1-2% live tuples). the first one is currently solvable only with slony (not log-shipping) type of replications and switchovers. the second one can be done using some extra non-functional updates and Both are needed to fight preformance degradations (usually sharp ones which happen when a significant ration of index pages does not fit in RAM any more or when say 5000 often updated records are spread over 50000 pages thanks to inability to vacuum them often enough during running another vacuum on big table or doing a backup using pg_dump. The sparce table problem is not too bad by itself (index access is still fast), but both vacuums and reindexes are 1) slow as they use seqscans internally and 2) they pollute RAM with empty pages that could be used for better ends. > > As the final step we need to scan all tuples in range ( MAX_CTID_1 to > > MAX_CTID_2 ) and insert their corresponding index entries into the new > > index. If the entry already exists for exact same ctid, that is ok. > > I don't think that's as easy as it sounds; at the very least it requires > mechanism comparable to the unique-index checking code, which we don't > have for any index type except btree. I currently dont use anything besides btree, so I would be happy with that. > Also, in an index that's actually > highly non-unique, that mechanism is *expensive* --- you may have to > scan many pages of identically-keyed entries to see if any of them match > the target ctid ... all the while holding a lock that prevents anyone > else from inserting on the same starting page. We could consider ctid part of the index, and place ctids in-order in btrees, (and probably consider ctid and additional dimension in gist/rtrees). That would likely give some of the similar benefits to bitmap scanning in indexscan locality. Now that I think of it, highly non-unique indexes should have been built like that all along and more unique indexes need not care, so this applies to *all* (btree) indexes. > What's more, the pseudo uniqueness check has to be done on both sides > --- else index build might decide the entry's not there, insert it, only > to have the original tuple inserter come along right after and insert > again. Yes, this surely need to be avoided. Waiting for other transactions to finish and then locing the area between (MAX_CTID_1 and MAX_CTID_2) should make sure that this never happens. > So this is a second major operational mode that has to affect > everybody in the system, not only index build. I'm not sure whether > there are race conditions associated with getting in and out of this > mode, but it wouldn't surprise me. Me neither. > > After reaching MAX_CTID_2, the index is ready for use by planner as well > > and can be marked as "complete" in pg_class. In case of REINDEX, the new > > index can be made to replace the old one at this point. > > AFAICS, the "replace" bit requires exclusive lock to make sure that no > one is in the midst of using the old index. This means that you have a > situation where you need to upgrade your table lock at the very end of > the operation --- which means the whole thing is prone to failing at the > very end because of deadlock. This is definitely not something I would wan't. Why not make the whole REINDEX just a "macro" doing create/drop/rename. Or if this is hard, then let the needy DBAs do all three manually. The main thing is the ability to add indexes to big tables in 24/7 OLTP system. -- Hannu Krosing <hannu@skype.net> -- Hannu Krosing <hannu@tm.ee>
Stephen Frost wrote: -- Start of PGP signed section. > * Jim C. Nasby (decibel@decibel.org) wrote: > > On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote: > > > I don't really agree with the viewpoint that truncate is just a quick > > > DELETE, and so I do not agree that DELETE permissions should be enough > > > to let you do a TRUNCATE. > > > > What about adding a truncate permission? I would find it useful, as it > > seems would others. > > That would be acceptable for me as well. I'd prefer it just work off > delete, but as long as I can grant truncate to someone w/o giving them > ownership rights on the table I'd be happy. Added to TODO: * Add TRUNCATE permission Currently only the owner can TRUNCATE a table because triggers are not called, and the tableis locked in exclusive mode. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian schrob: > Stephen Frost wrote: > -- Start of PGP signed section. >> * Jim C. Nasby (decibel@decibel.org) wrote: >> > On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote: >> > > I don't really agree with the viewpoint that truncate is just a quick >> > > DELETE, and so I do not agree that DELETE permissions should be enough >> > > to let you do a TRUNCATE. >> > >> > What about adding a truncate permission? I would find it useful, as it >> > seems would others. >> >> That would be acceptable for me as well. I'd prefer it just work off >> delete, but as long as I can grant truncate to someone w/o giving them >> ownership rights on the table I'd be happy. > > Added to TODO: > > * Add TRUNCATE permission > > Currently only the owner can TRUNCATE a table because triggers are not > called, and the table is locked in exclusive mode. Is anyone working on this yet? I looked at the code involved, and it seems there are just a couple of lines needed, some regression test and documentation updates, and most importantly, tab-completion updates. However, a question arose quickly: According to the standard, revoking INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the relation read-only, but with the TRUNCATE privilege lying around, this would no longer be true for PostgreSQL. Would this open a security hole or is it okay as far as extensions to the standard go? regards, Andreas --
On Wed, 24 Aug 2005 07:01:00 +0200, Andreas Seltenreich <andreas+pg@gate450.dyndns.org> wrote: >However, a question arose quickly: According to the standard, revoking >INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the >relation read-only, but with the TRUNCATE privilege lying around, this >would no longer be true for PostgreSQL. I'd say that the TRUNCATE privilege includes DELETE, so that REVOKE DELETE implicitly revokes TRUNCATE and GRANT TRUNCATE implicitly grants DELETE. ServusManfred
* Andreas Seltenreich (andreas+pg@gate450.dyndns.org) wrote: > Bruce Momjian schrob: > > Added to TODO: > > > > * Add TRUNCATE permission > > > > Currently only the owner can TRUNCATE a table because triggers are not > > called, and the table is locked in exclusive mode. > > Is anyone working on this yet? I looked at the code involved, and it > seems there are just a couple of lines needed, some regression test > and documentation updates, and most importantly, tab-completion > updates. I'm not working on it, though I agree that it really shouldn't be very difficult to add. I'd certainly like to see it done. While you're in there I'd really like to see analyze and vacuum as grantable permissions too... Of course, eliminating the need for them would be even better... :) > However, a question arose quickly: According to the standard, revoking > INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the > relation read-only, but with the TRUNCATE privilege lying around, this > would no longer be true for PostgreSQL. Would this open a security > hole or is it okay as far as extensions to the standard go? Hrm, I'm not really sure about this one. I could see linking TRUNCATE with DELETE (ie: you must have both DELETE and TRUNCATE permissions on a table to TRUNCATE it, ala SELECT/UPDATE), or perhaps excluding TRUNCATE from GRANT ALL PRIVILEGES. Thanks, Stephen
* Manfred Koizar (mkoi-pg@aon.at) wrote: > On Wed, 24 Aug 2005 07:01:00 +0200, Andreas Seltenreich > <andreas+pg@gate450.dyndns.org> wrote: > >However, a question arose quickly: According to the standard, revoking > >INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the > >relation read-only, but with the TRUNCATE privilege lying around, this > >would no longer be true for PostgreSQL. > > I'd say that the TRUNCATE privilege includes DELETE, so that REVOKE > DELETE implicitly revokes TRUNCATE and GRANT TRUNCATE implicitly > grants DELETE. I disagree with implicitly granting/revokeing. Rather, if we're going to go this route, we should require both DELETE and TRUNCATE rights on the object in order to TRUNCATE it but otherwise have TRUNCATE privileges and DELETE privileges be distinct from each other in terms of being granted/revoked. This follows the SELECT/UPDATE relationship. Granting UPDATE doesn't implicitly grant SELECT, and revoking SELECT doesn't implicitly revoke UPDATE; but in order to actually UPDATE you need SELECT rights. Thanks, Stephen