Thread: Must be owner to truncate?

Must be owner to truncate?

From
Stephen Frost
Date:
Greetings,
 The current permissions checks for truncate seem to be excessive.  It requires that you're the owner of the relation
insteadof requiring that you have delete permissions on the relation.  It was pointed out that truncate doesn't call
triggersbut it seems like that would be something easy enough to check for.  My thinking is to replace the existing
ownercheckwith:
 
 Must have delete permissions on the relation If the relation has triggers:   Check that the caller is the owner of the
relation,if so, then   issue a NOTICE that those triggers won't be called and perform the   truncate.   If not the
owner,then error out saying there are ON DELETE triggers   and that you're not the owner.
 
 I can submit a patch for this today if there's general agreement on this change.  An alternative that was mentioned
wasto make 'delete' smart enough to know when it's delete'ing all the rows and there aren't any triggers on it, etc, to
performlike truncate, perhaps leaving the old file around until all transactions using it have finished.  This sounds
likea good idea but also sounds like it'd be a larger change and might have to wait till 8.2.
 
     Thanks,
    Stephen

Re: Must be owner to truncate?

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
>   The current permissions checks for truncate seem to be excessive.  It
>   requires that you're the owner of the relation instead of requiring
>   that you have delete permissions on the relation.  It was pointed out
>   that truncate doesn't call triggers but it seems like that would be
>   something easy enough to check for.

There are other reasons for restricting it:* truncate takes a much stronger lock than a plain delete does.* truncate is
notMVCC-safe.
 

I don't really agree with the viewpoint that truncate is just a quick
DELETE, and so I do not agree that DELETE permissions should be enough
to let you do a TRUNCATE.
        regards, tom lane


Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> >   The current permissions checks for truncate seem to be excessive.  It
> >   requires that you're the owner of the relation instead of requiring
> >   that you have delete permissions on the relation.  It was pointed out
> >   that truncate doesn't call triggers but it seems like that would be
> >   something easy enough to check for.
>
> There are other reasons for restricting it:
>  * truncate takes a much stronger lock than a plain delete does.

What permissions are required to lock a table?  With just select,
insert, update and delete on a table I can LOCK TABLE it, which acquires
an ACCESS EXCLUSIVE on it and will therefore hold off anyone else from
using the table till the end of my transaction anyway.  So I don't see
this as being a reason to disallow non-owners use of truncate.

>  * truncate is not MVCC-safe.

Erm, that's why it gets a stronger lock, so I don't really see what
this has to do with it.

> I don't really agree with the viewpoint that truncate is just a quick
> DELETE, and so I do not agree that DELETE permissions should be enough
> to let you do a TRUNCATE.

Truncate is exactly a quick DELETE, in fact, DELETE could stand to learn
some thing from truncate to make it suck a little less to
'delete from x;' when x is a reasonably large table.  This probably
wouldn't actually be all that difficult to do if there's a way to keep
the old file around until all the transactions using it have completed
that's not too expensive, etc.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Andrew - Supernews
Date:
On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote:
>
>>  * truncate is not MVCC-safe.
>
> Erm, that's why it gets a stronger lock, so I don't really see what
> this has to do with it.

It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots
that were taken before the truncate operation but which don't have a lock
on the table yet. The only reason it doesn't break pg_dump is that the
first thing that pg_dump does is to take AccessShare locks on every table
that it's going to dump.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Must be owner to truncate?

From
"Jim C. Nasby"
Date:
On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> >   The current permissions checks for truncate seem to be excessive.  It
> >   requires that you're the owner of the relation instead of requiring
> >   that you have delete permissions on the relation.  It was pointed out
> >   that truncate doesn't call triggers but it seems like that would be
> >   something easy enough to check for.
> 
> There are other reasons for restricting it:
>  * truncate takes a much stronger lock than a plain delete does.
>  * truncate is not MVCC-safe.
> 
> I don't really agree with the viewpoint that truncate is just a quick
> DELETE, and so I do not agree that DELETE permissions should be enough
> to let you do a TRUNCATE.

What about adding a truncate permission? I would find it useful, as it
seems would others.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Jim C. Nasby (decibel@decibel.org) wrote:
> On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
> > I don't really agree with the viewpoint that truncate is just a quick
> > DELETE, and so I do not agree that DELETE permissions should be enough
> > to let you do a TRUNCATE.
>
> What about adding a truncate permission? I would find it useful, as it
> seems would others.

That would be acceptable for me as well.  I'd prefer it just work off
delete, but as long as I can grant truncate to someone w/o giving them
ownership rights on the table I'd be happy.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Andrew - Supernews (andrew+nonews@supernews.com) wrote:
> On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote:
> >
> >>  * truncate is not MVCC-safe.
> >
> > Erm, that's why it gets a stronger lock, so I don't really see what
> > this has to do with it.
>
> It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots
> that were taken before the truncate operation but which don't have a lock
> on the table yet. The only reason it doesn't break pg_dump is that the
> first thing that pg_dump does is to take AccessShare locks on every table
> that it's going to dump.

This seems like something which should probably be fixed, but which is
probably too late to fix for 8.1.  Of course, if we could fix this then
it seems like it would be possible for us to just change 'delete from x'
to behave as truncate does now given appropriate conditions.  I'm not as
familiar with that area as others are; is this a very difficult thing to
do?  If not then I may take a look at it, it'd be a very nice
improvement.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Christopher Kings-Lynne
Date:
>   The current permissions checks for truncate seem to be excessive.  It
>   requires that you're the owner of the relation instead of requiring
>   that you have delete permissions on the relation.  It was pointed out
>   that truncate doesn't call triggers but it seems like that would be
>   something easy enough to check for.  My thinking is to replace the
>   existing ownercheck with:
> 
>   Must have delete permissions on the relation
>   If the relation has triggers:
>     Check that the caller is the owner of the relation, if so, then
>     issue a NOTICE that those triggers won't be called and perform the
>     truncate.
>     If not the owner, then error out saying there are ON DELETE triggers
>     and that you're not the owner.

I'm strongly in favour of this patch.  I am currently in this situation:

1. Web db user runs as non-superuser, non-owner.
2. I have a table of a tens of thousands of rows that I must delete 
entirely and rebuild every day at least (pg_trgm word list)
3. It just gets slow over time, even with autovac.
4. I can't vacuum it as i'm not the owner, and I cannot truncate it either.
5. Table has no triggers or FK's whatsoever.

So, stephen frost's suggestion would be fantastic.

Chris



Re: Must be owner to truncate?

From
Christopher Kings-Lynne
Date:
> There are other reasons for restricting it:
>  * truncate takes a much stronger lock than a plain delete does.
>  * truncate is not MVCC-safe.
> 
> I don't really agree with the viewpoint that truncate is just a quick
> DELETE, and so I do not agree that DELETE permissions should be enough
> to let you do a TRUNCATE.

Ah.  I didn't realise that 2nd point.  I don't care so much about the 
stronger lock in my application.

Chris



Re: Must be owner to truncate?

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Andrew - Supernews (andrew+nonews@supernews.com) wrote:
>> It's not MVCC-safe even with the AccessExclusive lock;

> This seems like something which should probably be fixed,

You've missed the point entirely: this *cannot* be fixed, at least not
without giving up the performance advantages that make TRUNCATE
interesting.
        regards, tom lane


Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote:
> I'm strongly in favour of this patch.  I am currently in this situation:
>
> 1. Web db user runs as non-superuser, non-owner.
> 2. I have a table of a tens of thousands of rows that I must delete
> entirely and rebuild every day at least (pg_trgm word list)
> 3. It just gets slow over time, even with autovac.
> 4. I can't vacuum it as i'm not the owner, and I cannot truncate it either.
> 5. Table has no triggers or FK's whatsoever.
>
> So, stephen frost's suggestion would be fantastic.

This is a very similar situation to what I'm in, which is why I was
asking for the change. :)
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Christopher Kings-Lynne (chriskl@familyhealth.com.au) wrote:
> >There are other reasons for restricting it:
> > * truncate takes a much stronger lock than a plain delete does.
> > * truncate is not MVCC-safe.
> >
> >I don't really agree with the viewpoint that truncate is just a quick
> >DELETE, and so I do not agree that DELETE permissions should be enough
> >to let you do a TRUNCATE.
>
> Ah.  I didn't realise that 2nd point.  I don't care so much about the
> stronger lock in my application.

Does truncate not being MVCC-safe cause problems in your situation?  It
certainly doesn't in mine and I expect the same is true for alot of
others in the same situation.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Andrew - Supernews (andrew+nonews@supernews.com) wrote:
> >> It's not MVCC-safe even with the AccessExclusive lock;
>
> > This seems like something which should probably be fixed,
>
> You've missed the point entirely: this *cannot* be fixed, at least not
> without giving up the performance advantages that make TRUNCATE
> interesting.

Alright, can we give that rather significant performance advantage to
non-owners in some way then?  Perhaps as an extra grant right?

This is along the lines of what I was thinking, though I do see that it
gets more complicated when dealing with transactions which started
before the one committing the truncate (Not a problem in my case, but
would have to be dealt with to be MVCC-safe):

TRUNCATE is fast because it knows that it's delete'ing everything and
so it just creates a new (empty) file and deletes the old file.  DELETE
goes through the entire file marking each record for deletion and then
the system has to wait around for the vacuum'er to come through and
clean up the file.  New transactions using that file have to scan past
all of the deleted tuples until they get vacuumed though.  My thinking
is along these lines:

delete from x;/truncate x; --> Creates a new, empty, file and makes it the 'current' file --> Marks the old file for
deletion,but it is kept around for any     transactions which were started before the truncate; --> New transactions
usethe empty file --> Once all transactions using the old file have completed, the old     file can be deleted. --> Old
transactionswhich insert rows would need to use the new file     or scan the old file for rows which they added, I
suppose.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Stephan Szabo
Date:
On Thu, 7 Jul 2005, Stephen Frost wrote:

> * Andrew - Supernews (andrew+nonews@supernews.com) wrote:
> > On 2005-07-07, Stephen Frost <sfrost@snowman.net> wrote:
> > >
> > >>  * truncate is not MVCC-safe.
> > >
> > > Erm, that's why it gets a stronger lock, so I don't really see what
> > > this has to do with it.
> >
> > It's not MVCC-safe even with the AccessExclusive lock; it damages snapshots
> > that were taken before the truncate operation but which don't have a lock
> > on the table yet. The only reason it doesn't break pg_dump is that the
> > first thing that pg_dump does is to take AccessShare locks on every table
> > that it's going to dump.
>
> This seems like something which should probably be fixed, but which is
> probably too late to fix for 8.1.  Of course, if we could fix this then
> it seems like it would be possible for us to just change 'delete from x'
> to behave as truncate does now given appropriate conditions.  I'm not as

Doesn't the lock difference between delete and truncate mean that suddenly
deletes on x may or may not block concurrent selects to x (depending on
whether it's a full table delete and whether x has delete triggers)? Or
are you thinking that after making it MVCC safe the lock could be
lessened?

With the current truncate lock, it seems bad to me for users who want to
do:
begin;delete from x;-- do inserts and other stuff to the now empty x

while still allowing access to x. Especially if whether or not you have
access depends on whether there are delete triggers on x.


Re: Must be owner to truncate?

From
Christopher Kings-Lynne
Date:
> Does truncate not being MVCC-safe cause problems in your situation?  It
> certainly doesn't in mine and I expect the same is true for alot of
> others in the same situation.

Well, it is done inside a transaction, plus has concurrent use...

Chris


Re: Must be owner to truncate?

From
Mike Mascari
Date:
Stephen Frost wrote:

> delete from x;/truncate x;
>   --> Creates a new, empty, file and makes it the 'current' file
>   --> Marks the old file for deletion, but it is kept around for any
>       transactions which were started before the truncate;
>   --> New transactions use the empty file
>   --> Once all transactions using the old file have completed, the old
>       file can be deleted.
>   --> Old transactions which insert rows would need to use the new file
>       or scan the old file for rows which they added, I suppose.

And when the transaction that issued the TRUNCATE aborts after step 3, 
but newer transactions commit?

Mike Mascari


Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Mike Mascari (mascarm@mascari.com) wrote:
> Stephen Frost wrote:
>
> >delete from x;/truncate x;
> >  --> Creates a new, empty, file and makes it the 'current' file
> >  --> Marks the old file for deletion, but it is kept around for any
> >      transactions which were started before the truncate;
> >  --> New transactions use the empty file
> >  --> Once all transactions using the old file have completed, the old
> >      file can be deleted.
> >  --> Old transactions which insert rows would need to use the new file
> >      or scan the old file for rows which they added, I suppose.
>
> And when the transaction that issued the TRUNCATE aborts after step 3,
> but newer transactions commit?

The newer transactions would have to check for that situation.  It's not
completely thought through, but at the same time I don't necessairly
think it's something that would be completely impossible to do and still
retain most of the performance benefits, at least in the most common
case.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Mike Mascari (mascarm@mascari.com) wrote:
>> And when the transaction that issued the TRUNCATE aborts after step 3,
>> but newer transactions commit?

> The newer transactions would have to check for that situation.

How would they do that?  They might be long gone by the time the
truncating transaction rolls back.

It might be possible to do something that preserves full MVCC-ness for
concurrent readers, but I don't believe there is any choice but to lock
out concurrent writers until the truncate commits.  If you try to allow
that, there's no way to keep straight whose change goes into which file.
        regards, tom lane


Re: Must be owner to truncate?

From
Hannu Krosing
Date:
On L, 2005-07-09 at 09:47 -0400, Mike Mascari wrote:
> Stephen Frost wrote:
> 
> > delete from x;/truncate x;
> >   --> Creates a new, empty, file and makes it the 'current' file
> >   --> Marks the old file for deletion, but it is kept around for any
> >       transactions which were started before the truncate;
> >   --> New transactions use the empty file
> >   --> Once all transactions using the old file have completed, the old
> >       file can be deleted.
> >   --> Old transactions which insert rows would need to use the new file
> >       or scan the old file for rows which they added, I suppose.
> 
> And when the transaction that issued the TRUNCATE aborts after step 3, 
> but newer transactions commit?

should be the same as when newer transactions had used a file after a
DELETE ; had been issued.

Could the new file not be made to cover the next available 1GB of file
space, that is a new physical file ?

This could made using of same kind of machinery my proposal for
concurrent index does (i.e. locks that forbid putting new tuples in
certain tuple ranges)

Then, if the truncating transaction commits, the N first pgysical 1GB
files are removed, and just the remaining ones are used. if it aborts,
the first files stay, and we just have some tuples placed sparcely
starting at the next 1GB boundary.

-- 
Hannu Krosing <hannu@skype.net>



Re: Must be owner to truncate?

From
Alvaro Herrera
Date:
On Sat, Jul 09, 2005 at 11:43:52PM +0300, Hannu Krosing wrote:

> Could the new file not be made to cover the next available 1GB of file
> space, that is a new physical file ?
> 
> This could made using of same kind of machinery my proposal for
> concurrent index does (i.e. locks that forbid putting new tuples in
> certain tuple ranges)

I think your proposals are too "handwavy", but there is a similar
mechanism outlined for on-line index reorganizarion, whereby new tuples
can be inserted concurrently with the reorganization, being stored on a
"spill area".  See

@inproceedings{DBLP:conf/sigmod/ZouS96,   author    = {C. Zou and B. Salzberg},   editor    = {H. V. Jagadish and
InderpalSingh Mumick},   title     = {On-line Reorganization of Sparsely-populated B+trees},   booktitle = {Proceedings
ofthe 1996 ACM SIGMOD International Conference on       Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},   year      = {1996},   pages     = {115-124},   bibsource = {DBLP,
\url{http://dblp.uni-trier.de}}
}

-- 
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Para tener más hay que desear menos"


CONCURRENT INDEXing again (was: Must be owner to truncate?)

From
Hannu Krosing
Date:
On L, 2005-07-09 at 16:50 -0400, Alvaro Herrera wrote:
> On Sat, Jul 09, 2005 at 11:43:52PM +0300, Hannu Krosing wrote:
> 
> > Could the new file not be made to cover the next available 1GB of file
> > space, that is a new physical file ?
> > 
> > This could made using of same kind of machinery my proposal for
> > concurrent index does (i.e. locks that forbid putting new tuples in
> > certain tuple ranges)
> 
> I think your proposals are too "handwavy", but there is a similar
> mechanism outlined for on-line index reorganizarion, whereby new tuples
> can be inserted concurrently with the reorganization, being stored on a
> "spill area".  See

I try to state my reworked idea of concurrent indexing in a more clear
way:

The index build in done 2 transactions, need one new type of lock and a
new system column in pg_class to tell planner not to use an incomplete
index. This similar to vacuum in thet ot needs its own transactions and
is not rollbackable. Perhaps the decision to use either this or current
INDEX should be based on weather (RE)INDEX command is run in its own
transaction.

1st transaction:
----------------

The index for the part of datafile that exists at the start of INDEX
command, is created within the 1st transacton, in similar way we do now.
"the part" is definded as all tuples with ctid below (<=) the max(ctid)
stored at the start as MAX_CTID_1.

To be sure that we cover all the tuples in range <= MAX_CTID_1, and no
new tuples are stored there as the result of INSERT or UPDATE, we need a
new type of lock (lets call it "TupleStoreRangeLock"), which prevents
new tuples to be placed below MAX_CTID_1 and which is aquired before
starting the initial build.

After the initial build of index for tuples below MAX_CTID_1 is
finished, it is made visible to the rest of the system by committing the
transaction, but marking the index as "incomplete" (probably a new
column in pg_class is needed for that), so that it will not be used by
planner, but all new inerts/updates will see and use it.

2nd transaction
---------------

After that we need to wait for all other running transactions to
complete, so we can be sure that all other backends know about the new
index.

Once we are sure they do, we record the new max(ctid) as MAX_CTID_2. At
this point we can release the TupleStoreRangeLock, to make it possible
to place new tuples below MAX_CTID_1

As the final step we need to scan all tuples in range ( MAX_CTID_1 to
MAX_CTID_2 ) and insert their corresponding index entries into the new
index. If the entry already exists for exact same ctid, that is ok.

After reaching MAX_CTID_2, the index is ready for use by planner as well
and can be marked as "complete" in pg_class. In case of REINDEX, the new
index can be made to replace the old one at this point.



TODO: work out with which locks TupleStoreRangeLock conflicts and with
which it can coexists.

-- 
Hannu Krosing <hannu@skype.net>



Re: CONCURRENT INDEXing again (was: Must be owner to truncate?)

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
> I try to state my reworked idea of concurrent indexing in a more clear
> way:

> The index build in done 2 transactions, need one new type of lock and a
> new system column in pg_class to tell planner not to use an incomplete
> index. This similar to vacuum in thet ot needs its own transactions and
> is not rollbackable.

"Not rollbackable" is certainly "not acceptable"... what if there's a
crash partway through?  VACUUM FULL is designed so that there isn't
anything special that needs to be done to clean up after it if it fails
partway through, but this seems to be untrue of your proposal.  You'd
have committed-created but useless indexes that have to be got rid of
somehow.

> To be sure that we cover all the tuples in range <= MAX_CTID_1, and no
> new tuples are stored there as the result of INSERT or UPDATE, we need a
> new type of lock (lets call it "TupleStoreRangeLock"), which prevents
> new tuples to be placed below MAX_CTID_1 and which is aquired before
> starting the initial build.

Checking for such a lock will put a nontrivial distributed cost on every
insert and update, whether the facility is in use or not.

> After the initial build of index for tuples below MAX_CTID_1 is
> finished, it is made visible to the rest of the system by committing the
> transaction, but marking the index as "incomplete" (probably a new
> column in pg_class is needed for that), so that it will not be used by
> planner, but all new inerts/updates will see and use it.

I can see how this might work for a new index build, but it doesn't work
for REINDEX --- you get to have other transactions inserting into either
the old or the new index, not both.  You could possibly make it work by
creating a complete new index with its own OID, and then swapping that
out for the old index at completion --- but see below.

> After that we need to wait for all other running transactions to
> complete, so we can be sure that all other backends know about the new
> index.

That could be a pretty long time ... and presumably the index build is
pretty long, too, or we'd not be bothering with all this mechanism.
All the while, tuples are getting inserted into highly nonoptimal pages
far from the start of the table.  Doesn't this idea conflict rather
badly with your desire expressed nearby to force tuples to be inserted
near the front of the table?

> As the final step we need to scan all tuples in range ( MAX_CTID_1 to
> MAX_CTID_2 ) and insert their corresponding index entries into the new
> index. If the entry already exists for exact same ctid, that is ok.

I don't think that's as easy as it sounds; at the very least it requires
mechanism comparable to the unique-index checking code, which we don't
have for any index type except btree.  Also, in an index that's actually
highly non-unique, that mechanism is *expensive* --- you may have to
scan many pages of identically-keyed entries to see if any of them match
the target ctid ... all the while holding a lock that prevents anyone
else from inserting on the same starting page.

What's more, the pseudo uniqueness check has to be done on both sides
--- else index build might decide the entry's not there, insert it, only
to have the original tuple inserter come along right after and insert
again.  So this is a second major operational mode that has to affect
everybody in the system, not only index build.  I'm not sure whether
there are race conditions associated with getting in and out of this
mode, but it wouldn't surprise me.

> After reaching MAX_CTID_2, the index is ready for use by planner as well
> and can be marked as "complete" in pg_class. In case of REINDEX, the new
> index can be made to replace the old one at this point.

AFAICS, the "replace" bit requires exclusive lock to make sure that no
one is in the midst of using the old index.  This means that you have a
situation where you need to upgrade your table lock at the very end of
the operation --- which means the whole thing is prone to failing at the
very end because of deadlock.
        regards, tom lane


Re: Must be owner to truncate?

From
"Jim C. Nasby"
Date:
On Sat, Jul 09, 2005 at 11:48:35AM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Mike Mascari (mascarm@mascari.com) wrote:
> >> And when the transaction that issued the TRUNCATE aborts after step 3,
> >> but newer transactions commit?
> 
> > The newer transactions would have to check for that situation.
> 
> How would they do that?  They might be long gone by the time the
> truncating transaction rolls back.
> 
> It might be possible to do something that preserves full MVCC-ness for
> concurrent readers, but I don't believe there is any choice but to lock
> out concurrent writers until the truncate commits.  If you try to allow
> that, there's no way to keep straight whose change goes into which file.

What about if inserts go into both files until the truncate commits?

Updates are a bit trickier; AFAICS they would have to block until the
truncate commits because they would have different effects depending on
if the commit happenend or not. But I don't see this as an issue.
Likewise, inserts that would violate a unique constraint would have to
wait until the truncate committed, but inserts that are unique could
commit.

Of course something else to consider is that just being able to support
concurrent reads against truncate would be a step forward, even if
writes were still blocked. That alone means this is probably
worth-while, although it might require a slightly different syntax since
it would behave differently than the existing truncate does.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: CONCURRENT INDEXing again (was: Must be owner to

From
Hannu Krosing
Date:
On T, 2005-07-12 at 12:20 -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > I try to state my reworked idea of concurrent indexing in a more clear
> > way:
> 
> > The index build in done 2 transactions, need one new type of lock and a
> > new system column in pg_class to tell planner not to use an incomplete
> > index. This similar to vacuum in thet ot needs its own transactions and
> > is not rollbackable.
> 
> "Not rollbackable" is certainly "not acceptable"... what if there's a
> crash partway through?  VACUUM FULL is designed so that there isn't
> anything special that needs to be done to clean up after it if it fails
> partway through, but this seems to be untrue of your proposal.  You'd
> have committed-created but useless indexes that have to be got rid of
> somehow.

I meant non-rollbackable in the sense that it can't be rolled back by user command.

If a there is an index marked "incomplete" after recovery, it is simply
dropped, as it was logically never there anyway.

If it fails in the first stage, it is the same to what happens with
current implementation.


> > To be sure that we cover all the tuples in range <= MAX_CTID_1, and no
> > new tuples are stored there as the result of INSERT or UPDATE, we need a
> > new type of lock (lets call it "TupleStoreRangeLock"), which prevents
> > new tuples to be placed below MAX_CTID_1 and which is aquired before
> > starting the initial build.
> 
> Checking for such a lock will put a nontrivial distributed cost on every
> insert and update, whether the facility is in use or not.

Do we not take at least some kind of lock at each insert/update now ?

I guess we do, how else am I prevented from changing a table during
create index now ?

We are currently most likely doing the locking _after_ deciding where to
put the tuple, so taking that lock should inform us about existance of
TupleStoreRangeLock on that relation if there is one and force a
reallocation of tuple space. This way there is very little extra cost
for the case with no TupleStoreRangeLock.

My aim is not making concurrent indexing as fast as current indexing
with exclusive lock, just making it bearable on 24/7 system.

And I'd like to run it in the same kind of situation as my VACUUM patch,
so that it won't show up in VACUUMs GetOldestXmin calculations. In this
case it could run for hours/days/weeks ;) if that is needed for doing it
without halting the system.

> > After the initial build of index for tuples below MAX_CTID_1 is
> > finished, it is made visible to the rest of the system by committing the
> > transaction, but marking the index as "incomplete" (probably a new
> > column in pg_class is needed for that), so that it will not be used by
> > planner, but all new inerts/updates will see and use it.
> 
> I can see how this might work for a new index build, but it doesn't work
> for REINDEX --- you get to have other transactions inserting into either
> the old or the new index, not both.  You could possibly make it work by
> creating a complete new index with its own OID, and then swapping that
> out for the old index at completion --- but see below.

That's what I meant - create another index, then drop old one and rename 
the new to old. Index names are not _that_ important. And I see REINDEX 
really as a shorthand for 
BEGIN;CREATE INDEX newndx; DROP INDEX oldndx;ALTER TABLE newndx RENAME TO oldndx; 
COMMIT;
without the need to look up the index definition.
Or is there anything more REINDEX provides us ?
And even the BEGIN/COMMIT dont give us any added benefit - any sane
query plan won't use 2 similar indexes in the same query anyway even if
they see them. for only a very brief moment will there be slight extra
burden for planner in considering 2 similar indexes.

> > After that we need to wait for all other running transactions to
> > complete, so we can be sure that all other backends know about the new
> > index.
> 
> That could be a pretty long time ... and presumably the index build is
> pretty long, too, or we'd not be bothering with all this mechanism.

<sarcasm>Of course it would take less time if all other critical, "need
to complete below 5 sec",  transactions were waiting for the create
index to complete.</sarcasm>

I mainly see a use case for concurrent indexing in 24/7 OLTP systems,
which can't allow even scheduled downtime. There are generally no long
transactions, except VACUUM or COPY and these can be scheduled to run at
different time from create index.

> All the while, tuples are getting inserted into highly nonoptimal pages
> far from the start of the table.  Doesn't this idea conflict rather
> badly with your desire expressed nearby to force tuples to be inserted
> near the front of the table?


They are for two different use-cases, both needed in maintaining 24/7
systems:

1. concurrent index is needed for a *big* table for either replacing a
*bloated* *index* or creating a new index

2. placing new tuples near beginning is for getting down the size of a
*bloated* *table* (say 1-2% live tuples).

the first one is currently solvable only with slony (not log-shipping)
type of replications and switchovers. 

the second one can be done using some extra non-functional updates and 

Both are needed to fight preformance degradations (usually sharp ones
which happen when a significant ration of index pages does not fit in
RAM any more or when say 5000 often updated records are spread over
50000 pages thanks to inability to vacuum them often enough during
running another vacuum on big table or doing a backup using pg_dump. 

The sparce table problem is not too bad by itself (index access is still
fast), but both vacuums and reindexes are 1) slow as they use seqscans
internally and 2) they pollute RAM with empty pages that could be used
for better ends.


> > As the final step we need to scan all tuples in range ( MAX_CTID_1 to
> > MAX_CTID_2 ) and insert their corresponding index entries into the new
> > index. If the entry already exists for exact same ctid, that is ok.
> 
> I don't think that's as easy as it sounds; at the very least it requires
> mechanism comparable to the unique-index checking code, which we don't
> have for any index type except btree.  

I currently dont use anything besides btree, so I would be happy with
that.


> Also, in an index that's actually
> highly non-unique, that mechanism is *expensive* --- you may have to
> scan many pages of identically-keyed entries to see if any of them match
> the target ctid ... all the while holding a lock that prevents anyone
> else from inserting on the same starting page.

We could consider ctid part of the index, and place ctids in-order in
btrees, (and probably consider ctid and additional dimension in
gist/rtrees). That would likely give some of the similar benefits to
bitmap scanning in indexscan locality. Now that I think of it, highly
non-unique indexes should have been built like that all along and more
unique indexes need not care, so this applies to *all* (btree) indexes.

> What's more, the pseudo uniqueness check has to be done on both sides
> --- else index build might decide the entry's not there, insert it, only
> to have the original tuple inserter come along right after and insert
> again.  

Yes, this surely need to be avoided.

Waiting for other transactions to finish and then locing the area
between (MAX_CTID_1 and MAX_CTID_2) should make sure that this never
happens.

> So this is a second major operational mode that has to affect
> everybody in the system, not only index build.  I'm not sure whether
> there are race conditions associated with getting in and out of this
> mode, but it wouldn't surprise me.

Me neither.

> > After reaching MAX_CTID_2, the index is ready for use by planner as well
> > and can be marked as "complete" in pg_class. In case of REINDEX, the new
> > index can be made to replace the old one at this point.
> 
> AFAICS, the "replace" bit requires exclusive lock to make sure that no
> one is in the midst of using the old index.  This means that you have a
> situation where you need to upgrade your table lock at the very end of
> the operation --- which means the whole thing is prone to failing at the
> very end because of deadlock.

This is definitely not something I would wan't.

Why not make the whole REINDEX just a "macro" doing create/drop/rename.
Or if this is hard, then let the needy DBAs do all three manually.

The main thing is the ability to add indexes to big tables in 24/7 OLTP
system.

-- 
Hannu Krosing <hannu@skype.net>
-- 
Hannu Krosing <hannu@tm.ee>


Re: Must be owner to truncate?

From
Bruce Momjian
Date:
Stephen Frost wrote:
-- Start of PGP signed section.
> * Jim C. Nasby (decibel@decibel.org) wrote:
> > On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
> > > I don't really agree with the viewpoint that truncate is just a quick
> > > DELETE, and so I do not agree that DELETE permissions should be enough
> > > to let you do a TRUNCATE.
> > 
> > What about adding a truncate permission? I would find it useful, as it
> > seems would others.
> 
> That would be acceptable for me as well.  I'd prefer it just work off
> delete, but as long as I can grant truncate to someone w/o giving them
> ownership rights on the table I'd be happy.

Added to TODO:
* Add TRUNCATE permission  Currently only the owner can TRUNCATE a table because triggers are not  called, and the
tableis locked in exclusive mode.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Must be owner to truncate?

From
Andreas Seltenreich
Date:
Bruce Momjian schrob:

> Stephen Frost wrote:
> -- Start of PGP signed section.
>> * Jim C. Nasby (decibel@decibel.org) wrote:
>> > On Thu, Jul 07, 2005 at 01:48:59PM -0400, Tom Lane wrote:
>> > > I don't really agree with the viewpoint that truncate is just a quick
>> > > DELETE, and so I do not agree that DELETE permissions should be enough
>> > > to let you do a TRUNCATE.
>> > 
>> > What about adding a truncate permission? I would find it useful, as it
>> > seems would others.
>> 
>> That would be acceptable for me as well.  I'd prefer it just work off
>> delete, but as long as I can grant truncate to someone w/o giving them
>> ownership rights on the table I'd be happy.
>
> Added to TODO:
>
>     * Add TRUNCATE permission
>     
>       Currently only the owner can TRUNCATE a table because triggers are not
>       called, and the table is locked in exclusive mode.

Is anyone working on this yet? I looked at the code involved, and it
seems there are just a couple of lines needed, some regression test
and documentation updates, and most importantly, tab-completion
updates.

However, a question arose quickly: According to the standard, revoking
INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the
relation read-only, but with the TRUNCATE privilege lying around, this
would no longer be true for PostgreSQL. Would this open a security
hole or is it okay as far as extensions to the standard go?

regards,
Andreas
-- 


Re: Must be owner to truncate?

From
Manfred Koizar
Date:
On Wed, 24 Aug 2005 07:01:00 +0200, Andreas Seltenreich
<andreas+pg@gate450.dyndns.org> wrote:
>However, a question arose quickly: According to the standard, revoking
>INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the
>relation read-only, but with the TRUNCATE privilege lying around, this
>would no longer be true for PostgreSQL.

I'd say that the TRUNCATE privilege includes DELETE, so that REVOKE
DELETE implicitly revokes TRUNCATE and GRANT TRUNCATE implicitly
grants DELETE.

ServusManfred



Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Andreas Seltenreich (andreas+pg@gate450.dyndns.org) wrote:
> Bruce Momjian schrob:
> > Added to TODO:
> >
> >     * Add TRUNCATE permission
> >
> >       Currently only the owner can TRUNCATE a table because triggers are not
> >       called, and the table is locked in exclusive mode.
>
> Is anyone working on this yet? I looked at the code involved, and it
> seems there are just a couple of lines needed, some regression test
> and documentation updates, and most importantly, tab-completion
> updates.

I'm not working on it, though I agree that it really shouldn't be very
difficult to add.  I'd certainly like to see it done.  While you're in
there I'd really like to see analyze and vacuum as grantable permissions
too...

Of course, eliminating the need for them would be even better... :)

> However, a question arose quickly: According to the standard, revoking
> INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the
> relation read-only, but with the TRUNCATE privilege lying around, this
> would no longer be true for PostgreSQL. Would this open a security
> hole or is it okay as far as extensions to the standard go?

Hrm, I'm not really sure about this one.  I could see linking TRUNCATE
with DELETE (ie: you must have both DELETE and TRUNCATE permissions on a
table to TRUNCATE it, ala SELECT/UPDATE), or perhaps excluding TRUNCATE
from GRANT ALL PRIVILEGES.
Thanks,
    Stephen

Re: Must be owner to truncate?

From
Stephen Frost
Date:
* Manfred Koizar (mkoi-pg@aon.at) wrote:
> On Wed, 24 Aug 2005 07:01:00 +0200, Andreas Seltenreich
> <andreas+pg@gate450.dyndns.org> wrote:
> >However, a question arose quickly: According to the standard, revoking
> >INSERT, UPDATE and DELETE after GRANT ALL PRIVILEGES would leave the
> >relation read-only, but with the TRUNCATE privilege lying around, this
> >would no longer be true for PostgreSQL.
>
> I'd say that the TRUNCATE privilege includes DELETE, so that REVOKE
> DELETE implicitly revokes TRUNCATE and GRANT TRUNCATE implicitly
> grants DELETE.

I disagree with implicitly granting/revokeing.  Rather, if we're going
to go this route, we should require both DELETE and TRUNCATE rights on
the object in order to TRUNCATE it but otherwise have TRUNCATE
privileges and DELETE privileges be distinct from each other in terms of
being granted/revoked.

This follows the SELECT/UPDATE relationship.  Granting UPDATE doesn't
implicitly grant SELECT, and revoking SELECT doesn't implicitly revoke
UPDATE; but in order to actually UPDATE you need SELECT rights.
Thanks,
    Stephen