Thread: allow LIMIT in UPDATE and DELETE
Hi all, Currently the LIMIT clause is not allowed in UPDATE or DELETE statements. I wonder how easy it would be to allow it, and what people think about it ? For our application it would help a lot when processing things chunk-wise to avoid long running queries. The fact that the actual rows processed would be unpredictable does not make it less useful for us. We actually don't care which rows are processed, we process them all anyway, we just want to make sure it is a limited number at a time. A lot of our processes do take large amounts of time (hours up to days), and we cannot allow that to be in one transaction, the system does on-line processing too... I guess the low-level infrastructure is already there (from what I understood from earlier postings, but I may be wrong), and the question is more if this feature is worth to be included or not... and the syntax must be adjusted of course if yes. All comments welcome... Thanks, Csaba.
On 5/19/06, Csaba Nagy <nagy@ecircle-ag.com> wrote: > Hi all, > > Currently the LIMIT clause is not allowed in UPDATE or DELETE > statements. I wonder how easy it would be to allow it, and what people > think about it ? For our application it would help a lot when processing > things chunk-wise to avoid long running queries. I asked that question a while ago.. http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php and got this response: http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php Works quite well :) -- Postgresql & php tutorials http://www.designmagick.com/
On 5/19/06, Csaba Nagy <nagy@ecircle-ag.com> wrote: > Hi all, > > Currently the LIMIT clause is not allowed in UPDATE or DELETE > statements. I wonder how easy it would be to allow it, and what people > think about it ? For our application it would help a lot when processing > things chunk-wise to avoid long running queries. > > The fact that the actual rows processed would be unpredictable does not > make it less useful for us. We actually don't care which rows are > processed, we process them all anyway, we just want to make sure it is a > limited number at a time. A lot of our processes do take large amounts > of time (hours up to days), and we cannot allow that to be in one > transaction, the system does on-line processing too... > > I guess the low-level infrastructure is already there (from what I > understood from earlier postings, but I may be wrong), and the question > is more if this feature is worth to be included or not... and the syntax > must be adjusted of course if yes. -- sample data CREATE TEMP TABLE tab (id serial primary key, n int, t text); INSERT INTO tab(n) SELECT * FROM generate_series(1,1000); -- say, you want such an update: UPDATE tab SET t = 'aqq' WHERE n > 10; -- but with limit: UPDATE tab SET t = 'aqq' WHERE id in (SELECT id FROM tab WHERE n > 10 LIMIT 100); -- or this way (join): UPDATE tab SET t = 'aqq' FROM (SELECT id FROM tab WHERE n > 10 LIMIT 100) AS tab_ids WHERE tab.id = tab_ids.id; ...this of course assumes that you have a primary key you can use to "target" the update. Then again, there are places where there is no primary key, like: CREATE TABLE foo (t text); INSERT INTO foo VALUES('aaa'); INSERT INTO foo VALUES('aaa'); ...and you want to update first 'aaa' to 'bbb'. But you can handle it this way: CREATE TEMP SEQUENCE aaa_temp_seq; UPDATE foo SET t='bbb' WHERE t='aaa' AND nextval('aaa_temp_seq') <= 1; -- LIMIT 1 ...this of course will suck for big queries (all matching rows will be searched, but not updated); Reagrds, Dawid
> I asked that question a while ago.. > > http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php > > and got this response: > > http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php > > Works quite well :) I don't doubt that it works well, in fact that's what I plan to use myself... I was not asking for a workaround, that I can figure out myself :-) Regarding the arguments of the post you linked: "It isn't in the SQL standard" - neither LIMIT on SELECT, still it is a very useful feature; "it would have undefined behavior" - so does LIMIT on SELECT without order by, and it is again still useful when you don't care about which entry you get, you just want one of them. It certainly provides a planning advantage in some cases where ordering would mean a sort; The subquery delete will definitely have more involved plan than a delete with limit. On some of my tables that would make a difference, even if not that big one due to caching effects (the same rows are deleted/updated which were already visited by the subquery). I can't say for sure how big is the penalty of doing the subqery, but the plans I have seen involve something like: db=# prepare test_001(bigint, bigint, smallint) as db-# DELETE FROM big_table db-# WHERE (col1, col2) IN db-# (SELECT col1, col2 FROM big_table db(# WHERE col1=$2 db(# AND col3 IS NOT NULL db(# AND col4 =$3 db(# AND col5 <> 'o' db(# LIMIT 1000); PREPARE db=# explain execute test_001(1,1,1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=2166.10..2282.99 rows=1 width=6) -> HashAggregate (cost=2166.10..2166.10 rows=29 width=16) -> Subquery Scan "IN_subquery" (cost=0.00..2165.95 rows=29 width=16) -> Limit (cost=0.00..2165.66 rows=29 width=16) -> Index Scan using idx_big_table_col3 on big_table (cost=0.00..2165.66 rows=29 width=16) Index Cond: (col1 = $2) Filter: ((col3 IS NOT NULL) AND (col4 = $3) AND ("col5" <> 'o'::bpchar)) -> Index Scan using pk_big_table on big_table (cost=0.00..4.02 rows=1 width=22) Index Cond: ((big_table.col1 = "outer".col1) AND (big_table.col2 = "outer".col2)) (9 rows) idx_big_table_col3 - is a partial index where col3 is not null, which means ~ 10% of the total rows. The estimates are a generic estimate, the worst case is that there are a few 100K rows selected by the subselect before the limit. So I guess the worst case can have as much as double cost than a plan for DELETE with LIMIT would have. With the LIMIT allowed on DELETE it would be something like: -> Limit (cost=0.00..2165.66 rows=29 width=16) -> Index Scan using idx_big_table_col3 on big_table (cost=0.00..2165.66 rows=29 width=16) Index Cond: (col1 = $2) Filter: ((col3 IS NOT NULL) AND (col4 = $3) AND ("col5" <> 'o'::bpchar)) That would spare a HashAggregate and an index scan. The index scan would very likely not be a problem, as the same index entries are visited in the subquery and likely are cached, and the HashAggregate should be also fast for the max 1000 rows it has to handle, but they are still completely unnecessary for my purpose, so I still think the LIMIT on DELETE and UPDATE would make perfect sense. Cheers, Csaba.
> -- sample data > CREATE TEMP TABLE tab (id serial primary key, n int, t text); > INSERT INTO tab(n) SELECT * FROM generate_series(1,1000); > -- say, you want such an update: > UPDATE tab SET t = 'aqq' WHERE n > 10; > -- but with limit: > UPDATE tab SET t = 'aqq' WHERE id in (SELECT id FROM tab WHERE n > 10 > LIMIT 100); > -- or this way (join): > UPDATE tab SET t = 'aqq' FROM (SELECT id FROM tab WHERE n > 10 LIMIT > 100) AS tab_ids WHERE tab.id = tab_ids.id; > > ...this of course assumes that you have a primary key you can use > to "target" the update. > > Then again, there are places where there is no primary key, like: > > CREATE TABLE foo (t text); > INSERT INTO foo VALUES('aaa'); > INSERT INTO foo VALUES('aaa'); > ...and you want to update first 'aaa' to 'bbb'. > > But you can handle it this way: > > CREATE TEMP SEQUENCE aaa_temp_seq; > UPDATE foo SET t='bbb' WHERE t='aaa' AND nextval('aaa_temp_seq') <= 1; > -- LIMIT 1 > ...this of course will suck for big queries (all matching rows will be > searched, but not updated); > > Reagrds, > Dawid Like I said in a previous post, I can figure out the workarounds, but it would be nice not to need it, not to mention the LIMIT would work faster. I specifically seek for opinions about the LIMIT on DELETE/UPDATE feature, not workarounds. Cheers, Csaba.
> That would spare a HashAggregate and an index scan. The index scan would > very likely not be a problem, as the same index entries are visited in > the subquery and likely are cached, and the HashAggregate should be also > fast for the max 1000 rows it has to handle, but they are still > completely unnecessary for my purpose, so I still think the LIMIT on > DELETE and UPDATE would make perfect sense. Oh, it just occured to me: the subquery is using a different index than the outer loop, so we can forget about caching. Considering that the outer loop uses an index 10x bigger than the subquery, and that means ~50million entries, and the typical situation where this would be used mostly selects 1000 rows indeed, the LIMIT on DELETE for this case would mean for sure ~ 50% speedup. Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > Currently the LIMIT clause is not allowed in UPDATE or DELETE > statements. I wonder how easy it would be to allow it, and what people > think about it ? This has been proposed before, and rejected before, and the arguments against are just as strong as they were before. See the archives. regards, tom lane
On Fri, 2006-05-19 at 15:51, Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > Currently the LIMIT clause is not allowed in UPDATE or DELETE > > statements. I wonder how easy it would be to allow it, and what people > > think about it ? > > This has been proposed before, and rejected before, and the arguments > against are just as strong as they were before. See the archives. Tom, I guess you refer to the following: http://archives.postgresql.org/pgsql-patches/2002-09/msg00256.php Well, let me fight your arguments. The first one here is not the first in your mail, but the most important I guess: "Just because MySQL is willing to implement nonstandard bad ideas doesn't mean we are. In any case the idea that this might provide some amount of compatibility is illusory: the odds are good that we'd delete or update a different tuple than they do, because of implementation differences. An application that actually depends on MySQL's behavior would surely be broken." Well, first of all, you're not competing here with MySQL in this case, but with Oracle. Our application does this using Oracle'sROWNUM trick and it works perfectly fine. Now I guess you think Oracle's ROWNUM is also stupid in this case, butit certainly helps us writing cleaner SQL, and a missing postgres alternative which is easy to use won't help you in attractingOracle users. Regarding the compatibility, I do not expect that the rows deleted/updated will be the same as on Oracle or other DB, I simplyexpect that only a limited number of rows will be processed at a time... the rest will be processed in the next runs,on all DBs... Is this so outrageous ? Regarding non-standard, LIMIT is non-standard in SELECTS as well, and I don't see how much more non-standard would it bein DELETE/UPDATE. Regarding "bad ideas", "depending on MySQLs behavior", I think you're simply overreacting here... for me LIMIT in DELETE/UPDATEwould have saved a few days of compatibility work on our application WITH ORACLE, not MySQL, and a few contrivedqueries. "And how exactly do you control *which* tuple(s) get deleted or updated, if the WHERE clause selects more than the limit?" I DO NOT CARE about which rows are deleted. The fact that it is nondeterministic can be very clearly specified in the documentationif you think it is such a bad thing, but nondeterministic is perfectly fine sometimes. There are lots of nondeterminismsin the data base world, starting with the ordering of selects if you don't use order by, then why don't weforce everybody using order by ? Why don't you force to use order by on a select with limit ? Why there it is enough tosay it in the docs that it WILL BE NON_DETERMINISTIC ? "Then use ctid." For the problem at hand in your post it is a good solution, except that it will cause a full table scan cause I guess fewpeople have indexes on ctid. Or you have to write your queries really contrived, by duplicating most of your query conditionsso that it can use some indexes. I'm not sure if you'll get away without at least 2 full table scans if using ctidand no indexes, one for the subquery and one for the delete itself... not to mention the need for something like a HashAggregateon the subquery results... all this is speculation, but for sure you'll spend 10x the time for optimizing thesubquery then you would writing a simple DELETE with LIMIT. "Have you got any evidence that there's a meaningful speedup?" No, but from speculating the query plans I see it would mean up to 50% speedup for my use case. "We have a zero-tolerance policy on yacc warnings." "This just seems like a really bad idea ..." You simply have prejudices against this feature. If you wouldn't be so against it I'm sure the problems could be solved.You're one of the most influent person on where postgres is going, and it's a pity when you're so against somethingyou don't like based on gut feelings... Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > I DO NOT CARE about which rows are deleted. You can't possibly think that that holds true in general. > The fact that it is > nondeterministic can be very clearly specified in the documentation if > you think it is such a bad thing, but nondeterministic is perfectly > fine sometimes. There are lots of nondeterminisms in the data base > world, starting with the ordering of selects if you don't use order > by, then why don't we force everybody using order by ? Why don't you > force to use order by on a select with limit ? Why there it is enough > to say it in the docs that it WILL BE NON_DETERMINISTIC ? I can tolerate nondeterminism in SELECT because it doesn't change the data. If you get it wrong you can always do it over. UPDATE/DELETE need to have higher standards though. regards, tom lane
On Fri, 2006-05-19 at 16:31, Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > I DO NOT CARE about which rows are deleted. > > You can't possibly think that that holds true in general. I agree that it is not true in the general case, but then I also don't want to use DELETE with LIMIT in the general case. I only want to use it in the very specific cases where it makes sense, and it results in cleaner SQL, and it would likely result in a better execution plan. > I can tolerate nondeterminism in SELECT because it doesn't change the > data. If you get it wrong you can always do it over. UPDATE/DELETE > need to have higher standards though. Please Tom, there are so many ways you can shoot your feet already in there... I don't see why this one would be a bigger foot-gun then the subquery stile. It is functionally equivalent. It's only easier to write... if somebody wants to shoot himself, he can do it one way or the other. Placing a big warning on the docs should be enough... <rant> except if postgres is really targeting the MySql users instead of the Oracle folks. Those guys already have this foot-gun readily loaded... where's the American spirit where you are allowed to carry guns and expected to act responsible ?</rant> Cheers, Csaba.
>----- Original Message ---- >From: Tom Lane <tgl@sss.pgh.pa.us> >To: Csaba Nagy <nagy@ecircle-ag.com> >Cc: Postgres general mailing list <pgsql-general@postgresql.org> >Sent: Friday, May 19, 2006 9:31:24 AM >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > >You can't possibly think that that holds true in general. > >I can tolerate nondeterminism in SELECT because it doesn't change the >data. If you get it wrong you can always do it over. UPDATE/DELETE >need to have higher standards though. > > regards, tom lane The usage Csaba is referring to seems to be pretty common practice in the world of Oracle. If I need to purge 5-10 millionrows from a non-partitioned table on a regular basis (e.g: archiving) I'm going to use delete in conjunction withan appropriate where clause (typically something like less than some sequence number or date) and tack a "rownum<X" (whereX is some fairly large constant) on the end so that the delete is done in chunks. I'll commit immediately afterwardsand loop until sql%rowcount<X indicating that I'm finsihed. Now the question... why would you do that instead of doing everything in one big transaction on Oracle? I guess performanceis one reason. Oracle's (at least with 8/8i) performance seems to tank very quickly on deletes as you increasethe number of records you delete in a single transaction. The other (at least with my understanding of Oracle internals)is that using smaller transactions will mean less rollback segment space used which reduces the likelyhood of yourtransaction getting killed due to Oracle running out of rollback space on a database that has heavy usage. Since Postgresql's MVCC system doesn't require rollback segments I guess the performance question is the more important ofthe two. If there could be a performance gain in Postgresql with "delete from X where Y limit Z" vs "delete from X whereY in (some subselect limit Z)" I'd think Csaba suggestion has some merit. Regards, Shelby Cain
> Now the question... why would you do that instead of doing everything in one big transaction on Oracle? I guess performanceis one reason. Oracle's (at least with 8/8i) performance seems to tank very quickly on deletes as you increasethe number of records you delete in a single transaction. The other (at least with my understanding of Oracle internals)is that using smaller transactions will mean less rollback segment space used which reduces the likelyhood of yourtransaction getting killed due to Oracle running out of rollback space on a database that has heavy usage. Running out of rollback segments is the answer in our case. It happened more than once... > Since Postgresql's MVCC system doesn't require rollback segments I guess the performance question is the more importantof the two. If there could be a performance gain in Postgresql with "delete from X where Y limit Z" vs "deletefrom X where Y in (some subselect limit Z)" I'd think Csaba suggestion has some merit. But postgres has in turn the problem of not functional vacuum when you have long running transactions. That is a problem for heavily recycled tables like queue tables. Now recently I have solved the vacuum problem by regularly CLUSTER-ing our most heavily used queue table, so long running transactions are not anymore such a huge problem for us, but we still have the case of some user triggered operations which time out on the web before finishing on the DB. Some of those would make perfect sense to be done partially and the user then can restart the operation from where it left... now we often have 5 minutes of updates rolled back and leaving nothing useful but only dead rows. And yes, not everything we do must be perfectly transactional... sometimes the partial work worth more than partial work rolled back. Cheers, Csaba.
On Fri, May 19, 2006 at 04:22:42PM +0200, Csaba Nagy wrote: > "Then use ctid." > > For the problem at hand in your post it is a good solution, except > that it will cause a full table scan cause I guess few people have > indexes on ctid. Or you have to write your queries really contrived, > by duplicating most of your query conditions so that it can use some > indexes. I'm not sure if you'll get away without at least 2 full > table scans if using ctid and no indexes, one for the subquery and > one for the delete itself... not to mention the need for something > like a HashAggregate on the subquery results... all this is > speculation, but for sure you'll spend 10x the time for optimizing > the subquery then you would writing a simple DELETE with LIMIT. Err, you don't need an index on ctid because the ctid represents that physical location of the tuple on disk. ctids are what indexes use to refer to tuples... # explain DELETE FROM t WHERE ctid=(SELECT ctid FROM t WHERE pronargs=1 LIMIT 1); QUERY PLAN ---------------------------------------------------------------- Tid Scan on t (cost=3.75..7.76 rows=2 width=6) Filter: (ctid = $0) InitPlan -> Limit (cost=0.00..3.75 rows=1 width=6) -> Seq Scan on t (cost=0.00..22.50 rows=6 width=6) Filter: (pronargs = 1) (6 rows) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
> Err, you don't need an index on ctid because the ctid represents that > physical location of the tuple on disk. ctids are what indexes use to > refer to tuples... OK, then how you explain this: db=# prepare test_001(bigint, bigint, smallint) as db-# DELETE FROM big_table db-# WHERE ctid IN db-# (SELECT ctid FROM big_table db(# WHERE col1=$2 db(# AND col2 IS NOT NULL db(# AND col3 =$3 db(# AND col4 <> 'o' db(# LIMIT 1000); PREPARE db=# explain execute test_001(1,1,1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop IN Join (cost=2165.98..24570725.13 rows=29 width=6) Join Filter: ("outer".ctid = "inner".ctid) -> Seq Scan on big_table (cost=0.00..1037188.04 rows=36063404 width=6) -> Materialize (cost=2165.98..2166.27 rows=29 width=6) -> Subquery Scan "IN_subquery" (cost=0.00..2165.95 rows=29 width=6) -> Limit (cost=0.00..2165.66 rows=29 width=6) -> Index Scan using idx_big_table_col2 on big_table (cost=0.00..2165.66 rows=29 width=6) Index Cond: (col1 = $2) Filter: ((col2 IS NOT NULL) AND (col3 = $3) AND ("col4" <> 'o'::bpchar)) (9 rows) Cheers, Csaba.
Regarding Oracle's ROWNUM - since they have to use that instead of OFFSET and LIMIT, that isn't much of an argument for the Oracle way. When converting queries into Oracle SQL, I always _really_ miss OFFSET and LIMIT. They are much easier to use than ROWNUM, especially with ORDER BY. I think that more databases support OFFSET and LIMIT than ROWNUM (the Oracle way). Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one time I did something similar in Oracle, I used partitions, and just dropped or truncated the partition containing the old data. Susan Csaba Nagy <nagy@ecircle-ag.com> To: Postgres general mailing list <pgsql-general@postgresql.org> Sent by: cc: Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE pgsql-general-owner@pos |-------------------| tgresql.org | [ ] Expand Groups | |-------------------| 05/19/2006 07:22 AM Well, first of all, you're not competing here with MySQL in this case, but with Oracle. Our application does this using Oracle's ROWNUM trick and it works perfectly fine. Now I guess you think Oracle's ROWNUM is also stupid in this case, but it certainly helps us writing cleaner SQL, and a missing postgres alternative which is easy to use won't help you in attracting Oracle users. Cheers, Csaba. ---------------------------------------------------------------------------------------------- Simply protected storage solutions ensure that your information is automatically safe, readily available and always there, visit us at http://www.overlandstorage.com ----------------------------------------------------------------------------------------------
On Fri, 2006-05-19 at 17:43, SCassidy@overlandstorage.com wrote: > Regarding Oracle's ROWNUM - since they have to use that instead of OFFSET > and LIMIT, that isn't much of an argument for the Oracle way. When > converting queries into Oracle SQL, I always _really_ miss OFFSET and > LIMIT. They are much easier to use than ROWNUM, especially with ORDER BY. I would agree here, but for my purpose would have been a lot easier to parameterize the limit syntax only than to completely rewrite the query for postgres. > I think that more databases support OFFSET and LIMIT than ROWNUM (the > Oracle way). I actually don't care what's the syntax as long I can create a query with the syntactically equivalent parts in the same place of the query and the same number and order of parameters. Then it's fairly easy to parameterize it and have the same code handle it... otherwise I have to write special code for each data base. Sometimes I have to do that anyway for different reasons, but it is a PITA to maintain it, so I try to minimize the number of places where I have to do it. > Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one > time I did something similar in Oracle, I used partitions, and just dropped > or truncated the partition containing the old data. Perfectly fine, I also don't use many of the features available :-) That doesn't mean it is not useful for others... Cheers, Csaba.
>----- Original Message ---- >From: SCassidy@overlandstorage.com >To: Csaba Nagy <nagy@ecircle-ag.com> >Cc: Postgres general mailing list <pgsql-general@postgresql.org>; >pgsql-general-owner@postgresql.org >Sent: Friday, May 19, 2006 10:43:43 AM >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > >Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one >time I did something similar in Oracle, I used partitions, and just dropped >or truncated the partition containing the old data. > Yeah, that’s the proper way to handle the issue assuming that sufficient forethought was put into the design of the database. It becomes trivial to drop a partition part of your weekly/monthly maintenance. Unfortunately, when dealing with legacy database systems that have grown in a very organic way over a span of 5-10 yearsyou don't have much control over the schema and there is significant inertia present to leave things as they are. Asan example, the DBA group at the client I'm currently working has their databases managed by IBM global services and theywill push back on altering a table definition to use partitioning as it costs the organization money to have those changesput into production. Having a internal developer use a script to perform such maintenance as a batch process is considered"free" so you can guess how many such processes have been created over the years. Regards, Shelby Cain
> >Personally, I have never wanted a DELETE or UPDATE with LIMIT. The one > >time I did something similar in Oracle, I used partitions, and just dropped > >or truncated the partition containing the old data. > > > > Yeah, that’s the proper way to handle the issue assuming that sufficient forethought was put into the design of the database. It becomes trivial to drop a partition part of your weekly/monthly maintenance. Well, sometimes it's not that easy. How would you handle a batch processing system which stores the incoming requests in a queue table in the data base, and then periodically processes a batch of it, with the additional constraint that it is allowed to process at most 1000 at a time so it won't produce a too long running transaction ? Suppose the processing is quite costly, and the queue can have bursts of incoming requests which then have to be slowly processed... the requests are coming from the web and must be processed asynchronously, the insert into the data base must be very fast. Partitioning would be able to solve this kind of problem I guess, if one processing chunk is one partition, and it is dropped after processed, but it needs a whole lot of setup and I'm not sure how well it would work with largely variable bursts of data... the number of partitions could grow indefinitely, and there would be a race condition when there's low traffic and we need to process an incomplete chunk (after a maximum sleep timeout for e.g.) while there are still some incoming requests we don't want to loose. So what I'm talking about is not maintenance, but on-line operation... Cheers, Csaba.
Martijn van Oosterhout <kleptog@svana.org> writes: > Err, you don't need an index on ctid because the ctid represents that > physical location of the tuple on disk. ctids are what indexes use to > refer to tuples... > # explain DELETE FROM t WHERE ctid=(SELECT ctid FROM t WHERE pronargs=1 LIMIT 1); This doesn't currently work for more than one tuple, though: the natural locution would be "WHERE ctid IN (SELECT returning more than one tid)" but the planner/executor don't cope with doing that efficiently. Might be worth trying to fix that. regards, tom lane
----- Original Message ---- >From: Csaba Nagy <nagy@ecircle-ag.com> >To: Shelby Cain <alyandon@yahoo.com> >Cc: SCassidy@overlandstorage.com; Postgres general mailing list ><pgsql-general@postgresql.org>; pgsql-general-owner@postgresql.org >Sent: Friday, May 19, 2006 11:46:42 AM >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > >Well, sometimes it's not that easy. How would you handle a batch >processing system which stores the incoming requests in a queue table in >the data base, and then periodically processes a batch of it, with the >additional constraint that it is allowed to process at most 1000 at a >time so it won't produce a too long running transaction ? Suppose the >processing is quite costly, and the queue can have bursts of incoming >requests which then have to be slowly processed... the requests are >coming from the web and must be processed asynchronously, the insert >into the data base must be very fast. I can't imagine a case where a properly tuned Postgresql installation with appropriate hardware backing it couldn't handlethat particular kind of workload pattern. However, I usually work with Oracle so tables used as queues don't havethe same performance issues you'd run into with Postgresql. Regardless, this type of queue problem can also be tackled by having your data layer persisting the input from the web inmemory (which maintains a low perceived response time to the client) and posting to the table as fast as the database allows. > >So what I'm talking about is not maintenance, but on-line operation... Different problems will always require different solutions. In the case you present I don't really think partitioning isthe answer. Regards, Shelby Cain
On Fri, May 19, 2006 at 05:19:37PM +0200, Csaba Nagy wrote: > > Since Postgresql's MVCC system doesn't require rollback segments I guess the performance question is the more importantof the two. If there could be a performance gain in Postgresql with "delete from X where Y limit Z" vs "deletefrom X where Y in (some subselect limit Z)" I'd think Csaba suggestion has some merit. > > But postgres has in turn the problem of not functional vacuum when you > have long running transactions. That is a problem for heavily recycled > tables like queue tables. I'd rather see time invested in coming up with a way to seperate long running and short running transactions from an MVCC standpoint than relying on the ability to shorten transactions lengths. But, there is another advantage to doing deletes or even updates in chunks; it means fewer dead rows in the table being modified, since autovacuum can go to work any time after the first long-running update/delete finishes. > Now recently I have solved the vacuum problem by regularly CLUSTER-ing > our most heavily used queue table, so long running transactions are not > anymore such a huge problem for us, but we still have the case of some > user triggered operations which time out on the web before finishing on > the DB. Some of those would make perfect sense to be done partially and > the user then can restart the operation from where it left... now we > often have 5 minutes of updates rolled back and leaving nothing useful > but only dead rows. BTW, there's a bug/issue with CLUSTER that makes it not entirely transaction safe. > And yes, not everything we do must be perfectly transactional... > sometimes the partial work worth more than partial work rolled back. Yes, something that some people in the community have a hard time accepting. A great example is storing session data for websites. It's convenient to do it in the database, but it can cause big performance problems with MVCC. But it's session data; if you have to wipe the table after a database crash, who cares? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, May 19, 2006 at 10:25:19AM -0700, Shelby Cain wrote: > ----- Original Message ---- > >From: Csaba Nagy <nagy@ecircle-ag.com> > >To: Shelby Cain <alyandon@yahoo.com> > >Cc: SCassidy@overlandstorage.com; Postgres general mailing list ><pgsql-general@postgresql.org>; pgsql-general-owner@postgresql.org > >Sent: Friday, May 19, 2006 11:46:42 AM > >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > > > >Well, sometimes it's not that easy. How would you handle a batch > >processing system which stores the incoming requests in a queue table in > >the data base, and then periodically processes a batch of it, with the > >additional constraint that it is allowed to process at most 1000 at a > >time so it won't produce a too long running transaction ? Suppose the > >processing is quite costly, and the queue can have bursts of incoming > >requests which then have to be slowly processed... the requests are > >coming from the web and must be processed asynchronously, the insert > >into the data base must be very fast. > > I can't imagine a case where a properly tuned Postgresql installation with appropriate hardware backing it couldn't handlethat particular kind of workload pattern. However, I usually work with Oracle so tables used as queues don't havethe same performance issues you'd run into with Postgresql. Just try and do (what should stay) a small queue table in the same database as long-running reporting transactions. As long as a long-running report is going you might as well suspend all vacuuming on that queue table, because it won't do you any good; the report transaction means that vacuum can't remove anything. I've seen a case where a queue table should always fit into a single database page; 2 at most. But because some transactions will run for a minute or two, that table is normally about 40 pages, almost entirely all dead space. Of course the same problem affects all the indexes on that table as well. I can't imagine how bad this would be if the database actually had hour-long reports that had to run... and luckily the system is quiet at night when pg_dump runs. > Regardless, this type of queue problem can also be tackled by having your data layer persisting the input from the webin memory (which maintains a low perceived response time to the client) and posting to the table as fast as the databaseallows. Uh, and just what happens when your web front-end crashes then?? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
----- Original Message ---- >From: Jim C. Nasby <jnasby@pervasive.com> >To: Shelby Cain <alyandon@yahoo.com> >Cc: Csaba Nagy <nagy@ecircle-ag.com>; SCassidy@overlandstorage.com; >Postgres general mailing list <pgsql-general@postgresql.org>;>pgsql-general-owner@postgresql.org >Sent: Monday, May 22, 2006 5:04:51 PM >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > >Just try and do (what should stay) a small queue table in the same >database as long-running reporting transactions. As long as a >long-running report is going you might as well suspend all vacuuming on >that queue table, because it won't do you any good; the report >transaction means that vacuum can't remove anything. > >I've seen a case where a queue table should always fit into a single >database page; 2 at most. But because some transactions will run for a >minute or two, that table is normally about 40 pages, almost entirely >all dead space. Of course the same problem affects all the indexes on >that table as well. > >I can't imagine how bad this would be if the database actually had >hour-long reports that had to run... and luckily the system is quiet at >night when pg_dump runs. My experience with job queues comes from clients that mostly use Oracle as the backend. However, even with Oracle a queuetable should be storing information about a job and not have records unnecessarily locked simply because they are being"worked on" by another hypothetical "job runner" process... by this I mean that the status of a job should be updatedto a distinct state at any given moment in time (eg: unprocessed, processing, processed). In the case I present above,if you are using Postgresql you wouldnt have any open long-running transactions on that table and vacuuming shouldwork... or am I misunderstanding the issue? >> Regardless, this type of queue problem can also be tackled by having your data >>layer persisting the input from the webin memory (which maintains a low >>perceived response time to the client) and posting to the table as fast as the >>databaseallows. > >Uh, and just what happens when your web front-end crashes then?? Sorry, I probably should have said middle tier instead of data layer. Obviously, no single design choice will cover 100%of circumstances. If your middle tier is located on the web server and you have a physical crash then obviously you are going to miss the jobsthere were being held in memory waiting to be posted to the database and you'll have to resubmit (if possible) on restart. Having sane restart requirements/procedures thought out beforehand also goes a long way to making recover from anunexpected failure easier. If your middle tier lives on another machine then you'd miss (at most) the one job that the web application was in the processof handing off to your middle tier. If your application is such that a missed job submission is a highly criticalproblem then you'd want a redundant solution for your middle tier to help avoid the problem or perhaps you mightwant to rethink posting asynchronously from your web UI and buffering the submissions. Regards, Shelby Cain
On Mon, 2006-05-22 at 23:55, Jim C. Nasby wrote: > BTW, there's a bug/issue with CLUSTER that makes it not entirely > transaction safe. For God's sake, don't fix that one ! I rely on it... now seriously, until there's a way to tell the DB that an exclusive lock is enough to vacuum all dead rows, even if they would be visible by other transactions, CLUSTER is the only thing I can use to circumvent the long running transaction syndrome. Cheers, Csaba.
On Tue, 2006-05-23 at 00:04, Jim C. Nasby wrote: > I can't imagine how bad this would be if the database actually had > hour-long reports that had to run... and luckily the system is quiet at > night when pg_dump runs. BTW, pg_dump is the only thing that stays in my way with the CLUSTER strategy against long running transactions, because it locks (in shared mode, but that's just enough) all dumped tables at the beginning. It would be nice to have an "exclude-table" option on it. I actually started working on a patch to allow that, I will make it just good enough for my purpose (very poor C skills here). Would that be interesting for others ? > > Regardless, this type of queue problem can also be tackled by having your data layer persisting the input from the webin memory (which maintains a low perceived response time to the client) and posting to the table as fast as the databaseallows. > > Uh, and just what happens when your web front-end crashes then?? Well, actually we do have some processes where we use a buffer in the web layer which collects ~1000 entries (or 5 minutes worth, whichever comes first) in memory and then flushes them to the DB. But the processing still happens in the DB, as we have a sizable cluster of web machines and it is still more efficient to handle the processing centrally than by each machine itself. So we still insert a batch of raw entries and bulk process them on the DB using stored procedures. If the web machine crashes, we loose some entries. But in those cases we have bigger problems too... We do have however a job table where the whole cluster puts tasks and the machine which has free resources takes some to process. We first used JMS for this purpose, but that was just not flexible enough to accommodate our constraints and rules of load distribution across the cluster. This is the real bitchy queue table. Cheers, Csaba.
On Tue, May 23, 2006 at 10:30:35AM +0200, Csaba Nagy wrote: > On Tue, 2006-05-23 at 00:04, Jim C. Nasby wrote: > > I can't imagine how bad this would be if the database actually had > > hour-long reports that had to run... and luckily the system is quiet at > > night when pg_dump runs. > > BTW, pg_dump is the only thing that stays in my way with the CLUSTER > strategy against long running transactions, because it locks (in shared > mode, but that's just enough) all dumped tables at the beginning. It > would be nice to have an "exclude-table" option on it. I actually > started working on a patch to allow that, I will make it just good > enough for my purpose (very poor C skills here). Would that be > interesting for others ? Well, being able to have finer control over what you're dumping is on the TODO list, and I think there was even consensus reached on -hackers as to how the syntax should work. But that's only a partial fix, because generally you'd want a complete dump of your database anyway. What would be better is if pg_dump could release locks as it no longer needs them, namely as it dumps each table. Though this might require pg_dump remembering some state information about each object since certain things are dumped after all the COPY commands, such as RI. This also doesn't address the issue of long-running transactions preventing dead rows from being vacuumed. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, May 23, 2006 at 10:19:25AM +0200, Csaba Nagy wrote: > On Mon, 2006-05-22 at 23:55, Jim C. Nasby wrote: > > BTW, there's a bug/issue with CLUSTER that makes it not entirely > > transaction safe. > > For God's sake, don't fix that one ! I rely on it... now seriously, > until there's a way to tell the DB that an exclusive lock is enough to > vacuum all dead rows, even if they would be visible by other > transactions, CLUSTER is the only thing I can use to circumvent the long > running transaction syndrome. I believe it's already fixed in HEAD/8.2. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Tue, May 23, 2006 at 10:19:25AM +0200, Csaba Nagy wrote: > > On Mon, 2006-05-22 at 23:55, Jim C. Nasby wrote: > > > BTW, there's a bug/issue with CLUSTER that makes it not entirely > > > transaction safe. > > > > For God's sake, don't fix that one ! I rely on it... now seriously, > > until there's a way to tell the DB that an exclusive lock is enough to > > vacuum all dead rows, even if they would be visible by other > > transactions, CLUSTER is the only thing I can use to circumvent the long > > running transaction syndrome. > > I believe it's already fixed in HEAD/8.2. If it is, TODO doesn't reflect that: * Make CLUSTER preserve recently-dead tuples per MVCC requirements -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, May 22, 2006 at 03:49:01PM -0700, Shelby Cain wrote: > My experience with job queues comes from clients that mostly use Oracle as the backend. However, even with Oracle a queuetable should be storing information about a job and not have records unnecessarily locked simply because they are being"worked on" by another hypothetical "job runner" process... by this I mean that the status of a job should be updatedto a distinct state at any given moment in time (eg: unprocessed, processing, processed). In the case I present above,if you are using Postgresql you wouldnt have any open long-running transactions on that table and vacuuming shouldwork... or am I misunderstanding the issue? The issue is that vacuum has to base it's decisions not on the oldest running transaction that has locks on a table, but on the oldest running transaction in the entire database, because that transaction could start reading any table at any time. Until that changes, long-running transactions of any kind and heavy-update tables simply won't mix well at all in a single database. I recently proposed a way around this [1], but it didn't get much traction. [1] http://archives.postgresql.org/pgsql-general/2006-05/msg00184.php -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
----- Original Message ---- >From: Jim C. Nasby <jnasby@pervasive.com> >To: Shelby Cain <alyandon@yahoo.com> >Cc: Csaba Nagy <nagy@ecircle-ag.com>; SCassidy@overlandstorage.com; Postgres >general mailing list <pgsql-general@postgresql.org>;pgsql-general-owner@postgresql.org >Sent: Tuesday, May 23, 2006 5:19:58 PM >Subject: Re: [GENERAL] allow LIMIT in UPDATE and DELETE > >The issue is that vacuum has to base it's decisions not on the oldest >running transaction that has locks on a table, but on the oldest running >transaction in the entire database, because that transaction could start >reading any table at any time. Until that changes, long-running >transactions of any kind and heavy-update tables simply won't mix well >at all in a single database. Now I understand... thanks for the clarification. Regards, Shelby Cain
> > would be nice to have an "exclude-table" option on it. I actually > > started working on a patch to allow that, I will make it just good > > enough for my purpose (very poor C skills here). Would that be > > interesting for others ? > > Well, being able to have finer control over what you're dumping is on > the TODO list, and I think there was even consensus reached on -hackers > as to how the syntax should work. > I actually managed to do that, I have a pg_dump which accepts "-e table_name" multiple times, so I can exclude tables from the dump. It's nice to have the source at hand :-) For the tables I do exclude, works fine for me, might have problems in the generic case, e.g. when you have large object references in the excluded table, the referred large objects will still be transferred, and I don't know what happens if there are other tables depending on the excluded ones... in my case that's not the case. > But that's only a partial fix, because generally you'd want a complete > dump of your database anyway. What would be better is if pg_dump could > release locks as it no longer needs them, namely as it dumps each table. > Though this might require pg_dump remembering some state information > about each object since certain things are dumped after all the COPY > commands, such as RI. For my purpose that wouldn't help much, the queue tables are anyway useless in the dump (their job might have been already executed, with external side-effects as emails, and that can't be rolled back, and shouldn't be executed again either). So the queue tables are out of context anyway by the time I load the dump, so I can dump/reload them separately if needed, and I have to filter their content anyway. > This also doesn't address the issue of long-running transactions > preventing dead rows from being vacuumed. But it works fine with CLUSTER, at least as currently implemented. Cheers, Csaba.
On Wed, 2006-05-24 at 00:15, Bruce Momjian wrote: > Jim C. Nasby wrote: > > On Tue, May 23, 2006 at 10:19:25AM +0200, Csaba Nagy wrote: > > > On Mon, 2006-05-22 at 23:55, Jim C. Nasby wrote: > > > > BTW, there's a bug/issue with CLUSTER that makes it not entirely > > > > transaction safe. > > > > > > For God's sake, don't fix that one ! I rely on it... now seriously, > > > until there's a way to tell the DB that an exclusive lock is enough to > > > vacuum all dead rows, even if they would be visible by other > > > transactions, CLUSTER is the only thing I can use to circumvent the long > > > running transaction syndrome. > > > > I believe it's already fixed in HEAD/8.2. > > If it is, TODO doesn't reflect that: > > * Make CLUSTER preserve recently-dead tuples per MVCC > requirements Can I ask to leave in some kind of switch (configuration, syntax, I don't care) so that the old behavior is still available if needed ? Thanks, Csaba.