Thread: transaction control in pl/pgsql
Dear list, I have some data (big size) and I've written a long function in pl/pgsql which processes the data in several steps. At a test run my function aborted because of memory exhaustion. My guess is, that everything what happens during the function transaction is stored in the memory - until it's full... So, my idea for solving that problem would be to cut the big function into smaller functions. But, I don't want to write 30 function calls at the end - I would rather like to have one function which is calling all these small functions, so I would only have to write one sql-query at the end. What I fear is either, that, if this function calls the other functions, everything is only one trancaction again and I get memory overflow once more. I've read the documentation regarding this on http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html : "It is important not to confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly-named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do not start or end a transaction. Functions and trigger procedures are always executed within a transaction established by an outer query — they cannot start or commit that transaction, since there would be no context for them to execute in." Somewhere else I've read: "PostgreSQL does not have nested transactions." I'm still not sure if I got it right or if there are other possibilities to solve my problem. Any suggestions would be appreciated! Thanks and regards, Birgit.
a) you can't explicitly control transactions in plpgsql. If you need some sort of a form of it, use save points.
b) you are trying to outsmart database software, and this is just a biiig mistake, and you should stop doing that completely.
b) you are trying to outsmart database software, and this is just a biiig mistake, and you should stop doing that completely.
2010/4/1 Birgit Laggner <birgit.laggner@vti.bund.de>: > Dear list, > > I have some data (big size) and I've written a long function in pl/pgsql > which processes the data in several steps. At a test run my function > aborted because of memory exhaustion. My guess is, that everything what > happens during the function transaction is stored in the memory - until > it's full... So, my idea for solving that problem would be to cut the > big function into smaller functions. But, I don't want to write 30 > function calls at the end - I would rather like to have one function > which is calling all these small functions, so I would only have to > write one sql-query at the end. What I fear is either, that, if this > function calls the other functions, everything is only one trancaction > again and I get memory overflow once more. plpgsql can you implicit subtransaction - every block with protected section is evaluated under subtransaction. But I don't think, so subtransaction help in your case. You do some what is memory expensive - example: larger SRF function in plpgsql, badly used hash aggregation, maybe using of deffered triggers. Subtransaction doesn't help. Try to use temp tables instead. Regards Pavel Stehule > > I've read the documentation regarding this on > http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html : > > "It is important not to confuse the use of BEGIN/END for grouping > statements in PL/pgSQL with the similarly-named SQL commands for > transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do > not start or end a transaction. Functions and trigger procedures are > always executed within a transaction established by an outer query — > they cannot start or commit that transaction, since there would be no > context for them to execute in." > > Somewhere else I've read: "PostgreSQL does not have nested transactions." > > I'm still not sure if I got it right or if there are other possibilities > to solve my problem. Any suggestions would be appreciated! > > Thanks and regards, > > Birgit. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Hi Grzegorz, sorry, but that doesn't help me, perhaps you could get a little bit clearer: @a) Does the use of SAVEPOINT avoid memory overflow? I could not find an explanation about memory use in the documentation of SAVEPOINT. @b) Do you mean I should not process my data or I should not use plpgsql to do that? In what way I'm trying to outsmart the software??? Thanks, Birgit. On 01.04.2010 12:35, Grzegorz Jaśkiewicz wrote: > a) you can't explicitly control transactions in plpgsql. If you need > some sort of a form of it, use save points. > b) you are trying to outsmart database software, and this is just a > biiig mistake, and you should stop doing that completely. >
2010/4/1 Birgit Laggner <birgit.laggner@vti.bund.de>
you are trying to save some memory, that database is going to allocated. You can control that much better by writing things the way others do, and by tweaking your config.
What I was trying to say in a), is that you can control transactions in a way - by using savepoints.
You started to talk about saving memory that database might allocate for transactions. This sounds like 'I am trying to be smarter about things than my DB'. Don't do that. It is good to know how things work, and use it wisely, but don't do things for database, since it was designed to take care of memory, and transactions, etc.
Hi Grzegorz,
sorry, but that doesn't help me, perhaps you could get a little bit clearer:
@a) Does the use of SAVEPOINT avoid memory overflow? I could not find an
explanation about memory use in the documentation of SAVEPOINT.
transactions don't really use a lot of memory, but you want to keep them short, due to possible locking, etc.
@b) Do you mean I should not process my data or I should not use plpgsql
to do that? In what way I'm trying to outsmart the software???
you are trying to save some memory, that database is going to allocated. You can control that much better by writing things the way others do, and by tweaking your config.
What I was trying to say in a), is that you can control transactions in a way - by using savepoints.
You started to talk about saving memory that database might allocate for transactions. This sounds like 'I am trying to be smarter about things than my DB'. Don't do that. It is good to know how things work, and use it wisely, but don't do things for database, since it was designed to take care of memory, and transactions, etc.
--
GJ
On 1 Apr 2010, at 12:22, Birgit Laggner wrote: > Dear list, > > I have some data (big size) and I've written a long function in pl/pgsql > which processes the data in several steps. At a test run my function > aborted because of memory exhaustion. My guess is, that everything what > happens during the function transaction is stored in the memory - until > it's full... No, transactions operate on the database just like anything else. Running out of memory doing that is not impossible, butyou'd need to create some really big transactions to reach those limits (I think I've been quoted the limit being 2^32instructions per transaction a looong time ago, just to give you an indication). You're probably running out of memory for another reason. Now to figure out why... I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints? Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may receivetoo little, or you cross what's available. What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total and sharedmemory does your server have? Another possibility is that your function stores large amounts of data in variables that are not backed up by database tables.That means all that data will be stored in memory, and even if it goes to swap at some point (not sure it will, itwould seriously hurt performance) there is a limit. If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in memory. > So, my idea for solving that problem would be to cut the > big function into smaller functions. But, I don't want to write 30 > function calls at the end - I would rather like to have one function > which is calling all these small functions, so I would only have to > write one sql-query at the end. Splitting up big functions into smaller functions is always a good idea. That's part of general programming paradigms. Itwon't cut down the size of your transaction though. > What I fear is either, that, if this > function calls the other functions, everything is only one trancaction > again and I get memory overflow once more. If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function, can youstrip it down to something that still causes it to run out of memory but that will be a bit easier for the people on thislist to wade through? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4bb4831310417247659380!
Hi Alban, thanks for your detailed answer! My database settings: max connections: 20 work_mem: 100MB shared buffers: 12000MB Server memory: physical 32GB total memory 50GB (incl. swap) shared memory ?? I am not sure if I use deferred constraints - the only constraints I use in the function are NOT NULL constraints. But I don't believe my memory overflow had something to do with them, because the error occured during a loop. There should not happen any sudden changes regarding any NOT NULL constraints between one loop cycle and the next. My function was the only connection to the database as far as I know - so, I don't think allocation of memory should be a reason... I would say: Yes, my function seems to store large amounts of data in memory. But in my function, I tried to store as much as possible of the interim results in real tables (not temp tables) instead of storing them in variables. But my guess is that postgres doesn't write the tables and therefore keeps everything in memory. (by the way: the swap was used up, too) It's really difficult to post only a part of the function, just because every next step is based on the result of the previous step. I also guess that every step on its own wouldn't cause memory overflow, but that it's more like the memory use adds up with every step. But I will try and cut the function into little snippets and let them run one for one - perhaps the memory overflows still occurs for one snippet... I you have any ideas ... Thanks again and regards, Birgit. On 01.04.2010 13:27, Alban Hertroys wrote: > On 1 Apr 2010, at 12:22, Birgit Laggner wrote: > > >> Dear list, >> >> I have some data (big size) and I've written a long function in pl/pgsql >> which processes the data in several steps. At a test run my function >> aborted because of memory exhaustion. My guess is, that everything what >> happens during the function transaction is stored in the memory - until >> it's full... >> > No, transactions operate on the database just like anything else. Running out of memory doing that is not impossible, butyou'd need to create some really big transactions to reach those limits (I think I've been quoted the limit being 2^32instructions per transaction a looong time ago, just to give you an indication). > > You're probably running out of memory for another reason. Now to figure out why... > > I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints? > > Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may receivetoo little, or you cross what's available. > What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total andshared memory does your server have? > > Another possibility is that your function stores large amounts of data in variables that are not backed up by databasetables. That means all that data will be stored in memory, and even if it goes to swap at some point (not sure itwill, it would seriously hurt performance) there is a limit. > If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in memory. > > >> So, my idea for solving that problem would be to cut the >> big function into smaller functions. But, I don't want to write 30 >> function calls at the end - I would rather like to have one function >> which is calling all these small functions, so I would only have to >> write one sql-query at the end. >> > Splitting up big functions into smaller functions is always a good idea. That's part of general programming paradigms.It won't cut down the size of your transaction though. > > >> What I fear is either, that, if this >> function calls the other functions, everything is only one trancaction >> again and I get memory overflow once more. >> > > If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function, can youstrip it down to something that still causes it to run out of memory but that will be a bit easier for the people on thislist to wade through? > > Alban Hertroys > > -- > If you can't see the forest for the trees, > cut the trees and you'll see there is no forest. > > > !DSPAM:1054,4bb4832810417514219450! > > > >
On 1 Apr 2010, at 14:38, Birgit Laggner wrote: > Hi Alban, > > thanks for your detailed answer! > > My database settings: > max connections: 20 > work_mem: 100MB > shared buffers: 12000MB > > Server memory: > physical 32GB > total memory 50GB (incl. swap) > shared memory ?? Ok, so max work_mem in use at any given time is 20 * 100MB ~ 2G. Considering you have 12G shared buffers and 50G availablein total that's not your problem. Btw, the amount of shared memory is something controlled from the kernel (you appear to be on a Linux system). If less thanyour 12G shared buffers would have been configured for kernel shared memory, then Postgres wouldn't start up at all,so there's probably no need to look at its value. > I am not sure if I use deferred constraints - the only constraints I use > in the function are NOT NULL constraints. But I don't believe my memory > overflow had something to do with them, because the error occured during > a loop. There should not happen any sudden changes regarding any NOT > NULL constraints between one loop cycle and the next. Deferred constraints are constraints that aren't checked until the end of the transaction. You can imagine keeping trackof the things to check on commit can build up. But this doesn't apply to NOT NULL constraints as the documentation says: DEFERRABLE NOT DEFERRABLE This controls whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately afterevery command. Checking of constraints that are deferrable may be postponed until the end of the transaction (usingthe SET CONSTRAINTS command). NOT DEFERRABLE is the default. Only foreign key constraints currently accept this clause.All other constraint types are not deferrable. > My function was the only connection to the database as far as I know - > so, I don't think allocation of memory should be a reason... Probably not, no. > I would say: Yes, my function seems to store large amounts of data in > memory. But in my function, I tried to store as much as possible of the > interim results in real tables (not temp tables) instead of storing them > in variables. Good practice I think. Do any of those variables ever contain a large amount of data? I realised that since you have onesingle function many of your variables stay allocated until the final END statement. If that's what's causing you to runout of memory then it would help to split the function up into smaller ones, it would reduce the scope of such variables. > But my guess is that postgres doesn't write the tables and > therefore keeps everything in memory. (by the way: the swap was used up, > too) No it certainly doesn't do that, if you're operating on tables (like you say you do) then it writes such things to its WAL. Maybe there's something else trying to keep lots of data in memory? Can you tell what is using all that memory? What happens to the results of your function? Is that a large result set and is some piece of software on the same machinebuffering all of that in memory, to display it for example? I recall psql can suffer from that problem, but it hasa switch to turn that off (the name eludes me). Web-site scripts also have a tendency to handle data that way. A few wild guesses; Did someone for example configure the WAL to be on a memory-disk? Is some other process churning up memorywhile you're running your function? > It's really difficult to post only a part of the function, just because > every next step is based on the result of the previous step. I also > guess that every step on its own wouldn't cause memory overflow, but > that it's more like the memory use adds up with every step. But I will > try and cut the function into little snippets and let them run one for > one - perhaps the memory overflows still occurs for one snippet... > > I you have any ideas ... > > Thanks again and regards, > > Birgit. > > > > > On 01.04.2010 13:27, Alban Hertroys wrote: >> On 1 Apr 2010, at 12:22, Birgit Laggner wrote: >> >> >>> Dear list, >>> >>> I have some data (big size) and I've written a long function in pl/pgsql >>> which processes the data in several steps. At a test run my function >>> aborted because of memory exhaustion. My guess is, that everything what >>> happens during the function transaction is stored in the memory - until >>> it's full... >>> >> No, transactions operate on the database just like anything else. Running out of memory doing that is not impossible,but you'd need to create some really big transactions to reach those limits (I think I've been quoted the limitbeing 2^32 instructions per transaction a looong time ago, just to give you an indication). >> >> You're probably running out of memory for another reason. Now to figure out why... >> >> I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints? >> >> Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may receivetoo little, or you cross what's available. >> What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total andshared memory does your server have? >> >> Another possibility is that your function stores large amounts of data in variables that are not backed up by databasetables. That means all that data will be stored in memory, and even if it goes to swap at some point (not sure itwill, it would seriously hurt performance) there is a limit. >> If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in memory. >> >> >>> So, my idea for solving that problem would be to cut the >>> big function into smaller functions. But, I don't want to write 30 >>> function calls at the end - I would rather like to have one function >>> which is calling all these small functions, so I would only have to >>> write one sql-query at the end. >>> >> Splitting up big functions into smaller functions is always a good idea. That's part of general programming paradigms.It won't cut down the size of your transaction though. >> >> >>> What I fear is either, that, if this >>> function calls the other functions, everything is only one trancaction >>> again and I get memory overflow once more. >>> >> >> If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function, canyou strip it down to something that still causes it to run out of memory but that will be a bit easier for the peopleon this list to wade through? >> >> Alban Hertroys >> >> -- >> If you can't see the forest for the trees, >> cut the trees and you'll see there is no forest. >> >> >> >> >> >> >> > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > > Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4bb4b67310411808037631!
On Thu, Apr 1, 2010 at 6:22 AM, Birgit Laggner <birgit.laggner@vti.bund.de> wrote: > Dear list, > > I have some data (big size) and I've written a long function in pl/pgsql > which processes the data in several steps. At a test run my function > aborted because of memory exhaustion. My guess is, that everything what > happens during the function transaction is stored in the memory - until > it's full... So, my idea for solving that problem would be to cut the > big function into smaller functions. But, I don't want to write 30 > function calls at the end - I would rather like to have one function > which is calling all these small functions, so I would only have to > write one sql-query at the end. What I fear is either, that, if this > function calls the other functions, everything is only one trancaction > again and I get memory overflow once more. I don't know all the specifics of your case but ultimately there are limits to what you can reasonably do in a single transaction, especially if you are writing to the database. If you push the limit the database starts to push back. Transactions generally should be as short as possible. Long transactions inhibit the ability of the database to do certain types of maintenance on itself and have other issues like bad performance and memory exhaustion. Regardless, of how many separate functions/savepoints/begin/end blocks your 'outer' function calls, your entire set of work is going to operate within the context of a single transaction. This is an iron clad rule which (at present) there is no work around for. For this reason certain classes of data processing must unhappily be done on the client side, introducing another language and forcing all the data back and forth through the protocol. In the future, it may be possible to execute pl/pgsql-ish type of code in the backend that allows explicit transaction control. This feature might be a 'stored procedure', or there might be some other type of nomenclature to distinguish functions that manage their own transaction state. merlin
Hi Merlin, hi Alban, thank you both for your helpful answers. Now, I splitted the function into smaller parts which have to be called seperately one after another. Probably, I will write a script for calling all the functions needed. Not as nice as an all in one function, but if there is no other way.... Executing my function snippets I came down to some possible explanation of the memory overflow: The function runs som loop cycles and for every cycle, I let the function write a notice of the current loop cycle number. I use pgadmin for writing and executing most of my postgres stuff. So, I guess, writing all these loop cycle notices in the pgadmin window lead to the exorbitant memory usage. Now, I let the function write a notice only every 100 loop cycles. Until now, I didn't have problems with memory overflow anymore... Regards, Birgit. On 01.04.2010 22:33, Merlin Moncure wrote: > On Thu, Apr 1, 2010 at 6:22 AM, Birgit Laggner > <birgit.laggner@vti.bund.de> wrote: > >> Dear list, >> >> I have some data (big size) and I've written a long function in pl/pgsql >> which processes the data in several steps. At a test run my function >> aborted because of memory exhaustion. My guess is, that everything what >> happens during the function transaction is stored in the memory - until >> it's full... So, my idea for solving that problem would be to cut the >> big function into smaller functions. But, I don't want to write 30 >> function calls at the end - I would rather like to have one function >> which is calling all these small functions, so I would only have to >> write one sql-query at the end. What I fear is either, that, if this >> function calls the other functions, everything is only one trancaction >> again and I get memory overflow once more. >> > I don't know all the specifics of your case but ultimately there are > limits to what you can reasonably do in a single transaction, > especially if you are writing to the database. If you push the limit > the database starts to push back. Transactions generally should be as > short as possible. Long transactions inhibit the ability of the > database to do certain types of maintenance on itself and have other > issues like bad performance and memory exhaustion. > > Regardless, of how many separate functions/savepoints/begin/end blocks > your 'outer' function calls, your entire set of work is going to > operate within the context of a single transaction. This is an iron > clad rule which (at present) there is no work around for. For this > reason certain classes of data processing must unhappily be done on > the client side, introducing another language and forcing all the data > back and forth through the protocol. > > In the future, it may be possible to execute pl/pgsql-ish type of code > in the backend that allows explicit transaction control. This feature > might be a 'stored procedure', or there might be some other type of > nomenclature to distinguish functions that manage their own > transaction state. > > merlin > >