Thread: Survey results on Oracle/M$NT4 to PG72/RH72 migration
Hi all, Here are the results of our survey on a migration from Oracle 8.0 / W$ NT4 SP5 to PostgreSQL 7.2 / Red Hat 7.2. You'll probably remember of a thread I initiated in this list a couple of weeks ago, this is the same survey for the same customer. Now, the survey is finished. So, we migrated all Oracle's specific syntaxes succesfully, including CONNECT BY statements (thanks to all hackers at OpenACS project (visit http://www.openacs.org) for the good code!). We migrated succesfully Oracle Pro*C thanks to fantastic ECPG (Thanks Michael). The overall performance of PostgreSQL, is 33% slower than the Oracle/Nt solution. One must say we faced a well tuned Oracle, tuned for best performance. Even SQL queries were very well tuned, using Oracle pragmas for example (ex: %USE HASH foobar%). Since our customer accepted up to 50%, this is a success for us, technicaly on this point. BUT, we faced a real problem. On some batches, in ECPG, Pro*C structures uses intensively CURSORs loops. In Oracle, CURSORs can be PREPARED. Thus, it seems Oracle only computes once the query plan for the cursor, even if it is closed and re-opened. Maybe some kind of stored query plan / caching / whatever makes it possible. This seems not be the case in ECPG. In each COMMIT, the cursors are closed (they dont even need to close cursors in Oracle!). And at each BEGIN TRANSACTION PostgreSQL seems to compute again parsing and query plan.. So this finaly makes the batch work taking 300% the time Oracle needs. We clearly see our ECPG programs waits for PostgreSQL in the functions were CURSORs are opened. Then, we know the problem is not in ECPG but in PG backend. This is unaceptable for our customer. Many batches are launched during the night and have to be completed in 5h (between 0h and 5h). With a ratio of 3, this is not worth think about migration anymore :-( We know we could have much better performances with something else than ECPG, for example, using C or TCL stored procedures, placing the SQL work wuch closer from the PG backend, using SPI, etc... But this is not possible. We have to make it under ECPG, there are tons of Pro*C code to migrate, and we must make it the same. With ECPG/Pro*C compiled programs, we can stop executions, renice programs, etc, what we would loose putting work in stored procedures. So, I'd really like some of you validate this thing about cursor. We have a strange feeling blended of pride for only a 1,33 ratio face to the giant Oracle, and a feeling of something unfinished, because only of a feature not yet implemented... I read many times the current TODO list. I think our problem is somewhere between the CURSOR thread and the CACHE thread in the TODO. We would really appreciate some of you validate this behaviour about CURSORs, this would validate we didn't spent 40 day/man for nothing, and that we reached a certain good explanation of the problem, that we have not dug just next to the treasure. Thanks a lot anyway for such good database. PS: Bad english, I know... :-) I hope the customer will accept put the survey in GNU/Linuxdoc, as all of PG folks canread it. It is still under a NDA. -- Jean-Paul ARGUDO
> This seems not be the case in ECPG. In each COMMIT, the cursors are > closed (they dont even need to close cursors in Oracle!). And at each > BEGIN TRANSACTION PostgreSQL seems to compute again parsing and query > plan.. I am still convinced that there is room for interpretation of the standard here, Michael. Since we have "begin work" all cursors that were opened outside a tx block (before "begin work") should imho in no way be affected by a commit. (e.g. Informix does it like that) Someone who wants more conformant behavior would need to use the mode of operation where you are always in a tx anyway, thus loosing the above feature :-) Unfortunately I think the backend currently would lack the necessary support for this, since commit does the cleanup work for the cursor ? Such a cursor would need an explicit close or open on the prepared statement to be cleaned up. Andreas
> So this finaly makes the batch work taking 300% the time Oracle needs. > We clearly see our ECPG programs waits for PostgreSQL in the functions > were CURSORs are opened. Then, we know the problem is not in ECPG but in > PG backend. > This is unaceptable for our customer. Many batches are launched during > the night and have to be completed in 5h (between 0h and 5h). With a > ratio of 3, this is not worth think about migration anymore :-( So why exactly can you not simply do the whole batch in one transaction ? Unless you need to run concurrent vacuums, or are low on disk space, or need to concurrently update the affected rows (thus fear deadlocks or locking out interactive clients that update), there is no need to do frequent commits in PostgreSQL for batch work. Andreas PS: I know that coming from other DB's one fears "snapshot too old", filling rollback segments, or other "deficiencies" like long transaction aborted :-)
Le Thursday Mar 14, 2002 at 09:32:19AM +0500, Hannu Krosing a écrit : > On Thu, 2002-03-14 at 02:30, Zeugswetter Andreas SB SD wrote: > > > > > So this finaly makes the batch work taking 300% the time Oracle needs. > > > We clearly see our ECPG programs waits for PostgreSQL in the functions > > > were CURSORs are opened. Then, we know the problem is not in ECPG but in > > > PG backend. > > > > > This is unaceptable for our customer. Many batches are launched during > > > the night and have to be completed in 5h (between 0h and 5h). With a > > > ratio of 3, this is not worth think about migration anymore :-( > > > > So why exactly can you not simply do the whole batch in one transaction ? Ah! Sorry forgot to say this work has double use. In fact, it can be used in batch and it is also used in a kind of "daemon". This daemon wakes up every 5 seconds. It scans (SELECT...) for new insert in a table (lika trigger). When new tuples are found, it launches the work. The work consist in computing total sales of a big store... Each receipt as many items. The batch computes total sales for each section/sector of the store. The "daemon" mode permit having a total sales in "real time"... The batch mode is here to compute final total sales in the end of the day. It can be also use to compute back previous days (up to 5). So, putting "the whole batch in one transaction" is not possible, due to daemon mode. A commit by receipt also permit to not loose previous database work in the case the daemon goes down, for example. > > Unless you need to run concurrent vacuums, Forgot to say too that de x3 ratio is based only on batch mode. Daemon mode is as faster as Oracle (wow!). Forgot to say too that in batch mode we launch concurrent vacuum analyze on the 2 tables constantly accessed (update/inserts only : updating total sale by sector/ sub-sector/ sub-sub-sector, etc.. the total sales has a tree structure then). The vacuum analyze on those 2 tables has a sleep of 10 s, in a while [ 1 ] loop in a .sh > I ran some tests based on their earlier description and concurrent > vacuums (the new, non-locking ones) are a must, best run every few > seconds, as without them the ratio of dead/live tuples will be huge and > that will bog down the whole process. Yes, concurrent vaccums is really *GREAT* without it, the batch work is going slower and slower with time. Concurrent vaccum allows constant performances. > I also suspect (from reading their description) that the main problem of > parsing/optimising each and every similar query will remain even if they > do run in one transaction. Exactly. To answer a question in this thread: the batch has really basic SQL statments! CURSORS are really simple too, based on 1 to 2 "bind variables" that unfortunately are not processed the same way has Oracle. :-( Thanks for your support, much appreciated :-)) -- Jean-Paul ARGUDO
On Thu, 2002-03-14 at 11:20, Jean-Paul ARGUDO wrote: > > > Unless you need to run concurrent vacuums, > > Forgot to say too that de x3 ratio is based only on batch mode. Daemon > mode is as faster as Oracle (wow!). > > Forgot to say too that in batch mode we launch concurrent vacuum analyze > on the 2 tables constantly accessed (update/inserts only : updating > total sale by sector/ sub-sector/ sub-sub-sector, etc.. the total sales > has a tree structure then). > > The vacuum analyze on those 2 tables has a sleep of 10 s, in a > while [ 1 ] loop in a .sh If the general distribution of values does not drastically change in these tables then you can save some time by running just VACUUM, not VACUUM ANALYZE. VACUUM does all the old tuple removing work VACUUM ANALYZE does that + also gathers statistics which make it slower. > > I ran some tests based on their earlier description and concurrent > > vacuums (the new, non-locking ones) are a must, best run every few > > seconds, as without them the ratio of dead/live tuples will be huge and > > that will bog down the whole process. > > Yes, concurrent vaccums is really *GREAT* without it, the batch work is > going slower and slower with time. Concurrent vaccum allows constant > performances. > > > I also suspect (from reading their description) that the main problem of > > parsing/optimising each and every similar query will remain even if they > > do run in one transaction. > > Exactly. > > To answer a question in this thread: the batch has really basic SQL > statments! CURSORS are really simple too, based on 1 to 2 "bind > variables" that unfortunately are not processed the same way has Oracle. > :-( can you give me a small made-up example and then tell me what performance you get on Oracle/NT and what on PostgreSQL/Linux ? I'd like to try to move cursor -> backend proc and see 1) if it is big enough gain to warrant further work 2) if it can be done automatically, either by preprocessing ECPG or just changing it -------------- Hannu
On Wed, 2002-03-13 at 21:18, Jean-Paul ARGUDO wrote: > Hi all, > > > Here are the results of our survey on a migration from Oracle 8.0 / W$ > NT4 SP5 to PostgreSQL 7.2 / Red Hat 7.2. > > You'll probably remember of a thread I initiated in this list a couple > of weeks ago, this is the same survey for the same customer. Now, the > survey is finished. > > So, we migrated all Oracle's specific syntaxes succesfully, including > CONNECT BY statements (thanks to all hackers at OpenACS project (visit > http://www.openacs.org) for the good code!). Could you elaborate here ? I know they do some of it using triggers and bitmap indexes, do you mean this ? > We migrated succesfully Oracle Pro*C thanks to fantastic ECPG (Thanks > Michael). > > The overall performance of PostgreSQL, is 33% slower than the Oracle/Nt > solution. One must say we faced a well tuned Oracle, tuned for best > performance. Even SQL queries were very well tuned, using Oracle > pragmas for example (ex: %USE HASH foobar%). > > Since our customer accepted up to 50%, this is a success for us, > technicaly on this point. > > BUT, we faced a real problem. On some batches, in ECPG, Pro*C > structures uses intensively CURSORs loops. In Oracle, CURSORs > can be PREPARED. Thus, it seems Oracle only computes once the query plan > for the cursor, even if it is closed and re-opened. Maybe some kind of > stored query plan / caching / whatever makes it possible. What kind of work do you do in these cursors ? Is it inserts, updates, deletes, complicated selects ... > This seems not be the case in ECPG. In each COMMIT, the cursors are > closed (they dont even need to close cursors in Oracle!). And at each > BEGIN TRANSACTION PostgreSQL seems to compute again parsing and query > plan.. > > So this finaly makes the batch work taking 300% the time Oracle needs. > We clearly see our ECPG programs waits for PostgreSQL in the functions > were CURSORs are opened. Then, we know the problem is not in ECPG but in > PG backend. Could you make ona sample test case with minimal schema/data that demonstrates this behaviour so I can try to optimise it ? > This is unaceptable for our customer. Many batches are launched during > the night and have to be completed in 5h (between 0h and 5h). With a > ratio of 3, this is not worth think about migration anymore :-( > > We know we could have much better performances with something else than > ECPG, for example, using C or TCL stored procedures, placing the > SQL work wuch closer from the PG backend, using SPI, etc... Did you do any tests ? How much faster did it get ? > But this is > not possible. We have to make it under ECPG, there are tons of Pro*C > code to migrate, and we must make it the same. With ECPG/Pro*C compiled > programs, we can stop executions, renice programs, etc, what we would > loose putting work in stored procedures. AFAIK some SQL/C type precompilers and other frontend tools for other databases do generate stored procedures for PREPAREd CURSORs. I'm afraid ECPG does not :( But making ECPG do it might be one way to fix this until real prepared queries will be available to frontend. > So, I'd really like some of you validate this thing about cursor. We > have a strange feeling blended of pride for only a 1,33 ratio face to > the giant Oracle, and a feeling of something unfinished, because only > of a feature not yet implemented... > > I read many times the current TODO list. I think our problem is > somewhere between the CURSOR thread and the CACHE thread in the TODO. > > We would really appreciate some of you validate this behaviour about > CURSORs, this would validate we didn't spent 40 day/man for nothing, and > that we reached a certain good explanation of the problem, that we have > not dug just next to the treasure. The treasure is currently locked up in backend behind FE/BE protocol --------------------- Hannu
On Thu, 2002-03-14 at 02:30, Zeugswetter Andreas SB SD wrote: > > > So this finaly makes the batch work taking 300% the time Oracle needs. > > We clearly see our ECPG programs waits for PostgreSQL in the functions > > were CURSORs are opened. Then, we know the problem is not in ECPG but in > > PG backend. > > > This is unaceptable for our customer. Many batches are launched during > > the night and have to be completed in 5h (between 0h and 5h). With a > > ratio of 3, this is not worth think about migration anymore :-( > > So why exactly can you not simply do the whole batch in one transaction ? > > Unless you need to run concurrent vacuums, I ran some tests based on their earlier description and concurrent vacuums (the new, non-locking ones) are a must, best run every few seconds, as without them the ratio of dead/live tuples will be huge and that will bog down the whole process. > or are low on disk space, or need > to concurrently update the affected rows (thus fear deadlocks or locking out > interactive clients that update), there is no need to do frequent commits in > PostgreSQL for batch work. I also suspect (from reading their description) that the main problem of parsing/optimising each and every similar query will remain even if they do run in one transaction. In my tests of simple updates I got 3/2 speed increase (from 1050 to 1500 updates/sec) by using prepared statements inside a stored procedure -------------------- Hannu
On Thu, 14 Mar 2002, Jean-Paul ARGUDO wrote: > This daemon wakes up every 5 seconds. It scans (SELECT...) for new > insert in a table (lika trigger). When new tuples are found, it > launches the work. The work consist in computing total sales of a big > store... You might find it worthwhile to investigate "listen" and "notify" -- combined with a rule or trigger, you can get this effect in near-real-time You'll probably still want a sleep(5) at the end of the loop so you can batch a reasonable number of updates if there's a lot going on. Matthew.
On Thu, Mar 14, 2002 at 09:08:55AM +0500, Hannu Krosing wrote: > AFAIK some SQL/C type precompilers and other frontend tools for other > databases do generate stored procedures for PREPAREd CURSORs. You mean ECPG should/could replace a PEPARE statement with a CREATE FUNCTION and then the usage of the cursor with the usage of that function? Should be possible, but needs some work. > I'm afraid ECPG does not :( That's correct of course. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
> > AFAIK some SQL/C type precompilers and other frontend tools for other > > databases do generate stored procedures for PREPAREd CURSORs. > > You mean ECPG should/could replace a PEPARE statement with a CREATE > FUNCTION and then the usage of the cursor with the usage of that > function? > > Should be possible, but needs some work. Wow Michael, this would be much much much appreciated. :-) > > I'm afraid ECPG does not :( > > That's correct of course. > Michael Thanks. Then we know our conclusions on the survey are right. We hope functionality about prepared cursors, bind variables, etc will come soon in PG :-) We actually think about solutions to patch PostgreSQL and contribute this way, adding a feature we need for business. Thanks. -- Jean-Paul ARGUDO IDEALX S.A.S Consultant bases de données 15-17, av. de Ségur http://www.idealx.com F-75007 PARIS
On Fri, Mar 15, 2002 at 10:25:09AM +0100, Jean-Paul ARGUDO wrote: > > You mean ECPG should/could replace a PEPARE statement with a CREATE > > FUNCTION and then the usage of the cursor with the usage of that > > function? > > > > Should be possible, but needs some work. > > Wow Michael, this would be much much much appreciated. :-) Problem is I have no idea when I will find time to care about such an addition. It certainly won't be possible prior May or so. Sorry. Michael -- Michael Meskes Michael@Fam-Meskes.De Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!