Thread: PsqlODBC slow on UNION queries
Hi, we are using PowerBuilder for development and I experimented a little with PostgreSQL-8.1 through ODBC and I experienced the following on psqlodbc-8.01.0106. We have some large views, consisting of 6+ UNION members. Recently, I rewrote them to use UNION ALL, since the members give distinct rows across the whole view. The performance difference between the two is enormous, e.g. using UNION, the rows starts flowing only after 30-32 seconds but when using UNION ALL, it starts instantly. Doing the same in psql like below runs under 4 seconds using UNION and under about 12 seconds using UNION ALL, and this includes getting all the rows from the server. time echo "select * from view_x" | psql db >result.txt What may cause such a large difference?
Zoltan Boszormenyi <zboszor@dunaweb.hu> writes: > We have some large views, consisting of 6+ UNION members. > Recently, I rewrote them to use UNION ALL, since the members > give distinct rows across the whole view. > The performance difference between the two is enormous, e.g. using UNION, > the rows starts flowing only after 30-32 seconds but when using UNION ALL, > it starts instantly. Well, UNION requires fetching all the source rows and then doing a duplicate-elimination step (sort+uniq basically). Why does the performance change surprise you? regards, tom lane
Tom Lane írta: >Zoltan Boszormenyi <zboszor@dunaweb.hu> writes: > > >>We have some large views, consisting of 6+ UNION members. >>Recently, I rewrote them to use UNION ALL, since the members >>give distinct rows across the whole view. >> >> > > > >>The performance difference between the two is enormous, e.g. using UNION, >>the rows starts flowing only after 30-32 seconds but when using UNION ALL, >>it starts instantly. >> >> > >Well, UNION requires fetching all the source rows and then doing a >duplicate-elimination step (sort+uniq basically). Why does the >performance change surprise you? > > regards, tom lane > > Well, as I said, doing the same test from PSQL, the performance difference is much less. I was really asking about the difference between psql and PsqlODBC: psql(output to file) psqlodbc UNION total time: < 12 sec first row received after 30 seconds UNION ALL total time < 4 sec first row received instantly My question is, why does PsqlODBC does the same unique seach again, when the server already did it? Only this would explain the difference between the psql and PsqlODBC timings. Best regards, Zoltán Böszörményi
> Well, as I said, doing the same test from PSQL, the performance > difference is much less. I was really asking about the difference > between psql and PsqlODBC: > > psql(output to file) psqlodbc > UNION total time: < 12 sec first row received after > 30 seconds > UNION ALL total time < 4 sec first row received instantly I suppose you have used Declare/Fetch turned on. Please try the same query in psql client. The exact query you can find in mylog output. It could be something like DECLARE CURSOR <CURSOR_NAME> FOR SELECT <your select>; FETCH FORWARD <number of rows to fetch at first time> FROM <CURSOR_NAME>; If you have used Server side prepare then query is already different. It use: PREPARE <plan_name>[(paremters types)] AS <query>; EXECUTE <plan_name> [(parameters)]; Regards, Luf
Hi, Ludek Finstrle írta: >>Well, as I said, doing the same test from PSQL, the performance >>difference is much less. I was really asking about the difference >>between psql and PsqlODBC: >> >> psql(output to file) psqlodbc >>UNION total time: < 12 sec first row received after >>30 seconds >>UNION ALL total time < 4 sec first row received instantly >> >> > >I suppose you have used Declare/Fetch turned on. Please try the same >query in psql client. The exact query you can find in mylog output. >It could be something like >DECLARE CURSOR <CURSOR_NAME> FOR SELECT <your select>; >FETCH FORWARD <number of rows to fetch at first time> FROM <CURSOR_NAME>; > >If you have used Server side prepare then query is already different. >It use: >PREPARE <plan_name>[(paremters types)] AS <query>; >EXECUTE <plan_name> [(parameters)]; > >Regards, > >Luf > > At the moment I am at home, being sick, so I cannot tell which queries PowerBuilder use. But here are the above queries and their timings from psql, this is on my home machine, so the timings are different from what I quoted. 84693 rows were produced by SELECT *, so that's the number I used in FETCH FORWARD. [zozo@host-81-17-177-202 psql]$ cat q1.txt select * from v_invoice_browse; [zozo@host-81-17-177-202 psql]$ cat q2.txt begin; declare mycur1 cursor for select * from v_invoice_browse; FETCH FORWARD 84693 FROM mycur1; commit; [zozo@host-81-17-177-202 psql]$ cat q3.txt prepare plan1 as select * from v_invoice_browse; execute plan1; DECLARE CURSOR complained about being outside of a transaction block, so I put it between BEGIN and COMMIT. UNION ALL: [zozo@host-81-17-177-202 psql]$ time psql -f q1.txt -U bolt bolt2 >r1.txt real 0m5.663s user 0m2.440s sys 0m0.347s [zozo@host-81-17-177-202 psql]$ time psql -f q2.txt -U bolt bolt2 >r2.txt real 0m5.870s user 0m2.480s sys 0m0.354s [zozo@host-81-17-177-202 psql]$ time psql -f q3.txt -U bolt bolt2 >r3.txt real 0m5.981s user 0m2.449s sys 0m0.353s UNION: [zozo@host-81-17-177-202 psql]$ time psql -f q1.txt -U bolt bolt2 >r1.txt real 0m10.237s user 0m2.453s sys 0m0.214s [zozo@host-81-17-177-202 psql]$ time psql -f q2.txt -U bolt bolt2 >r2.txt real 0m10.535s user 0m2.480s sys 0m0.241s [zozo@host-81-17-177-202 psql]$ time psql -f q3.txt -U bolt bolt2 >r3.txt real 0m10.488s user 0m2.466s sys 0m0.245s The timings between UNION and UNION ALL comparing the same query is about what I expected. But there isn't too much difference between the different queries on the same view, so the PowerBuilder behaviour still isn't clear to me. BTW, the quoted timings in the previous post from psql and UNION vs UNION ALL were taken on a 3GHz HT P4, running RedHat 9. The client is Windows2000 with PsqlODBC 08.01.0106. The above timings were on an Athlon 3200+, running FC3/x86-64. Both servers are running PostgreSQL 8.1.1. And the timing differences happen differently if I attach the view in an Access database in Office2003 using PsqlODBC 8.01.0106. This is on WinXP running in VMWare on the same machine as the PostgreSQL server. E.g. the view opens in Access in about 7 seconds when I use UNION ALL, but it requires about 17 seconds if I use UNION. This was with both Declare/Fetch and Server side prepare being OFF. Also the same with only Server side prepare being ON. If I set only Declare/Fetch or both to ON, opening the VIEW in Access comes close to what I experinced in psql, e.g. it appears almost instantly when using UNION ALL, and just under 7 seconds when using UNION. I don't remember which ODBC settings I used for the Windows2000 client and PowerBuilder, I am not near that machine. I will recheck it when I get back to my workplace next monday. Best regards, Zoltán Böszörményi
> >>Well, as I said, doing the same test from PSQL, the performance > >>difference is much less. I was really asking about the difference > >>between psql and PsqlODBC: > >> > >> psql(output to file) psqlodbc > >>UNION total time: < 12 sec first row received after > >>30 seconds > >>UNION ALL total time < 4 sec first row received instantly > > > >I suppose you have used Declare/Fetch turned on. Please try the same > >query in psql client. The exact query you can find in mylog output. > >It could be something like > >DECLARE CURSOR <CURSOR_NAME> FOR SELECT <your select>; > >FETCH FORWARD <number of rows to fetch at first time> FROM <CURSOR_NAME>; > > > >If you have used Server side prepare then query is already different. > >It use: > >PREPARE <plan_name>[(paremters types)] AS <query>; > >EXECUTE <plan_name> [(parameters)]; > > At the moment I am at home, being sick, so I cannot tell which queries > PowerBuilder use. You don't understand me. When you turn on psqlODBC option Use Declare/Fetch all SELECT statements are internally (by psqlODBC) changed to DECLARE/FETCH commands. It's the same with Server side prepare. So it doesn't matter what does PowerBuilder use. > But here are the above queries and their timings from psql, > this is on my home machine, so the timings are different from > what I quoted. 84693 rows were produced by SELECT *, > so that's the number I used in FETCH FORWARD. psqlODBC driver use only Cache size (default is 100) in FETCH FORWARD. But you can change it to get better performance. > And the timing differences happen differently if I attach the view > in an Access database in Office2003 using PsqlODBC 8.01.0106. > This is on WinXP running in VMWare on the same machine as the PostgreSQL > server. E.g. the view opens in Access in about 7 seconds when I use > UNION ALL, > but it requires about 17 seconds if I use UNION. This was with both > Declare/Fetch and Server side prepare being OFF. Also the same with > only Server side prepare being ON. > > If I set only Declare/Fetch or both to ON, opening the VIEW in Access > comes close to what I experinced in psql, e.g. it appears almost instantly > when using UNION ALL, and just under 7 seconds when using UNION. > > I don't remember which ODBC settings I used for the Windows2000 client > and PowerBuilder, I am not near that machine. I will recheck it when > I get back to my workplace next monday. There is another thing. psqlODBC driver simulate SQLColAttributes (I wrote it on the fly so it could be different ODBC API) throught call the query and then get the columns information before query is already open. So it could be the time bottleneck. I try describe it better: SQLPrepare, SQLColAttributes (this call the statement for getting columns information), SQLExecute (this call the statement for result). If you want trace it down the good start point is turn mylog output on and read the result in C:\mylog_XXXX.log (or txt extension?). Regards, Luf
> > If I set only Declare/Fetch or both to ON, opening the VIEW in Access > > comes close to what I experinced in psql, e.g. it appears almost instantly > > when using UNION ALL, and just under 7 seconds when using UNION. > > > > I don't remember which ODBC settings I used for the Windows2000 client > > and PowerBuilder, I am not near that machine. I will recheck it when > > I get back to my workplace next monday. > > There is another thing. psqlODBC driver simulate SQLColAttributes (I wrote > it on the fly so it could be different ODBC API) throught call > the query and then get the columns information before query is already > open. So it could be the time bottleneck. > I try describe it better: > SQLPrepare, SQLColAttributes (this call the statement for getting columns > information), SQLExecute (this call the statement for result). > > If you want trace it down the good start point is turn mylog output on > and read the result in C:\mylog_XXXX.log (or txt extension?). I forgot. Feel free to send the mylog output here to list. We can help you to decode it ;-) Regards, Luf
Ludek Finstrle írta: >>>If I set only Declare/Fetch or both to ON, opening the VIEW in Access >>>comes close to what I experinced in psql, e.g. it appears almost instantly >>>when using UNION ALL, and just under 7 seconds when using UNION. >>> >>>I don't remember which ODBC settings I used for the Windows2000 client >>>and PowerBuilder, I am not near that machine. I will recheck it when >>>I get back to my workplace next monday. >>> >>> >>There is another thing. psqlODBC driver simulate SQLColAttributes (I wrote >>it on the fly so it could be different ODBC API) throught call >>the query and then get the columns information before query is already >>open. So it could be the time bottleneck. >>I try describe it better: >>SQLPrepare, SQLColAttributes (this call the statement for getting columns >>information), SQLExecute (this call the statement for result). >> >>If you want trace it down the good start point is turn mylog output on >>and read the result in C:\mylog_XXXX.log (or txt extension?). >> >> > >I forgot. Feel free to send the mylog output here to list. We can >help you to decode it ;-) > > I will, when I am back to my workplace. In the meantime, I have set MyLog on in WinXP under VMWare here, but Access produced some 120+ MB log and it wasn't even finished reading the records... I didn't feel like I have to send it to the list. ;-) Anyway, it turned out it used DECLARE CURSOR and it fetched 100 records at a time. Best regards, Zoltán Böszörményi
> >I forgot. Feel free to send the mylog output here to list. We can > >help you to decode it ;-) > > I will, when I am back to my workplace. > > In the meantime, I have set MyLog on in WinXP under VMWare here, > but Access produced some 120+ MB log and it wasn't even finished > reading the records... I didn't feel like I have to send it to the list. ;-) > Anyway, it turned out it used DECLARE CURSOR and it fetched > 100 records at a time. Ufff. What tons of records do you get? BTW could you compile psqlODBC from source code? It is possible to disable mylog output in function CC_mapping (it's per record mylog for problems with encoding). I think you don't have such a problem so there is no problem to comment out mylog(...) calls from CC_mapping. This results in much smaller mylog output. Regards, Luf
Ludek Finstrle írta: >>>I forgot. Feel free to send the mylog output here to list. We can >>>help you to decode it ;-) >>> >>> >>I will, when I am back to my workplace. >> >>In the meantime, I have set MyLog on in WinXP under VMWare here, >>but Access produced some 120+ MB log and it wasn't even finished >>reading the records... I didn't feel like I have to send it to the list. ;-) >>Anyway, it turned out it used DECLARE CURSOR and it fetched >>100 records at a time. >> >> > >Ufff. What tons of records do you get? BTW could you compile psqlODBC > > About 86000 records and I found every fields in every records logged, too. >from source code? It is possible to disable mylog output in function > > I can, but not on Windows. I can take the postgresql-odbc src.rpm from RedHat Rawhide and replace the source TGZ with the current CVS snapshot. In fact, I have done it several times, to test my Linux apps against the "official" ODBC driver instead of the ones that come with unixODBC. But that doesn't help me under Windows. :-) I don't have VC++. >CC_mapping (it's per record mylog for problems with encoding). I think >you don't have such a problem so there is no problem to comment out >mylog(...) calls from CC_mapping. This results in much smaller mylog >output. > >Regards, > >Luf > > > Best regards, Zoltán Böszörményi
> I can, but not on Windows. I can take the postgresql-odbc src.rpm ... > But that doesn't help me under Windows. :-) I don't have VC++. Ok. I'll send you changed psqlodbc[aw].dll tomorrow at first. I'm going to come at home late today. Regards, Luf
Wed, Jan 11, 2006 at 10:25:33AM +0100, Ludek Finstrle napsal(a): > > I can, but not on Windows. I can take the postgresql-odbc src.rpm > ... > > But that doesn't help me under Windows. :-) I don't have VC++. > > Ok. I'll send you changed psqlodbc[aw].dll tomorrow at first. > I'm going to come at home late today. Oh, I'm sorry. I'm cleaning my mailbox and I see I forgot to you :-( Please feel free to notify me next time. I'll send it to you privately in few minutes. BTW there was similar problem reported from some man. He use BC++ and the solution was turn on Parse Statement with recent psqlODBC driver. Regards, Luf
> finally here's the results. I hope you can find something useful in it. Finally I see the problem in mylog output. Unfortunatelly I don't know how to change the behaviour :-( without more investigation. There is problem that you (I mean PowerBuilder) calling SQLExecDirect select * from view (which fire the execution) SQLDescribeCol (for each column in result) SQLCancel (I don't know why the statement is canceled it seems to me that PB miss some information) ... SQLTables (get additional informations) ... SQLColumns ... SQLPrimaryKeys ... SQLStatistics ... SQLExecDirect select * from view (which fire the execution again) I have a lot of questions it will be the best if you could try create some repro app. Some of the questions: 1) could you try similar view without text (char, varchar, ...) columns? 2) maybe minimalization of view to one column could be the best test 3) could you try enable trace ODBC (in ODBC manager) - independent on datasource and create two logs - againist psqlodbc and another DB - maybe we could find the difference there 4) could you say me what are the query for (it's fired bettwen two calls of select * from view): - select pbd_fhgt, pbd_fwgt, pbd_fitl, pbd_funl, pbd_fchr, pbd_fptc, pbd_ffce, pbh_fhgt, pbh_fwgt, pbh_fitl, pbh_funl, pbh_fchr, pbh_fptc, pbh_ffce, pbl_fhgt, pbl_fwgt, pbl_fitl, pbl_funl, pbl_fchr, pbl_fptc, pbl_ffce, pbt_cmnt from informix.pbcattbl where pbt_tnam = 'v_invoice_browse' and pbt_ownr = 'informix' - select pbv_name, pbv_vald, pbv_type, pbv_cntr, pbv_msg from informix.pbcatvld - select pbf_name, pbf_frmt, pbf_type, pbf_cntr from informix.pbcatfmt - select pbe_name, pbe_edit, pbe_type, pbe_cntr, pbe_work, pbe_seqn, pbe_flag from informix.pbcatedt order by pbe_name, pbe_seqn - select pbc_tnam, pbc_ownr, pbc_cnam, pbc_labl, pbc_lpos, pbc_hdr, pbc_hpos, pbc_jtfy, pbc_mask, pbc_case, pbc_hght, pbc_wdth, pbc_ptrn, pbc_bmap, pbc_init, pbc_cmnt, pbc_edit from informix.pbcatcol where (pbc_tnam = 'v_invoice_browse' and pbc_ownr = 'informix') 5) How long does it take only (without fetch): BEGIN; DECLARE CURSOR csr_test AS select * from v_invoice_browse; COMMIT; 6) How long does it take: BEGIN; DECLARE CURSOR csr_test AS select * from v_invoice_browse; FETCH BACKWARD in csr_test; COMMIT; Maybe the PowerBuilder go this way all the time. It's our problem that we get first data in SQLExecDirect. I don't know it right now. We have to find the solution but the way could be hard. Regards, Luf
Ludek Finstrle írta: >>finally here's the results. I hope you can find something useful in it. >> >> > >Finally I see the problem in mylog output. Unfortunatelly I don't know >how to change the behaviour :-( without more investigation. > >There is problem that you (I mean PowerBuilder) calling >SQLExecDirect select * from view (which fire the execution) >SQLDescribeCol (for each column in result) >SQLCancel (I don't know why the statement is canceled > it seems to me that PB miss some information) >... >SQLTables (get additional informations) >... >SQLColumns >... >SQLPrimaryKeys >... >SQLStatistics >... >SQLExecDirect select * from view (which fire the execution again) > > ... retrieve 1000+ rows and then _I_ pressed Cancel. Can you confirm that? I am just curious. >I have a lot of questions it will be the best if you could try >create some repro app. Some of the questions: >1) could you try similar view without text (char, varchar, ...) columns? > > Would the same view do but only seleting numeric (integer and decimal) columns? >2) maybe minimalization of view to one column could be the best test > > OK. >3) could you try enable trace ODBC (in ODBC manager) - independent on > datasource and create two logs - againist psqlodbc and another DB > - maybe we could find the difference there > > OK. >4) could you say me what are the query for (it's fired bettwen two calls > of select * from view): > - select pbd_fhgt, pbd_fwgt, pbd_fitl, pbd_funl, pbd_fchr, pbd_fptc, > pbd_ffce, pbh_fhgt, pbh_fwgt, pbh_fitl, pbh_funl, pbh_fchr, > pbh_fptc, pbh_ffce, pbl_fhgt, pbl_fwgt, pbl_fitl, pbl_funl, > pbl_fchr, pbl_fptc, pbl_ffce, pbt_cmnt > from informix.pbcattbl > where pbt_tnam = 'v_invoice_browse' and pbt_ownr = 'informix' > - select pbv_name, pbv_vald, pbv_type, pbv_cntr, pbv_msg > from informix.pbcatvld > - select pbf_name, pbf_frmt, pbf_type, pbf_cntr > from informix.pbcatfmt > - select pbe_name, pbe_edit, pbe_type, pbe_cntr, pbe_work, pbe_seqn, > pbe_flag > from informix.pbcatedt > order by pbe_name, pbe_seqn > - select pbc_tnam, pbc_ownr, pbc_cnam, pbc_labl, pbc_lpos, pbc_hdr, > pbc_hpos, pbc_jtfy, pbc_mask, pbc_case, pbc_hght, pbc_wdth, > pbc_ptrn, pbc_bmap, pbc_init, pbc_cmnt, pbc_edit > from informix.pbcatcol > where (pbc_tnam = 'v_invoice_browse' and pbc_ownr = 'informix') > > I don't know what are these. I guess PowerBuilder tries something behind everyones' back. I just opened the "Database Painter" and issued "select * from v_invoice_browse;". I will recheck my database but those table names aren't familiar at all. Maybe PB created them at first connect and collected something about the database in those tables. >5) How long does it take only (without fetch): > BEGIN; DECLARE CURSOR csr_test AS select * from v_invoice_browse; COMMIT; >6) How long does it take: > BEGIN; DECLARE CURSOR csr_test AS select * from v_invoice_browse; > FETCH BACKWARD in csr_test; COMMIT; > > I will test these, too. >Maybe the PowerBuilder go this way all the time. It's our problem >that we get first data in SQLExecDirect. I don't know it right now. >We have to find the solution but the way could be hard. > >Regards, > >Luf > > Best regards, Zoltán Böszörményi
> >SQLExecDirect select * from view (which fire the execution again) > > ... retrieve 1000+ rows and then _I_ pressed Cancel. Can you confirm that? > I am just curious. Yes. I see > 1000 < 1100 rows. You could go to the bottom of mylog (I see union one) and search backward "select * from v_invoice_browse". Then you search forward fetch <case size> (<case size> = 100 in your configuration so you search "fetch 100"). I count 11 (I hope I don't create a mistake) "fetch 100" => > 1000 < 1100 :-) There is SQLCancel after last "fetch 100" ;-) > >create some repro app. Some of the questions: > >1) could you try similar view without text (char, varchar, ...) columns? > > Would the same view do but only seleting numeric (integer and decimal) > columns? Yes. It's enough. I don't know if the problem isn't char column handling. > >4) could you say me what are the query for (it's fired bettwen two calls > > of select * from view): ... > I don't know what are these. I guess PowerBuilder tries something > behind everyones' back. I think so. Could you try these selects in psql or PgAdmin tool? I'm curios what you see. Regards, Luf