Thread: Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1
Hi, I am transplanting an application to use PostgreSQL8.2.4 instead of DB2 9.1. CLI was used to connect to DB2, and ODBC is used to connect to PostgreSQL. The query statement is as follows: SELECT void,nameId,tag FROM (SELECT void,nameId,tag,.... FROM Attr WHERE attributeof IN (SELECT oid_ FROM ItemView WHERE ItemView.ItemId=?)) x RIGHT OUTER JOIN (SELECT oid_ FROM ItemView WHERE ItemView.ItemId=? and ItemView.assignedTo_=?) y ON attributeof = oid_ FOR READ ONLY I tested the performance on PostgreSQL against DB2, and found that First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds PostgreSQL cost nearly the same time but DB2 ran 30 times faster in second execution. I tried to adjust shared_buffers parameter in postgresql.conf, no speed up. Actually my DB holds only several records so I don't think memory is the reason. Could anybody give some advice to speed up in repeated execution in PostgreSQL or give an explanation why DB2 is so mush faster in repeated execution? Thank you. ning
On Wed, 2009-07-15 at 12:10 +0900, ning wrote: > Hi, > > I am transplanting an application to use PostgreSQL8.2.4 instead of DB2 9.1. > CLI was used to connect to DB2, and ODBC is used to connect to PostgreSQL. > The query statement is as follows: > PostgreSQL cost nearly the same time but DB2 ran 30 times faster in > second execution. Can you run your query with EXPLAIN ANALYZE and post the results, both of the first and later executions? -- Craig Ringer
On Wed, 2009-07-15 at 12:10 +0900, ning wrote: > First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds > Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds Actually, on second thoughts that looks a lot like DB2 is caching the query results and is just returning the cached results when you repeat the query. I'm not sure to what extent PostgreSQL is capable of result caching, but I'd be surprised if it could do as much as DB2. -- Craig Ringer
On Wed, Jul 15, 2009 at 9:27 AM, Craig Ringer<craig@postnewspapers.com.au> wrote: > On Wed, 2009-07-15 at 12:10 +0900, ning wrote: > >> First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds >> Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds > > Actually, on second thoughts that looks a lot like DB2 is caching the > query results and is just returning the cached results when you repeat > the query. Yeah, is 6ms really a problematic response time for your system? If so you might consider whether executing millions of small queries is really the best approach instead of absorbing them all into queries which operate on more records at a time. For example, it's a lot faster to join two large tables than look up matches for every record one by one in separate queries. There's no question if you match up results from DB2 and Postgres one to one there will be cases where DB2 is faster and hopefully cases where Postgres is faster. It's only interesting if the differences could cause problems, otherwise you'll be running around in circles hunting down every difference between two fundamentally different products. -- greg http://mit.edu/~gsstark/resume.pdf
Hi Craig, The log is really long, but I compared the result of "explain analyze" for first and later executions, except for 3 "time=XXX" numbers, they are identical. I agree with you that PostgreSQL is doing different level of caching, I just wonder if there is any way to speed up PostgreSQL in this scenario, which is a wrong way perhaps. Thank you. Ning On Wed, Jul 15, 2009 at 5:27 PM, Craig Ringer<craig@postnewspapers.com.au> wrote: > On Wed, 2009-07-15 at 12:10 +0900, ning wrote: > >> First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds >> Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds > > Actually, on second thoughts that looks a lot like DB2 is caching the > query results and is just returning the cached results when you repeat > the query. > > I'm not sure to what extent PostgreSQL is capable of result caching, but > I'd be surprised if it could do as much as DB2. > > -- > Craig Ringer > >
Hi Greg, I am doing performance test by running unit test program to compare time used on PostgreSQL and DB2. As you pointed out, there are cases that PostgreSQL is faster. Actually in real world for my application, repeatedly executing same query statement will hardly happen. I am investigating this because on the performance test report automatically generated by running unit test program, DB2 is 20-30 times faster than PostgreSQL in some test cases because of repeatedly executed query. I am thinking that ignoring these test cases for performance measure is safe and acceptable, since PostgreSQL is quicker than DB2 for the first execution. Thank you. Ning On Wed, Jul 15, 2009 at 5:37 PM, Greg Stark<gsstark@mit.edu> wrote: > On Wed, Jul 15, 2009 at 9:27 AM, Craig > Ringer<craig@postnewspapers.com.au> wrote: >> On Wed, 2009-07-15 at 12:10 +0900, ning wrote: >> >>> First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds >>> Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds >> >> Actually, on second thoughts that looks a lot like DB2 is caching the >> query results and is just returning the cached results when you repeat >> the query. > > > Yeah, is 6ms really a problematic response time for your system? > > If so you might consider whether executing millions of small queries > is really the best approach instead of absorbing them all into queries > which operate on more records at a time. For example, it's a lot > faster to join two large tables than look up matches for every record > one by one in separate queries. > > There's no question if you match up results from DB2 and Postgres one > to one there will be cases where DB2 is faster and hopefully cases > where Postgres is faster. It's only interesting if the differences > could cause problems, otherwise you'll be running around in circles > hunting down every difference between two fundamentally different > products. > > -- > greg > http://mit.edu/~gsstark/resume.pdf >
ning wrote: > The log is really long, Which usually signals a problem with the query. > but I compared the result of "explain analyze" > for first and later executions, except for 3 "time=XXX" numbers, they > are identical. > They are supposed to be identical unless something is really badly broken. > I agree with you that PostgreSQL is doing different level of caching, > I just wonder if there is any way to speed up PostgreSQL in this > scenario, > This is what EXPLAIN ANALYZE for. Could you post the results please? Cheers, Mike
On Wednesday 15 July 2009 10:27:50 Craig Ringer wrote: > On Wed, 2009-07-15 at 12:10 +0900, ning wrote: > > First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds > > Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds > > Actually, on second thoughts that looks a lot like DB2 is caching the > query results and is just returning the cached results when you repeat > the query. Are you sure getting the query *result* is causing the delay? If my faint memory serves right DB2 does plan caching - PG does not. To test this theory you could prepare it and execute it twice. Prepare it: PREPARE test_query AS SELECT void,nameId,tag FROM (SELECT void,nameId,tag,.... FROM Attr WHERE attributeof IN (SELECT oid_ FROM ItemView WHERE ItemView.ItemId=?)) x RIGHT OUTER JOIN (SELECT oid_ FROM ItemView WHERE ItemView.ItemId=? and ItemView.assignedTo_=?) y ON attributeof = oid_ FOR READ ONLY; Execute it: EXECUTE test_query; EXECUTE test_query; Greetings, Andres
Hi Mike, Thank you for your explanation. The "explain analyze" command used is as follows, several integers are bound to '?'. ----- SELECT oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM (SELECT attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) AND nameId in (?)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) y ON attributeof = oid_ FOR READ ONLY ----- The result of the command is ----- QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=575.60..1273.15 rows=81 width=568) (actual time=0.018..0.018 rows=0 loops=1) Join Filter: (x.attributeof = j1.oid_) -> Index Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=1) Index Cond: ((assignedto_ = 888) AND (jobid = 0)) -> Merge Left Join (cost=575.60..899.41 rows=16243 width=564) (never executed) Merge Cond: (v.void = b.void) -> Merge Left Join (cost=470.77..504.87 rows=2152 width=556) (never executed) Merge Cond: (v.void = res.void) -> Sort (cost=373.61..374.39 rows=310 width=544) (never executed) Sort Key: v.void -> Hash Left Join (cost=112.07..360.78 rows=310 width=544) (never executed) Hash Cond: (v.void = i.void) -> Hash Left Join (cost=65.40..303.17 rows=38 width=540) (never executed) Hash Cond: (v.void = r.void) -> Hash Left Join (cost=21.42..257.86 rows=5 width=532) (never executed) Hash Cond: (v.void = s.void) -> Nested Loop Left Join (cost=8.27..244.65 rows=5 width=16) (never executed) Join Filter: (v.containedin = a.id) -> Nested Loop (cost=8.27..16.57 rows=1 width=12) (never executed) -> HashAggregate (cost=8.27..8.28 rows=1 width=4) (never executed) -> Index Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 width=4) (never executed) Index Cond: ((assignedto_ = 888) AND (jobid = 0)) -> Index Scan using attribute_tc1 on attribute a (cost=0.00..8.27 rows=1 width=12) (never executed) Index Cond: ((a.attributeof = j1.oid_) AND (a.nameid = 6)) -> Append (cost=0.00..137.60 rows=7239 width=12) (never executed) -> Index Scan using attribute_value_i on attribute_value v (cost=0.00..5.30 rows=9 width=12) (never executed) Index Cond: (v.containedin = a.id) -> Seq Scan on string_value v (cost=0.00..11.40 rows=140 width=12) (never executed) -> Seq Scan on integer_value v (cost=0.00..26.30 rows=1630 width=12) (never executed) -> Seq Scan on bigint_value v (cost=0.00..25.10 rows=1510 width=12) (never executed) -> Seq Scan on rangeofint_value v (cost=0.00..25.10 rows=1510 width=12) (never executed) -> Seq Scan on resolution_value v (cost=0.00..24.00 rows=1400 width=12) (never executed) -> Seq Scan on opaque_value v (cost=0.00..20.40 rows=1040 width=12) (never executed) -> Hash (cost=11.40..11.40 rows=140 width=520) (never executed) -> Seq Scan on string_value s (cost=0.00..11.40 rows=140 width=520) (never executed) -> Hash (cost=25.10..25.10 rows=1510 width=12) (never executed) -> Seq Scan on rangeofint_value r (cost=0.00..25.10 rows=1510 width=12) (never executed) -> Hash (cost=26.30..26.30 rows=1630 width=8) (never executed) -> Seq Scan on integer_value i (cost=0.00..26.30 rows=1630 width=8) (never executed) -> Sort (cost=97.16..100.66 rows=1400 width=16) (never executed) Sort Key: res.void -> Seq Scan on resolution_value res (cost=0.00..24.00 rows=1400 width=16) (never executed) -> Sort (cost=104.83..108.61 rows=1510 width=12) (never executed) Sort Key: b.void -> Seq Scan on bigint_value b (cost=0.00..25.10 rows=1510 width=12) (never executed) Total runtime: 0.479 ms (46 rows) ----- Best regards, Ning On Thu, Jul 16, 2009 at 7:37 AM, Mike Ivanov<mikei@activestate.com> wrote: > ning wrote: >> >> The log is really long, > > Which usually signals a problem with the query. > >> but I compared the result of "explain analyze" >> for first and later executions, except for 3 "time=XXX" numbers, they >> are identical. >> > > They are supposed to be identical unless something is really badly broken. > >> I agree with you that PostgreSQL is doing different level of caching, >> I just wonder if there is any way to speed up PostgreSQL in this >> scenario, > > This is what EXPLAIN ANALYZE for. Could you post the results please? > > Cheers, > Mike > >
Hi Andres, The log for the test you suggested is as follows in PostgreSQL8.2.4, but I cannot find a clue to prove or prove not PostgreSQL is doing plan caching. Best regards, Ning ----- job=# prepare test_query as SELECT oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM (SELECT attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM JobView WHERE JobView.JobId=100 and JobView.assignedTo_=1) AND nameId in (6)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE JobView.JobId=100 and JobView.assignedTo_=1) y ON attributeof = oid_ FOR READ ONLY ; PREPARE job=# execute test_query; oid_ | void | nameid | tag | intval | lowerbound | upperbound | crossfeeddir | feeddir | units | opqval | bigval | strval ------+------+--------+-----+--------+------------+------------+--------------+---------+-------+--------+--------+-------- 101 | | | | | | | | | | | | (1 row) job=# execute test_query; oid_ | void | nameid | tag | intval | lowerbound | upperbound | crossfeeddir | feeddir | units | opqval | bigval | strval ------+------+--------+-----+--------+------------+------------+--------------+---------+-------+--------+--------+-------- 101 | | | | | | | | | | | | (1 row) ----- On Thu, Jul 16, 2009 at 7:51 AM, Andres Freund<andres@anarazel.de> wrote: > On Wednesday 15 July 2009 10:27:50 Craig Ringer wrote: >> On Wed, 2009-07-15 at 12:10 +0900, ning wrote: >> > First execution: PostgreSQL 0.006277 seconds / DB2 0.009028 seconds >> > Second execution: PostgreSQL 0.005932 seconds / DB2 0.000332 seconds >> >> Actually, on second thoughts that looks a lot like DB2 is caching the >> query results and is just returning the cached results when you repeat >> the query. > Are you sure getting the query *result* is causing the delay? If my faint > memory serves right DB2 does plan caching - PG does not. > To test this theory you could prepare it and execute it twice. > > Prepare it: > PREPARE test_query AS SELECT void,nameId,tag FROM (SELECT void,nameId,tag,.... > FROM Attr > WHERE attributeof IN (SELECT oid_ FROM ItemView WHERE > ItemView.ItemId=?)) x RIGHT OUTER JOIN (SELECT oid_ FROM ItemView > WHERE ItemView.ItemId=? and ItemView.assignedTo_=?) y ON attributeof = > oid_ FOR READ ONLY; > > > Execute it: > EXECUTE test_query; > EXECUTE test_query; > > Greetings, > > Andres >
On Thursday 16 July 2009 03:11:29 ning wrote: > Hi Andres, > > The log for the test you suggested is as follows in PostgreSQL8.2.4, > but I cannot find a clue to prove or prove not PostgreSQL is doing > plan caching. Well. How long is the PREPARE and the EXECUTEs taking? Andres
Hi Andres, By executing #explain analyze execute test_query; the first execution cost 0.389 seconds the second cost 0.285 seconds Greetings, Ning On Thu, Jul 16, 2009 at 4:45 PM, Andres Freund<andres@anarazel.de> wrote: > On Thursday 16 July 2009 03:11:29 ning wrote: >> Hi Andres, >> >> The log for the test you suggested is as follows in PostgreSQL8.2.4, >> but I cannot find a clue to prove or prove not PostgreSQL is doing >> plan caching. > Well. How long is the PREPARE and the EXECUTEs taking? > > > Andres >
On Thursday 16 July 2009 11:30:00 ning wrote: > Hi Andres, > > By executing > #explain analyze execute test_query; > > the first execution cost 0.389 seconds > the second cost 0.285 seconds Seconds or milliseconds? If seconds that would be by far slower than the plain SELECT, right? Andres
I'm sorry, they are in milliseconds, not seconds. The time used is quite same to the result of "explain analyze select ...." I pasted above, which was " Total runtime: 0.479 ms". Greetings, Ning On Thu, Jul 16, 2009 at 6:33 PM, Andres Freund<andres@anarazel.de> wrote: > On Thursday 16 July 2009 11:30:00 ning wrote: >> Hi Andres, >> >> By executing >> #explain analyze execute test_query; >> >> the first execution cost 0.389 seconds >> the second cost 0.285 seconds > Seconds or milliseconds? > > If seconds that would be by far slower than the plain SELECT, right? > > Andres >
On Thursday 16 July 2009 11:46:18 ning wrote: > I'm sorry, they are in milliseconds, not seconds. > The time used is quite same to the result of "explain analyze select > ...." I pasted above, > which was " Total runtime: 0.479 ms". Yea. Unfortunately that time does not including planning time. If you work locally on the server using psql you can use '\timing' to make psql output timing information. If I interpret those findings correcty the execution is approx. as fast as DB2, only DB2 is doing automated plan caching while pg is not. If it _really_ is necessary that this is that fast, you can prepare the query like I showed. Andres
Hi, Le 16 juil. 09 à 11:52, Andres Freund a écrit : > If I interpret those findings correcty the execution is approx. as > fast as DB2, > only DB2 is doing automated plan caching while pg is not. > > If it _really_ is necessary that this is that fast, you can prepare > the query > like I showed. A form of automatic plan caching by PostgreSQL is possible and available as an extension called "preprepare", which aims to have your statements already prepared by the time you're able to send queries in, at which point you simply EXECUTE and never care about the PREPARE in your code. http://preprepare.projects.postgresql.org/README.html The module is only available for PostgreSQL 8.3 or 8.4, and works in a fully automatic manner only in 8.4 (using local_preload_librairies), and in both cases requires you to: - edit postgresql.conf and restart if preloading, else reload - put your full PREPARE statements into a table - psql -c "EXECUTE myquery;" mydb - enjoy Regards, -- dim PS: soon to be in debian sid for both 8.3 and 8.4, currently in NEW queue
Interesting. It's quite a hairy plan even though all the branches are cut off by conditions ("never executed") so the query yields 0 rows. 0.018 is not a bad timing for that. However, if you run this query with different parameters, the result could be quite sad. There are some deeply nested loops with joins filtered by inner seq scans; this can be extremely expensive. Also, note that Left Merge Join with 16243 rows being reduced into just 1. With a database like DB2, the results you had are quite predictable: slow first time execution (because of the ineffective query) and then fast consequent responses because the tiny resultset produced by the query can be stored in the memory. Now, with Postgres the picture is different: all this complex stuff has to be executed each time the query is sent. I would rather rewrite the query without inner selects, using straight joins instead. Also, try to filter things before joining, not after. Correct me if I'm wrong, but in this particular case this seems pretty much possible. Cheers, Mike ning wrote: > Hi Mike, > > Thank you for your explanation. > The "explain analyze" command used is as follows, several integers are > bound to '?'. > ----- > SELECT oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval > FROM (SELECT attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval > FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM > JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) AND nameId in > (?)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE > JobView.JobId=? and JobView.assignedTo_=?) y ON attributeof = oid_ FOR > READ ONLY > ----- > > The result of the command is > ----- > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop Left Join (cost=575.60..1273.15 rows=81 width=568) > (actual time=0.018..0.018 rows=0 loops=1) > Join Filter: (x.attributeof = j1.oid_) > -> Index Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 > width=4) (actual time=0.016..0.016 rows=0 loops=1) > Index Cond: ((assignedto_ = 888) AND (jobid = 0)) > -> Merge Left Join (cost=575.60..899.41 rows=16243 width=564) > (never executed) > Merge Cond: (v.void = b.void) > -> Merge Left Join (cost=470.77..504.87 rows=2152 > width=556) (never executed) > Merge Cond: (v.void = res.void) > -> Sort (cost=373.61..374.39 rows=310 width=544) > (never executed) > Sort Key: v.void > -> Hash Left Join (cost=112.07..360.78 rows=310 > width=544) (never executed) > Hash Cond: (v.void = i.void) > -> Hash Left Join (cost=65.40..303.17 > rows=38 width=540) (never executed) > Hash Cond: (v.void = r.void) > -> Hash Left Join > (cost=21.42..257.86 rows=5 width=532) (never executed) > Hash Cond: (v.void = s.void) > -> Nested Loop Left Join > (cost=8.27..244.65 rows=5 width=16) (never executed) > Join Filter: (v.containedin = a.id) > -> Nested Loop > (cost=8.27..16.57 rows=1 width=12) (never executed) > -> HashAggregate > (cost=8.27..8.28 rows=1 width=4) (never executed) > -> Index > Scan using job_tc1 on job j1 (cost=0.00..8.27 rows=1 width=4) (never > executed) > Index > Cond: ((assignedto_ = 888) AND (jobid = 0)) > -> Index Scan > using attribute_tc1 on attribute a (cost=0.00..8.27 rows=1 width=12) > (never executed) > Index Cond: > ((a.attributeof = j1.oid_) AND (a.nameid = 6)) > -> Append > (cost=0.00..137.60 rows=7239 width=12) (never executed) > -> Index Scan > using attribute_value_i on attribute_value v (cost=0.00..5.30 rows=9 > width=12) (never executed) > Index Cond: > (v.containedin = a.id) > -> Seq Scan on > string_value v (cost=0.00..11.40 rows=140 width=12) (never executed) > -> Seq Scan on > integer_value v (cost=0.00..26.30 rows=1630 width=12) (never > executed) > -> Seq Scan on > bigint_value v (cost=0.00..25.10 rows=1510 width=12) (never executed) > -> Seq Scan on > rangeofint_value v (cost=0.00..25.10 rows=1510 width=12) (never > executed) > -> Seq Scan on > resolution_value v (cost=0.00..24.00 rows=1400 width=12) (never > executed) > -> Seq Scan on > opaque_value v (cost=0.00..20.40 rows=1040 width=12) (never executed) > -> Hash (cost=11.40..11.40 > rows=140 width=520) (never executed) > -> Seq Scan on > string_value s (cost=0.00..11.40 rows=140 width=520) (never executed) > -> Hash (cost=25.10..25.10 > rows=1510 width=12) (never executed) > -> Seq Scan on > rangeofint_value r (cost=0.00..25.10 rows=1510 width=12) (never > executed) > -> Hash (cost=26.30..26.30 rows=1630 > width=8) (never executed) > -> Seq Scan on integer_value i > (cost=0.00..26.30 rows=1630 width=8) (never executed) > -> Sort (cost=97.16..100.66 rows=1400 width=16) > (never executed) > Sort Key: res.void > -> Seq Scan on resolution_value res > (cost=0.00..24.00 rows=1400 width=16) (never executed) > -> Sort (cost=104.83..108.61 rows=1510 width=12) (never executed) > Sort Key: b.void > -> Seq Scan on bigint_value b (cost=0.00..25.10 > rows=1510 width=12) (never executed) > Total runtime: 0.479 ms > (46 rows) > ----- > > Best regards, > Ning > > On Thu, Jul 16, 2009 at 7:37 AM, Mike Ivanov<mikei@activestate.com> wrote: > >> ning wrote: >> >>> The log is really long, >>> >> Which usually signals a problem with the query. >> >> >>> but I compared the result of "explain analyze" >>> for first and later executions, except for 3 "time=XXX" numbers, they >>> are identical. >>> >>> >> They are supposed to be identical unless something is really badly broken. >> >> >>> I agree with you that PostgreSQL is doing different level of caching, >>> I just wonder if there is any way to speed up PostgreSQL in this >>> scenario, >>> >> This is what EXPLAIN ANALYZE for. Could you post the results please? >> >> Cheers, >> Mike >> >> >>