Thread: How to optimize a query...
I originally had this query with an OR clause instead of the UNION, which made it amazingly impossibly slow, with the UNION it's far faster however it's still very slow,(10,000+ rows in both stables), is there a way to speed this up? I have BTREE indexes on all concerned fields of the tables, but it doesn't seem to use any of them... I have another query that joins the result of this with 5 other tables, unfortunately that one takes like 10 minutes... Any suggestions on optimizations would be very appreciated. ftc=> explain ftc-> select po_id,ticket_pk ftc-> from tickets,po ftc-> where po_id=material_po ftc-> union ftc-> select po_id,ticket_pk ftc-> from tickets,po ftc-> where po_id=trucking_po ftc-> ; NOTICE: QUERY PLAN: Unique (cost=4744.05 size=0 width=0) -> Sort (cost=4744.05 size=0 width=0) -> Append (cost=4744.05 size=0 width=0) -> Hash Join (cost=2372.03 size=11659303 width=12) -> Seq Scan on tickets (cost=849.03 size=19213 width=8) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on po (cost=528.98 size=10848 width=4) -> Hash Join (cost=2372.03 size=13838477 width=12) -> Seq Scan on tickets (cost=849.03 size=19213 width=8) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on po (cost=528.98 size=10848 width=4) EXPLAIN
secret wrote: <description of sloq query, with proper EXPLAIN output... thanks!> Hmm, it looks like your tables are big enough to benefit from indices, but the query plan didn't choose any. I'd guess an index on po.po_id, and perhaps tickets.material_po and tickets.trucking_po might help. From what I understand, indices are of most use on attributes (fields) that are mostly unique, like your po_id is (probably). They can hurt you if you build them on fields that are mostly _not_ unique (the classic example being a gender field - only two likely values, lost of rows returned for either, so you scan anyway...) Any real experts out there to correct me? ;-) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> I originally had this query with an OR clause instead of the UNION, > which made it amazingly impossibly slow, with the UNION it's far faster > however it's still very slow,(10,000+ rows in both stables), is there a > way to speed this up? Hmm. What does EXPLAIN give as the query plan if you use the OR approach? The UNION method requires much more work than a decent OR plan would need; as you can see from the query plan, it has to sort and unique-ify the results of the two sub-selects in order to discard duplicate tuples. If you don't mind seeing the same tuple twice when it matches on both PO fields, you could use UNION ALL rather than UNION to avoid the sort step. But that's just a hack... I think the problem might be that Postgres doesn't know how to handle ORed join clauses very efficiently. Something to work on in the future. > I have another query that joins the result of this with 5 other > tables, unfortunately that one takes like 10 minutes... How long does it take to EXPLAIN that query? Postgres' optimizer has some bugs that cause it to take an unreasonable amount of time to plan a query that joins more than a few tables. You can tell whether it's the planning or execution time that's the problem by comparing EXPLAIN runtime to the actual query. (The bugs are fixed in 6.5, btw. In the meantime a workaround is to reduce the GEQO threshold to less than the number of tables in the query that's giving you trouble.) regards, tom lane
secret wrote: > I originally had this query with an OR clause instead of the UNION, > which made it amazingly impossibly slow, with the UNION it's far faster > however it's still very slow,(10,000+ rows in both stables), is there a > way to speed this up? I have BTREE indexes on all concerned fields of > the tables, but it doesn't seem to use any of them... > > I have another query that joins the result of this with 5 other > tables, unfortunately that one takes like 10 minutes... Any suggestions > on optimizations would be very appreciated. > > ftc=> explain > ftc-> select po_id,ticket_pk > ftc-> from tickets,po > ftc-> where po_id=material_po > ftc-> union > ftc-> select po_id,ticket_pk > ftc-> from tickets,po > ftc-> where po_id=trucking_po > ftc-> ; > NOTICE: QUERY PLAN: > > Unique (cost=4744.05 size=0 width=0) > -> Sort (cost=4744.05 size=0 width=0) > -> Append (cost=4744.05 size=0 width=0) > -> Hash Join (cost=2372.03 size=11659303 width=12) > -> Seq Scan on tickets (cost=849.03 size=19213 > width=8) > -> Hash (cost=0.00 size=0 width=0) > -> Seq Scan on po (cost=528.98 size=10848 > width=4) > -> Hash Join (cost=2372.03 size=13838477 width=12) > -> Seq Scan on tickets (cost=849.03 size=19213 > width=8) > -> Hash (cost=0.00 size=0 width=0) > -> Seq Scan on po (cost=528.98 size=10848 > width=4) > > EXPLAIN I'm having a lot of problems with performance under PostgreSQL, it seems most of my major queries arn't using indexes(as above)... tickets has 15k tables, 10k tables... The above query takes about 4 minutes... Unfortunately I need to add in joins to about 5 tables.... If I add one more table it's 10 minutes... 2? Who knows, I know by 5 it's unacceptable, can anyone please give me some advise on what my problems might be with queries? I've done VACUUM ANALYZE ... I didn't used to do that, could it be missing data from way back? Should I reload the entire database? --David
Re: [SQL] Re: How to optimize a query... -- Extreme performance difference, same EXPLAIN
From
secret
Date:
Tom Lane wrote: > > I originally had this query with an OR clause instead of the UNION, > > which made it amazingly impossibly slow, with the UNION it's far faster > > however it's still very slow,(10,000+ rows in both stables), is there a > > way to speed this up? > > Hmm. What does EXPLAIN give as the query plan if you use the OR > approach? The UNION method requires much more work than a decent OR > plan would need; as you can see from the query plan, it has to sort and > unique-ify the results of the two sub-selects in order to discard > duplicate tuples. If you don't mind seeing the same tuple twice when > it matches on both PO fields, you could use UNION ALL rather than UNION > to avoid the sort step. But that's just a hack... > Explain seems to be telling a different story now... It used to give a bunch of sequential scans, when I used UNION it gave less hash joins, I thought the less hash joins might be better(And they were, one would take hours & run out of memory the other would take an hour & not run out of memory)... I have a stripped down version of the query I need, the UNION takes: ! system usage stats: ! 938.603740 elapsed 914.250000 user 1.140000 system sec ! [914.260000 user 1.140000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 3799/13766 [3915/13873] page faults/reclaims, 654 [654] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 688 read, 716 written When I use OR: ! system usage stats: ! 706.158704 elapsed 698.970000 user 0.370000 system sec ! [698.980000 user 0.370000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 1025/6810 [1062/6906] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 232 read, 0 written, buffer hit rate = 84.61% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written > > I think the problem might be that Postgres doesn't know how to handle > ORed join clauses very efficiently. Something to work on in the future. > I think that may of been because I didn' tknow to VACUUM ANALYZE, however today I found a much more disturbing thing, if I simply change the order of the SELECT clause I can change a 9.69 second query into a 605 second query, below is the example: SELECT po_id, <- PK of po material.name, <- Notice material name here. ticket_pk <- PK of tickets FROM po,tickets,material WHERE po_id=material_po AND po.units_id=tickets.units_id AND po.material_id=material.material_id ; ! system usage stats: ! 315.427223 elapsed 315.010000 user 0.250000 system sec ! [315.020000 user 0.250000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 645/6717 [682/6812] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 144 read, 0 written, buffer hit rate = 89.66% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written select po_id,ticket_pk,material.name FROM po,tickets,material WHERE po_id=material_po AND po.units_id=tickets.units_id AND po.material_id=material.material_id; ! system usage stats: ! 2.290408 elapsed 1.890000 user 0.250000 system sec ! [1.900000 user 0.250000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 73/6691 [110/6786] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 10 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0. 00% ! Direct blocks: 0 read, 0 written 315 seconds vs 2.3 seconds? Is there some light you can shed on how queries are processed under PostgreSQL? They both have the exact same EXPLAIN: Hash Join (cost=5773.47 size=24494486 width=36) -> Hash Join (cost=899.69 size=101128 width=26) -> Seq Scan on po (cost=530.38 size=10860 width=10) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on material (cost=5.47 size=105 width=16) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on tickets (cost=878.78 size=19872 width=10) The only GUESS I can come up with is that It's joining po & material then tickets, vs po & tickets then material, the latter one would be more efficient, since po & material are going to knock out lots from po, so there would be less joining against material which will have a component in material for all po. I have indexes on all involved rows. > > > I have another query that joins the result of this with 5 other > > tables, unfortunately that one takes like 10 minutes... > > How long does it take to EXPLAIN that query? Postgres' optimizer has > some bugs that cause it to take an unreasonable amount of time to plan > a query that joins more than a few tables. You can tell whether it's > the planning or execution time that's the problem by comparing EXPLAIN > runtime to the actual query. (The bugs are fixed in 6.5, btw. In the > meantime a workaround is to reduce the GEQO threshold to less than the > number of tables in the query that's giving you trouble.) > > regards, tom lane The explain takes a few seconds, maybe 10... It's definately a runtime thing, one time I ran the silly thing and it sucked up 192M of RAM(64M Physical + 128M swap) then died... Sigh. I'm slowly working my way back up to it, I just bought 128M extra RAM... If you could shed some light on the extreme performance difference for a simple change of order in the SELECT part of it I'd appreciate it.... I don't think PostgreSQL is using my indexes very well, or at all... :( Sorry about the length of the message, I wanted to include as much detail as possible. David Secret MIS Director Kearney Development Co., Inc.