Re: [SQL] Re: How to optimize a query... -- Extreme performance difference, same EXPLAIN - Mailing list pgsql-sql
From | secret |
---|---|
Subject | Re: [SQL] Re: How to optimize a query... -- Extreme performance difference, same EXPLAIN |
Date | |
Msg-id | 36E6EB54.D6624BD@kearneydev.com Whole thread Raw |
In response to | Re: How to optimize a query... (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-sql |
Tom Lane wrote: > > I originally had this query with an OR clause instead of the UNION, > > which made it amazingly impossibly slow, with the UNION it's far faster > > however it's still very slow,(10,000+ rows in both stables), is there a > > way to speed this up? > > Hmm. What does EXPLAIN give as the query plan if you use the OR > approach? The UNION method requires much more work than a decent OR > plan would need; as you can see from the query plan, it has to sort and > unique-ify the results of the two sub-selects in order to discard > duplicate tuples. If you don't mind seeing the same tuple twice when > it matches on both PO fields, you could use UNION ALL rather than UNION > to avoid the sort step. But that's just a hack... > Explain seems to be telling a different story now... It used to give a bunch of sequential scans, when I used UNION it gave less hash joins, I thought the less hash joins might be better(And they were, one would take hours & run out of memory the other would take an hour & not run out of memory)... I have a stripped down version of the query I need, the UNION takes: ! system usage stats: ! 938.603740 elapsed 914.250000 user 1.140000 system sec ! [914.260000 user 1.140000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 3799/13766 [3915/13873] page faults/reclaims, 654 [654] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 100.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 688 read, 716 written When I use OR: ! system usage stats: ! 706.158704 elapsed 698.970000 user 0.370000 system sec ! [698.980000 user 0.370000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 1025/6810 [1062/6906] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 232 read, 0 written, buffer hit rate = 84.61% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written > > I think the problem might be that Postgres doesn't know how to handle > ORed join clauses very efficiently. Something to work on in the future. > I think that may of been because I didn' tknow to VACUUM ANALYZE, however today I found a much more disturbing thing, if I simply change the order of the SELECT clause I can change a 9.69 second query into a 605 second query, below is the example: SELECT po_id, <- PK of po material.name, <- Notice material name here. ticket_pk <- PK of tickets FROM po,tickets,material WHERE po_id=material_po AND po.units_id=tickets.units_id AND po.material_id=material.material_id ; ! system usage stats: ! 315.427223 elapsed 315.010000 user 0.250000 system sec ! [315.020000 user 0.250000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 645/6717 [682/6812] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 144 read, 0 written, buffer hit rate = 89.66% ! Local blocks: 0 read, 0 written, buffer hit rate = 0.00% ! Direct blocks: 0 read, 0 written select po_id,ticket_pk,material.name FROM po,tickets,material WHERE po_id=material_po AND po.units_id=tickets.units_id AND po.material_id=material.material_id; ! system usage stats: ! 2.290408 elapsed 1.890000 user 0.250000 system sec ! [1.900000 user 0.250000 sys total] ! 0/0 [0/0] filesystem blocks in/out ! 73/6691 [110/6786] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 0/0 [0/0] voluntary/involuntary context switches ! postgres usage stats: ! Shared blocks: 0 read, 0 written, buffer hit rate = 10 0.00% ! Local blocks: 0 read, 0 written, buffer hit rate = 0. 00% ! Direct blocks: 0 read, 0 written 315 seconds vs 2.3 seconds? Is there some light you can shed on how queries are processed under PostgreSQL? They both have the exact same EXPLAIN: Hash Join (cost=5773.47 size=24494486 width=36) -> Hash Join (cost=899.69 size=101128 width=26) -> Seq Scan on po (cost=530.38 size=10860 width=10) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on material (cost=5.47 size=105 width=16) -> Hash (cost=0.00 size=0 width=0) -> Seq Scan on tickets (cost=878.78 size=19872 width=10) The only GUESS I can come up with is that It's joining po & material then tickets, vs po & tickets then material, the latter one would be more efficient, since po & material are going to knock out lots from po, so there would be less joining against material which will have a component in material for all po. I have indexes on all involved rows. > > > I have another query that joins the result of this with 5 other > > tables, unfortunately that one takes like 10 minutes... > > How long does it take to EXPLAIN that query? Postgres' optimizer has > some bugs that cause it to take an unreasonable amount of time to plan > a query that joins more than a few tables. You can tell whether it's > the planning or execution time that's the problem by comparing EXPLAIN > runtime to the actual query. (The bugs are fixed in 6.5, btw. In the > meantime a workaround is to reduce the GEQO threshold to less than the > number of tables in the query that's giving you trouble.) > > regards, tom lane The explain takes a few seconds, maybe 10... It's definately a runtime thing, one time I ran the silly thing and it sucked up 192M of RAM(64M Physical + 128M swap) then died... Sigh. I'm slowly working my way back up to it, I just bought 128M extra RAM... If you could shed some light on the extreme performance difference for a simple change of order in the SELECT part of it I'd appreciate it.... I don't think PostgreSQL is using my indexes very well, or at all... :( Sorry about the length of the message, I wanted to include as much detail as possible. David Secret MIS Director Kearney Development Co., Inc.