Thread: Help interpreting the output of EXPLAIN
All, I have an 4-5 table SQL database where I do queries of various expense. The most expensive query involves a UNION of two inner joins. I'm having trouble interpreting the output of the EXPLAIN in order to optimise it. Some problems I'm having: 1) The costs seem to change radically without me taking any action - do I have to run them on an unloaded machine? 2) VACUUM seemed to massively increase the cost 3) VACUUM ANALYZE decreased it - so, should I always do "VACUUM; VACUUM ANALYZE;" 4) I'm using LIMIT after the outer join - should I LIMIT the two inner joins to the same amount, since or will Postgres do this itself? After a VACUUM ANALYZE and using the default indexing scheme (probably what's slowing me down) EXPLAIN gives this: 1 Unique (cost=27403.90..27559.07 rows=1034 width=84) 2 -> Sort (cost=27403.90..27403.90 rows=10345 width=84) 3 -> Append (cost=716.18..26447.76 rows=10345 width=84) 4 -> Hash Join (cost=716.18..20480.67 rows=9209 width=72) 5 -> Nested Loop (cost=0.00..18931.96 rows=9209 width=60) 6 -> Seq Scan on host (cost=0.00..211.09 rows=9209 width=42) 7 -> Index Scan using interface_pkey on interface (cost=0.00..2.02 rows=1 width=18) 8 -> Hash (cost=604.94..604.94 rows=17294 width=12) 9 -> Seq Scan on machine (cost=0.00..604.94 rows=17294 width=12) 10 -> Hash Join (cost=4656.18..5967.08 rows=1136 width=84) 11 -> Seq Scan on machine (cost=0.00..604.94 rows=17294 width=12) 12 -> Hash (cost=4653.34..4653.34 rows=1136 width=72) 13 -> Nested Loop (cost=0.00..4653.34 rows=1136 width=72) 14 -> Nested Loop (cost=0.00..2343.98 rows=1136 width=54) 15 -> Seq Scan on alias (cost=0.00..26.36 rows=1136 width=36) 16 -> Index Scan using host_pkey on host (cost=0.00..2.03 rows=1 width=18) 17 -> Index Scan using interface_pkey on interface (cost=0.00..2.02 rows=1 width=18) Am I right in thinking that the loop on line 5 is costing most, and what's costing most under that is the sequential scan on host? That corresponds to a join of the form "host.mac = interface.mac", so I should hash index the host.mac column? Unfortunately, mac is of type "macaddr" and this happens: hdb=> create index host_mac on host using hash (mac); ERROR: fmgr_info: function 0: cache lookup failed hdb=> I'm more used to optimising C than SQL - how can you tell from an explain which type of index will decrease the cost (or how to restructure your query to lower the cost). Will postgres order WHERE clauses such that the least expensive happens first? Regards, Phil +----------------------------------+ | Phil Mayers, Network Support | | Centre for Computing Services | | Imperial College | +----------------------------------+
I need to remove a FOREIGN KEY constraint; I take it that a foreign key constraint is implemented by way of three CONSTRAINT TRIGGERS; however, I can find no way to remove or drop a CONSTRAINT TRIGGER. Can someone please point me in the right direction? Thank you, Bryan
I believe you should be able to use DROP TRIGGER to drop the triggers once you have their trigger name from pg_trigger. On Fri, 8 Dec 2000, Bryan (Mailing Lists) wrote: > I need to remove a FOREIGN KEY constraint; I take it that a foreign key > constraint is implemented by way of three CONSTRAINT TRIGGERS; however, I > can find no way to remove or drop a CONSTRAINT TRIGGER. Can someone please > point me in the right direction?
"Bryan \(Mailing Lists\)" <bryan_lists@netmeme.org> writes: > I need to remove a FOREIGN KEY constraint; I take it that a foreign key > constraint is implemented by way of three CONSTRAINT TRIGGERS; however, I > can find no way to remove or drop a CONSTRAINT TRIGGER. Doesn't a regular DROP TRIGGER work? (I dunno, I haven't tried it.) regards, tom lane
"Mayers, Philip J" <p.mayers@ic.ac.uk> writes: > 1) The costs seem to change radically without me taking any action - do I > have to run them on an unloaded machine? > 2) VACUUM seemed to massively increase the cost > 3) VACUUM ANALYZE decreased it - so, should I always do "VACUUM; VACUUM > ANALYZE;" The cost estimates certainly don't depend on load ;-). They aren't going to change when you "haven't taken any action", either. However, they do depend on statistics gathered by VACUUM and VACUUM ANALYZE, so the estimates can change after you run those. > 4) I'm using LIMIT after the outer join - should I LIMIT the two inner joins > to the same amount, since or will Postgres do this itself? Hard to say. You're not going to get much useful response about a query as complex as this one evidently is when you haven't shown us the query itself and the declarations of the relevant tables/indexes. > a join of the form "host.mac = interface.mac", so I should hash index the > host.mac column? Unfortunately, mac is of type "macaddr" and this happens: > hdb=> create index host_mac on host using hash (mac); > ERROR: fmgr_info: function 0: cache lookup failed Hash indexes on macaddr don't work in 7.0 (the support's only partially present). Personally I can see very little use for hash indexes anyway. I don't know of any situation where I'd prefer hash to btree, at least not given the implementations of the two in Postgres. regards, tom lane
Ok, I'm slowly getting it. The table declarations and query are shown at: http://blacklotus.cc.ic.ac.uk/~pjm3/sql.txt http://blacklotus.cc.ic.ac.uk/~pjm3/query The output of the explain with just the SQL is at: http://blacklotus.cc.ic.ac.uk/~pjm3/explain1 So, the major optimisation point seems to be the nested loop on line 5 - the major portion of the cost there is coming from 9000 rows times 2.0 cost for index scan on interface_pkey, which corresponds to the join on line 11 of the query. Hmm - interface.mac is the primary key and thus already indexed - if I eliminate the sequential scan on host (line 6 of the explain) by indexing the host.mac (which is macaddr, so no hash index), what happens: http://blacklotus.cc.ic.ac.uk/~pjm3/explain2 Now the major portion of the cost is an index scan on host_mac at a cost of 2.0, as the inner scan of a nested join of 9000 rows, so a cost of (surprise) 18000 again... Damn. I'm stuck - restructure the query? Cheers, Phil -----Original Message----- From: Tom Lane To: Mayers, Philip J Cc: 'pgsql-general@postgresql.org' Sent: 09/12/00 00:07 Subject: Re: [GENERAL] Help interpreting the output of EXPLAIN "Mayers, Philip J" <p.mayers@ic.ac.uk> writes: > 1) The costs seem to change radically without me taking any action - do I > have to run them on an unloaded machine? > 2) VACUUM seemed to massively increase the cost > 3) VACUUM ANALYZE decreased it - so, should I always do "VACUUM; VACUUM > ANALYZE;" The cost estimates certainly don't depend on load ;-). They aren't going to change when you "haven't taken any action", either. However, they do depend on statistics gathered by VACUUM and VACUUM ANALYZE, so the estimates can change after you run those. > 4) I'm using LIMIT after the outer join - should I LIMIT the two inner joins > to the same amount, since or will Postgres do this itself? Hard to say. You're not going to get much useful response about a query as complex as this one evidently is when you haven't shown us the query itself and the declarations of the relevant tables/indexes. > a join of the form "host.mac = interface.mac", so I should hash index the > host.mac column? Unfortunately, mac is of type "macaddr" and this happens: > hdb=> create index host_mac on host using hash (mac); > ERROR: fmgr_info: function 0: cache lookup failed Hash indexes on macaddr don't work in 7.0 (the support's only partially present). Personally I can see very little use for hash indexes anyway. I don't know of any situation where I'd prefer hash to btree, at least not given the implementations of the two in Postgres. regards, tom lane