Re: [PERFORM] How to read query plan - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PERFORM] How to read query plan |
Date | |
Msg-id | 6715.1110758921@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [PERFORM] How to read query plan
Re: [PERFORM] How to read query plan |
List | pgsql-hackers |
=?windows-1250?Q?Miroslav_=8Aulc?= <miroslav.sulc@startnet.cz> writes: >> Is the data confidential? If you'd be willing to send me a pg_dump >> off-list, I'd like to replicate this test and try to see where the time >> is going. >> > Thank you very much for your offer. The data are partially confidental > so I hashed some of the text information and changed some values (not > the values for the JOINs) so I could send it to you. I've checked the > EXPLAIN ANALYZE if anything changed and the result is merely the same > (maybe cca 1 sec slower - maybe because the hash caused the text data to > be longer). No problem; thank you for supplying the test case. What I find is rather surprising: most of the runtime can be blamed on disassembling and reassembling tuples during the join steps. Here are the hot spots according to gprof: ----------------------------------------------- 1.27 7.38 8277/103737 ExecScan [16] 2.93 17.02 19092/103737 ExecNestLoop <cycle 2> [14] 3.91 22.70 25456/103737 ExecMergeJoin <cycle 2> [13] 7.81 45.40 50912/103737 ExecHashJoin <cycle 2> [12] [9] 86.3 15.92 92.50 103737 ExecProject [9] 7.65 76.45 8809835/9143692 ExecEvalVar [10] 3.42 4.57 103737/103775 heap_formtuple [17] 0.03 0.24 12726/143737 ExecMakeFunctionResultNoSets [24] 0.02 0.12 103737/290777 ExecStoreTuple [44] 0.01 0.00 2/2 ExecEvalFunc [372] 0.00 0.00 2/22 ExecMakeFunctionResult [166] ----------------------------------------------- 0.00 0.00 42/9143692 ExecEvalFuncArgs [555] 0.05 0.51 59067/9143692 ExecHashGetHashValue [32] 0.24 2.38 274748/9143692 ExecMakeFunctionResultNoSets [24] 7.65 76.45 8809835/9143692 ExecProject [9] [10] 69.5 7.94 79.34 9143692 ExecEvalVar [10] 79.34 0.00 8750101/9175517 nocachegetattr [11] ----------------------------------------------- I think the reason this is popping to the top of the runtime is that the joins are so wide (an average of ~85 columns in a join tuple according to the numbers above). Because there are lots of variable-width columns involved, most of the time the fast path for field access doesn't apply and we end up going to nocachegetattr --- which itself is going to be slow because it has to scan over so many columns. So the cost is roughly O(N^2) in the number of columns. As a short-term hack, you might be able to improve matters if you can reorder your LEFT JOINs to have the minimum number of columns propagating up from the earlier join steps. In other words make the later joins add more columns than the earlier, as much as you can. This is actually good news, because before 8.0 we had much worse problems than this with extremely wide tuples --- there were O(N^2) behaviors all over the place due to the old List algorithms. Neil Conway's rewrite of the List code got rid of those problems, and now we can see the places that are left to optimize. The fact that there seems to be just one is very nice indeed. Since ExecProject operations within a nest of joins are going to be dealing entirely with Vars, I wonder if we couldn't speed matters up by having a short-circuit case for a projection that is only Vars. Essentially it would be a lot like execJunk.c, except able to cope with two input tuples. Using heap_deformtuple instead of retail extraction of fields would eliminate the O(N^2) penalty for wide tuples. regards, tom lane
pgsql-hackers by date: