Thread: Is there something wrong with my test case?
Hi all and merry Christmas I was under the impression that updating a table with values from (an) other table(s) would be implemented most efficiently with a correlated subquery a long the schema as follows, let's name it A. update TO_BE_UPDATED U set ({column list}) = ({correlated subquery 1}) where exists ({correlated subquery 1}) ; I set up a test case to figure out if this is the case. I compared the execution plan of a query with above pattern with the execution plans with the following patterns. B update TO_BE_UPDATED U set COL_1 = ({correlated subquery 1}), COL_2 = ({correlated subquery 2}) where exists ({correlated subquery 3}) ; C update TO_BE_UPDATED U set COL_1 = ({correlated subquery 1}), COL_2 = ({correlated subquery 2}) where U.KEY_U in ({correlated subquery 3}) ; Explain analyze verbose showed for: A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0 loops=1) B (cost=264.72..974.25 rows=31 width=58) (actual time=1.508..1.508 rows=0 loops=1) C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0 loops=1) I am very surprised that the cost of A is (much) higher than that of C which I suspected to be the most inefficient. I was that much fixed on the costs that I initially ignored the actual time where my assumptions on efficiency are reflected. Funny though is that the subjective impression when waiting for the update queries to complete was that C was fastest by far, followed by B and only at the end was update A. Now I wonder whether I do not know, how to read the explain plan summary, I should not trust my subjective impression with respect to time measurement or if my test case is faulty. I carried out the test on an idle Windows 10 laptop with portable PostgreSQL 10.4 provided by PortableApps. You can find test case script and log at https://pastebin.com/W2HsTBwi I would appreciate your two dimes. Kind regards Thiemo -- Signal: +49 1578 7723737 Handys: +41 78 947 36 21 | +49 1578 772 37 37 Tox-Id: B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B
On 2018-12-25 11:54:11 +0000, Thiemo Kellner wrote: [three different but functionally equivalent queries] > Explain analyze verbose showed for: > A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0 > loops=1) > B (cost=264.72..974.25 rows=31 width=58) (actual time=1.508..1.508 rows=0 > loops=1) > C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0 > loops=1) > > I am very surprised that the cost of A is (much) higher than that of C which 626.97 doesn't seem "much higher" to me than 611.19. I would call that "about the same". > I suspected to be the most inefficient. I was that much fixed on the costs > that I initially ignored the actual time where my assumptions on efficiency > are reflected. Funny though is that the subjective impression when waiting > for the update queries to complete was that C was fastest by far, followed > by B and only at the end was update A. This is weird. C takes almost exactly twice as long as A, and while humans aren't very good at estimating times, One second should feel faster than two, not slower, and certainly not slower by far. Is it possible that your subjective impression wasn't based on the executions you posted but on others? Caching and load spikes can cause quite large variations in run time, so running the same query again may not take the same time (usually the second time is faster - sometimes much faster). hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment
On Wed, 26 Dec 2018 at 00:54, Thiemo Kellner <thiemo@gelassene-pferde.biz> wrote: > Explain analyze verbose showed for: > A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 > rows=0 loops=1) > B (cost=264.72..974.25 rows=31 width=58) (actual time=1.508..1.508 > rows=0 loops=1) > C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 > rows=0 loops=1) > > I am very surprised that the cost of A is (much) higher than that of C > which I suspected to be the most inefficient. I was that much fixed on > the costs that I initially ignored the actual time where my > assumptions on efficiency are reflected. Funny though is that the > subjective impression when waiting for the update queries to complete > was that C was fastest by far, followed by B and only at the end was > update A. While the times mentioned in "actual time" are for execution only and don't account for the time taken to plan the query, the results you put in [1] disagree entirely with your claim that 'C' was faster. 'A' comes out fastest with the explain analyzes you've listed. A: Planning TIME: 0.423 ms Execution TIME: 1.170 ms C: Planning TIME: 0.631 ms Execution TIME: 2.281 ms Have you confused each of the results, perhaps because they're in a different order as to your cases above? I'd certainly expect 'A' to be the fastest of the bunch since it's both less effort for the planner and also the executor. I didn't look at why the cost is estimated to be slightly higher, but the planner wouldn't consider rewriting the queries to one of the other cases anyway, so it's likely not that critical that the costings are slightly out from reality. > where U.KEY_U in ({correlated subquery 3}) This is not correlated in [1]. [1] https://pastebin.com/W2HsTBwi -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi HP Thanks for your reply. Quoting "Peter J. Holzer" <hjp-pgsql@hjp.at>: > On 2018-12-25 11:54:11 +0000, Thiemo Kellner wrote: > [three different but functionally equivalent queries] > >> Explain analyze verbose showed for: >> A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0 >> loops=1) >> C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0 >> loops=1) > > 626.97 doesn't seem "much higher" to me than 611.19. I would call that > "about the same". > So would I but the cost is given as a range. Taking the the average somewhat 400 compare to somewhat 300. I do not know whether averaging is appropriate here. > This is weird. C takes almost exactly twice as long as A, and while > humans aren't very good at estimating times, One second should feel > faster than two, not slower, and certainly not slower by far. Is it > possible that your subjective impression wasn't based on the executions > you posted but on others? Caching and load spikes can cause quite large > variations in run time, so running the same query again may not take the > same time (usually the second time is faster - sometimes much faster). I am pretty sure not to have confused the association of my felt time measure to the query. I did run the script several times but as the script create everything anew this might not have any impact caching wise. However, if caching had an impact it just would add to the discrepancy between feeling that the first statement was much faster than the supposedly optimal statement. Being as may, there is still fact that cost for A was estimated about the same as C or much higher but A was executed in half of the time of C. -- Signal: +49 1578 7723737 Handys: +41 78 947 36 21 | +49 1578 772 37 37 Tox-Id: B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B
Hi David Thanks for your revision. Quoting David Rowley <david.rowley@2ndquadrant.com>: > On Wed, 26 Dec 2018 at 00:54, Thiemo Kellner > <thiemo@gelassene-pferde.biz> wrote: >> Explain analyze verbose showed for: >> A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 >> rows=0 loops=1) >> B (cost=264.72..974.25 rows=31 width=58) (actual time=1.508..1.508 >> rows=0 loops=1) >> C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 >> rows=0 loops=1) >> >> I am very surprised that the cost of A is (much) higher than that of C >> which I suspected to be the most inefficient. I was that much fixed on >> the costs that I initially ignored the actual time where my >> assumptions on efficiency are reflected. Funny though is that the >> subjective impression when waiting for the update queries to complete >> was that C was fastest by far, followed by B and only at the end was >> update A. > > While the times mentioned in "actual time" are for execution only and > don't account for the time taken to plan the query, the results you > put in [1] disagree entirely with your claim that 'C' was faster. 'A' > comes out fastest with the explain analyzes you've listed. > > A: > Planning TIME: 0.423 ms > Execution TIME: 1.170 ms > > C: > Planning TIME: 0.631 ms > Execution TIME: 2.281 ms > > Have you confused each of the results, perhaps because they're in a > different order as to your cases above? I am pretty sure I did not confuse. I am not worried about planning times as I assume that PostgreSQL has a time limit restricting the time used to find the best execution path in the order of seconds such that for a heavy load query it would get neglectable. > I'd certainly expect 'A' to be the fastest of the bunch since it's > both less effort for the planner and also the executor. I didn't look > at why the cost is estimated to be slightly higher, but the planner > wouldn't consider rewriting the queries to one of the other cases > anyway, so it's likely not that critical that the costings are > slightly out from reality. I am glad, that my feeling what should be the best query meets reality. However, I am left a bit concerned about the usefulness of the costs of the execution plan. I feel the costs rather contradict the actual execution times in my test case. To me this would render the cost useless for comparison of queries. >> where U.KEY_U in ({correlated subquery 3}) > > This is not correlated in [1]. > > [1] https://pastebin.com/W2HsTBwi Right you are, my fault. Thanks for your attention there as well. :-) -- Signal: +49 1578 7723737 Handys: +41 78 947 36 21 | +49 1578 772 37 37 Tox-Id: B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B
Hi Thiemo,
The cost is not a range. The 2 numbers you see are:
On 07/01/2019 11:30, Thiemo Kellner wrote:
Hi HP
Thanks for your reply.
Quoting "Peter J. Holzer" <hjp-pgsql@hjp.at>:On 2018-12-25 11:54:11 +0000, Thiemo Kellner wrote:
[three different but functionally equivalent queries]Explain analyze verbose showed for:
A (cost=264.72..626.97 rows=31 width=90) (actual time=1.117..1.117 rows=0
loops=1)
C (cost=0.42..611.19 rows=31 width=52) (actual time=2.217..2.217 rows=0
loops=1)
626.97 doesn't seem "much higher" to me than 611.19. I would call that
"about the same".
So would I but the cost is given as a range. Taking the the average somewhat 400 compare to somewhat 300. I do not know whether averaging is appropriate here.
The cost is not a range. The 2 numbers you see are:
Estimated start-up cost. This is the time expended before the output phase can begin, e.g., time to do the sorting in a sort node.
- Estimated total cost. This is stated on the assumption that the plan node is run to completion, i.e., all available rows are retrieved. In practice a node's parent node might stop short of reading all available rows (see the LIMIT example below).
As you can read here:
https://www.postgresql.org/docs/current/using-explain.html
regards,
fabio pardi
Hi Fabio Quoting Fabio Pardi <f.pardi@portavita.eu>: > The cost is not a range. The 2 numbers you see are: > > > * > Estimated start-up cost. This is the time expended before the > output phase can begin, e.g., time to do the sorting in a sort node. > > * Estimated total cost. This is stated on the assumption that > the plan node is run to completion, i.e., all available rows are > retrieved. In practice a node's parent node might stop short of > reading all available rows (see the LIMIT example below). > > As you can read here: > > https://www.postgresql.org/docs/current/using-explain.html Thanks for pointing this out. I was mislead than be the form those two different things are displayed. -- Signal: +49 1578 7723737 Handys: +41 78 947 36 21 | +49 1578 772 37 37 Tox-Id: B7282A9E31D944DDAD1ECC5B33C36AAE80B32D119AB7391D602C937A65E8CA0625C495C5322B