Scrollable cursors and Sort performance - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Scrollable cursors and Sort performance |
Date | |
Msg-id | 1139578364.1258.469.camel@localhost.localdomain Whole thread Raw |
Responses |
Re: Scrollable cursors and Sort performance
Re: Scrollable cursors and Sort performance |
List | pgsql-hackers |
I'm interested in the behaviour of ExecSort, which *for all queries* prepares the sort result for randomAccess, even when part of a plan that will *never* go backwards/rewind etc. A recent performance test shows this output from mid-way through a heap sort with trace_sort=on (the query itself is not relevant here) > LOG: 00000: finished writing final run 65 to tape 64: CPU 57.50s/484.51u sec elapsed 597.46 sec > LOG: 00000: finished merge step: CPU 107.90s/653.53u sec elapsed 941.83 sec Which shows that the *unnecessary* final merge takes 344 secs, adding approximately 60% to the elapsed time of the query and nearly doubling the CPU requirement. [Aside: you'll notice the above test was performed with my recent sort improvement patch applied, but the behaviour of ExecSort is identical in both cases. However in the current cvstip case, you simply don't notice the extra expense of the request for randomAccess because of the additional time taken by the sort] So, why does the planner think random access is required? Well, only for use in queries; CREATE INDEX for example does not force this. To allow support for CURSORs, of course. >From the code, we never call ExecSort with a direction other than Forward unless we are issuing a FETCH with a direction other than one identified internally as FETCH_FORWARD. According to the SQL standard, that can only happen when a scrollable cursor has been declared using DECLARE ... SCROLL. The current PostgreSQL manual says the following: FETCH: "The cursor should be declared with the SCROLL option if one intends to use any variants of FETCH other than FETCH NEXT or FETCH FORWARD with a positive count. For simple queries PostgreSQL will allow backwards fetch from cursors not declared with SCROLL, but this behavior is best not relied on. If the cursor is declared with NO SCROLL, no backward fetches are allowed." DECLARE: "The SCROLL option should be specified when defining a cursor that will be used to fetch backwards. This is required by the SQL standard. However, for compatibility with earlier versions, PostgreSQL will allow backward fetches without SCROLL, if the cursor's query plan is simple enough that no extra overhead is needed to support it. However, application developers are advised not to rely on using backward fetches from a cursor that has not been created with SCROLL. If NO SCROLL is specified, then backward fetches are disallowed in any case." The current behaviour is to plan every query as if it would allow backwards scans, then immediately disallow backwards scans *if* it fails the "no extra overhead" test later on in the Declare Cursor processing. [portalcmds.c:PerformCursorOpen()]. (i.e. you pay, but get no benefit) My suggestion is that the backwards-compatible behaviour of allowing backwards/absolute FETCHes *without* a specific SCROLL command be deprecated in the next release, so that the default is *disallow*. We've warned people and now its time to turn it off by default. (This would be re-enabled using default_cursor_scroll = on). If that is not acceptable, then we should re-evaluate the idea that sorts *always* allow backward scans [execAmi.c:ExecSupportsBackwardScan()], replacing this either with *never* or some kind of query cost test (but that seems much less preferable). Materialize already provides the infrastructure required to do this. This will then allow us to use the firm knowledge that a plan will only ever be scanned in a Forwards direction at plan time. ...and that will allow us to optimize out the rather large step taken during Sort to freeze the result unnecessarily for randomAccess. This will then give a good perfomance gain for larger joins and aggregations, neither of which would ever allow backwards scans using them current method anyway. I intend to add a short patch to pass down the cursor state during planning, so that when it is explicitly specified the sort node is able to recognise this and avoid work. Also, to add a GUC to force the not-explicitly-specified case to be the same as the NO SCROLL case, as the standard requires. Comments? Best Regards, Simon Riggs
pgsql-hackers by date: