Thread: Uninterruptible slow geo_ops.c
Hi, A customer of ours hit some very slow code while running the @>(polygon, polygon) operator with some big polygons. I'm not familiar with this stuff but I think the problem is that the algorithm converges too slowly to a solution and also has some pretty expensive calls somewhere. (Perhaps there is also a problem that the algorithm *never* converges for some inputs ...) While I'm not familiar with the code itself, and can't post the exact slow query just yet, I have noticed that it is missing a CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query. I'd backpatch this all the way back. (The exact issue they hit is mutual recursion between touched_lseg_between_poly and lseg_between_poly. Since the latter also recurses on itself, the best way forward seem to add a check for interrupts in the loop there.) I will follow up on the actual slowness later, as warranted. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 11/12/15 18:48, Alvaro Herrera wrote: > Hi, > > A customer of ours hit some very slow code while running the > @>(polygon, polygon) operator with some big polygons. I'm not familiar > with this stuff but I think the problem is that the algorithm converges > too slowly to a solution and also has some pretty expensive calls > somewhere. (Perhaps there is also a problem that the algorithm *never* > converges for some inputs ...) > > While I'm not familiar with the code itself, and can't post the exact > slow query just yet, I have noticed that it is missing a > CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query. I'd > backpatch this all the way back. (The exact issue they hit is mutual > recursion between touched_lseg_between_poly and lseg_between_poly. > Since the latter also recurses on itself, the best way forward seem to > add a check for interrupts in the loop there.) > > I will follow up on the actual slowness later, as warranted. > I would add that it was not simply a slow computation, but more probably they hit a case where the algorithm doesn't convergeat all. I've killed it manually by calling ProcessInterrupts() through gdb after 7 days and half of CPU time (100% of one CPU). The server CPU is an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. The query doesn't involve any table and is a simple call of @>(polygon, polygon) operator. SELECT polygon 'poligon literal with 522 points' @> polygon 'poligon box' I'm checking if we can share the full query. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
On 11/12/15 19:18, Marco Nenciarini wrote: > On 11/12/15 18:48, Alvaro Herrera wrote: >> Hi, >> >> A customer of ours hit some very slow code while running the >> @>(polygon, polygon) operator with some big polygons. I'm not familiar >> with this stuff but I think the problem is that the algorithm converges >> too slowly to a solution and also has some pretty expensive calls >> somewhere. (Perhaps there is also a problem that the algorithm *never* >> converges for some inputs ...) >> >> While I'm not familiar with the code itself, and can't post the exact >> slow query just yet, I have noticed that it is missing a >> CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query. I'd >> backpatch this all the way back. (The exact issue they hit is mutual >> recursion between touched_lseg_between_poly and lseg_between_poly. >> Since the latter also recurses on itself, the best way forward seem to >> add a check for interrupts in the loop there.) >> >> I will follow up on the actual slowness later, as warranted. >> > > I would add that it was not simply a slow computation, but more probably they hit a case where the algorithm doesn't convergeat all. > > I've killed it manually by calling ProcessInterrupts() through gdb after 7 days and half of CPU time (100% of one CPU). > The server CPU is an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. > > The query doesn't involve any table and is a simple call of @>(polygon, polygon) operator. > > SELECT polygon 'poligon literal with 522 points' @> polygon 'poligon box' > > I'm checking if we can share the full query. > The full query is attached. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
Alvaro Herrera wrote: > While I'm not familiar with the code itself, and can't post the exact > slow query just yet, I have noticed that it is missing a > CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query. I'd > backpatch this all the way back. (The exact issue they hit is mutual > recursion between touched_lseg_between_poly and lseg_between_poly. > Since the latter also recurses on itself, the best way forward seem to > add a check for interrupts in the loop there.) Pushed this part. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[Replying to an old thread...] > A customer of ours hit some very slow code while running the > @>(polygon, polygon) operator with some big polygons. I'm not familiar > with this stuff but I think the problem is that the algorithm converges > too slowly to a solution and also has some pretty expensive calls > somewhere. (Perhaps there is also a problem that the algorithm *never* > converges for some inputs ...) I believe there is a simple flaw on the algorithm that causes it loop endlessly. It should stop looking at the other segments of the polygon after finding the touching one. The attached patch fixes the issue for the query posted to this thread. I suggest backpacking the fix. There are many code quality issues and bugs like this still unresolved on geo_ops.c. I am trying to improve it. Any help appreciated: https://commitfest.postgresql.org/16/1160/
Attachment
Emre Hasegeli <emre@hasegeli.com> writes: > I believe there is a simple flaw on the algorithm that causes it loop > endlessly. It should stop looking at the other segments of the > polygon after finding the touching one. The attached patch fixes the > issue for the query posted to this thread. I suggest backpacking the > fix. I think this fix is wrong, because it assumes that the answer of touched_lseg_inside_poly is authoritative, which it clearly isn't; the blind "return true" for noncollinear segments breaks it. That is, if "a" is an interior point on the polygon's first segment, but "b" is outside the polygon, then lseg_inside_poly's first on_ps_internal call will succeed, its second will not, then it will call touched_lseg_inside_poly. All four of the latter's tests will fail and it'll return true. Your patch would have us break out of the loop without looking at any subsequent segments, which is wrong. It's a bit difficult to demonstrate the error, because lseg_inside_poly isn't exposed directly, only through poly_contain (maybe we should add an operator that does expose it?). But after experimenting awhile I found a counterexample: select '(0,0),(10,0),(0,10)'::polygon @> '(6,0),(7,0),(6,6)'::polygon; This surely should return false, and it does in HEAD, but your patch makes it return true. There are other things to be unhappy about. As written, lseg_inside_poly and touched_lseg_inside_poly can mutually recurse to a depth equal to the number of points in the polygon, which opens a stack-overflow hazard. I'd like to get rid of the recursion; it seems unnecessary. Also, while it's sadly underdocumented what touched_lseg_inside_poly is meant to accomplish at all, I think what it's there for is to deal with cases involving collinear polygon segments. That is, if we inject some useless points into the polygon, say '(0,0),(5,0),(10,0),(0,10)'::polygon then it should still be true that the line segment '(1,0),(9,0)' is "inside" this polygon, but we won't find that out if we just consider the (0,0),(5,0) and (5,0),(10,0) polygon segments independently. I am not, however, convinced that this coding fixes that problem if there are multiple polygon segments we need to match up against. What concerns me is that if the first collinear poly segment we come to is an interior subset of the segment-under-test, say we're considering line segment '(1,0),(9,0)' and we find a poly segment '(3,0),(5,0)', then we need to separately detect whether each of '(1,0),(3,0)' and '(5,0),(9,0)' are subsets of the rest of the polygon ... and I don't see where this code can handle that. I've not managed to build a weaponized test case though. regards, tom lane