Thread: Uninterruptible slow geo_ops.c

Uninterruptible slow geo_ops.c

From
Alvaro Herrera
Date:
Hi,

A customer of ours hit some very slow code while running the
@>(polygon, polygon) operator with some big polygons.  I'm not familiar
with this stuff but I think the problem is that the algorithm converges
too slowly to a solution and also has some pretty expensive calls
somewhere.  (Perhaps there is also a problem that the algorithm *never*
converges for some inputs ...)

While I'm not familiar with the code itself, and can't post the exact
slow query just yet, I have noticed that it is missing a
CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query.  I'd
backpatch this all the way back.  (The exact issue they hit is mutual
recursion between touched_lseg_between_poly and lseg_between_poly.
Since the latter also recurses on itself, the best way forward seem to
add a check for interrupts in the loop there.)

I will follow up on the actual slowness later, as warranted.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Uninterruptible slow geo_ops.c

From
Marco Nenciarini
Date:
On 11/12/15 18:48, Alvaro Herrera wrote:
> Hi,
>
> A customer of ours hit some very slow code while running the
> @>(polygon, polygon) operator with some big polygons.  I'm not familiar
> with this stuff but I think the problem is that the algorithm converges
> too slowly to a solution and also has some pretty expensive calls
> somewhere.  (Perhaps there is also a problem that the algorithm *never*
> converges for some inputs ...)
>
> While I'm not familiar with the code itself, and can't post the exact
> slow query just yet, I have noticed that it is missing a
> CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query.  I'd
> backpatch this all the way back.  (The exact issue they hit is mutual
> recursion between touched_lseg_between_poly and lseg_between_poly.
> Since the latter also recurses on itself, the best way forward seem to
> add a check for interrupts in the loop there.)
>
> I will follow up on the actual slowness later, as warranted.
>

I would add that it was not simply a slow computation, but more probably they hit a case where the algorithm doesn't
convergeat all. 

I've killed it manually by calling ProcessInterrupts() through gdb after 7 days and half of CPU time (100% of one CPU).
The server CPU is an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.

The query doesn't involve any table and is a simple call of @>(polygon, polygon) operator.
   SELECT polygon 'poligon literal with 522 points' @> polygon 'poligon box'

I'm checking if we can share the full query.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Uninterruptible slow geo_ops.c

From
Marco Nenciarini
Date:
On 11/12/15 19:18, Marco Nenciarini wrote:
> On 11/12/15 18:48, Alvaro Herrera wrote:
>> Hi,
>>
>> A customer of ours hit some very slow code while running the
>> @>(polygon, polygon) operator with some big polygons.  I'm not familiar
>> with this stuff but I think the problem is that the algorithm converges
>> too slowly to a solution and also has some pretty expensive calls
>> somewhere.  (Perhaps there is also a problem that the algorithm *never*
>> converges for some inputs ...)
>>
>> While I'm not familiar with the code itself, and can't post the exact
>> slow query just yet, I have noticed that it is missing a
>> CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query.  I'd
>> backpatch this all the way back.  (The exact issue they hit is mutual
>> recursion between touched_lseg_between_poly and lseg_between_poly.
>> Since the latter also recurses on itself, the best way forward seem to
>> add a check for interrupts in the loop there.)
>>
>> I will follow up on the actual slowness later, as warranted.
>>
>
> I would add that it was not simply a slow computation, but more probably they hit a case where the algorithm doesn't
convergeat all. 
>
> I've killed it manually by calling ProcessInterrupts() through gdb after 7 days and half of CPU time (100% of one
CPU).
> The server CPU is an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
>
> The query doesn't involve any table and is a simple call of @>(polygon, polygon) operator.
>
>     SELECT polygon 'poligon literal with 522 points' @> polygon 'poligon box'
>
> I'm checking if we can share the full query.
>

The full query is attached.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it

Attachment

Re: Uninterruptible slow geo_ops.c

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> While I'm not familiar with the code itself, and can't post the exact
> slow query just yet, I have noticed that it is missing a
> CHECK_FOR_INTERRUPTS() call to enable cancelling the slow query.  I'd
> backpatch this all the way back.  (The exact issue they hit is mutual
> recursion between touched_lseg_between_poly and lseg_between_poly.
> Since the latter also recurses on itself, the best way forward seem to
> add a check for interrupts in the loop there.)

Pushed this part.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Uninterruptible slow geo_ops.c

From
Emre Hasegeli
Date:
[Replying to an old thread...]

> A customer of ours hit some very slow code while running the
> @>(polygon, polygon) operator with some big polygons.  I'm not familiar
> with this stuff but I think the problem is that the algorithm converges
> too slowly to a solution and also has some pretty expensive calls
> somewhere.  (Perhaps there is also a problem that the algorithm *never*
> converges for some inputs ...)

I believe there is a simple flaw on the algorithm that causes it loop
endlessly.  It should stop looking at the other segments of the
polygon after finding the touching one.  The attached patch fixes the
issue for the query posted to this thread.  I suggest backpacking the
fix.

There are many code quality issues and bugs like this still unresolved
on geo_ops.c.  I am trying to improve it.  Any help appreciated:

https://commitfest.postgresql.org/16/1160/

Attachment

Re: [HACKERS] Uninterruptible slow geo_ops.c

From
Tom Lane
Date:
Emre Hasegeli <emre@hasegeli.com> writes:
> I believe there is a simple flaw on the algorithm that causes it loop
> endlessly.  It should stop looking at the other segments of the
> polygon after finding the touching one.  The attached patch fixes the
> issue for the query posted to this thread.  I suggest backpacking the
> fix.

I think this fix is wrong, because it assumes that the answer of
touched_lseg_inside_poly is authoritative, which it clearly isn't;
the blind "return true" for noncollinear segments breaks it.
That is, if "a" is an interior point on the polygon's first segment,
but "b" is outside the polygon, then lseg_inside_poly's first
on_ps_internal call will succeed, its second will not, then it
will call touched_lseg_inside_poly.  All four of the latter's
tests will fail and it'll return true.  Your patch would have us
break out of the loop without looking at any subsequent segments,
which is wrong.

It's a bit difficult to demonstrate the error, because lseg_inside_poly
isn't exposed directly, only through poly_contain (maybe we should add
an operator that does expose it?).  But after experimenting awhile
I found a counterexample:

select '(0,0),(10,0),(0,10)'::polygon @> '(6,0),(7,0),(6,6)'::polygon;

This surely should return false, and it does in HEAD, but your patch
makes it return true.

There are other things to be unhappy about.  As written, lseg_inside_poly
and touched_lseg_inside_poly can mutually recurse to a depth equal to
the number of points in the polygon, which opens a stack-overflow hazard.
I'd like to get rid of the recursion; it seems unnecessary.

Also, while it's sadly underdocumented what touched_lseg_inside_poly is
meant to accomplish at all, I think what it's there for is to deal with
cases involving collinear polygon segments.  That is, if we inject some
useless points into the polygon, say

    '(0,0),(5,0),(10,0),(0,10)'::polygon

then it should still be true that the line segment '(1,0),(9,0)' is
"inside" this polygon, but we won't find that out if we just consider
the (0,0),(5,0) and (5,0),(10,0) polygon segments independently.
I am not, however, convinced that this coding fixes that problem if
there are multiple polygon segments we need to match up against.
What concerns me is that if the first collinear poly segment we come
to is an interior subset of the segment-under-test, say we're
considering line segment '(1,0),(9,0)' and we find a poly segment
'(3,0),(5,0)', then we need to separately detect whether each of
'(1,0),(3,0)' and '(5,0),(9,0)' are subsets of the rest of the
polygon ... and I don't see where this code can handle that.
I've not managed to build a weaponized test case though.

            regards, tom lane