Re: [HACKERS] [PATCH] Improve geometric types - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [HACKERS] [PATCH] Improve geometric types |
Date | |
Msg-id | 20180112.171656.119920237.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Improve geometric types (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: [HACKERS] [PATCH] Improve geometric types
|
List | pgsql-hackers |
Hello, sorry for the absense. (Sorry for the long but should be hard-to-read article..) At Thu, 4 Jan 2018 14:53:47 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in <CAE2gYzz0hRfC-M8KDV4Aytj+6k6cvoiGqVvEjVyHTbtraf=YBQ@mail.gmail.com> > Rebased versions are attached. Thank you for the new patch. I'm looking 0001 patch. This does many things at once but seems hard to split into step-by-step patches. So I reviewed this from the viewpoint that how each of the external function is changed. (Attatchment 1). I think that this patch is not intending to change any behaviors of the external functions, but intending make it mathematically reasonable. So some behavioral changes are ineviablly requried. The problem here is what is the most reasonable (and acceptable) behavior. The comments below are just the starting of a discussion about some aspects of design. I'm sorry that this discussion might be changing the way to go largily, but I'd like to hear what you think about two topics. I found seemingly inconsistent handling of NaN. - Old box_same assumed NaN != NaN, but new assumes NaN == NaN. I'm not sure how the difference is significant out of the context of sorting, though... - box_overlap()'s behavior has not been changed. As the result box_same and box_overlap now behaves differently concerning NaN handling. - line_construct_pts has been changed so that generates {NaN,-1,NaN} for two identical points (old code generated {-1,0,0}) but I think the right behavior here is erroring out. (as line_in behaves.) I feel that it'd better to define how to handle non-numbers before refactoring. I prefer to just denying NaN as a part of the geometric types, or any input containing NaN just yields a result filled with NaNs. The next point is reasonability of the algorithm and consistency among related functions. - line_parallel considers the EPSILON(ugaa) with different scale from the old code, but both don't seem reasonable.. It might not be the time to touch it, but if we changed the behavior, I'd like to choose reasonable one. At least the tolerance of parallelity should be taken by rotation, not by slope. I think that one reasonable way to do this is taking the distance between the far ends of two direction (unit) vectors. Assuming Ax + By + C = 0, the direction vecotr dv(x, y) for the line is: n = sqrt(A^2 + B^2), dv = (B / n, -A / n). (and the normal vector nv = (A / n, B / n)) The vector needs to be restricted within upper or any two successive quadrants so that it is usable for this objective. So let's restrict A to be negative value for now. Something like the follwoing would be an answer. void line_direction_vector(Point *result, LINE *l) { float8 A = l->A; float8 B = l->B; float8 n; n = HYPOT(A, B); /* keep the result vector within the 1st-2nd quadrants */ if (A > 0) { A = -A; B = -B; } point_construct(result, B / n, -A / n); } bool line_parallel(LINE *l1, LINE *l2) { Point d1, d2; line_direction_vector(&d1, l1); line_direction_vector(&d2, l2); return FPzero(point_dt(&d1, &d2)); } Also perpendicularity is detected by comparing the direction vector of one line and the normal vector of another in the same manner. void line_normal_vector(Point *result, LINE *l) { float8 A = l->A; float8 B = l->B; float8 n; /* keep the result vector within the 1st-2nd quadrants */ n = HYPOT(A, B); if (B < 0) { A = -A; B = -B; } point_construct(result, A / n, B / n); } bool line_perp(LINE *l1, LINE *l2) { Point d1, d2; line_direction_vector(&d1, l1); line_normal_vector(&d2, l2); return FPzero(point_dt(&d1, &d2)); } In relation to this, equality of two lines is also nuisance. From the definitions, two equal lines are parallel. In contraposition, two nonparallel lines cannot be equal. This relationship is represented by the following expression: is_equal =: line_parallel(l1, l2) && FPzero(line_distance(l1, l2)) But the line_distance returns zero in most cases even if it is considered "parallel" with considering tolerance. This means that There's almost no two lines that are parallel but not equal (that is, the truely parallel lines)... sigh. So we might should calculate the distance in different (not purely mathematical/geometrical) way. For example, if we can assume the geometric objects are placed mainly around the origin, we could take the distance between the points nearest to origin on both lines. (In other words, between the foots of perpendicular lines from the origin onto the both lines). Of couse this is not ideal but... The same discussion also affects line_interpt_line or other similar functions. At last, just a simple comment. - point_eq_point() assumes that NaN == NaN. This is an inherited behavior from old float[48]_cmp_internal() but it's not a common treat. point_eq_point() needs any comment about the special definition as float[48]_cmp_internal has. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: