Thread: Optimizing CartoDB's JenksBins function

Optimizing CartoDB's JenksBins function

From
Seamus Abshere
Date:
hi,

Our friends from CartoDB [1] provide a beautiful Jenks Natural Breaks
function for Postgres [2]. It is quite computationally intensive.

Even if you don't know what Jenks is, do you see any optimizations?

Best, thanks,
Seamus

PS. I was hoping for something magical like Tom Lane's VALUES() fix for
DataDog [3] or HeapAnalytic's array fix [4], although I realize both are
fundamentally different :) This is probably a more subtle loop
optimization problem, if anything.


[1] http://cartodb.com
[2]
https://github.com/CartoDB/cartodb/blob/master/lib/sql/scripts-available/CDB_JenksBins.sql
[3]
https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/
[4]
http://blog.heapanalytics.com/dont-iterate-over-a-postgres-array-with-a-loop/


--
Seamus Abshere, SCEA
https://github.com/seamusabshere


Re: Optimizing CartoDB's JenksBins function

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/18/2014 09:42 AM, Seamus Abshere wrote:
> Our friends from CartoDB [1] provide a beautiful Jenks Natural
> Breaks function for Postgres [2]. It is quite computationally
> intensive.
>
> Even if you don't know what Jenks is, do you see any
> optimizations?

Not exactly what you asked, but you might try PL/R with the R
"classInt" package:

  http://cran.at.r-project.org/web/packages/classInt/classInt.pdf

HTH,

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTKIAcAAoJEDfy90M199hlTcIP/0jK/8yuAN140a4oj54H4bOr
mJwf1XAj0/CefwdgCkaSioiiwSOUGTuJyz+VH05GhKeuWchKiQzuD+w5/ijbjOFj
qnxupIoULxBfk/ZA/8bh6/54IGDIlA1F8lFH9lHntPCLGgNnZkwYGsPMGntXUf2/
WfesFAY8D185rnlEVWbC3ZUgeEXj1kxl6xjAJEXlGBoESfGWTKUI04tfg3+wTpHp
A2F2Ck579XuxA5WK62fk0+d+B+ua/tixl2xCuRotQvtSQ6LVPQOlo1mBGuVxYtou
KrBdyv7rxQzWOxgNePuMBqnMsSowFMP9XC7Lzdlg1NsHQL9o1yLT1bdIrT5lX5OQ
DErvAHwh87kfcyFOYTl6uwS0I89gOuaTimYkghlX7dCPWnEu0KKDaLJnVVkcNfi8
Xpyj42AodsfKb7F0elPCYqgBhv+pRwXaPP3ig9SYMgU/h82UFdk/RexqWYK5FwAU
XwoeY/NioKahaQYpMXEnWmnvvueuuz+s8KCbNj7avXPEJE4RxuNVLIYShrN64Ydx
bW/1/Hpv98Gn71ksSjrVlru1xaxArQhiAImuepWp76LvpvW1rtRUy4+fDDIek+nF
TeHiQrzgkdywSKTJCgqxTjIO/PkWV8JIXghc/m0z6qud+8YPa6nbWnIq0XjSjOIQ
jQ4u2w6r/7GOl93tT3yY
=zPkM
-----END PGP SIGNATURE-----


Re: Optimizing CartoDB's JenksBins function

From
Andy Colson
Date:
On 3/18/2014 12:19 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/18/2014 09:42 AM, Seamus Abshere wrote:
>> Our friends from CartoDB [1] provide a beautiful Jenks Natural
>> Breaks function for Postgres [2]. It is quite computationally
>> intensive.
>>
>> Even if you don't know what Jenks is, do you see any
>> optimizations?
>
> Not exactly what you asked, but you might try PL/R with the R
> "classInt" package:
>
>    http://cran.at.r-project.org/web/packages/classInt/classInt.pdf
>
> HTH,
>
> Joe
>

Actually rewriting in any of the script languages might help.  plpgsql
is good a sql, but not so much with the math and string handling.  I'm
not sure about array's.

It looks like most of that code is playing with arrays and math.  I'd
bet pl/perl (or R, python, javascript, etc) would run much faster.

-Andy