Thread: Optimizing CartoDB's JenksBins function
hi, Our friends from CartoDB [1] provide a beautiful Jenks Natural Breaks function for Postgres [2]. It is quite computationally intensive. Even if you don't know what Jenks is, do you see any optimizations? Best, thanks, Seamus PS. I was hoping for something magical like Tom Lane's VALUES() fix for DataDog [3] or HeapAnalytic's array fix [4], although I realize both are fundamentally different :) This is probably a more subtle loop optimization problem, if anything. [1] http://cartodb.com [2] https://github.com/CartoDB/cartodb/blob/master/lib/sql/scripts-available/CDB_JenksBins.sql [3] https://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/ [4] http://blog.heapanalytics.com/dont-iterate-over-a-postgres-array-with-a-loop/ -- Seamus Abshere, SCEA https://github.com/seamusabshere
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/18/2014 09:42 AM, Seamus Abshere wrote: > Our friends from CartoDB [1] provide a beautiful Jenks Natural > Breaks function for Postgres [2]. It is quite computationally > intensive. > > Even if you don't know what Jenks is, do you see any > optimizations? Not exactly what you asked, but you might try PL/R with the R "classInt" package: http://cran.at.r-project.org/web/packages/classInt/classInt.pdf HTH, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTKIAcAAoJEDfy90M199hlTcIP/0jK/8yuAN140a4oj54H4bOr mJwf1XAj0/CefwdgCkaSioiiwSOUGTuJyz+VH05GhKeuWchKiQzuD+w5/ijbjOFj qnxupIoULxBfk/ZA/8bh6/54IGDIlA1F8lFH9lHntPCLGgNnZkwYGsPMGntXUf2/ WfesFAY8D185rnlEVWbC3ZUgeEXj1kxl6xjAJEXlGBoESfGWTKUI04tfg3+wTpHp A2F2Ck579XuxA5WK62fk0+d+B+ua/tixl2xCuRotQvtSQ6LVPQOlo1mBGuVxYtou KrBdyv7rxQzWOxgNePuMBqnMsSowFMP9XC7Lzdlg1NsHQL9o1yLT1bdIrT5lX5OQ DErvAHwh87kfcyFOYTl6uwS0I89gOuaTimYkghlX7dCPWnEu0KKDaLJnVVkcNfi8 Xpyj42AodsfKb7F0elPCYqgBhv+pRwXaPP3ig9SYMgU/h82UFdk/RexqWYK5FwAU XwoeY/NioKahaQYpMXEnWmnvvueuuz+s8KCbNj7avXPEJE4RxuNVLIYShrN64Ydx bW/1/Hpv98Gn71ksSjrVlru1xaxArQhiAImuepWp76LvpvW1rtRUy4+fDDIek+nF TeHiQrzgkdywSKTJCgqxTjIO/PkWV8JIXghc/m0z6qud+8YPa6nbWnIq0XjSjOIQ jQ4u2w6r/7GOl93tT3yY =zPkM -----END PGP SIGNATURE-----
On 3/18/2014 12:19 PM, Joe Conway wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 03/18/2014 09:42 AM, Seamus Abshere wrote: >> Our friends from CartoDB [1] provide a beautiful Jenks Natural >> Breaks function for Postgres [2]. It is quite computationally >> intensive. >> >> Even if you don't know what Jenks is, do you see any >> optimizations? > > Not exactly what you asked, but you might try PL/R with the R > "classInt" package: > > http://cran.at.r-project.org/web/packages/classInt/classInt.pdf > > HTH, > > Joe > Actually rewriting in any of the script languages might help. plpgsql is good a sql, but not so much with the math and string handling. I'm not sure about array's. It looks like most of that code is playing with arrays and math. I'd bet pl/perl (or R, python, javascript, etc) would run much faster. -Andy