For the NumPy functionality, NULL values are handled by returning masked arrays
(https://docs.scipy.org/doc/numpy/reference/maskedarray.html)instead of plain arrays.
Regards
Michael
On 17/01/2017, 16:06, "Jim Nasby" <Jim.Nasby@BlueTreble.com> wrote:
On 1/17/17 4:51 AM, Uwe L. Korn wrote:
> One important thing for fast columnar data access is that you don't want
> to have the data as Python objects before they will be turned into a
> DataFrame. Besides much better buffering, this was one of the main
> advantages we have with Turbodbc. Given that the ODBC drivers for
> Postgres seem to be in a miserable state, it would be much preferable to
> have such functionality directly in pyscopg2. Given from meetings with
> people at some PyData conferences that I showed turbodbc to, I can
> definitely say that there are some users out there that would like a
> fast path for Postgres-to-Pandas.
>
> In turbodbc, there are two additional functions added to the DB-API
> cursor object: fetchallnumpy and fetchallarrow. These suffice mostly for
> the typical pandas workloads. The experience from implementing this is
> basically that with Arrow it was quite simple to add a columnar
> interface as most of the data conversions were handled by Arrow. Also
> there was no need for me to interface with any Python types as the
> language "barrier" was transparently handled by Arrow.
I certainly see the advantages to not creating objects. How do you end
up handling NULLs?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)