Re: [psycopg] Turbo ODBC - Mailing list psycopg

From Uwe L. Korn
Subject Re: [psycopg] Turbo ODBC
Date
Msg-id 1484650266.264445.850189824.3538CE31@webmail.messagingengine.com
Whole thread Raw
In response to Re: [psycopg] Turbo ODBC  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: [psycopg] Turbo ODBC  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List psycopg
One important thing for fast columnar data access is that you don't want
to have the data as Python objects before they will be turned into a
DataFrame. Besides much better buffering, this was one of the main
advantages we have with Turbodbc. Given that the ODBC drivers for
Postgres seem to be in a miserable state, it would be much preferable to
have such functionality directly in pyscopg2. Given from meetings with
people at some PyData conferences that I showed turbodbc to, I can
definitely say that there are some users out there that would like a
fast path for Postgres-to-Pandas.

In turbodbc, there are two additional functions added to the DB-API
cursor object: fetchallnumpy and fetchallarrow. These suffice mostly for
the typical pandas workloads. The experience from implementing this is
basically that with Arrow it was quite simple to add a columnar
interface as most of the data conversions were handled by Arrow. Also
there was no need for me to interface with any Python types as the
language "barrier" was transparently handled by Arrow.

CC'ing Michael König, the creator of Turbodbc, he might be able to give
some more input.

--
  Uwe L. Korn
  uwelk@xhochy.com

On Tue, Jan 17, 2017, at 03:07 AM, Jim Nasby wrote:
> On 1/16/17 7:32 PM, Adrian Klaver wrote:
> > All of this is very interesting and definitely worth exploring, just not
> > sure how much of it ties back to psycopg2 and this list. Not trying to
> > rain on anyone's parade, I am wondering if this might not be better
> > explored on a 'meta' list, something like the various Python projects
> > that deal with Excel do:
>
> Since this is a user mailing list that might make sense. Though, I'm
> getting the impression that there's some disconnect between what data
> science users are doing and this list. Tuple-based results vs
> vector-based (ie: columnar) results is an example of that.
>
> I do think there's 3 items that would best be handled at the "bottom" of
> the stack (namely, psycopg2), because they'll enable every higher level
> as well as make life easier for direct users of psycopg2:
>
> 1) Performance, both in low-latency (ie: filesystem socket) and
> high-latency environments.
> 2) Type conversion (in particular, getting rid of strings as the
> intermediate representation).
> 3) Optionally providing a columnar result set.
>
> #3 might be in direct opposition to the standard Python DB accessor
> stuff, so maybe that would need to be a separate module on top of
> psycopg2, but psycopg2 would certainly still need to support it. (IE:
> you certainly do NOT want psycopg2 to build a list of dicts only to then
> try and convert that to a columnar format).
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)


psycopg by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [psycopg] Turbo ODBC
Next
From: Jim Nasby
Date:
Subject: Re: [psycopg] Turbo ODBC