On 01/15/2017 01:19 PM, Jim Nasby wrote:
> On 1/15/17 3:04 PM, Adrian Klaver wrote:
>> On 01/15/2017 12:32 PM, Jim Nasby wrote:
>>> Interesting... https://github.com/blue-yonder/turbodbc
>>
>> Yes, interesting but you are still dependent on the underlying ODBC
>> implementation. Not sure how that impacts performance versus a method
>> with fewer hops?
>
> Oh, I'd hope that a native libpq implementation would be faster than
> going through ODBC. But, there's presumably useful info that can be
> picked up here; the bit about buffering was certainly interesting.
>
> BTW, the person that brought this to my attention had mentioned that a
> lot of people doing data science with data living in Postgres feel the
> need to extract the data from Postgres into something like HDFS before
> they can do anything useful, because apparently data access through HDFS
> is 3x faster than through Postgres. My impression is that at least part
Have you looked at asyncpg:
https://github.com/MagicStack/asyncpg
it is Python 3.5+ though.
> of that is due to using Pandas from_sql functionality (which AIUI
> marshals everything through SQL Alchemy), but anything that can be done
> on the psycopg2 side would help.
>
> I'm also looking into speeding up SPI access through plpython; depending
> on how you want to measure I've gotten a 30-600% improvement by removing
> the buffering that SPI does by default.
--
Adrian Klaver
adrian.klaver@aklaver.com