Thread: Parquet support

Parquet support

From
Christopher Bader
Date:
Does psycopg support parquet as an input format?

Thanks,
Christopher Bader
Staff Data Scientist
Zscaler

Re: Parquet support

From
Daniele Varrazzo
Date:
On Wed, 23 Nov 2022 at 19:49, Christopher Bader <cbader@zscaler.com> wrote:
>
> Does psycopg support parquet as an input format?

No, not yet.

I had some conversation in the past around parquet input/output: it is
a major project which I would like to either develop or see developed,
but at the moment I don't have the several months required to do the
former, and nobody has volunteered for the latter.

Cheers

-- Daniele



Re: Parquet support

From
Vladimir Ryabtsev
Date:
Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it loading data from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough?

Regards,
--VR

On Wed, 23 Nov 2022 at 10:56, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
On Wed, 23 Nov 2022 at 19:49, Christopher Bader <cbader@zscaler.com> wrote:
>
> Does psycopg support parquet as an input format?

No, not yet.

I had some conversation in the past around parquet input/output: it is
a major project which I would like to either develop or see developed,
but at the moment I don't have the several months required to do the
former, and nobody has volunteered for the latter.

Cheers

-- Daniele


Re: Parquet support

From
Daniele Varrazzo
Date:
On Wed, 23 Nov 2022 at 20:56, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>
> Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it
loadingdata from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough?
 

I am not an expert, but I understand that Python-Postgres roundtrip
goes via generating and parsing CSV files, whereas there is some
performance gain to be had by creating native arrow data.

-- Daniele



Re: Parquet support

From
Brian M Hamlin
Date:
Hi - desktop linux user/maker here in California --

The engineering stakes are high in the clouds these days.  There are 
some important efforts underway to make "cloud-native" ways for python, 
python installation, python data and python communication tools.  In my 
corners of the world (remote sensing, urban planning) that means DASK 
and xarray.  As a desktop linux distribution, we/OSGeoLive  ship both, 
and enthusiastically so.. the "cloud-native" data storage formats ZARR 
and parquet, not so much.   My best understanding is xarray is a happy 
medium between "what only runs on cloud" and "the powerful Linux I can 
run myself on standard equipment today" ..

I support a python ecosystem that individual people can run entirely 
locally, and can interoperate well with standard networking and data 
formats. Not every python environment is doing that.. change happens

interested to see the common and useful Python discussion here, 
regarding Postgresql, PostGIS and cloudy interoperability.

   --Brian M Hamlin    /  MAPLABS  /  OSGeoLive PSC


On 11/23/22 12:00, Daniele Varrazzo wrote:
> On Wed, 23 Nov 2022 at 20:56, Vladimir Ryabtsev <greatvovan@gmail.com> wrote:
>> Just curious folks, what are your thoughts about the scope of that potential support? What is the use case? Is it
loadingdata from Parquet to Postgres (and back)? Why is the combination with Python modules like pyarrow not enough?
 
> I am not an expert, but I understand that Python-Postgres roundtrip
> goes via generating and parsing CSV files, whereas there is some
> performance gain to be had by creating native arrow data.
>
> -- Daniele
>
>