Thread: csv_populate_recordset and csv_agg

csv_populate_recordset and csv_agg

From

Steve Chavez

Date:

24 October 2022, 01:50:11

Hello hackers,

The `json_populate_recordset` and `json_agg` functions allow systems to process/generate json directly on the database. This "cut outs the middle tier"[1] and notably reduces the complexity of web applications.

CSV processing is also a common use case and PostgreSQL has the COPY .. FROM .. CSV form but COPY is not compatible with libpq pipeline mode and the interface is clunkier to use.

I propose to include two new functions:

- csv_populate_recordset ( base anyelement, from_csv text )

- csv_agg ( anyelement )

I would gladly implement these if it sounds like a good idea.

I see there's already some code that deals with CSV on

- src/backend/commands/copyfromparse.c(CopyReadAttributesCSV)

- src/fe_utils/print.c(csv_print_field)

- src/backend/utils/error/csvlog(write_csvlog)

So perhaps a new csv module could benefit the codebase as well.

Best regards,

Steve

[1]: https://www.crunchydata.com/blog/generating-json-directly-from-postgres

Re: csv_populate_recordset and csv_agg

From

Tom Lane

Date:

24 October 2022, 02:51:00

Steve Chavez <steve@supabase.io> writes:
> CSV processing is also a common use case and PostgreSQL has the COPY ..
> FROM .. CSV form but COPY is not compatible with libpq pipeline mode and
> the interface is clunkier to use.

> I propose to include two new functions:

> - csv_populate_recordset ( base anyelement, from_csv text )
> - csv_agg ( anyelement )

The trouble with CSV is there are so many mildly-incompatible
versions of it.  I'm okay with supporting it in COPY, where
we have the freedom to add random sub-options (QUOTE, ESCAPE,
FORCE_QUOTE, yadda yadda) to cope with those variants.
I don't see a nice way to handle that issue in the functions
you propose --- you'd have to assume that there is One True CSV,
which sadly ain't so, or else complicate the functions beyond
usability.

Also, in the end CSV is a surface presentation layer, and as
such it's not terribly well suited as the calculation representation
for aggregates and other functions.  I think these proposed functions
would have pretty terrible performance as a consequence of the
need to constantly re-parse the surface format.  The same point
could be made about JSON ... which is why we prefer to implement
processing functions with JSONB.

            regards, tom lane