Re: COPY FROM WHEN condition - Mailing list pgsql-hackers

From David Fetter
Subject Re: COPY FROM WHEN condition
Date
Msg-id 20181102142132.GJ12677@fetter.org
Whole thread Raw
In response to Re: COPY FROM WHEN condition  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: COPY FROM WHEN condition  (Corey Huinker <corey.huinker@gmail.com>)
List pgsql-hackers
On Thu, Nov 01, 2018 at 10:57:25PM -0400, Corey Huinker wrote:
> >
> > > Are you thinking something like having a COPY command that provides
> > > results in such a way that they could be referenced in a FROM clause
> > > (perhaps a COPY that defines a cursor…)?
> >
> > That would also be nice, but what I was thinking of was that some
> > highly restricted subset of cases of SQL in general could lend
> > themselves to levels of optimization that would be impractical in
> > other contexts.
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.

That's an excellent point.

> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
> 
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )

Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?

    COPY t(a integer, b numeric, c text, d date, e json) FROM '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)

> WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the *opportunity* to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)

That sounds like something that could go inside the WITH extension
I'm proposing above.

[STRICT_TYPE boolean DEFAULT true]?

This might not be something that has to be in version 1.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: Avoid creation of the free space map for small tables
Next
From: Robert Haas
Date:
Subject: Re: Hash Joins vs. Bloom Filters / take 2