Re: COPY as a set returning function - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: COPY as a set returning function
Date
Msg-id CADkLM=cdXdB3UQ4MNeg86WAGpZRDb8Rd4K+3vmS+tDLOTSJLEg@mail.gmail.com
Whole thread Raw
In response to Re: COPY as a set returning function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><span>> That sounds fantastic. It'd help this copy variant retain
festureparity<br /> > with normal copy. And it'd bring us closer to being able to FETCH in non<br /> >
queries.<br/><br /></span>On second thought, though, this couldn't exactly duplicate the existing<br /> COPY syntax,
becauseCOPY relies heavily on the rowtype of the named<br /> target table to tell it what it's copying.  You'd need
somenew syntax<br /> to provide the list of column names and types, which puts a bit of<br /> a hole in the "syntax we
alreadyknow" argument.  A SRF-returning-record<br /> would have a leg up on that, because we do have existing syntax
for<br/> defining the concrete rowtype that any particular call returns.<br /><br />                         regards,
tomlane<br /></blockquote></div><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">I would like
tomake COPY itself a SRF. That's a bit beyond my capabilities, so if that is the route we want to go, I will need
help.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">The syntax would probably look like this (new
bitsin bold):</div><div class="gmail_extra"><br /></div><blockquote style="margin:0 0 0
40px;border:none;padding:0px"><divclass="gmail_extra"><font face="monospace, monospace">WITH my_copy  AS
(</font></div><divclass="gmail_extra"><font face="monospace, monospace">    COPY FROM 'example.csv' TO <b>RESULT SET(c1
text,c2 integer, dummy1 text, dummy2 text, c5 date)</b> WITH (FORMAT CSV)</font></div><div class="gmail_extra"><font
face="monospace,monospace">    <b>RETURNING c1, c2, c3</b></font></div><div class="gmail_extra"><font face="monospace,
monospace">)</font></div><divclass="gmail_extra"><font face="monospace, monospace">SELECT ...</font></div><div
class="gmail_extra"><fontface="monospace, monospace">FROM my_copy</font></div><div class="gmail_extra"><font
face="monospace,monospace">LEFT OUTER JOIN ref_table ...</font></div></blockquote><div class="gmail_extra"><br
/></div><divclass="gmail_extra">The RESULT SET (colspecs) bit would be the rsinfo currently used by copy_srf(). It
wouldbe nice if the CTE declaration could take types, but it can't.<br /></div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">The RETURNING clause here doesn't return all the columns made available from the COPY.
Thatwould be nice, but not required because the same filtration could be done when the CTE is referenced. So if we
require<font face="monospace, monospace">RETURNING *</font> be the only returning option I'd be fine with
that.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">If we're ok with adding a function like
copy_srf()to the core, will we still be happy with it when COPY does get a RETURNING clause? </div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Somewhat off-topic: here's
someother observations of a n00b who spent a fair amount of time looking at the copy.c code.<br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra">1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info
from->rel, repeatedly. If we were going to try to leverage that code we'd need to store those things in a separate
cstatemember so that we add complexity only in the initialization of the copy state data struct, pulling the result
structurefrom rsinfo rather than a relation. There's probably a minor performance gain to be had in keeping that info
around.Refactoring those two procs to allow for a pre-set attlist/tupdesc would help.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">2. NextCopyFrom() checks every single time to see if it's row 0 and if it should skip
thisheader row. I know a single (row_num == 0 && has_header) isn't much extra processing, but shouldn't we
digestand discard headers before going into the per-row loop?<br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra">3.All the code that handles indexes, constraints, buffering, etc, simply doesn't apply in the SRF
context.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">4. The code somewhat needlessly mixes code
forthe COPY FROM and COPY TO cases. There's probably a good reason for this, but it made for a lot of clutter in
achievingmy very narrow goal.</div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div></div> 

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [sqlsmith] Infinite recursion in bitshift
Next
From: Kevin Grittner
Date:
Subject: Re: btree vacuum and suspended scans can deadlock