Re: COPY as a set returning function - Mailing list pgsql-hackers
From | Corey Huinker |
---|---|
Subject | Re: COPY as a set returning function |
Date | |
Msg-id | CADkLM=cdXdB3UQ4MNeg86WAGpZRDb8Rd4K+3vmS+tDLOTSJLEg@mail.gmail.com Whole thread Raw |
In response to | Re: COPY as a set returning function (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><span>> That sounds fantastic. It'd help this copy variant retain festureparity<br /> > with normal copy. And it'd bring us closer to being able to FETCH in non<br /> > queries.<br/><br /></span>On second thought, though, this couldn't exactly duplicate the existing<br /> COPY syntax, becauseCOPY relies heavily on the rowtype of the named<br /> target table to tell it what it's copying. You'd need somenew syntax<br /> to provide the list of column names and types, which puts a bit of<br /> a hole in the "syntax we alreadyknow" argument. A SRF-returning-record<br /> would have a leg up on that, because we do have existing syntax for<br/> defining the concrete rowtype that any particular call returns.<br /><br /> regards, tomlane<br /></blockquote></div><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">I would like tomake COPY itself a SRF. That's a bit beyond my capabilities, so if that is the route we want to go, I will need help.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">The syntax would probably look like this (new bitsin bold):</div><div class="gmail_extra"><br /></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><divclass="gmail_extra"><font face="monospace, monospace">WITH my_copy AS (</font></div><divclass="gmail_extra"><font face="monospace, monospace"> COPY FROM 'example.csv' TO <b>RESULT SET(c1 text,c2 integer, dummy1 text, dummy2 text, c5 date)</b> WITH (FORMAT CSV)</font></div><div class="gmail_extra"><font face="monospace,monospace"> <b>RETURNING c1, c2, c3</b></font></div><div class="gmail_extra"><font face="monospace, monospace">)</font></div><divclass="gmail_extra"><font face="monospace, monospace">SELECT ...</font></div><div class="gmail_extra"><fontface="monospace, monospace">FROM my_copy</font></div><div class="gmail_extra"><font face="monospace,monospace">LEFT OUTER JOIN ref_table ...</font></div></blockquote><div class="gmail_extra"><br /></div><divclass="gmail_extra">The RESULT SET (colspecs) bit would be the rsinfo currently used by copy_srf(). It wouldbe nice if the CTE declaration could take types, but it can't.<br /></div><div class="gmail_extra"><br /></div><divclass="gmail_extra">The RETURNING clause here doesn't return all the columns made available from the COPY. Thatwould be nice, but not required because the same filtration could be done when the CTE is referenced. So if we require<font face="monospace, monospace">RETURNING *</font> be the only returning option I'd be fine with that.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">If we're ok with adding a function like copy_srf()to the core, will we still be happy with it when COPY does get a RETURNING clause? </div><div class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra">Somewhat off-topic: here's someother observations of a n00b who spent a fair amount of time looking at the copy.c code.<br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra">1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from->rel, repeatedly. If we were going to try to leverage that code we'd need to store those things in a separate cstatemember so that we add complexity only in the initialization of the copy state data struct, pulling the result structurefrom rsinfo rather than a relation. There's probably a minor performance gain to be had in keeping that info around.Refactoring those two procs to allow for a pre-set attlist/tupdesc would help.</div><div class="gmail_extra"><br /></div><divclass="gmail_extra">2. NextCopyFrom() checks every single time to see if it's row 0 and if it should skip thisheader row. I know a single (row_num == 0 && has_header) isn't much extra processing, but shouldn't we digestand discard headers before going into the per-row loop?<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">3.All the code that handles indexes, constraints, buffering, etc, simply doesn't apply in the SRF context.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra">4. The code somewhat needlessly mixes code forthe COPY FROM and COPY TO cases. There's probably a good reason for this, but it made for a lot of clutter in achievingmy very narrow goal.</div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra"><br/></div><div class="gmail_extra"><br /></div></div>
pgsql-hackers by date: