Modifying COPY TO - Mailing list pgsql-hackers

From Dave Held
Subject Modifying COPY TO
Date
Msg-id 49E94D0CFCD4DB43AFBA928DDD20C8F902618457@asg002.asg.local
Whole thread Raw
Responses Re: Modifying COPY TO  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Modifying COPY TO  (Stephen Frost <sfrost@snowman.net>)
Re: Modifying COPY TO  ("Jim C. Nasby" <decibel@decibel.org>)
List pgsql-hackers
<p><font face="Arial" size="2">I am interested in hacking COPY TO such that one can specify that</font><br /><font
face="Arial"size="2">rows are copied in a certain index order.  I got as far as </font><br /><font face="Arial"
size="2">src/backend/commands/copy.c:CopyTo(),and it looks like I would need</font><br /><font face="Arial" size="2">to
modifythe call to heap_beginscan() so that it uses a key.  However,</font><br /><font face="Arial" size="2">I couldn't
figureout how to provide one, or if I'm even looking at the</font><br /><font face="Arial" size="2">right area. 
Ideally,this behavior would be specified with a flag,</font><br /><font face="Arial" size="2">perhaps: "WITH INDEX
<index_name>"or "WITH PRIMARY KEY"</font><br /><font face="Arial" size="2">or something similar.</font><p><font
face="Arial"size="2">The motivation for this change is as follows.  I have a fairly large</font><br /><font
face="Arial"size="2">database (10 million+ records) that mirrors the data in a proprietary</font><br /><font
face="Arial"size="2">system.  The only access to that data is through exported flat files.</font><br /><font
face="Arial"size="2">Currently, those flat files are copied directly into a staging area in the </font><br /><font
face="Arial"size="2">db via a COPY FROM, the actual tables are truncated, and the</font><br /><font face="Arial"
size="2">stagingdata is inserted into the live tables.  Since the data is read-only,</font><br /><font face="Arial"
size="2">itdoesn't matter that it is recreated every day.  However, as you</font><br /><font face="Arial" size="2">can
imagine,the import process takes quite a while (several hours).</font><br /><font face="Arial" size="2">Also,
rebuildingthe db from scratch every day loses any statistical</font><br /><font face="Arial" size="2">information
gatheredfrom the execution of queries during the day.</font><p><font face="Arial" size="2">A possibility that I would
liketo pursue is to keep the staging data</font><br /><font face="Arial" size="2">from the previous day, do a COPY TO,
importthe new data into</font><br /><font face="Arial" size="2">another staging table with a COPY FROM, then export the
fresh</font><br/><font face="Arial" size="2">data with another COPY TO.  Then, I can write a fast C/C++</font><br
/><fontface="Arial" size="2">program to do a line-by-line comparison of each record, isolating</font><br /><font
face="Arial"size="2">the ones that have changed from the previous day.  I can then</font><br /><font face="Arial"
size="2">emitthose records in a change file that should be relatively small</font><br /><font face="Arial" size="2">and
easyto update.  Of course, this scheme can only work if</font><br /><font face="Arial" size="2">COPY TO emits the
recordsin a reliable order.</font><p><font face="Arial" size="2">Any assistance on this project would be greatly
appreciated. The</font><br /><font face="Arial" size="2">best I can see, I'm stuck on line 1053 from
copy.c:</font><p><fontface="Arial" size="2">    scandesc = heap_beginscan(rel, mySnapshot, 0, NULL);</font><p><font
face="Arial"size="2">I suspect that I want it to look like this:</font><p><font face="Arial" size="2">    scandesc =
heap_beginscan(rel,mySnapshot, 1, key);</font><p><font face="Arial" size="2">where 'key' is an appropriately
constructedScanKey.  It looks</font><br /><font face="Arial" size="2">like I want to call ScanKeyEntryInitialize(), but
I'mnot sure what</font><br /><font face="Arial" size="2">parameters I need to pass to it to get an index or the
primary</font><br/><font face="Arial" size="2">key.  I mostly need help building the ScanKey object.  I think I
</font><br/><font face="Arial" size="2">can figure out how to hack the custom option, etc.  I should </font><br /><font
face="Arial"size="2">mention that I am using the 7.4.7 codebase on Linux 2.4.</font><p><font face="Courier New"
size="2">__</font><br/><font face="Courier New" size="2">David B. Held</font><br /><font face="Courier New"
size="2">SoftwareEngineer/Array Services Group</font><br /><font face="Courier New" size="2">200 14th Ave. East, 
Sartell,MN 56377</font><br /><font face="Courier New" size="2">320.534.3637 320.253.7800 800.752.8129</font><br /> 

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Development schedule
Next
From: Tom Lane
Date:
Subject: Re: Modifying COPY TO