Re: what would tar file FDW look like? - Mailing list pgsql-hackers

From Bear Giles
Subject Re: what would tar file FDW look like?
Date
Msg-id CALBNtw4dpNi9YJksmrju6S8BZLy-vd=kpvK7K3fH4pxqM1r9aw@mail.gmail.com
Whole thread Raw
In response to Re: what would tar file FDW look like?  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
<div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif;color:#000000">I've written readers for
bothfrom scratch. Tar isn't that bad since it's blocked - you read the header, skip forward N blocks, continue. The
hardestpart is setting up the decompression libraries if you want to support tar.gz or tar.bz2 files.</div><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:#000000"><br /></div><div class="gmail_default"
style="font-family:tahoma,sans-serif;color:#000000">Zipfiles are more complex. You have (iirc) 5 control blocks - start
ofarchive, start of file, end of file, start of index, end of archive, and the information in the control block is
prettylimited. That's not a huge burden since there's support for extensions for things like the unix file metadata.
Onecomplication is that you need to support compression from the start.</div><div class="gmail_default"
style="font-family:tahoma,sans-serif;color:#000000"><br/></div><div class="gmail_default"
style="font-family:tahoma,sans-serif;color:#000000">Zipfiles support two types of encryption. There's a really weak
versionthat almost nobody supports and a much stronger modern version that's subject to license restrictions.  (Some
peopleuse the weak version on embedded systems because of legal requirements to /do something/, no matter how
lame.)</div><divclass="gmail_default" style="font-family:tahoma,sans-serif;color:#000000"><br /></div><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:#000000">There are third-party libraries, of course,
butthat introduces dependencies. Both formats are simple enough to write from scratch.</div><div class="gmail_default"
style="font-family:tahoma,sans-serif;color:#000000"><br/></div><div class="gmail_default"
style="font-family:tahoma,sans-serif;color:#000000">Iguess my bigger question is if there's an interest in either or
bothfor "real" use. I'm doing this as an exercise but am willing to contrib the code if there's a general interest in
it.</div><divclass="gmail_default" style="font-family:tahoma,sans-serif;color:#000000"><br /></div><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:#000000">(BTW the more complex object I'm working on is
the.p12 keystore for digital certificates and private keys. We have everything we need in the openssl library so
there'sno additional third-party dependencies. I have a minimal FDW for the digital certificate itself and am now
workingon a way to access keys stored in a standard format on the filesystem instead of in the database itself. A
naturalfit is a specialized archive FDW. Unlike tar and zip it will have two payloads, the digital certificate and the
(optionallyencrypted) private key. It has searchable metadata, e.g., finding all records with a specific
subject.)</div><divclass="gmail_default" style="font-family:tahoma,sans-serif;color:#000000"><br /></div><div
class="gmail_default"style="font-family:tahoma,sans-serif;color:#000000">Bear</div></div><div class="gmail_extra"><br
/><divclass="gmail_quote">On Mon, Aug 17, 2015 at 8:29 AM, Greg Stark <span dir="ltr"><<a
href="mailto:stark@mit.edu"target="_blank">stark@mit.edu</a>></span> wrote:<br /><blockquote class="gmail_quote"
style="margin:00 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, Aug 17, 2015 at 3:14 PM,
BearGiles <<a href="mailto:bgiles@coyotesong.com">bgiles@coyotesong.com</a>> wrote:<br /> > I'm starting to
workon a tar FDW as a proxy for a much more specific FDW.<br /> > (It's the 'faster to build two and toss the first
away'approach - tar lets<br /> > me get the FDW stuff nailed down before attacking the more complex<br /> >
container.)It could also be useful in its own right, or as the basis for a<br /> > zip file FDW.<br /><br
/></span>Hm.tar may be a bad fit where zip may be much easier. Tar has no<br /> index or table of contents. You have to
scanthe entire file to find<br /> all the members. IIRC Zip does have a table of contents at the end of<br /> the
file.<br/><br /> The most efficient way to process a tar file is to describe exactly<br /> what you want to happen with
eachmember and then process it linearly<br /> from start to end (or until you've found the members you're looking<br />
for).Trying to return meta info and then go looking for individual<br /> members will be quite slow and have a large
startupcost.<br /><span class="HOEnZb"><font color="#888888"><br /><br /> --<br /> greg<br
/></font></span></blockquote></div><br/></div> 

pgsql-hackers by date:

Previous
From: Mark Johnston
Date:
Subject: DTrace build dependency rules
Next
From: Neil Conway
Date:
Subject: Re: Memory allocation in spi_printtup()