Re: Database restore speed - Mailing list pgsql-performance

From Mitch Skinner
Subject Re: Database restore speed
Date
Msg-id 1133652555.4333.41.camel@firebolt
Whole thread Raw
In response to Re: Database restore speed  ("Luke Lonergan" <LLonergan@greenplum.com>)
List pgsql-performance
On Fri, 2005-12-02 at 23:03 -0500, Luke Lonergan wrote:
> And how do we compose the binary data on the client?  Do we trust that the client encoding conversion logic is
identicalto the backend's? 

Well, my newbieness is undoubtedly showing already, so I might as well
continue with my line of dumb questions.  I did a little mail archive
searching, but had a hard time coming up with unique query terms.

This is a slight digression, but my question about binary format query
results wasn't rhetorical.  Do I have to worry about different platforms
when I'm getting binary RowData(s) back from the server?  Or when I'm
sending binary bind messages?

Regarding whether or not the client has identical encoding/conversion
logic, how about a fast path that starts out by checking for
compatibility?  In addition to a BOM, you could add a "float format
mark" that was an array of things like +0.0, -0.0, min, max, +Inf, -Inf,
NaN, etc.

It looks like XDR specifies byte order for floats and otherwise punts to
IEEE.  I have no experience with SQL*Loader, but a quick read of the
docs appears to divide data types into "portable" and "nonportable"
groups, where loading nonportable data types requires extra care.

This may be overkill, but have you looked at HDF5?  Only one hit came up
in the mail archives.
http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html
For (e.g.) floats, the format includes metadata that specifies byte
order, padding, normalization, the location of the sign, exponent, and
mantissa, and the size of the exponent and mantissa.  The format appears
not to require length information on a per-datum basis.  A cursory look
at the data format page gives me the impression that there's a useful
streamable subset.  The license of the implementation is BSD-style (no
advertising clause), and it appears to support a large variety of
platforms.  Currently, the format spec only mentions ASCII, but since
the library doesn't do any actual string manipulation (just storage and
retrieval, AFAICS) it may be UTF-8 clean.

Mitch

pgsql-performance by date:

Previous
From: Rodrigo Madera
Date:
Subject: Faster db architecture for a twisted table.
Next
From: Andreas Pflug
Date:
Subject: Re: Faster db architecture for a twisted table.