Home > mailing lists

Re: Database restore speed - Mailing list pgsql-performance

From	Mitch Skinner
Subject	Re: Database restore speed
Date	December 3, 2005 19:29:22
Msg-id	1133652555.4333.41.camel@firebolt Whole thread Raw
In response to	Re: Database restore speed ("Luke Lonergan" <LLonergan@greenplum.com>)
List	pgsql-performance

Tree view

On Fri, 2005-12-02 at 23:03 -0500, Luke Lonergan wrote:
> And how do we compose the binary data on the client? Do we trust that the client encoding conversion logic is
identicalto the backend's?

Well, my newbieness is undoubtedly showing already, so I might as well
continue with my line of dumb questions. I did a little mail archive
searching, but had a hard time coming up with unique query terms.

This is a slight digression, but my question about binary format query
results wasn't rhetorical. Do I have to worry about different platforms
when I'm getting binary RowData(s) back from the server? Or when I'm
sending binary bind messages?

Regarding whether or not the client has identical encoding/conversion
logic, how about a fast path that starts out by checking for
compatibility? In addition to a BOM, you could add a "float format
mark" that was an array of things like +0.0, -0.0, min, max, +Inf, -Inf,
NaN, etc.

It looks like XDR specifies byte order for floats and otherwise punts to
IEEE. I have no experience with SQL*Loader, but a quick read of the
docs appears to divide data types into "portable" and "nonportable"
groups, where loading nonportable data types requires extra care.

This may be overkill, but have you looked at HDF5? Only one hit came up
in the mail archives.
http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html
For (e.g.) floats, the format includes metadata that specifies byte
order, padding, normalization, the location of the sign, exponent, and
mantissa, and the size of the exponent and mantissa. The format appears
not to require length information on a per-datum basis. A cursory look
at the data format page gives me the impression that there's a useful
streamable subset. The license of the implementation is BSD-style (no
advertising clause), and it appears to support a large variety of
platforms. Currently, the format spec only mentions ASCII, but since
the library doesn't do any actual string manipulation (just storage and
retrieval, AFAICS) it may be UTF-8 clean.

Mitch

pgsql-performance by date:

From: Rodrigo Madera
Date: 03 December 2005, 19:00:26
Subject: Faster db architecture for a twisted table.

From: Andreas Pflug
Date: 03 December 2005, 20:00:29
Subject: Re: Faster db architecture for a twisted table.

Re: Database restore speed - Mailing list pgsql-performance

Previous

Next