Thread: Database restore speed

Database restore speed

From
"Steve Oualline"
Date:

Our application tries to insert data into the database as fast as it can.
Currently the work is being split into a number of 1MB copy operations.

When we restore the postmaster process tries to use 100% of the CPU.

The questions we have are:

1) What is postmaster doing that it needs so much CPU?

2) How can we get our system to go faster?

Note: We've tried adjusting the checkpoint_segements parameter to no effect.
Any suggestions welcome.

Re: Database restore speed

From
"Luke Lonergan"
Date:
Steve,

> When we restore the postmaster process tries to use 100% of the CPU.
>
> The questions we have are:
>
> 1) What is postmaster doing that it needs so much CPU?

Parsing mostly, and attribute conversion from text to DBMS native
formats.

> 2) How can we get our system to go faster?

Use Postgres 8.1 or Bizgres.  Get a faster CPU.

These two points are based on our work to improve COPY speed, which led
to a near doubling in Bizgres, and in the 8.1 version it's about 60-70%
faster than in Postgres 8.0.

There are currently two main bottlenecks in COPY, one is parsing +
attribute conversion (if the postgres CPU is nailed at 100% that's what
your limit is) and the other is the write speed through the WAL.  You
can roughly divide the write speed of your disk by 3 to get that limit,
e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed
might be limited to 33MB/s.  You can tell which of these limits you've
hit using "vmstat 1" on Linux or iostat on Solaris and watch the blocks
input/output on your disk while you watch your CPU.

> Note: We've tried adjusting the checkpoint_segements
> parameter to no effect.

No surprise.

- Luke


Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> Steve,
>
>> When we restore the postmaster process tries to use 100% of the CPU.
>>
>> The questions we have are:
>>
>> 1) What is postmaster doing that it needs so much CPU?
>
> Parsing mostly, and attribute conversion from text to DBMS native
> formats.
>
>> 2) How can we get our system to go faster?
>
> Use Postgres 8.1 or Bizgres.  Get a faster CPU.
>
> These two points are based on our work to improve COPY speed, which led
> to a near doubling in Bizgres, and in the 8.1 version it's about 60-70%
> faster than in Postgres 8.0.
>
> There are currently two main bottlenecks in COPY, one is parsing +
> attribute conversion (if the postgres CPU is nailed at 100% that's what
> your limit is) and the other is the write speed through the WAL.  You
> can roughly divide the write speed of your disk by 3 to get that limit,
> e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed
> might be limited to 33MB/s.  You can tell which of these limits you've
> hit using "vmstat 1" on Linux or iostat on Solaris and watch the blocks
> input/output on your disk while you watch your CPU.

Luke, would it help to have one machine read the file and have it connect
to postgres on a different machine when doing the copy? (I'm thinking that
the first machine may be able to do a lot of the parseing and conversion,
leaving the second machine to just worry about doing the writes)

David Lang

Re: Database restore speed

From
"Luke Lonergan"
Date:
David,

> Luke, would it help to have one machine read the file and
> have it connect to postgres on a different machine when doing
> the copy? (I'm thinking that the first machine may be able to
> do a lot of the parseing and conversion, leaving the second
> machine to just worry about doing the writes)

Unfortunately not - the parsing / conversion core is in the backend,
where it should be IMO because of the need to do the attribute
conversion there in the machine-native representation of the attributes
(int4, float, etc) in addition to having the backend convert from client
encoding (like LATIN1) to the backend encoding (like UNICODE aka UTF8).

There are a few areas of discussion about continued performance
increases in the codebase for COPY FROM, here are my picks:
- More micro-optimization of the parsing and att conversion core - maybe
100% speedup in the parse/convert stage is possible
- A user selectable option to bypass transaction logging, similar to
Oracle's
- A well-defined binary input format, like Oracle's SQL*Loader - this
would bypass most parsing / att conversion
- A direct-to-table storage loader facility - this would probably be the
fastest possible load rate

- Luke


Re: Database restore speed

From
Stephen Frost
Date:
* Luke Lonergan (LLonergan@greenplum.com) wrote:
> > Luke, would it help to have one machine read the file and
> > have it connect to postgres on a different machine when doing
> > the copy? (I'm thinking that the first machine may be able to
> > do a lot of the parseing and conversion, leaving the second
> > machine to just worry about doing the writes)
>
> Unfortunately not - the parsing / conversion core is in the backend,
> where it should be IMO because of the need to do the attribute
> conversion there in the machine-native representation of the attributes
> (int4, float, etc) in addition to having the backend convert from client
> encoding (like LATIN1) to the backend encoding (like UNICODE aka UTF8).

Just a thought, but couldn't psql be made to use the binary mode of
libpq and do at least some of the conversion on the client side?  Or
does binary mode not work with copy (that wouldn't suprise me, but
perhaps copy could be made to support it)?

The other thought, of course, is that you could use PITR for your
backups instead of pgdump...

    Thanks,

        Stephen

Attachment

Re: Database restore speed

From
"Luke Lonergan"
Date:
Stephen,

On 12/2/05 12:18 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

> Just a thought, but couldn't psql be made to use the binary mode of
> libpq and do at least some of the conversion on the client side?  Or
> does binary mode not work with copy (that wouldn't suprise me, but
> perhaps copy could be made to support it)?

Yes - I think this idea is implicit in what David suggested, and my response
as well.  The problem is that the way the client does conversions can
potentially differ from the way the backend does.  Some of the types in
Postgres are machine intrinsic and the encoding conversions use on-machine
libraries, each of which preclude the use of client conversion methods
(without a lot of restructuring).  We'd tackled this problem in the past and
concluded that the parse / convert stage really belongs in the backend.

> The other thought, of course, is that you could use PITR for your
> backups instead of pgdump...

Totally - great idea, if this is actually a backup / restore then PITR plus
filesystem copy (tarball) is hugely faster than dump / restore.

- Luke



Re: Database restore speed

From
Stephen Frost
Date:
* Luke Lonergan (llonergan@greenplum.com) wrote:
> On 12/2/05 12:18 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > Just a thought, but couldn't psql be made to use the binary mode of
> > libpq and do at least some of the conversion on the client side?  Or
> > does binary mode not work with copy (that wouldn't suprise me, but
> > perhaps copy could be made to support it)?
>
> Yes - I think this idea is implicit in what David suggested, and my response
> as well.  The problem is that the way the client does conversions can
> potentially differ from the way the backend does.  Some of the types in
> Postgres are machine intrinsic and the encoding conversions use on-machine
> libraries, each of which preclude the use of client conversion methods
> (without a lot of restructuring).  We'd tackled this problem in the past and
> concluded that the parse / convert stage really belongs in the backend.

I've used the binary mode stuff before, sure, Postgres may have to
convert some things but I have a hard time believing it'd be more
expensive to do a network_encoding -> host_encoding (or toasting, or
whatever) than to do the ascii -> binary change.

    Thanks,

        Stephen

Attachment

Re: Database restore speed

From
"Luke Lonergan"
Date:
Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:

> I've used the binary mode stuff before, sure, Postgres may have to
> convert some things but I have a hard time believing it'd be more
> expensive to do a network_encoding -> host_encoding (or toasting, or
> whatever) than to do the ascii -> binary change.

From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can (I
know *we* can :-).  It's a matter of safety and generality - in general you
can't be sure that client machines / OS'es will render the same conversions
that the backend does in all cases IMO.

- Luke



Re: Database restore speed

From
"Luke Lonergan"
Date:
Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
>
>> I've used the binary mode stuff before, sure, Postgres may have to
>> convert some things but I have a hard time believing it'd be more
>> expensive to do a network_encoding -> host_encoding (or toasting, or
>> whatever) than to do the ascii -> binary change.
>
> From a performance standpoint no argument, although you're betting that you
> can do parsing / conversion faster than the COPY core in the backend can (I
> know *we* can :-).  It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

One more thing - this is really about the lack of a cross-platform binary
input standard for Postgres IMO.  If there were such a thing, it *would* be
safe to do this.  The current Binary spec is not cross-platform AFAICS, it
embeds native representations of the DATUMs, and does not specify a
universal binary representation of same.

For instance - when representing a float, is it an IEEE 32-bit floating
point number in little endian byte ordering? Or is it IEEE 64-bit?  With
libpq, we could do something like an XDR implementation, but the machinery
isn't there AFAICS.

- Luke



Re: Database restore speed

From
Michael Stone
Date:
On Fri, Dec 02, 2005 at 01:24:31PM -0800, Luke Lonergan wrote:
From a performance standpoint no argument, although you're betting that you
>can do parsing / conversion faster than the COPY core in the backend can

Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win.

Mike Stone

Re: Database restore speed

From
Tom Lane
Date:
"Luke Lonergan" <llonergan@greenplum.com> writes:
> One more thing - this is really about the lack of a cross-platform binary
> input standard for Postgres IMO.  If there were such a thing, it *would* be
> safe to do this.  The current Binary spec is not cross-platform AFAICS, it
> embeds native representations of the DATUMs, and does not specify a
> universal binary representation of same.

Sure it does ... at least as long as you are willing to assume everybody
uses IEEE floats, and if they don't you have semantic problems
translating float datums anyhow.

What we lack is documentation, more than functionality.

            regards, tom lane

Re: Database restore speed

From
"Luke Lonergan"
Date:
Micahel,

On 12/2/05 1:46 PM, "Michael Stone" <mstone+postgres@mathom.us> wrote:

> Not necessarily; you may be betting that it's more *efficient* to do the
> parsing on a bunch of lightly loaded clients than your server. Even if
> you're using the same code this may be a big win.

If it were possible in light of the issues on client parse / convert, then
we should analyze whether it's a performance win.

In the restore case, where we've got a dedicated server with a dedicated
client machine, I don't see why there would be a speed benefit from running
the same parse / convert code on the client versus running it on the server.
Imagine a pipeline where there is a bottleneck, moving the bottleneck to a
different machine doesn't make it less of a bottleneck.

- Luke



Re: Database restore speed

From
Mitch Skinner
Date:
On Fri, 2005-12-02 at 13:24 -0800, Luke Lonergan wrote:
> It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

Can't binary values can safely be sent cross-platform in DataRow
messages?  At least from my ignorant, cursory look at printtup.c,
there's a binary format code path.  float4send in utils/adt/float.c uses
pq_sendfloat4.  I obviously haven't followed the entire rabbit trail,
but it seems like it happens.

IOW, why isn't there a cross-platform issue when sending binary data
from the backend to the client in query results?  And if there isn't a
problem there, why can't binary data be sent from the client to the
backend?

Mitch

Re: Database restore speed

From
"Luke Lonergan"
Date:
And how do we compose the binary data on the client?  Do we trust that the client encoding conversion logic is
identicalto the backend's?  If there is a difference, what happens if the same file loaded from different client
machineshas different results?  Key conflicts when loading a restore from one machine and not from another?
 
- Luke
--------------------------
Sent from my BlackBerry Wireless Device


-----Original Message-----
From: Mitch Skinner <mitch@egcrc.net>
To: Luke Lonergan <LLonergan@greenplum.com>
CC: Stephen Frost <sfrost@snowman.net>; David Lang <dlang@invendra.net>; Steve Oualline <soualline@stbernard.com>;
pgsql-performance@postgresql.org<pgsql-performance@postgresql.org>
 
Sent: Fri Dec 02 22:26:06 2005
Subject: Re: [PERFORM] Database restore speed

On Fri, 2005-12-02 at 13:24 -0800, Luke Lonergan wrote:
> It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

Can't binary values can safely be sent cross-platform in DataRow
messages?  At least from my ignorant, cursory look at printtup.c,
there's a binary format code path.  float4send in utils/adt/float.c uses
pq_sendfloat4.  I obviously haven't followed the entire rabbit trail,
but it seems like it happens.

IOW, why isn't there a cross-platform issue when sending binary data
from the backend to the client in query results?  And if there isn't a
problem there, why can't binary data be sent from the client to the
backend?

Mitch


Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> Stephen,
>
> On 12/2/05 12:18 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
>
>> Just a thought, but couldn't psql be made to use the binary mode of
>> libpq and do at least some of the conversion on the client side?  Or
>> does binary mode not work with copy (that wouldn't suprise me, but
>> perhaps copy could be made to support it)?
>
> Yes - I think this idea is implicit in what David suggested, and my response
> as well.  The problem is that the way the client does conversions can
> potentially differ from the way the backend does.  Some of the types in
> Postgres are machine intrinsic and the encoding conversions use on-machine
> libraries, each of which preclude the use of client conversion methods
> (without a lot of restructuring).  We'd tackled this problem in the past and
> concluded that the parse / convert stage really belongs in the backend.

I'll bet this parsing cost varys greatly with the data types used, I'm
also willing to bet that for the data types that hae different encoding on
different systems there could be a intermediate encoding that is far
faster to parse then ASCII text is.

for example, (and I know nothing about the data storage itself so this is
just an example), if the issue was storing numeric values on big endian
and little endian systems (and 32 bit vs 64 bit systems to end up with 4
ways of holding the data) you have a substantial cost in parseing the
ASCII and converting it to a binary value, but the client can't (and
shouldn't) know which endian type and word size the server is. but it
could create a big endian multi-precision encoding that would then be very
cheap for the server to split and flip as nessasary. yes this means more
work is done overall, but it's split between different machines, and the
binary representation of the data will reduce probably your network
traffic as a side effect.

and for things like date which get parsed in multiple ways until one is
found that seems sane, there's a significant amount of work that the
server could avoid.

David Lang

>> The other thought, of course, is that you could use PITR for your
>> backups instead of pgdump...
>
> Totally - great idea, if this is actually a backup / restore then PITR plus
> filesystem copy (tarball) is hugely faster than dump / restore.
>
> - Luke
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Michael Stone wrote:

> On Fri, Dec 02, 2005 at 01:24:31PM -0800, Luke Lonergan wrote:
>> From a performance standpoint no argument, although you're betting that you
>> can do parsing / conversion faster than the COPY core in the backend can
>
> Not necessarily; you may be betting that it's more *efficient* to do the
> parsing on a bunch of lightly loaded clients than your server. Even if
> you're using the same code this may be a big win.

it's a lot easier to throw hardware at the problem by spliting your
incomeing data between multiple machines and have them all working in
parallel throwing the data at one database then it is to throw more
hardware at the database server to speed it up (and yes, assuming that MPP
splits the parseing costs as well, it can be an answer for some types of
systems)

David Lang

Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> Stephen,
>
> On 12/2/05 1:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
>>
>>> I've used the binary mode stuff before, sure, Postgres may have to
>>> convert some things but I have a hard time believing it'd be more
>>> expensive to do a network_encoding -> host_encoding (or toasting, or
>>> whatever) than to do the ascii -> binary change.
>>
>> From a performance standpoint no argument, although you're betting that you
>> can do parsing / conversion faster than the COPY core in the backend can (I
>> know *we* can :-).  It's a matter of safety and generality - in general you
>> can't be sure that client machines / OS'es will render the same conversions
>> that the backend does in all cases IMO.
>
> One more thing - this is really about the lack of a cross-platform binary
> input standard for Postgres IMO.  If there were such a thing, it *would* be
> safe to do this.  The current Binary spec is not cross-platform AFAICS, it
> embeds native representations of the DATUMs, and does not specify a
> universal binary representation of same.
>
> For instance - when representing a float, is it an IEEE 32-bit floating
> point number in little endian byte ordering? Or is it IEEE 64-bit?  With
> libpq, we could do something like an XDR implementation, but the machinery
> isn't there AFAICS.

This makes sense, however it then raises the question of how much effort
it would take to define such a standard and implement the shim layer
needed to accept the connections vs how much of a speed up it would result
in (the gain could probaly be approximated with just a little hacking to
use the existing binary format between two machines of the same type)

as for the standards, standard network byte order is big endian, so that
should be the standard used (in spite of the quantity of x86 machines out
there). for the size of the data elements, useing the largest size of each
will probably still be a win in size compared to ASCII. converting between
binary formats is useally a matter of a few and and shift opcodes (and
with the core so much faster then it's memory you can afford to do quite a
few of these on each chunk of data without it being measurable in your
overall time)

an alturnative would be to add a 1-byte data type before each data element
to specify it's type, but then the server side code would have to be
smarter to deal with the additional possibilities.

David Lang

Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> Micahel,
>
> On 12/2/05 1:46 PM, "Michael Stone" <mstone+postgres@mathom.us> wrote:
>
>> Not necessarily; you may be betting that it's more *efficient* to do the
>> parsing on a bunch of lightly loaded clients than your server. Even if
>> you're using the same code this may be a big win.
>
> If it were possible in light of the issues on client parse / convert, then
> we should analyze whether it's a performance win.
>
> In the restore case, where we've got a dedicated server with a dedicated
> client machine, I don't see why there would be a speed benefit from running
> the same parse / convert code on the client versus running it on the server.
> Imagine a pipeline where there is a bottleneck, moving the bottleneck to a
> different machine doesn't make it less of a bottleneck.

your database server needs to use it's CPU for
other things besides the parseing. you could buy a bigger machine, but
it's useally far cheaper to buy two dual-proc machines then it is one
quad proc machine (and if you load is such that you already have a
8-proc machine as the database, swallow hard when you ask for the price
of a 16 proc machine), and in addition there is a substantial efficiancy
loss in multi-proc machines (some software, some hardware) that may give
you more available work cycles on the multiple small machines.

if you can remove almost all the parsing load (CPU cycles, memory
footprint, and cache thrashing effects) then that box can do the rest of
it's stuff more efficiantly. meanwhile the client can use what would
otherwise be idle CPU to do the parseing.

if you only have a 1-1 relationship it's a good question  as to if it's a
win (it depends on how much other stuff each box is having to do to
support this), but if you allow for multiple clients it easily becomes a
win.

David Lang

Re: Database restore speed

From
David Lang
Date:
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> And how do we compose the binary data on the client?  Do we trust that
> the client encoding conversion logic is identical to the backend's?  If
> there is a difference, what happens if the same file loaded from
> different client machines has different results?  Key conflicts when
> loading a restore from one machine and not from another? - Luke

the same way you deal with text data that could be in different encodings,
you tag your message with the format version you are useing and throw an
error if you get a format you don't understand how to deal with.

if a client claims to be useing one format, but is instead doing something
different you will be in deep trouble anyway.

remember, we aren't talking about random application code here, we are
talking about postgres client code and libraries, if the library is
incorrect then it's a bug, parsing bugs could happen in the server as
welll. (in fact, the server could parse things to the intermediate format
and then convert them, this sounds expensive, but given the high clock
multipliers in use, it may not end up being measurable)

David Lang

Re: Database restore speed

From
Simon Riggs
Date:
On Fri, 2005-12-02 at 15:18 -0500, Stephen Frost wrote:

> The other thought, of course, is that you could use PITR for your
> backups instead of pgdump...

Yes, it is much faster that way.

Over on -hackers a few optimizations of COPY have been discussed; one of
those is to optimize COPY when it is loading into a table created within
the same transaction as the COPY. This would allow pg_dumps to be
restored much faster, since no WAL need be written in this case.
I hope to work on this fairly soon.

Dumping/restoring data with pg_dump has wider uses than data protecting
backup.

Best Regards, Simon Riggs


Re: Database restore speed

From
"Luke Lonergan"
Date:
Tom,

On 12/2/05 3:00 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
> Sure it does ... at least as long as you are willing to assume everybody
> uses IEEE floats, and if they don't you have semantic problems
> translating float datums anyhow.
>
> What we lack is documentation, more than functionality.

Cool - sounds like the transport part might be there - the thing we desire
is a file format that allows for efficient representation of portable binary
datums.

Last I looked at the Postgres binary dump format, it was not portable or
efficient enough to suit the need.  The efficiency problem with it was that
there was descriptive information attached to each individual data item, as
compared to the approach where that information is specified once for the
data group as a template for input.

Oracle's format allows for the expression of fixed width fields within the
input file, and specifies the data type of the fields in the metadata.  We
could choose to support exactly the specification of the SQL*Loader format,
which would certainly be general enough, and would have the advantage of
providing a compatibility option with Oracle SQL*Loader input.

Note that Oracle does not provide a similar functionality for the expression
of *output* files, those that can be dumped from an Oracle database.  Their
mechanism for database dump is the exp/imp utility pair, and it is a
proprietary "shifting sands" specification AFAIK.  This limits the benefit
of implementing the Oracle SQL*Loader compatibility to those customers who
have designed utilities to emit that format, which may still be valuable.

The alternative is to design a Postgres portable binary input file format.
I'd like to see a record oriented format like that of FORTRAN unformatted,
which uses bookends around each record to identify the length of each
record.  This allows for fast record oriented positioning within the file,
and provides some self-description for integrity checking, etc.

- Luke



Re: Database restore speed

From
Tom Lane
Date:
"Luke Lonergan" <llonergan@greenplum.com> writes:
> Last I looked at the Postgres binary dump format, it was not portable or
> efficient enough to suit the need.  The efficiency problem with it was that
> there was descriptive information attached to each individual data item, as
> compared to the approach where that information is specified once for the
> data group as a template for input.

Are you complaining about the length words?  Get real...

            regards, tom lane

Re: Database restore speed

From
"Luke Lonergan"
Date:
Tom,

On 12/3/05 12:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Luke Lonergan" <llonergan@greenplum.com> writes:
>> Last I looked at the Postgres binary dump format, it was not portable or
>> efficient enough to suit the need.  The efficiency problem with it was that
>> there was descriptive information attached to each individual data item, as
>> compared to the approach where that information is specified once for the
>> data group as a template for input.
>
> Are you complaining about the length words?  Get real...

Hmm - "<sizeof int><int>" repeat, efficiency is 1/2 of "<int>" repeat.  I
think that's worth complaining about.

- Luke



Re: Database restore speed

From
Mitch Skinner
Date:
On Fri, 2005-12-02 at 23:03 -0500, Luke Lonergan wrote:
> And how do we compose the binary data on the client?  Do we trust that the client encoding conversion logic is
identicalto the backend's? 

Well, my newbieness is undoubtedly showing already, so I might as well
continue with my line of dumb questions.  I did a little mail archive
searching, but had a hard time coming up with unique query terms.

This is a slight digression, but my question about binary format query
results wasn't rhetorical.  Do I have to worry about different platforms
when I'm getting binary RowData(s) back from the server?  Or when I'm
sending binary bind messages?

Regarding whether or not the client has identical encoding/conversion
logic, how about a fast path that starts out by checking for
compatibility?  In addition to a BOM, you could add a "float format
mark" that was an array of things like +0.0, -0.0, min, max, +Inf, -Inf,
NaN, etc.

It looks like XDR specifies byte order for floats and otherwise punts to
IEEE.  I have no experience with SQL*Loader, but a quick read of the
docs appears to divide data types into "portable" and "nonportable"
groups, where loading nonportable data types requires extra care.

This may be overkill, but have you looked at HDF5?  Only one hit came up
in the mail archives.
http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html
For (e.g.) floats, the format includes metadata that specifies byte
order, padding, normalization, the location of the sign, exponent, and
mantissa, and the size of the exponent and mantissa.  The format appears
not to require length information on a per-datum basis.  A cursory look
at the data format page gives me the impression that there's a useful
streamable subset.  The license of the implementation is BSD-style (no
advertising clause), and it appears to support a large variety of
platforms.  Currently, the format spec only mentions ASCII, but since
the library doesn't do any actual string manipulation (just storage and
retrieval, AFAICS) it may be UTF-8 clean.

Mitch

Re: Database restore speed

From
David Lang
Date:
On Sat, 3 Dec 2005, Luke Lonergan wrote:

> Tom,
>
> On 12/3/05 12:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
>> "Luke Lonergan" <llonergan@greenplum.com> writes:
>>> Last I looked at the Postgres binary dump format, it was not portable or
>>> efficient enough to suit the need.  The efficiency problem with it was that
>>> there was descriptive information attached to each individual data item, as
>>> compared to the approach where that information is specified once for the
>>> data group as a template for input.
>>
>> Are you complaining about the length words?  Get real...
>
> Hmm - "<sizeof int><int>" repeat, efficiency is 1/2 of "<int>" repeat.  I
> think that's worth complaining about.

but how does it compare to the ASCII representation of that int? (remember
to include your seperator characters as well)

yes it seems less efficiant, and it may be better to do something like
send a record description header that gives the sizes of each item and
then send the records following that without the size items, but either
way should still be an advantage over the existing ASCII messages.

also, how large is the <sizeof int> in the message?

there are other optimizations that can be done as well, but if there's
still a question about if it's worth it to do the parseing on the client
then a first implmentation should be done without makeing to many changes
to test things.

also some of the optimizations need to have measurements done to see if
they are worth it (even something that seems as obvious as seperating the
sizeof from the data itself as you suggest above has a penalty, namely it
spreads the data that needs to be accessed to process a line between
different cache lines, so in some cases it won't be worth it)

David Lang