Re: Database restore speed - Mailing list pgsql-performance

From David Lang
Subject Re: Database restore speed
Date
Msg-id Pine.LNX.4.62.0512030112330.2807@qnivq.ynat.uz
Whole thread Raw
In response to Re: Database restore speed  ("Luke Lonergan" <llonergan@greenplum.com>)
List pgsql-performance
On Fri, 2 Dec 2005, Luke Lonergan wrote:

> Stephen,
>
> On 12/2/05 1:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
>>
>>> I've used the binary mode stuff before, sure, Postgres may have to
>>> convert some things but I have a hard time believing it'd be more
>>> expensive to do a network_encoding -> host_encoding (or toasting, or
>>> whatever) than to do the ascii -> binary change.
>>
>> From a performance standpoint no argument, although you're betting that you
>> can do parsing / conversion faster than the COPY core in the backend can (I
>> know *we* can :-).  It's a matter of safety and generality - in general you
>> can't be sure that client machines / OS'es will render the same conversions
>> that the backend does in all cases IMO.
>
> One more thing - this is really about the lack of a cross-platform binary
> input standard for Postgres IMO.  If there were such a thing, it *would* be
> safe to do this.  The current Binary spec is not cross-platform AFAICS, it
> embeds native representations of the DATUMs, and does not specify a
> universal binary representation of same.
>
> For instance - when representing a float, is it an IEEE 32-bit floating
> point number in little endian byte ordering? Or is it IEEE 64-bit?  With
> libpq, we could do something like an XDR implementation, but the machinery
> isn't there AFAICS.

This makes sense, however it then raises the question of how much effort
it would take to define such a standard and implement the shim layer
needed to accept the connections vs how much of a speed up it would result
in (the gain could probaly be approximated with just a little hacking to
use the existing binary format between two machines of the same type)

as for the standards, standard network byte order is big endian, so that
should be the standard used (in spite of the quantity of x86 machines out
there). for the size of the data elements, useing the largest size of each
will probably still be a win in size compared to ASCII. converting between
binary formats is useally a matter of a few and and shift opcodes (and
with the core so much faster then it's memory you can afford to do quite a
few of these on each chunk of data without it being measurable in your
overall time)

an alturnative would be to add a 1-byte data type before each data element
to specify it's type, but then the server side code would have to be
smarter to deal with the additional possibilities.

David Lang

pgsql-performance by date:

Previous
From: David Lang
Date:
Subject: Re: Database restore speed
Next
From: David Lang
Date:
Subject: Re: Database restore speed