On Sat, 3 Dec 2005, Luke Lonergan wrote:
> Tom,
>
> On 12/3/05 12:32 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>
>> "Luke Lonergan" <llonergan@greenplum.com> writes:
>>> Last I looked at the Postgres binary dump format, it was not portable or
>>> efficient enough to suit the need. The efficiency problem with it was that
>>> there was descriptive information attached to each individual data item, as
>>> compared to the approach where that information is specified once for the
>>> data group as a template for input.
>>
>> Are you complaining about the length words? Get real...
>
> Hmm - "<sizeof int><int>" repeat, efficiency is 1/2 of "<int>" repeat. I
> think that's worth complaining about.
but how does it compare to the ASCII representation of that int? (remember
to include your seperator characters as well)
yes it seems less efficiant, and it may be better to do something like
send a record description header that gives the sizes of each item and
then send the records following that without the size items, but either
way should still be an advantage over the existing ASCII messages.
also, how large is the <sizeof int> in the message?
there are other optimizations that can be done as well, but if there's
still a question about if it's worth it to do the parseing on the client
then a first implmentation should be done without makeing to many changes
to test things.
also some of the optimizations need to have measurements done to see if
they are worth it (even something that seems as obvious as seperating the
sizeof from the data itself as you suggest above has a penalty, namely it
spreads the data that needs to be accessed to process a line between
different cache lines, so in some cases it won't be worth it)
David Lang