Thread: Streaming solution and v3.1 protocol

Streaming solution and v3.1 protocol

From
Radosław Smogura
Date:
Hello,

Sorry for short introduction about this, and plese as far as possible, 
disconnet it from LOBs, as it on top of LOB.

Idea of streaming is to reduce memory copy mainly during receiving and sending 
tuples. Currently receive works as follows
1. Read bytes of tuple (allocate x memory).
2. Eventually convert it to database encoding.
3. Use this data and create datum (which is little changed copy of 1 or 2)
4. Streaming will be allowed only in binary mode, and actually stream in/out 
will return binary data.

Look for example at execution chain from textrecv.

Idea is to add stream_in, stream_out columns pg_type.

When value is to be serialized the sin/sout is called. Struct must pass len of 
data, and struct of stream (simillar to C FILE*).

Caller should validate if all bytes has been consumed (expose simple methods 
for this)

To implement(code API requirements):
First stream is buffered socekt reader.

Fast substreams - create fast stream limited to x bytes basing on other stream

Skipping bytes + skipAll()

Stream filtering - do fast (faster will be if conversion will occure in 
buffered chunks) encoding conversion.

Support for direct PG printf() version. Linux has ability to create cookie 
streams and use it with fprintf(), so its greate advantage to format huge 
strings. Other system should buffer output. Problem is if Linux cookie will 
fail will it write something to output? Windows proxy will push value to temp 
buffer.

Good idea may be to introduce new version of protocol reserving len field 
value 
(-2) for fixed size streams above 4GB
(-3) for chunked streaming - actualy is innovative functionality and it's not 
required by any driver.

In streaming it may be imagined that socket's fd will be passed to sin 
functions.

Problems: during output - something failed while writing. Resolution, add some 
control flags for each n-bytes send to client. This will prevent sending of 
e.g. 4GB of data if first byte filed, You send only n-bytes and then abort is 
received - or send data in frames.

Regards,
Radek


Re: Streaming solution and v3.1 protocol

From
Heikki Linnakangas
Date:
On 03.06.2011 19:19, Radosław Smogura wrote:
> Hello,
>
> Sorry for short introduction about this, and plese as far as possible,
> disconnet it from LOBs, as it on top of LOB.
>
> Idea of streaming is to reduce memory copy mainly during receiving and sending
> tuples. Currently receive works as follows
> 1. Read bytes of tuple (allocate x memory).
> 2. Eventually convert it to database encoding.
> 3. Use this data and create datum (which is little changed copy of 1 or 2)
> 4. Streaming will be allowed only in binary mode, and actually stream in/out
> will return binary data.

Hmm, I was thinking that streaming would be a whole new mode, alongside 
the current text and binary mode.

> Look for example at execution chain from textrecv.
>
> Idea is to add stream_in, stream_out columns pg_type.
>
> When value is to be serialized the sin/sout is called. Struct must pass len of
> data, and struct of stream (simillar to C FILE*).
>
> Caller should validate if all bytes has been consumed (expose simple methods
> for this)
>
> To implement(code API requirements):
> First stream is buffered socekt reader.
>
> Fast substreams - create fast stream limited to x bytes basing on other stream
>
> Skipping bytes + skipAll()
>
> Stream filtering - do fast (faster will be if conversion will occure in
> buffered chunks) encoding conversion.
>
> Support for direct PG printf() version. Linux has ability to create cookie
> streams and use it with fprintf(), so its greate advantage to format huge
> strings. Other system should buffer output. Problem is if Linux cookie will
> fail will it write something to output? Windows proxy will push value to temp
> buffer.

This is pretty low-level stuff, I think we should focus on the protocol 
changes and user-visible libpq API first.

However, we don't want to use anything Linux-specific here, so that 
cookie streams are not an option.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Streaming solution and v3.1 protocol

From
Merlin Moncure
Date:
On Fri, Jun 3, 2011 at 12:04 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 03.06.2011 19:19, Radosław Smogura wrote:
>>
>> Hello,
>>
>> Sorry for short introduction about this, and plese as far as possible,
>> disconnet it from LOBs, as it on top of LOB.
>>
>> Idea of streaming is to reduce memory copy mainly during receiving and
>> sending
>> tuples. Currently receive works as follows
>> 1. Read bytes of tuple (allocate x memory).
>> 2. Eventually convert it to database encoding.
>> 3. Use this data and create datum (which is little changed copy of 1 or 2)
>> 4. Streaming will be allowed only in binary mode, and actually stream
>> in/out
>> will return binary data.
>
> Hmm, I was thinking that streaming would be a whole new mode, alongside the
> current text and binary mode.
>
>> Look for example at execution chain from textrecv.
>>
>> Idea is to add stream_in, stream_out columns pg_type.
>>
>> When value is to be serialized the sin/sout is called. Struct must pass
>> len of
>> data, and struct of stream (simillar to C FILE*).
>>
>> Caller should validate if all bytes has been consumed (expose simple
>> methods
>> for this)
>>
>> To implement(code API requirements):
>> First stream is buffered socekt reader.
>>
>> Fast substreams - create fast stream limited to x bytes basing on other
>> stream
>>
>> Skipping bytes + skipAll()
>>
>> Stream filtering - do fast (faster will be if conversion will occure in
>> buffered chunks) encoding conversion.
>>
>> Support for direct PG printf() version. Linux has ability to create cookie
>> streams and use it with fprintf(), so its greate advantage to format huge
>> strings. Other system should buffer output. Problem is if Linux cookie
>> will
>> fail will it write something to output? Windows proxy will push value to
>> temp
>> buffer.
>
> This is pretty low-level stuff, I think we should focus on the protocol
> changes and user-visible libpq API first.

+1.  in particular, I'd like to see the libpq api changes.

merlin