Thread: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Александър Шопов
Date:
Hi everyone,
I have a table containing file contents in bytea columns.
The functionality I am trying to achieve is having a result set
containing such columns, iterating over them and streaming them while
zipping them.
The problem is that I get ByteArrayInputStream from
ResultSet.getBinaryStream.
Thus iterating over many rows, each containing more than 10MB of data
smashes the heap. In peak times I will have several such processes.
I am using postgresql-8.4-702.jdbc3.jar against a PG 8.4.5 installation.
I looked at the current source of driver.
Jdbc3ResultSet extends AbstractJdbc3ResultSet extends
AbstractJdbc2ResultSet which is the place that provides implementation
for  getBinaryStream which returns ByteArrayInputStream on bytea
columns, and BlobInputStream on blob columns. On skimming it seems that
BlobInputStream does indeed stream the bytes instead of reading them in
memory (chunks for reads are 4k).
So what am I options? Refactor the DB schema to use blobs rather than
bytea? Is it impossible to have bytea read in chunks?
Kind regards:
al_shopov



Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Radosław Smogura
Date:
I see only two possibilities
1. Decrease fetch size, e.g. to 1.
2. Refactor schema.

Kind regards,
Radek

On Wed, 24 Nov 2010 22:50:46 +0200, Александър Шопов
<lists@kambanaria.org>
wrote:
> Hi everyone,
> I have a table containing file contents in bytea columns.
> The functionality I am trying to achieve is having a result set
> containing such columns, iterating over them and streaming them while
> zipping them.
[...]
> So what am I options? Refactor the DB schema to use blobs rather than
> bytea? Is it impossible to have bytea read in chunks?
> Kind regards:
> al_shopov

--
----------
Radosław Smogura
http://www.softperience.eu

Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Александър Шопов
Date:
В 16:04 -0600 на 24.11.2010 (ср), Radosław Smogura написа:
> I see only two possibilities
> 1. Decrease fetch size, e.g. to 1.
Even if I do, bytea is potentially 1GB. Plus peaks in usage can still
smash the heap.
So refactoring to BLOBs is perhaps the only way out.
Will the JDBC driver always present bytea InputStream as
ByteArrayInputStream? No plans to change that? (even if there are, I
will still have to refactor meanwhile).
Perhaps this behaviour should be better communicated to DB schema
designers.
It seems to me from the Npgsql2.0.11 readme.txt that reading in chunks
is provided for .Net.
Is there need to perhaps make patches for this in the jdbc driver?
Kind regards:
al_shopov


Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Radosław Smogura
Date:
On Thu, 25 Nov 2010 00:53:31 +0200, Александър Шопов
<lists@kambanaria.org>
wrote:
> В 16:04 -0600 на 24.11.2010 (ср), Radosław Smogura написа:
>> I see only two possibilities
>> 1. Decrease fetch size, e.g. to 1.
> Even if I do, bytea is potentially 1GB. Plus peaks in usage can still
> smash the heap.
bytea is like varchar, and it's transmitted to client at all, even if you
don't want to read it, it is somewhere on heap.

> So refactoring to BLOBs is perhaps the only way out.
Here is other solution
http://archives.postgresql.org/pgsql-jdbc/2007-08/msg00078.php
You can write simple stream to do such reads on demand, select everything
without bytea column..., but probably blobs will be better.

> Will the JDBC driver always present bytea InputStream as
> ByteArrayInputStream? No plans to change that? (even if there are, I
> will still have to refactor meanwhile).
As above, the content is on heap, much more when you read you transform
this content and you creates new array (so heap 2x), maybe some chunked  on
demand transformation will be better, but this is driver specific... or you
need to wait when I end binary JDBC and this will be in main release...

> Perhaps this behaviour should be better communicated to DB schema
> designers.
> It seems to me from the Npgsql2.0.11 readme.txt that reading in chunks
> is provided for .Net.
At a glance it looks that chunk reading means reading all network response
at all not bytea.

> Is there need to perhaps make patches for this in the jdbc driver?
> Kind regards:
> al_shopov

----------
Radosław Smogura
http://www.softperience.eu

Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Radosław Smogura
Date:
Hi,

I would like to send few files for getBinaryStream(). So this will work
much like stream and will don't eat so much heap. I don't copy source
this_row[i] array, so I don't know how this will do with concur updates,
(original method doesn't make this when column is not bytea, too). I left
few comments if we should throw exception on broken streams in 8.4, or just
silence notify EOF.

One thing in the below code is to change to PSQLException.
Below is AbstractJdbc2ResultSet.getBinaryStream

public InputStream getBinaryStream(int columnIndex) throws SQLException
    {
        checkResultSet( columnIndex );
        if (wasNullFlag)
            return null;

        if (connection.haveMinimumCompatibleVersion("7.2")) {
            //Version 7.2 supports BinaryStream for all PG bytea type
            //As the spec/javadoc for this method indicate this is to be
used for
            //large binary values (i.e. LONGVARBINARY) PG doesn't have a
separate
            //long binary datatype, but with toast the bytea datatype is
capable of
            //handling very large values.  Thus the implementation ends up
calling
            //getBytes() since there is no current way to stream the value
from the server

            //Copy of some logic from getBytes
            //Version 7.2 supports the bytea datatype for byte arrays
            if (fields[columnIndex - 1].getOID() == Oid.BYTEA) {
                //TODO: Move to datacast in future
                final byte[] bytes = this_row[columnIndex - 1];
                // Starting with PG 9.0, a new hex format is supported
                // that starts with "\x".  Figure out which format we're
                // dealing with here.
                //
                if (bytes.length < 2 || bytes[0] != '\\' || bytes[1] !=
'x') {
                    return new PGByteaTextInputStream_8_4(bytes,
                            (maxFieldSize > 0 &&
isColumnTrimmable(columnIndex)) ? maxFieldSize : Long.MAX_VALUE);
                }else {
                    if (bytes.length % 2 == 1)
                    getExceptionFactory().createException(
                            GT.tr("Unexpected bytea result size, should be
even."),
                            PSQLState.DATA_ERROR);
                    return new PGByteaTextInputStream_9_0_1(bytes,
                            (maxFieldSize > 0 &&
isColumnTrimmable(columnIndex)) ? maxFieldSize : Long.MAX_VALUE);
                }
            }else {
                return new ByteArrayInputStream(getBytes(columnIndex));
            }
        } else {
            // In 7.1 Handle as BLOBS so return the LargeObject input
stream
            if ( fields[columnIndex - 1].getOID() == Oid.OID)
            {
                LargeObjectManager lom = connection.getLargeObjectAPI();
                LargeObject lob = lom.open(getLong(columnIndex));
                return lob.getInputStream();
            }
        }
        return null;
    }

On Thu, 25 Nov 2010 00:53:31 +0200, Александър Шопов
<lists@kambanaria.org>
wrote:
> В 16:04 -0600 на 24.11.2010 (ср), Radosław Smogura написа:
>> I see only two possibilities
>> 1. Decrease fetch size, e.g. to 1.
> Even if I do, bytea is potentially 1GB. Plus peaks in usage can still
> smash the heap.
> So refactoring to BLOBs is perhaps the only way out.
> Will the JDBC driver always present bytea InputStream as
> ByteArrayInputStream? No plans to change that? (even if there are, I
> will still have to refactor meanwhile).
> Perhaps this behaviour should be better communicated to DB schema
> designers.
> It seems to me from the Npgsql2.0.11 readme.txt that reading in chunks
> is provided for .Net.
> Is there need to perhaps make patches for this in the jdbc driver?
> Kind regards:
> al_shopov

--
----------
Radosław Smogura
http://www.softperience.eu

Attachment

Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Kris Jurka
Date:

On Fri, 26 Nov 2010, Rados?aw Smogura wrote:

> I would like to send few files for getBinaryStream(). So this will work
> much like stream and will don't eat so much heap. I don't copy source
> this_row[i] array, so I don't know how this will do with concur updates,
> (original method doesn't make this when column is not bytea, too). I left
> few comments if we should throw exception on broken streams in 8.4, or just
> silence notify EOF.

The problem is that the whole bytea is still in this_row[i].  The value
isn't being streamed from the server.  So yes, you are saving a copy of
the value which does save heap space, but that won't really help the
described problem where many large bytea values are fetched because the
driver will have read and stored them all prior to getBinaryStream being
called.

Kris Jurka

Re: Workarounds for getBinaryStream returning ByteArrayInputStream on bytea

From
Radosław Smogura
Date:
On Fri, 26 Nov 2010 10:25:01 -0500 (EST), Kris Jurka <books@ejurka.com>
wrote:
> On Fri, 26 Nov 2010, Rados?aw Smogura wrote:
>
>> I would like to send few files for getBinaryStream(). So this will work
>> much like stream and will don't eat so much heap. I don't copy source
>> this_row[i] array, so I don't know how this will do with concur
updates,
>> (original method doesn't make this when column is not bytea, too). I
left
>> few comments if we should throw exception on broken streams in 8.4, or
>> just
>> silence notify EOF.
>
> The problem is that the whole bytea is still in this_row[i].  The value
> isn't being streamed from the server.  So yes, you are saving a copy of
> the value which does save heap space, but that won't really help the
> described problem where many large bytea values are fetched because the
> driver will have read and stored them all prior to getBinaryStream being

> called.
>
> Kris Jurka

Yes indeed it will don't give you "big" heap save, but driver calls in
getBinaryStream() getBytes(), then PGBytea... method. This method
transforms source, text based, array into pure binary array, so it creates
some kind of copy of source, generally smeller (this copy will not be
smaller then source divided by 4). So, when Aleksander compress 1GB files,
I assume he use stream compression, he allocates in addition about
500-800MB on heap for this transformed array, but he doesn't needs it so
big at one time, as compression block isn't larger then 1MB.

It is the way why submitted streams performs "on-line" conversion.
--
----------
Radosław Smogura
http://www.softperience.eu

Improved JDBC driver part 2

From
Radosław Smogura
Date:
Hello,

Maybe you are interested about this what I done with JDBC

=== Original driver (Text mode) ===
* Memory *
1. Memory usage improvments when using result set input streams (no uneeded
memory copy) - needs few touches for bigger performance.
2. Memory usage improvments for large data, should be no problem to load 1GB
bytea[] when have only 300MB of memory ("threshold" size still hardcoded).

* JDBC 4 *
1. XML are now correctly transformed before send to server - previous version
used normal text-file transformations that is not enaugh.
2. In all modes (text/binary) XMLs are sended in binary mode, so driver don't
need to do special transformation (does it require libxml?), until character
streams are used.
3. JDBC4 exception throwing.
4. XML objects are readable only once, you can't reuse it, update form result
set (silently set to null on RS.updateRow() - shouldn't be silent) returns
null till refreshRow(), but you can write to them after load.
5.Target XML behavior is streaming behavior to don't repeat problems with
bytea.

* JDBC 4.1 *
1. Just started.

* Others *
1. Few additional test cases. Few utils for XML checking (string equals is too
less) no good, but better.
2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE fields,
after changing default time zone.

=== Binary mode ===
1. Read for almost all data types with arrays.
2. Write for few.
3. Much more restrictive checking when casting form one type to other.
4. Exceptions when casting from one type to other inproper type.
5. Still ResultSet.getString() for XML will return XML - this spec. prohibited
(X - base type conversion, x - possible conversion, no x - no base and
possible = no conversion).
6. No getPriviliges for metadata - no binary output for ACL!!!
7. Many, many tests passed.
8. Data reading is faster for all reads (checked with profiler, against
original driver).

Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
currently JDK 6 compatible (will be not), compressed patch takes about 136kb
gziped.

Kind regards & have a nice day
----------
Radosław Smogura
http://www.softperience.eu

Re: [HACKERS] Improved JDBC driver part 2

From
Valentine Gogichashvili
Date:
Hi, 

I cannot get the file:

Resolving www.rsmogura.net... 64.120.14.83
Connecting to www.rsmogura.net|64.120.14.83|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2010-12-01 12:05:29 ERROR 404: Not Found.


On Tue, Nov 30, 2010 at 7:49 PM, Radosław Smogura <mail@smogura.eu> wrote:
Hello,

Maybe you are interested about this what I done with JDBC

=== Original driver (Text mode) ===
* Memory *
1. Memory usage improvments when using result set input streams (no uneeded
memory copy) - needs few touches for bigger performance.
2. Memory usage improvments for large data, should be no problem to load 1GB
bytea[] when have only 300MB of memory ("threshold" size still hardcoded).

* JDBC 4 *
1. XML are now correctly transformed before send to server - previous version
used normal text-file transformations that is not enaugh.
2. In all modes (text/binary) XMLs are sended in binary mode, so driver don't
need to do special transformation (does it require libxml?), until character
streams are used.
3. JDBC4 exception throwing.
4. XML objects are readable only once, you can't reuse it, update form result
set (silently set to null on RS.updateRow() - shouldn't be silent) returns
null till refreshRow(), but you can write to them after load.
5.Target XML behavior is streaming behavior to don't repeat problems with
bytea.

* JDBC 4.1 *
1. Just started.

* Others *
1. Few additional test cases. Few utils for XML checking (string equals is too
less) no good, but better.
2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE fields,
after changing default time zone.

=== Binary mode ===
1. Read for almost all data types with arrays.
2. Write for few.
3. Much more restrictive checking when casting form one type to other.
4. Exceptions when casting from one type to other inproper type.
5. Still ResultSet.getString() for XML will return XML - this spec. prohibited
(X - base type conversion, x - possible conversion, no x - no base and
possible = no conversion).
6. No getPriviliges for metadata - no binary output for ACL!!!
7. Many, many tests passed.
8. Data reading is faster for all reads (checked with profiler, against
original driver).

Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
currently JDK 6 compatible (will be not), compressed patch takes about 136kb
gziped.

Kind regards & have a nice day
----------
Radosław Smogura
http://www.softperience.eu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improved JDBC driver part 2

From
Radosław Smogura
Date:
I've just started small clean up - now it's there.

On Wed, 1 Dec 2010 12:06:19 +0100, Valentine Gogichashvili
<valgog@gmail.com> wrote:
> Hi,
>
> I cannot get the file:
>
> wget http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
> --2010-12-01 12:05:28--
> http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
> Resolving www.rsmogura.net... 64.120.14.83
> Connecting to www.rsmogura.net|64.120.14.83|:80... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2010-12-01 12:05:29 ERROR 404: Not Found.
>
>
> On Tue, Nov 30, 2010 at 7:49 PM, Radosław Smogura <mail@smogura.eu>
wrote:
>
>> Hello,
>>
>> Maybe you are interested about this what I done with JDBC
>>
>> === Original driver (Text mode) ===
>> * Memory *
>> 1. Memory usage improvments when using result set input streams (no
>> uneeded
>> memory copy) - needs few touches for bigger performance.
>> 2. Memory usage improvments for large data, should be no problem to
load
>> 1GB
>> bytea[] when have only 300MB of memory ("threshold" size still
>> hardcoded).
>>
>> * JDBC 4 *
>> 1. XML are now correctly transformed before send to server - previous
>> version
>> used normal text-file transformations that is not enaugh.
>> 2. In all modes (text/binary) XMLs are sended in binary mode, so driver
>> don't
>> need to do special transformation (does it require libxml?), until
>> character
>> streams are used.
>> 3. JDBC4 exception throwing.
>> 4. XML objects are readable only once, you can't reuse it, update form
>> result
>> set (silently set to null on RS.updateRow() - shouldn't be silent)
>> returns
>> null till refreshRow(), but you can write to them after load.
>> 5.Target XML behavior is streaming behavior to don't repeat problems
with
>> bytea.
>>
>> * JDBC 4.1 *
>> 1. Just started.
>>
>> * Others *
>> 1. Few additional test cases. Few utils for XML checking (string equals
>> is
>> too
>> less) no good, but better.
>> 2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE
>> fields,
>> after changing default time zone.
>>
>> === Binary mode ===
>> 1. Read for almost all data types with arrays.
>> 2. Write for few.
>> 3. Much more restrictive checking when casting form one type to other.
>> 4. Exceptions when casting from one type to other inproper type.
>> 5. Still ResultSet.getString() for XML will return XML - this spec.
>> prohibited
>> (X - base type conversion, x - possible conversion, no x - no base and
>> possible = no conversion).
>> 6. No getPriviliges for metadata - no binary output for ACL!!!
>> 7. Many, many tests passed.
>> 8. Data reading is faster for all reads (checked with profiler, against
>> original driver).
>>
>> Driver is here
>> http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gzis
>> currently JDK 6 compatible (will be not), compressed patch takes about
>> 136kb
>> gziped.
>>
>> Kind regards & have a nice day
>> ----------
>> Radosław Smogura
>> http://www.softperience.eu
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>

--
----------
Radosław Smogura
http://www.softperience.eu

Re: [HACKERS] Improved JDBC driver part 2

From
Magnus Hagander
Date:
On Tue, Nov 30, 2010 at 19:49, Radosław Smogura <mail@smogura.eu> wrote:
> Hello,
>
> Maybe you are interested about this what I done with JDBC

<snip>


> Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
> currently JDK 6 compatible (will be not), compressed patch takes about 136kb
> gziped.

Is there any particular reason why this work can't be maintained as a
branch to the main driver? My understanding is your work is based off
that one? Being able to work like that would make things a lot easier
to review.

That said, such a process would also be a lot easier if the JDBC
driver wasn't in cvs ;)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Improved JDBC driver part 2

From
Lew
Date:
Magnus Hagander wrote:
> Is there any particular reason why this work can't be maintained as a
> branch to the main driver? My understanding is your work is based off
> that one? Being able to work like that would make things a lot easier
> to review.
>
> That said, such a process would also be a lot easier if the JDBC
> driver wasn't in cvs ;)

Why is that a problem?

--
Lew

Re: [HACKERS] Improved JDBC driver part 2

From
Radosław Smogura
Date:
On Wed, 1 Dec 2010 12:47:13 +0100, Magnus Hagander <magnus@hagander.net>
wrote:
> On Tue, Nov 30, 2010 at 19:49, Radosław Smogura <mail@smogura.eu> wrote:
>> Hello,
>>
>> Maybe you are interested about this what I done with JDBC
>
> <snip>
>
>
>> Driver is here
>> http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
>> currently JDK 6 compatible (will be not), compressed patch takes about
>> 136kb
>> gziped.
>
> Is there any particular reason why this work can't be maintained as a
> branch to the main driver? My understanding is your work is based off
> that one? Being able to work like that would make things a lot easier
> to review.
Yes, it's based on this, with CVS subfolders in sources. I don't see any
problems to maintain this as branch. Ah only one need to read something
about CVS & branching.

> That said, such a process would also be a lot easier if the JDBC
> driver wasn't in cvs ;)
Yes, SVN is much more nicer.
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/

--
----------
Radosław Smogura
http://www.softperience.eu

Re: [HACKERS] Improved JDBC driver part 2

From
David Fetter
Date:
On Wed, Dec 01, 2010 at 12:47:13PM +0100, Magnus Hagander wrote:
> On Tue, Nov 30, 2010 at 19:49, Radosław Smogura <mail@smogura.eu> wrote:
> > Hello,
> >
> > Maybe you are interested about this what I done with JDBC
>
> <snip>
>
>
> > Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
> > currently JDK 6 compatible (will be not), compressed patch takes about 136kb
> > gziped.
>
> Is there any particular reason why this work can't be maintained as a
> branch to the main driver? My understanding is your work is based off
> that one? Being able to work like that would make things a lot easier
> to review.
>
> That said, such a process would also be a lot easier if the JDBC
> driver wasn't in cvs ;)

This brings up an excellent point.  Would the people now developing
the JDBC driver object to switching to git?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Improved JDBC driver part 2

From
"Kevin Grittner"
Date:
David Fetter <david@fetter.org> wrote:

> Would the people now developing the JDBC driver object to
> switching to git?

If we move to git, don't forget that there is not one repository
which has the entire history for PostgreSQL JDBC -- the current
repository is missing some work, including releases, from one stable
branch.  (Was it 7.4?)  We'd probably want to merge that in as part
of any conversion effort.

-Kevin

Re: [HACKERS] Improved JDBC driver part 2

From
David Fetter
Date:
On Wed, Dec 01, 2010 at 10:15:38AM -0600, Kevin Grittner wrote:
> David Fetter <david@fetter.org> wrote:
>
> > Would the people now developing the JDBC driver object to
> > switching to git?
>
> If we move to git, don't forget that there is not one repository
> which has the entire history for PostgreSQL JDBC -- the current
> repository is missing some work, including releases, from one stable
> branch.  (Was it 7.4?)  We'd probably want to merge that in as part
> of any conversion effort.

I guess that depends on our current needs.  As pre-split-off JDBC
driver history is already preserved in the main git tree, I'd see
putting the pre-split history into the JDBC tree as less important
than making current and future JDBC development easier, but that's
just me.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Improved JDBC driver part 2

From
David Fetter
Date:
On Wed, Dec 01, 2010 at 07:27:59AM -0500, Lew wrote:
> Magnus Hagander wrote:
> >Is there any particular reason why this work can't be maintained as
> >a branch to the main driver?  My understanding is your work is
> >based off that one?  Being able to work like that would make things
> >a lot easier to review.
> >
> >That said, such a process would also be a lot easier if the JDBC
> >driver wasn't in cvs ;)
>
> Why is that a problem?

Because to an excellent approximation, in practice, CVS does not
actually provide the ability to branch and merge, which means that
patches like Radoslav's are developed pretty much in isolation.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [HACKERS] Improved JDBC driver part 2

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Wed, Dec 01, 2010 at 10:15:38AM -0600, Kevin Grittner wrote:
>> If we move to git, don't forget that there is not one repository
>> which has the entire history for PostgreSQL JDBC -- the current
>> repository is missing some work, including releases, from one stable
>> branch.  (Was it 7.4?)  We'd probably want to merge that in as part
>> of any conversion effort.

> I guess that depends on our current needs.  As pre-split-off JDBC
> driver history is already preserved in the main git tree, I'd see
> putting the pre-split history into the JDBC tree as less important
> than making current and future JDBC development easier, but that's
> just me.

It was difficult enough to get an accurate conversion from cvs to git
when that was the only constraint.  Trying to merge some unrelated
history in at the same time seems like a recipe for trouble.  I'd
recommend just converting your existing repo and calling it good.

            regards, tom lane

Re: [HACKERS] Improved JDBC driver part 2

From
Lew
Date:
Magnus Hagander wrote:
>>> That said, such a process would also be a lot easier if the JDBC
>>> driver wasn't in cvs ;)

Lew wrote:
>> Why is that a problem?

David Fetter wrote:
> Because to an excellent approximation, in practice, CVS does not
> actually provide the ability to branch and merge, which means that
> patches like Radoslav's are developed pretty much in isolation.

That answer surprises me.  Over the last ten years I've used CVS at many jobs,
and I've used it to branch and merge lots of times.  I found it roughly
equivalent to, say, subversion in utility for that purpose.  In fact, for home
development to this day I use CVS and put IDE-specific files (e.g., NetBeans
"nbproject" directory tree) in a branch away from the trunk.

Well, as they say, YMMV.

--
Lew

Re: [HACKERS] Improved JDBC driver part 2

From
Maciek Sakrejda
Date:
> Over the last ten years I've used CVS at many jobs, and I've used it to branch and
> merge lots of times. I found it roughly equivalent to, say, subversion in utility for that
> purpose.

That's probably true. I don't think anyone would suggest a move to SVN
at this point.

> Well, as they say, YMMV.

Without getting into a distributed / centralized version control holy
war, as the core PostgreSQL project itself has found, there are major
advantages to the DVCS model for open source projects. It enables
workflows that would lead to a nightmare of merge conflicts with many
other tools (including CVS/SVN). This is part of what has made github
so successful.

I've done a private clone of JDBC CVS with git cvsimport for my work.
I imagine a number of other developers do that too. This is almost
certainly not sufficient for a proper migration, but it's usable. For
what it's worth, right now, it's not causing me any grief to manage my
git clone privately. I'll certainly put in a +1 for git, though.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: [HACKERS] Improved JDBC driver part 2

From
Craig Ringer
Date:
On 12/02/2010 07:43 AM, Lew wrote:
> Magnus Hagander wrote:
>>>> That said, such a process would also be a lot easier if the JDBC
>>>> driver wasn't in cvs ;)
>
> Lew wrote:
>>> Why is that a problem?
>
> David Fetter wrote:
>> Because to an excellent approximation, in practice, CVS does not
>> actually provide the ability to branch and merge, which means that
>> patches like Radoslav's are developed pretty much in isolation.
>
> That answer surprises me. Over the last ten years I've used CVS at many
> jobs, and I've used it to branch and merge lots of times. I found it
> roughly equivalent to, say, subversion in utility for that purpose.

I agree - it's roughly equivalent to svn (though w/o atomic commits).

Both suffer from the problem that interested contributors who have not
been granted commit access cannot branch. They cannot publish their work
with reference to mainline - they have to use patches or just copy the
whole HEAD into their own independent repository. Both options suck when
you want to track upstream HEAD and make sure that the upstream
developers can understand and follow your proposed changes.

I'd love to see JDBC move to git.postgresql.org or github, both to be
consistent with the rest of Pg and to make it easier to contribute.
postgresql is mirrored at github, and the same would make sense for jdbc
- keep the master on git.postgresql.org, mirror at github for easier
branch/fork/pull/merge .

--
Craig Ringer