Thread: Problems with Large Objects using Postgres 7.2.1

Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
I have a Java application which writes large objects to the database using the JDBC interface. The application reads binary input from an input stream and writes large objects to the database in 8K chunks and calculates the length of the data. At the end of the input stream it closes the large object and commits the large object and then updates associated tables with the large object id and large object length and commits that info to the database. The application has multiple threads (max 8)  simultaneously writing these large objects each using their own connection. Whenever the system has a problem we have a monitor application which detects a need for a system shutdown and shuts down Postgres using a smart shutdown.
 
What I am seeing is that when all 8 threads are running and the system is shutdown, large objects committed in transactions near to the shutdown are corrupt when the database is restarted. I know the large objects are committed, because the associated entries in the tables which point to the large objects are present after the restart with valid information about the large object length and oid. However when I access the large objects I am only returned a 2K chunk even though the table entry tells me the entry should be 320K.
 
Anybody have any ideas what is the problem? Are there any know issues with the recovery of large objects?
 
Chris White

Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
Never saw any responses to this. Anybody have any ideas?
-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Chris White
Sent: Thursday, April 03, 2003 8:36 AM
To: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: [ADMIN] Problems with Large Objects using Postgres 7.2.1

I have a Java application which writes large objects to the database using the JDBC interface. The application reads binary input from an input stream and writes large objects to the database in 8K chunks and calculates the length of the data. At the end of the input stream it closes the large object and commits the large object and then updates associated tables with the large object id and large object length and commits that info to the database. The application has multiple threads (max 8)  simultaneously writing these large objects each using their own connection. Whenever the system has a problem we have a monitor application which detects a need for a system shutdown and shuts down Postgres using a smart shutdown.
 
What I am seeing is that when all 8 threads are running and the system is shutdown, large objects committed in transactions near to the shutdown are corrupt when the database is restarted. I know the large objects are committed, because the associated entries in the tables which point to the large objects are present after the restart with valid information about the large object length and oid. However when I access the large objects I am only returned a 2K chunk even though the table entry tells me the entry should be 320K.
 
Anybody have any ideas what is the problem? Are there any know issues with the recovery of large objects?
 
Chris White

Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
>   What I am seeing is that when all 8 threads are running and the system is
> shutdown, large objects committed in transactions near to the shutdown are
> corrupt when the database is restarted. I know the large objects are
> committed, because the associated entries in the tables which point to the
> large objects are present after the restart with valid information about the
> large object length and oid. However when I access the large objects I am
> only returned a 2K chunk even though the table entry tells me the entry
> should be 320K.

Hmm.  Have you tried looking directly into pg_largeobject to see what
row(s) are present for the particular LO ID?  Is the data that's there
valid?

>   Anybody have any ideas what is the problem? Are there any know issues with
> the recovery of large objects?

No, news to me.  I would suggest that you should be running 7.2.4, not
7.2.1; we don't make dot-releases just to keep busy.  But offhand I
don't know of any recent reports of symptoms like this.

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
Tom,

I didn't looked at the data in the table. However, when I did a lo_export of
one of the objects I only got a 2K file output.

Next time this happens I will look at the table data.

Chris

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, April 09, 2003 9:51 AM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1


"Chris White" <cjwhite@cisco.com> writes:
>   What I am seeing is that when all 8 threads are running and the system
is
> shutdown, large objects committed in transactions near to the shutdown are
> corrupt when the database is restarted. I know the large objects are
> committed, because the associated entries in the tables which point to the
> large objects are present after the restart with valid information about
the
> large object length and oid. However when I access the large objects I am
> only returned a 2K chunk even though the table entry tells me the entry
> should be 320K.

Hmm.  Have you tried looking directly into pg_largeobject to see what
row(s) are present for the particular LO ID?  Is the data that's there
valid?

>   Anybody have any ideas what is the problem? Are there any know issues
with
> the recovery of large objects?

No, news to me.  I would suggest that you should be running 7.2.4, not
7.2.1; we don't make dot-releases just to keep busy.  But offhand I
don't know of any recent reports of symptoms like this.

            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
> I didn't looked at the data in the table. However, when I did a lo_export of
> one of the objects I only got a 2K file output.

IIRC, we store 2K per row in pg_largeobject.  So this is consistent with
the idea that row 0 is present for the LO ID, while row 1 is not.  What
I'm wondering is if the other hundred-odd rows that would be needed to
hold a 300K large object are there or not.  Also, do the rows contain
the appropriate data for their parts of the overall large object?

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
Looking at our code further, the actual code writes the large object commits
it, opens the large object updates the header of the large object (first 58
bytes) with some length info using seeks, then writes and commits the object
again, before updating and committing the associated tables. The data I saw
in the exported file was the header info without the updates for the length
info i.e. after the first commit!!

Chris

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, April 09, 2003 10:28 AM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


"Chris White" <cjwhite@cisco.com> writes:
> I didn't looked at the data in the table. However, when I did a lo_export
of
> one of the objects I only got a 2K file output.

IIRC, we store 2K per row in pg_largeobject.  So this is consistent with
the idea that row 0 is present for the LO ID, while row 1 is not.  What
I'm wondering is if the other hundred-odd rows that would be needed to
hold a 300K large object are there or not.  Also, do the rows contain
the appropriate data for their parts of the overall large object?

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
> Looking at our code further, the actual code writes the large object commits
> it, opens the large object updates the header of the large object (first 58
> bytes) with some length info using seeks, then writes and commits the object
> again, before updating and committing the associated tables. The data I saw
> in the exported file was the header info without the updates for the length
> info i.e. after the first commit!!

Oh, that's interesting.  I wonder whether you could be running into some
variant of this issue:
http://archives.postgresql.org/pgsql-hackers/2002-05/msg00875.php

I looked a little bit at fixing this, but wasn't sure how to get the
appropriate snapshot passed to the LO functions --- the global
QuerySnapshot might not be the right thing, but then what is?  Also,
what if a transaction opens multiple LO handles for the same object
--- should they be able to see each others' updates?  (I'm not sure
we could prevent it, so this may be moot.)

BTW what do you mean exactly by "commit" above?  There is no notion of
committing a large object separately from committing a transaction.

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
See inline ***CJW***

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, April 09, 2003 12:20 PM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


>"Chris White" <cjwhite@cisco.com> writes:
>> Looking at our code further, the actual code writes the large object
commits
>> it, opens the large object updates the header of the large object (first
58
>> bytes) with some length info using seeks, then writes and commits the
object
>> again, before updating and committing the associated tables. The data I
saw
>> in the exported file was the header info without the updates for the
length
>> info i.e. after the first commit!!
>
>Oh, that's interesting.  I wonder whether you could be running into some
>variant of this issue:
>http://archives.postgresql.org/pgsql-hackers/2002-05/msg00875.php
>
>I looked a little bit at fixing this, but wasn't sure how to get the
>appropriate snapshot passed to the LO functions --- the global
>QuerySnapshot might not be the right thing, but then what is?  Also,
>what if a transaction opens multiple LO handles for the same object
>--- should they be able to see each others' updates?  (I'm not sure
>we could prevent it, so this may be moot.)

***CJW***
Nobody can knows about these large objects until the associated table with
large object info is committed to the database, which is after the header
info has been updated. So nobody will be reading them while they are being
updated. The problem seems to be after the restart of the database after a
smart shutdown, when we are having issues with reading the large objects.
***CJW***

>BTW what do you mean exactly by "commit" above?  There is no notion of
>committing a large object separately from committing a transaction.

***CJW***
I meant committing the transaction. The first transaction commit is after
the large object is written and closed. Second is after the large object
update and close. Then the third is after the associated tables are updated.
***CJW***

>            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
>> BTW what do you mean exactly by "commit" above?  There is no notion of
>> committing a large object separately from committing a transaction.

> I meant committing the transaction. The first transaction commit is after
> the large object is written and closed. Second is after the large object
> update and close. Then the third is after the associated tables are updated.

Hmm.  So the state you are seeing corresponds to the commit of the first
transaction, as far as the LO itself goes --- that's perfectly
reasonable.  But I don't see how it could be that the third transaction
appears committed while the second does not.  Are you issuing all these
transactions over the same database connection?  Perhaps the second
transaction isn't really committed?

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
The first and second are over the same connection. The third is over a
different connection, but issued after the second transaction has completed.

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, April 09, 2003 2:19 PM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


"Chris White" <cjwhite@cisco.com> writes:
>> BTW what do you mean exactly by "commit" above?  There is no notion of
>> committing a large object separately from committing a transaction.

> I meant committing the transaction. The first transaction commit is after
> the large object is written and closed. Second is after the large object
> update and close. Then the third is after the associated tables are
updated.

Hmm.  So the state you are seeing corresponds to the commit of the first
transaction, as far as the LO itself goes --- that's perfectly
reasonable.  But I don't see how it could be that the third transaction
appears committed while the second does not.  Are you issuing all these
transactions over the same database connection?  Perhaps the second
transaction isn't really committed?

            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
> The first and second are over the same connection. The third is over a
> different connection, but issued after the second transaction has completed.

Oh?  At this point I'm prepared to bet that the second transaction isn't
really committed when you think it is.  Perhaps the "COMMIT" command
isn't getting flushed out to the backend on time.  You might want to
turn on query logging and watch to see exactly what happens when.

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
How do I do that?

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, April 09, 2003 2:31 PM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


"Chris White" <cjwhite@cisco.com> writes:
> The first and second are over the same connection. The third is over a
> different connection, but issued after the second transaction has
completed.

Oh?  At this point I'm prepared to bet that the second transaction isn't
really committed when you think it is.  Perhaps the "COMMIT" command
isn't getting flushed out to the backend on time.  You might want to
turn on query logging and watch to see exactly what happens when.

            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
> How do I do that?

In 7.2 I believe what you want to turn on is debug_print_query, and
likely also log_timestamp and log_pid (unless you are logging to syslog,
which will supply those items for you).  Don't forget to SIGHUP the
postmaster after you edit postgresql.conf.

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Barry Lind
Date:
It is a setting in the postgresql.conf file.  In 7.2 you set the
parameter debug_print_query to true (in 7.3 the parameter named changed
to log_statement)

--Barry

Chris White wrote:
> How do I do that?
>
> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org
> [mailto:pgsql-jdbc-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Wednesday, April 09, 2003 2:31 PM
> To: cjwhite@cisco.com
> Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
> Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
> 7.2.1
>
>
> "Chris White" <cjwhite@cisco.com> writes:
>
>>The first and second are over the same connection. The third is over a
>>different connection, but issued after the second transaction has
>
> completed.
>
> Oh?  At this point I'm prepared to bet that the second transaction isn't
> really committed when you think it is.  Perhaps the "COMMIT" command
> isn't getting flushed out to the backend on time.  You might want to
> turn on query logging and watch to see exactly what happens when.
>
>             regards, tom lane
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
Thanks will try these. Do you think I need to change the sync method for the
database? It is currently set to the default. I have an idea that the
problem could be that the connection is being used again to save another
large object when the database is shutdown and although we try to do a smart
shutdown if after 60 seconds postgres doesn't shutdown, we kill it and thus
could lose unsynched transactions.

Chris

-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Wednesday, April 09, 2003 2:45 PM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


"Chris White" <cjwhite@cisco.com> writes:
> How do I do that?

In 7.2 I believe what you want to turn on is debug_print_query, and
likely also log_timestamp and log_pid (unless you are logging to syslog,
which will supply those items for you).  Don't forget to SIGHUP the
postmaster after you edit postgresql.conf.

            regards, tom lane


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
Tom Lane
Date:
"Chris White" <cjwhite@cisco.com> writes:
> Do you think I need to change the sync method for the
> database?

No, that's not relevant unless you are looking at failure to recover
from a power loss.

            regards, tom lane


Re: [ADMIN] Problems with Large Objects using Postgres 7.2.1

From
"Chris White"
Date:
Yes we are looking at the ability to recover from a power loss. However, we
only want to recover what we think has been committed to the database.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, April 09, 2003 3:45 PM
To: cjwhite@cisco.com
Cc: pgsql-jdbc@postgresql.org; pgsql-admin@postgresql.org
Subject: Re: [JDBC] [ADMIN] Problems with Large Objects using Postgres
7.2.1


"Chris White" <cjwhite@cisco.com> writes:
> Do you think I need to change the sync method for the
> database?

No, that's not relevant unless you are looking at failure to recover
from a power loss.

            regards, tom lane