This message raises the doubt of a possible PostgreSQL bug connected
with
large objects and the locking mechanism.
1) The problem
I'm currently experiencing the followin problem.
I need to store in a PostgreSQL database a large amount of double
precision
numbers (they are wavelets coefficients, if you know what they are).
Since they are more than 8kb, I store them as a large object of about
46kb.
I've also written a set of functions that operate on them.
One of this functions is the following:
float8 *
get_stddev (Oid wdata, int4 elem)
{
float8 *result;
result = (float8 *) palloc (sizeof (float8));
if ((fd = lo_open (wdata, INV_READ)) == -1)
elog (ERROR, "wav_dist: Cannot access wavelet data");
<some `lo_read's>
lo_close (fd);
return result;
}
Once registered in the database, I call it as
SELECT DISTINCT get_stddev (fieldname, 1) FROM tablename;
Of course, there are also more complicated functions.
When the number of records in the db is around 300 (and above), I get
the
following messages:
NOTICE: LockReleaseAll: cannot remove lock from HTAB
NOTICE: LockRelease: find xid, table corrupted
NOTICE: LockRelease: find xid, table corrupted
NOTICE: LockRelease: find xid, table corrupted
FATAL: unrecognized data from the backend. It probably dumped core.
FATAL: unrecognized data from the backend. It probably dumped core.
Please note that the first run of the query gives the expected results
(sometimes).
If I run
gdb postgres core
and type where, I get
#0 0x8100bd9 in hash_search ()
#1 0x8100aec in hash_search ()
#2 0x80d4a75 in LockAcquire ()
#3 0x80d6538 in SingleLockPage ()
#4 0x80d4486 in RelationSetSingleRLockPage ()
#5 0x8070b6a in _bt_pagedel ()
#6 0x80708c0 in _bt_getbuf ()
#7 0x80703d9 in _bt_getroot ()
#8 0x8072105 in _bt_first ()
#9 0x8070fef in btgettuple ()
#10 0x8100414 in fmgr_c ()
#11 0x810071b in fmgr ()
#12 0x806b608 in index_getnext ()
#13 0x80d36c3 in inv_read ()
#14 0x80d354a in inv_read ()
#15 0x8098d09 in lo_read ()
#16 0x40230856 in ?? () from <<<this is my shared library>>>
<other frames follow>
1.1) Further analisys
To further study this problem, I've created the following table:
CREATE TABLE foo (fii oid);
and added it
INSERT INTO foo VALUES (lo_import ('/tmp/f'));
300 times. /tmp/f is a sample file of 46116 bytes.
The problem continues to arise.
I also noted that, using a code that does the following:
for each tuple
open connection
lo_export
close connection
<something on the exported file>
all goes well.
Otherwise, the following
open connection
for each tuple
lo_export
<something on the exported file>
close connection
fails around the same tuple.
Once, using dmesg, I found the message
VFS: file-max limit 1024 reached
but only once.
-->> Everything seems connected with the locking mechanism.
-->> If I run the postmaster with -o -L, everything (but not all) works.
I usually run postmaster with the -F flag. I tried to disable it, but
PostgreSQL continues to fail.
I'm running a Linux box (i586 120Mh) with kernel 2.1.65 ELF,
PostgreSQL 6.3.2 compiled with GCC 2.8.1, 64 Mb RAM.
Thanks for any help or suggestion
Alessandro Baldoni
abaldoni@csr.unibo.it
http://www.csr.unibo.it/~abaldoni