Thread: PostgreSQL BugTool Submission

PostgreSQL BugTool Submission

From
Unprivileged user
Date:
Chi Fan (chifungfan@yahoo.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
pgsql 7.0.2 cursor bug

Long Description
============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name               :       Chi Fan
Your email address      :       chifungfan@yahoo.com


System Configuration
- ---------------------
  Architecture (example: Intel Pentium)         : Intel Pentium II

  Operating System (example: Linux 2.0.26 ELF)  : Linux 2.2.14 RedHat 6.2

  PostgreSQL version (example: PostgreSQL-6.3)  : PostgreSQL-7.0.2

  Compiler used (example:  gcc 2.7.2)           : I used rpm to install pg


Please enter a FULL description of your problem:
-------------------------------------------------

The backend crash after seeing a message 'NOTICE: trying to delete
portal name that does not exist' after using a cursor on a particular
query (which'll be shown below).


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
-----------------------------------------------------------------------

CREATE SEQUENCE objectseq;
CREATE TABLE Party (
    id integer PRIMARY KEY,
    code text NOT NULL,
    creditRating text NOT NULL,
    subtype text NOT NULL,
    CONSTRAINT choice_subtype CHECK(subtype IN ('Person', 'Org'))
);
CREATE TABLE Org (
    id integer PRIMARY KEY,
    name text NOT NULL,
    FOREIGN KEY (id) REFERENCES Party(id) ON DELETE CASCADE INITIALLY DEFERRED
);
CREATE TABLE PartyRelationship (
    id integer PRIMARY KEY,
    fromPartyID integer NOT NULL,
    toPartyID integer NOT NULL,
    status text,
    FOREIGN KEY (fromPartyID) REFERENCES Party(id),
    FOREIGN KEY (toPartyID) REFERENCES Party(id)
);


begin;
delete from partyrelationship;
delete from org;
delete from party;
end;
begin;
INSERT INTO Party (subtype, id, code, creditRating) VALUES ('Org', 13654, 'NTD', 'good');
INSERT INTO Party (subtype, id, code, creditRating) VALUES ('Org', 13655, 'NTC', 'good');

INSERT INTO Org (id, name) VALUES (13654, 'N.T.D.');
INSERT INTO Org (id, name) VALUES (13655, 'NTC');

INSERT INTO PartyRelationship (id, fromPartyID, toPartyID, status) VALUES (13669, 13654, 13655, null);
end;



-- This is the problematic select query which'll cause the backend
-- crash.
begin;
declare x cursor for
select o.id
  from Org o, Party p
 where o.id = p.id
   and exists(select p3.id
                from Org o2, Party p2, PartyRelationship pr, Party p3
               where o2.id = 13654
                 and o2.id = p2.id
                 and p2.id = pr.fromPartyID
                 and pr.toPartyID = p3.id
                 and o.id = p3.id)
;
fetch all in x;
commit;

-- After commit, a notice warning will be shown.  The backend hasn't
-- crashed yet.  But once you got this warning, the backend will be
-- crashed after the following transaction.
-- NOTICE:  trying to delete portal name that does not exist.


-- The backend'll crash after running the following transaction twice.
begin;
declare y cursor for select * from party p, org o where o.id = p.id;
fetch all in y;
INSERT INTO Org (id, name) VALUES (23654, 'Test');
INSERT INTO Party (subtype, id, code, creditRating) VALUES ('Org', 23654, 'Test', 'good');
commit;
delete from party where id = 23654;


-- The first time you run the query, you'll get:
NOTICE:  CreatePortal: portal <SPI 0> already exists
NOTICE:  CreatePortal: portal <SPI 0> already exists
NOTICE:  CreatePortal: portal <SPI 0> already exists

-- The second time you run the query, the backend crashes and
-- you'll get these messages.
NOTICE:  CreatePortal: portal <SPI 0> already exists
NOTICE:  trying to delete portal name that does not exist.
NOTICE:  LockRelease: you don't own a lock of type AccessShareLock
NOTICE:  PortalHeapMemoryFree: 0x0x8218368 not in alloc set!
NOTICE:  PortalHeapMemoryFree: 0x0x82186c8 not in alloc set!
NOTICE:  AbortTransaction and not in in-progress state
NOTICE:  trying to delete portal name that does not exist.
NOTICE:  LockRelease: you don't own a lock of type AccessShareLock
NOTICE:  PortalHeapMemoryFree: 0x0x8218368 not in alloc set!
NOTICE:  PortalHeapMemoryFree: 0x0x82186c8 not in alloc set!
ERROR:  btree scan list trashed; can't find 0x0x8218640
ERROR:  btree scan list trashed; can't find 0x0x8218640
FATAL 2:  elog: error during error recovery, giving up!
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.



-- But if I simplified the problematic query to the one shown below, the
-- problem is gone.
begin;
declare x cursor for
select o.id
  from Org o, Party p
 where o.id = p.id
   and exists(select * from party p3 where p3.id = 13655 and o.id = p3.id)
;
fetch all in x;
commit;




Sample Code


No file was uploaded with this report

Re: PostgreSQL BugTool Submission

From
Tom Lane
Date:
Unprivileged user <nobody@hub.org> writes:
> The backend crash after seeing a message 'NOTICE: trying to delete
> portal name that does not exist' after using a cursor on a particular
> query (which'll be shown below).

Oooh, that's a nasty one!  The problem is one of bogus memory management
for the hash table that's used by the hash join that the sub-select is
implemented with.  As seen in 7.0.*, the problem is that the hash table
is stored in a separate "portal" which might be deleted before the
portal the CURSOR itself is kept in.  (If so, the eventual delete of the
cursor finds itself referencing already-freed memory.)  Current sources,
7.1-to-be, use a different memory management scheme but still exhibited
a genetically related bug.

I have fixed the problem in current sources but don't see any reasonably
simple/trustworthy way of fixing it in 7.0.*.  What I'd suggest as a
short-term band-aid is picking a different cursor name.  A little
experimentation should find a name that hashes before the name generated
internally for the hashtable portal --- that will ensure that shutdown
occurs in appropriate order.  A kluge, I know :-(

Thanks for the excellent bug report ... I'm sure it was a pain nailing
down a reproducible example of this creepie-crawlie.

            regards, tom lane