Thread: Silent deadlock

Silent deadlock

From
Boris Folgmann
Date:
Hi!

We run a web application based on a complex database using
postgresql-7.2.3-5.80 on Red Hat 8.0.

Generally using pgsql with JDBC is very nice, but from time to time we run
into problems that are caused by something like a silent deadlock, which
means that it isn't reported in the logfile and doesn't rollback one of the
transactions.
Hanging database connections cause the webapp to not return web pages to
the client, so that the connection pool gets filled up soon, as users try
and try to reload the page. This can freeze the entire application very fast.

Most actions are implemented in PL/PGSQL functions to use the transaction
features of pgsql, so we simply call things like

SELECT create_user('test');

in the application. As it seems to us that it's not very portable to parse
pgsql exceptions, e.g. if you try to insert a duplicate value in a unique
coloumn, we check things like this inside the function before doing the
insert. A table lock is necessary to be sure, that the result of the select
is still valid when the new value is inserted. Have a look at this functions:

CREATE OR REPLACE FUNCTION create_user(
        users.username%TYPE,
        users.password%TYPE,
        users.email%TYPE
        )
RETURNS BOOL AS '
DECLARE
        old_user    INTEGER;
        new_user    INTEGER;
BEGIN
        LOCK users IN SHARE MODE;
        SELECT INTO old_user uid FROM users WHERE username=$1;
        IF NOT FOUND THEN
                INSERT INTO users (username, password, email, status,
userlevel)
                VALUES ($1, $2, $3, 1, 0);
                GET DIAGNOSTICS new_user = RESULT_OID;
                PERFORM some_more_stuff((SELECT uid FROM users WHERE
oid=new_user));
                RETURN true;
        ELSE
                RETURN false; -- username already exists
        END IF;
END;
' LANGUAGE 'plpgsql';


-- This is called after a successful login
CREATE OR REPLACE FUNCTION save_login_stats(
        users.uid%TYPE,
        )
RETURNS BOOL AS '
DECLARE
        u      users.uid%TYPE;
BEGIN
        SELECT INTO u uid FROM users
        WHERE uid=$1
        FOR UPDATE;
        IF FOUND THEN
                UPDATE users SET
                last_login    = CURRENT_TIMESTAMP,
                login_counter = login_counter + 1,
                WHERE uid=u;
        ELSE
                RETURN FALSE; -- user not found
        END IF;

        RETURN TRUE;
END;
' LANGUAGE 'plpgsql';


The database reached a condition were nobody could login because
save_login_stats() did not return. A short time before that problem began
create_user() was called. I can not say if it that create_user() call
returned, because the logfile is not verbose enough at the moment. But to
me it seems as create_user() in any way did not free the shared lock of the
users table, that's why the SELECT ... uid ... FOR UPDATE in
save_login_stats() seems to hang.

As all calls are totally concurrent, can you imagine what could be the problem?

greetings,
    boris