DB Performance decreases due to often written/accessed table - Mailing list pgsql-performance

From Jens Schipkowski
Subject DB Performance decreases due to often written/accessed table
Date
Msg-id op.thn5e4sj81rjf6@xjens.apus.local
Whole thread Raw
Responses Re: DB Performance decreases due to often written/accessed  (Richard Huxton <dev@archonet.com>)
Re: DB Performance decreases due to often written/accessed table  ("Merlin Moncure" <mmoncure@gmail.com>)
List pgsql-performance
Hi,

we've got performance problems due to repeating SELECT, UPDATE, DELETE,
INSERT statements. This statements have to be executed every 10 seconds,
but they run into a timeout.
To obviate problems regarding to our Java Software and JDBC drivers, we
put the repeating sequence of statements to a file more than 100k times
(half a million statements) and executed "psql ourDB -f ourFile.sql -o
/dev/null". To accelerate the occurence of the performance drop, we
started 6 instances of this command.
The performance drop occured after 10 minutes shifting the server to 0
percent idle and 85 - 95 percent user.
For tracing the statement which raised the load, we are using pg_locks,
pg_stat_activity with current_query enabled. The responsible statement is
the DELETE, it hangs until its canceled by timeout. The first run on an
vacuumed DB took 300 - 600ms.
In a second test we removed the DELETE statements to see wich statements
also needs longer time by increasing the amount of data. After half an
hour the SELECT statements timed out.
An additional VACUUM - every 1 minute - does extend the timeout occurence
by factor 5 - 6.
It does not sound professional, but the database seems to be aging by the
increase of executed statements.

The Statements
---------------
// normal insert - does not seem to be the problem - runtime is ok
INSERT INTO tbl_reg(idreg,idtype,attr1,...,attr6,loc1,...,loc3,register)
VALUES(nextval('tbl_reg_idreg_seq'),1,[attr],[loc],1);

// select finds out which one has not an twin
// a twin is defined as record with the same attr* values
// decreases speed over time until timeout by postgresql
SELECT *
 FROM tbl_reg reg
WHERE register <> loc1 AND
    idreg NOT IN
        (
        SELECT reg.idreg
        FROM tbl_reg reg, tbl_reg regtwin
        WHERE regtwin.register = 1 AND
            regtwin.type <> 20 AND
            reg.attr1 = regtwin.attr1 AND
            reg.attr2 = regtwin.attr2 AND
            reg.attr3 = regtwin.attr3 AND
            reg.attr4 = regtwin.attr4 AND
            reg.attr5 = regtwin.attr5 AND
            reg.attr6 = regtwin.attr6 AND
            reg.idreg <> regtwin.idreg AND
            reg.register = 2
        );
I tried to optimize the seslect statement but the group by having count(*)
> 1 solution is half as fast as this statement - relating to the query
plan of EXPLAIN ANALYZE.

// delete data without a twin
// drastically decreases speed over time until timeout by postgresql
DELETE
 FROM tbl_reg
WHERE idregs IN
    (
    SELECT reg.idreg
    FROM tbl_reg reg, tbl_reg regtwin
    WHERE regtwin.register = 1 AND
        regtwin.type <> 20 AND
        reg.attr1 = regtwin.attr1 AND
        reg.attr2 = regtwin.attr2 AND
        reg.attr3 = regtwin.attr3 AND
        reg.attr4 = regtwin.attr4 AND
        reg.attr5 = regtwin.attr5 AND
        reg.attr6 = regtwin.attr6 AND
        reg.idreg <> regtwin.idreg AND
        reg.register = 2
    ) OR
    (loc1 = '2' AND loc2 = '2');
The runtime of this statement increases until it will canceled by
PostgreSQL.

// the where clause of this update statement is normally build in java
UPDATE tbl_reg SET loc1=2 WHERE idreg IN ('...',...,'...');

The Table
---------------
Tested with: 20.000, 80.000, 500.000 records

CREATE TABLE tbl_reg
(
   idreg bigserial NOT NULL,
   idtype int8 DEFAULT 0,
   attr1 int4,
   attr2 int4,
   attr3 varchar(20),
   attr4 varchar(20),
   attr5 int4,
   attr6 varchar(140) DEFAULT ''::character varying,
   loc1 int2 DEFAULT 0,
   loc2 int2 DEFAULT 0,
   loc3 int2 DEFAULT 0,
   register int2 DEFAULT 1,
   "timestamp" timestamp DEFAULT now(),
   CONSTRAINT tbl_reg_pkey PRIMARY KEY (idreg)
)
WITHOUT OIDS;

The Hardware
----------------
Dual Xeon 3.2GHz Hyperthreading
SCSI harddrives
RAID and non-RAID tested


We have the problem, that we cannot see any potential to improve SQL
statements. Indexing the attr* columns seems not to be an solution,
because the data mustn't be unique (twins) and changes really often so
reindexing will took too long.


thanks,
Jens Schipkowski

pgsql-performance by date:

Previous
From: "Rohit_Behl"
Date:
Subject:
Next
From: Andrzej Zawadzki
Date:
Subject: VACUUM FULL ANALYZE on 8.1.4 is slower then on 8.0