Home > mailing lists

diff between 6.5.3 and 7.0.2 with SELECT/UPDATE - Mailing list pgsql-general

From	Jim Mercer
Subject	diff between 6.5.3 and 7.0.2 with SELECT/UPDATE
Date	June 23, 2000 10:03:43
Msg-id	20000623100338.F24735@reptiles.org Whole thread Raw
List	pgsql-general

Tree view

i have some code, running production, on 6.5.3.

i have one process, which does a SELECT within a transaction, using a cursor.

this process then passes the read records, through stdout, and a pipeline,
into other processes which examine and deal with the data.

currently, i have:

$ read-rec | adjust-rec | summarize-rec

read-rec does a:
BEGIN WORK;
DECLARE records CURSOR FOR SELECT ... FROM table1 WHERE ...;
loop
   FETCH FORWARD n IN records
   print to stdout
END WORK;

adjust-rec does a:
BEGIN WORK;
loop
   read stdin
   verify some values
   if changed
       UPDATE table1 SET f1 = value .... WHERE key = value;
   print to stdout
END WORK;

summarize-rec does a:
loop
   read stdin
   tabulate various columns
BEGIN WORK;
INSERT summary values INTO table2;
END WORK;

the reason i do this as a pipeline, is that there are various procedures
i need to do on that set of data that read-rec reads.

the pipeline allows me to add processing, without changing any of the existing
code.  i simply add another element to the pipeline, or remove it.
(i can also test the pipeline elements by saving the output of read-rec
to a file and doing "newmodule < file.dat", the data sets i'm dealing with
here have millions of records, and as such, a full test can take some time
and a simple test using redirected data may not even need to touch the
database.)

in any case, that code is working wonderfully on a FreeBSD machine running
6.5.3.

i built a new FreeBSD machine and put 7.0.2 on it.

when i run the same pipeline on the 7.0.2 machine, i am finding that on
the first update performed by adjust-rec, the backend for that query
jumps to 90% CPU and stays there.  also, setproctitle of that process shows
that it is doing "UPDATE".

now, in a sample run, the read-rec process will read 400,000 records.
in that batch, there are a known 100 records, randomly dispersed, that will
need adjustment.

the backend of the adjust-rec process will have a proctitle of idle, until
the first update, then the read-rec process gets bogged down and the whole
pipeline slows down.

i've tried a couple things like changing from TRANSACTION ISOLATION from
READ COMMITTED to SERIALIZED for one and for all elements of the pipeline.

i'm sure there is something i'm missing here, probably to do with something
in the locking or concurrency changes between 6.5.3 and 7.0.2.

in short, if you have one process doing a long select on a table, and a second
process jumps in and updates a record of that table, why does it slow things
down so much?

--
[ Jim Mercer                 jim@reptiles.org              +1 416 410-5633 ]
[          Reptilian Research -- Longer Life through Colder Blood          ]
[  Don't be fooled by cheap Finnish imitations; BSD is the One True Code.  ]

pgsql-general by date:

From: Holger Klawitter
Date: 23 June 2000, 04:50:16
Subject: just a note

From:
Date: 23 June 2000, 10:20:50
Subject: app-index in www.pgsql.com

diff between 6.5.3 and 7.0.2 with SELECT/UPDATE - Mailing list pgsql-general

Previous

Next