diff between 6.5.3 and 7.0.2 with SELECT/UPDATE - Mailing list pgsql-general
From | Jim Mercer |
---|---|
Subject | diff between 6.5.3 and 7.0.2 with SELECT/UPDATE |
Date | |
Msg-id | 20000623100338.F24735@reptiles.org Whole thread Raw |
List | pgsql-general |
i have some code, running production, on 6.5.3. i have one process, which does a SELECT within a transaction, using a cursor. this process then passes the read records, through stdout, and a pipeline, into other processes which examine and deal with the data. currently, i have: $ read-rec | adjust-rec | summarize-rec read-rec does a: BEGIN WORK; DECLARE records CURSOR FOR SELECT ... FROM table1 WHERE ...; loop FETCH FORWARD n IN records print to stdout END WORK; adjust-rec does a: BEGIN WORK; loop read stdin verify some values if changed UPDATE table1 SET f1 = value .... WHERE key = value; print to stdout END WORK; summarize-rec does a: loop read stdin tabulate various columns BEGIN WORK; INSERT summary values INTO table2; END WORK; the reason i do this as a pipeline, is that there are various procedures i need to do on that set of data that read-rec reads. the pipeline allows me to add processing, without changing any of the existing code. i simply add another element to the pipeline, or remove it. (i can also test the pipeline elements by saving the output of read-rec to a file and doing "newmodule < file.dat", the data sets i'm dealing with here have millions of records, and as such, a full test can take some time and a simple test using redirected data may not even need to touch the database.) in any case, that code is working wonderfully on a FreeBSD machine running 6.5.3. i built a new FreeBSD machine and put 7.0.2 on it. when i run the same pipeline on the 7.0.2 machine, i am finding that on the first update performed by adjust-rec, the backend for that query jumps to 90% CPU and stays there. also, setproctitle of that process shows that it is doing "UPDATE". now, in a sample run, the read-rec process will read 400,000 records. in that batch, there are a known 100 records, randomly dispersed, that will need adjustment. the backend of the adjust-rec process will have a proctitle of idle, until the first update, then the read-rec process gets bogged down and the whole pipeline slows down. i've tried a couple things like changing from TRANSACTION ISOLATION from READ COMMITTED to SERIALIZED for one and for all elements of the pipeline. i'm sure there is something i'm missing here, probably to do with something in the locking or concurrency changes between 6.5.3 and 7.0.2. in short, if you have one process doing a long select on a table, and a second process jumps in and updates a record of that table, why does it slow things down so much? -- [ Jim Mercer jim@reptiles.org +1 416 410-5633 ] [ Reptilian Research -- Longer Life through Colder Blood ] [ Don't be fooled by cheap Finnish imitations; BSD is the One True Code. ]
pgsql-general by date: