Re: UPDATE many records - Mailing list pgsql-general
From | Justin |
---|---|
Subject | Re: UPDATE many records |
Date | |
Msg-id | CALL-XeMYo6r90bW3zqFf6SZYnzCpvAGpRj2a1iY3hhaaPXuJGw@mail.gmail.com Whole thread Raw |
In response to | Re: UPDATE many records (Israel Brewster <ijbrewster@alaska.edu>) |
Responses |
Re: UPDATE many records
|
List | pgsql-general |
What was the HD wait time ? What tool is being use to monitor the server resources??
It appears based on this information there is allot more going on than a simple Update command
Moving code out of the trigger probably not going to improve performance, unless there is allot of code that does not need to be processed for this update or code touching other tables
Study the trigger identify what has to run, pull that code out, then disable the trigger. Move the necessary code to a new function for Updating..
On Mon, Jan 6, 2020 at 4:24 PM Israel Brewster <ijbrewster@alaska.edu> wrote:
So a test run on 9,299 records took about 7 seconds to complete (EXPLAIN ANALYZE output at https://explain.depesz.com/s/lIYn if it matters), during which time I did see a postmaster process consuming 100% CPU. Upping the test to 20,819 records took about 16.5 seconds, so that looks relatively linear to me. Also, CPU bound. So by my calculations, doing all 50M records would take around 10 hours.On Jan 6, 2020, at 11:54 AM, Michael Lewis <mlewis@entrata.com> wrote:I’m thinking it might be worth it to do a “quick” test on 1,000 or so records (or whatever number can run in a minute or so), watching the processor utilization as it runs. That should give me a better feel for where the bottlenecks may be, and how long the entire update process would take. I’m assuming, of course, that the total time would scale more or less linearly with the number of records.I think that depends on how your identify and limit the update to those 1000 records. If it is using a primary key with specific keys in an array, probably close to linear increase because the where clause isn't impactful to the overall execution time. If you write a sub-query that is slow, then you would need to exclude that from the time. You can always run explain analyze on the update and rollback rather than commit.One potentially significant note: most of the execution time is spent in a trigger. This trigger is actually what’s doing the REAL update that I need to happen. If it would make a difference, I could easily pull the trigger code out to a separate function that I just call directly (with triggers temporarily disabled). My thinking is that calling a function is calling a function, and the fact that it is currently called via a trigger rather than direct is of little consequence, but I’m willing to be corrected on that :-)---Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320Work: 907-474-5172
cell: 907-328-9145
pgsql-general by date: