Thread: Very slow update / hash join
Hi, I have an update query that's been running for 48 hours now. Since it started it used about 2.5% CPU, and is writing to the disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly waiting for the disks. The query plan looks like this: QUERY PLAN ------------------------------------------------------------------------------------------------------ Update on certificates c (cost=1224052.45..60710389.31 rows=19950420 width=1371) -> Hash Join (cost=1224052.45..60710389.31 rows=19950420 width=1371) Hash Cond: (c.id = cu.id) -> Seq Scan on certificates c (cost=0.00..8372137.31 rows=147868231 width=1258) -> Hash (cost=623981.20..623981.20 rows=19950420 width=117) -> Seq Scan on certificates_update cu (cost=0.00..623981.20 rows=19950420 width=117) I've set the work_mem for this query to 6GB, which seem to be enough to make pgsql_tmp empty, when it was only set to 1 GB it did have files in it. The process is using about 4GB of RAM, of which 0.5 probably comes from the shared_buffers. It did use 100% CPU at the start, but that was for about 80 seconds. I'm guessing that's the time it needs to read and hash the update table. But from that point on, it gets really slow. As you can see, the table is quite large and I want to update about 20M rows of the 133M rows (not sure why the plan say 147M) The table itself is 53GB, and the table it updates from is only 3.3 GB. There are some index on some of the fields (like the id), but none of them are being updated. I tried removing those that did get updated but that had little effect. It does have foreign keys to other tables, and other tables references it, but none of the keys should get updated. Reading or writing the whole table shouldn't take that long, and I have no idea why it's this slow. Does anybody have an idea why it's this slow? From what I understand, the hash join should be the one I want to use, I tried to force the others but that doesn't seem to improve anything. Kurt
On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > Hi, > > I have an update query that's been running for 48 hours now. > Since it started it used about 2.5% CPU, and is writing to the > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly > waiting for the disks. The easiest way to figure out what is going on is to identify the process, and then trace it with something like: strace -T -ttt -y -p <PID of process> That should make it obvious which file it is waiting for IO on. Then you can look up that relfilenode in pg_class to see what table/index it is. > The table itself is 53GB, and the table it updates from is only > 3.3 GB. There are some index on some of the fields (like the id), > but none of them are being updated. I tried removing those that > did get updated but that had little effect. It does have foreign > keys to other tables, and other tables references it, but none of > the keys should get updated. What version of PostgreSQL are you using? Have you tried dropping the foreign keys? Cheers, Jeff
On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > Hi, > > > > I have an update query that's been running for 48 hours now. > > Since it started it used about 2.5% CPU, and is writing to the > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly > > waiting for the disks. > > The easiest way to figure out what is going on is to identify the > process, and then trace it with something like: > > strace -T -ttt -y -p <PID of process> > > That should make it obvious which file it is waiting for IO on. Then > you can look up that relfilenode in pg_class to see what table/index > it is. Thanks for the hint, that I didn't think about it. So it's busy reading all the index files including the primary key, and only writing to the table I'm updating. > What version of PostgreSQL are you using? Have you tried dropping the > foreign keys? I'm using 9.5.2. So I think the foreign keys are unrelated now. They all obviously point to the primary key that's not changing, and it's reading all the index on the table itself, not those on the other tables. It's kind of annoying that I would need to drop the indexes that aren't modified just to run an update query. Kurt
On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote: > On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: > > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > > Hi, > > > > > > I have an update query that's been running for 48 hours now. > > > Since it started it used about 2.5% CPU, and is writing to the > > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly > > > waiting for the disks. > > > > The easiest way to figure out what is going on is to identify the > > process, and then trace it with something like: > > > > strace -T -ttt -y -p <PID of process> > > > > That should make it obvious which file it is waiting for IO on. Then > > you can look up that relfilenode in pg_class to see what table/index > > it is. > > Thanks for the hint, that I didn't think about it. > > So it's busy reading all the index files including the primary > key, and only writing to the table I'm updating. > > > What version of PostgreSQL are you using? Have you tried dropping the > > foreign keys? > > I'm using 9.5.2. > > So I think the foreign keys are unrelated now. They all obviously > point to the primary key that's not changing, and it's reading all > the index on the table itself, not those on the other tables. > > It's kind of annoying that I would need to drop the indexes that > aren't modified just to run an update query. I dropped all the index except for the primary key. It was still as slow when it started, but then I forced the primary key into the filesystem cache and it seems to be much happier now, average reading at about 10 MB/s, writing at 30 MB/s. Kurt
> > It's kind of annoying that I would need to drop the indexes that > > aren't modified just to run an update query. > > I dropped all the index except for the primary key. It was still > as slow when it started, but then I forced the primary key into > the filesystem cache and it seems to be much happier now, average > reading at about 10 MB/s, writing at 30 MB/s. Look at the PG tuning doc's. It seems as if you have too little index cache assigned for the size of your database (and its indexes). Dropping indexes isn't the real answer, tuning the database to accomodate them is a better bet. It would also be worth checking whether the I/O was entirely due to sequential reads or may have been swapping. procinfo, vmstat, or just top can tell you about that. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lembark@wrkhors.com +1 888 359 3508
On Fri, May 06, 2016 at 09:13:09AM -0500, Steven Lembark wrote: > > > > It's kind of annoying that I would need to drop the indexes that > > > aren't modified just to run an update query. > > > > I dropped all the index except for the primary key. It was still > > as slow when it started, but then I forced the primary key into > > the filesystem cache and it seems to be much happier now, average > > reading at about 10 MB/s, writing at 30 MB/s. It finished in 2 hours. > Look at the PG tuning doc's. > It seems as if you have too little index cache assigned for the > size of your database (and its indexes). Dropping indexes isn't > the real answer, tuning the database to accomodate them is a > better bet. As far as I know, the only way to improve this is to make the shared_buffers larger. But the database is multiple times the total RAM. Even that table itself is, and all the combined indexes are too. And I do agree that dropping the indexes isn't a good idea, but it seems to be the only way to get it done in a reasonable time. > It would also be worth checking whether the I/O was entirely due > to sequential reads or may have been swapping. procinfo, vmstat, > or just top can tell you about that. There was no swapping. The point is that it wasn't doing sequential reads but was randomly accessing indexes for what looks like no good reason to me. Kurt
On Fri, May 6, 2016 at 3:21 AM, Kurt Roeckx <kurt@roeckx.be> wrote: > On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote: >> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: >> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx <kurt@roeckx.be> wrote: >> > > Hi, >> > > >> > > I have an update query that's been running for 48 hours now. >> > > Since it started it used about 2.5% CPU, and is writing to the >> > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly >> > > waiting for the disks. >> > >> > The easiest way to figure out what is going on is to identify the >> > process, and then trace it with something like: >> > >> > strace -T -ttt -y -p <PID of process> >> > >> > That should make it obvious which file it is waiting for IO on. Then >> > you can look up that relfilenode in pg_class to see what table/index >> > it is. >> >> Thanks for the hint, that I didn't think about it. >> >> So it's busy reading all the index files including the primary >> key, and only writing to the table I'm updating. It is probably dirtying the indexes as well, but the writes are being buffered in shared_buffers. The dirty blocks will eventually get written by the background writer or the checkpoint writer (which you won't see if you aren't tracing them). >> > What version of PostgreSQL are you using? Have you tried dropping the >> > foreign keys? >> >> I'm using 9.5.2. >> >> So I think the foreign keys are unrelated now. They all obviously >> point to the primary key that's not changing, and it's reading all >> the index on the table itself, not those on the other tables. >> >> It's kind of annoying that I would need to drop the indexes that >> aren't modified just to run an update query. > > I dropped all the index except for the primary key. It was still > as slow when it started, but then I forced the primary key into > the filesystem cache and it seems to be much happier now, average > reading at about 10 MB/s, writing at 30 MB/s. OK, so it sounds like what is happening is that your update cannot do a "Heap-Only Tuple" (HOT) update, because there is not enough room in each data page for the new copy of rows being updated. So it is forced to put the new copy on a different page, which means it has to update all the indexes so they know where to find the new version. If this not a one-time event, then one thing you could do is lower the table's fillfactor, so that the table is more loosely packed and future updates are more likely to be able to do HOT updates. If the rows being updated are randomly scattered, it wouldn' take much lowering to make this happen (maybe 90). But if the rows being updated in a single transaction are co-located with each other, then you might have to lower it to below 50 before it would solve the problem, which might be a solution worse than the problem. When you change the parameter, it won't take full effect until the table has been completely rewritten, either due to natural churn, or running a VACUUM FULL or CLUSTER. If lowering the fillfactor isn't a good solution, then pre-warming your indexes (using pg_prewarm or just doing it yourself from the filesystem) might be your best bet. If not all of the relevant indexes can fit in cache at the same time, then you might still have to drop some of them, or just be patient. If this giant update does not have to occur atomically in order for your application to behave correctly, then I would probably break it up into a series of smaller transactions. Then you could even run them in parallel, which would be a big help if you have a RAID (which can efficiently work on multiple random IO read requests in parallel) but not help so much if you have a single disk. Cheers, Jeff
On Fri, May 06, 2016 at 10:25:34AM -0700, Jeff Janes wrote: > > OK, so it sounds like what is happening is that your update cannot do > a "Heap-Only Tuple" (HOT) update, because there is not enough room in > each data page for the new copy of rows being updated. So it is > forced to put the new copy on a different page, which means it has to > update all the indexes so they know where to find the new version. That makes total sense now. > If this not a one-time event, then one thing you could do is lower the > table's fillfactor, so that the table is more loosely packed and > future updates are more likely to be able to do HOT updates. If the > rows being updated are randomly scattered, it wouldn' take much > lowering to make this happen (maybe 90). But if the rows being > updated in a single transaction are co-located with each other, then > you might have to lower it to below 50 before it would solve the > problem, which might be a solution worse than the problem. When you > change the parameter, it won't take full effect until the table has > been completely rewritten, either due to natural churn, or running a > VACUUM FULL or CLUSTER. I will probably want to run this a few times. The data being updated comes from an external tool and once I add new things or fix bug in it I would like to update the old rows. It's normally an insert/select only table. But there are only about 20M of the 133M current rows (about 15%) that I'm really interested in. So I guess something like an 85% fillfactor might actually help. > If this giant update does not have to occur atomically in order for > your application to behave correctly, then I would probably break it > up into a series of smaller transactions. Then you could even run > them in parallel, which would be a big help if you have a RAID (which > can efficiently work on multiple random IO read requests in parallel) > but not help so much if you have a single disk. I don't care about it being atomic or not. I actually tried to do it in smaller batches before and I ended up calculating that it would take 2 weeks to do the update. Kurt