Thread: 10+hrs vs 15min because of just one index
So I'm trying to figure out how to optimize my PG install (8.0.3) to get better performance without dropping one of my indexes. Basically, I have a table of 5M records with 3 columns: pri_key (SERIAL) data char(48) groupid integer there is an additional unique index on the data column. The problem is that when I update the groupid column for all the records, the query takes over 10hrs (after that I just canceled the update). Looking at iostat, top, vmstat shows I'm horribly disk IO bound (for data not WAL, CPU 85-90% iowait) and not swapping. Dropping the unique index on data (which isn't used in the query), running the update and recreating the index runs in under 15 min. Hence it's pretty clear to me that the index is the problem and there's really nothing worth optimizing in my query. As I understand from #postgresql, doing an UPDATE on one column causes all indexes for the effected row to have to be updated due to the way PG replaces the old row with a new one for updates. This seems to explain why dropping the unique index on data solves the performance problem. interesting settings: shared_buffers = 32768 maintenance_work_mem = 262144 fsync = true wal_sync_method = open_sync wal_buffers = 512 checkpoint_segments = 30 effective_cache_size = 10000 work_mem = <default> (1024 i think?) box: Linux 2.6.9-11EL (CentOS 4.1) 2x Xeon 3.4 HT 2GB of RAM (but Apache and other services are running) 4 disk raid 10 (74G Raptor) for data 4 disk raid 10 (7200rpm) for WAL other then throwing more spindles at the problem, any suggestions? Thanks, Aaron -- Aaron Turner http://synfin.net/
On 2/10/06, Aaron Turner <synfinatic@gmail.com> wrote: > So I'm trying to figure out how to optimize my PG install (8.0.3) to > get better performance without dropping one of my indexes. > Basically, I have a table of 5M records with 3 columns: > pri_key (SERIAL) > data char(48) > groupid integer > there is an additional unique index on the data column. > The problem is that when I update the groupid column for all the > records, the query takes over 10hrs (after that I just canceled the > update). Looking at iostat, top, vmstat shows I'm horribly disk IO > bound (for data not WAL, CPU 85-90% iowait) and not swapping. > Dropping the unique index on data (which isn't used in the query), for such a large update i would suggest to go with different scenario: split update into packets (10000, or 50000 rows at the time) and do: update packet vacuum table for all packets. and then reindex the table. should work much nicer. depesz
On 2/10/06, hubert depesz lubaczewski <depesz@gmail.com> wrote: > On 2/10/06, Aaron Turner <synfinatic@gmail.com> wrote: > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > get better performance without dropping one of my indexes. > > Basically, I have a table of 5M records with 3 columns: > > pri_key (SERIAL) > > data char(48) > > groupid integer > > there is an additional unique index on the data column. > > The problem is that when I update the groupid column for all the > > records, the query takes over 10hrs (after that I just canceled the > > update). Looking at iostat, top, vmstat shows I'm horribly disk IO > > bound (for data not WAL, CPU 85-90% iowait) and not swapping. > > Dropping the unique index on data (which isn't used in the query), > > for such a large update i would suggest to go with different scenario: > split update into packets (10000, or 50000 rows at the time) > and do: > update packet > vacuum table > for all packets. and then reindex the table. should work much nicer. The problem is that all 5M records are being updated by a single UPDATE statement, not 5M individual statements. Also, vacuum can't run inside of a transaction. On a side note, is there any performance information on updating indexes (via insert/update) over the size of the column? Obviously, char(48) is larger then most for indexing purposes, but I wonder if performance drops linerally or exponentially as the column width increases. Right now my column is hexidecimal... if I stored it as a binary representation it would be smaller. Thanks, Aaron
Aaron Turner wrote: > So I'm trying to figure out how to optimize my PG install (8.0.3) to > get better performance without dropping one of my indexes. What about something like this: begin; drop slow_index_name; update; create index slow_index_name; commit; vacuum; Matt
On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote: > Aaron Turner wrote: > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > get better performance without dropping one of my indexes. > > What about something like this: > > begin; > drop slow_index_name; > update; > create index slow_index_name; > commit; > vacuum; Right. That's exactly what I'm doing to get the update to occur in 15 minutes. Unfortunately though, I'm basically at the point of every time I insert/update into that table I have to drop the index which is making my life very painful (having to de-dupe records in RAM in my application is a lot faster but also more complicated/error prone). Basically, I need some way to optimize PG so that I don't have to drop that index every time. Suggestions? -- Aaron Turner http://synfin.net/
On Fri, Feb 10, 2006 at 09:24:39AM -0800, Aaron Turner wrote: > On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote: > > Aaron Turner wrote: > > > So I'm trying to figure out how to optimize my PG install (8.0.3) to > > > get better performance without dropping one of my indexes. > > > > What about something like this: > > > > begin; > > drop slow_index_name; > > update; > > create index slow_index_name; > > commit; > > vacuum; > > Right. That's exactly what I'm doing to get the update to occur in 15 > minutes. Unfortunately though, I'm basically at the point of every > time I insert/update into that table I have to drop the index which is > making my life very painful (having to de-dupe records in RAM in my > application is a lot faster but also more complicated/error prone). > > Basically, I need some way to optimize PG so that I don't have to drop > that index every time. > > Suggestions? I think you'll have a tough time making this faster; or I'm just not understanding the problem well enough. It's probably time to start thinking about re-architecting some things in the application so that you don't have to do this. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 2/11/06, Jim C. Nasby <jnasby@pervasive.com> wrote: > On Fri, Feb 10, 2006 at 09:24:39AM -0800, Aaron Turner wrote: > > On 2/10/06, Matthew T. O'Connor <matthew@zeut.net> wrote: > > > Aaron Turner wrote: > > > > Basically, I need some way to optimize PG so that I don't have to drop > > that index every time. > > > > Suggestions? > > I think you'll have a tough time making this faster; or I'm just not > understanding the problem well enough. It's probably time to start > thinking about re-architecting some things in the application so that > you don't have to do this. Well before I go about re-architecting things, it would be good to have a strong understanding of just what is going on. Obviously, the unique index on the char(48) is the killer. What I don't know is: 1) Is this because the column is so long? 2) Is this because PG is not optimized for char(48) (maybe it wants powers of 2? or doesn't like even numbers... I don't know, just throwing it out there) 3) Is there some algorithm I can use to estimate relative UPDATE speed? Ie, if I cut the column length in 1/2 does that make it 50% faster? 4) Does decoding the data (currently base64) and storing the binary data improve the distribution of the index, thereby masking it more efficent? Obviously, one solution would be to store the column to be UPDATED in a seperate joined table. That would cost more disk space, and be more complex, but it would be more efficient for updates (inserts would of course be more expensive since now I have to do two). -- Aaron Turner http://synfin.net/
Aaron Turner <synfinatic@gmail.com> writes: > Well before I go about re-architecting things, it would be good to > have a strong understanding of just what is going on. Obviously, the > unique index on the char(48) is the killer. What I don't know is: You have another unique index on the integer primary key, so it's not the mere fact of a unique index that's hurting you. > 1) Is this because the column is so long? Possibly. Allowing for 12 bytes index-entry overhead, the char keys would be 60 bytes vs 16 for the integer column, so this index is physically almost 4x larger than the other. You might say "but that should only cause 4x more I/O" but it's not necessarily so. What's hard to tell is whether you are running out of RAM disk cache space, resulting in re-reads of pages that could have stayed in memory when dealing with one-fifth as much index data. You did not show us the iostat numbers for the two cases, but it'd be interesting to look at the proportion of writes to reads on the data drive in both cases. > 2) Is this because PG is not optimized for char(48) (maybe it wants > powers of 2? or doesn't like even numbers... I don't know, just > throwing it out there) Are the key values really all 48 chars long? If not, you made a bad datatype choice: varchar(n) (or even text) would be a lot smarter. char(n) wastes space on blank-padding. Another thing to think about is whether this is C locale or not. String comparisons in non-C locales can be horrendously expensive ... though I'd expect that to cost CPU not I/O. (Hmm ... is it possible your libc is hitting locale config files constantly? Might be worth strace'ing to confirm exactly where the I/O is going.) > 4) Does decoding the data (currently base64) and storing the binary > data improve the distribution of the index, thereby masking it more > efficent? No, but it'd reduce the size of the index, which you certainly want. Storing as bytea would also eliminate any questions about wasteful locale-dependent comparisons. The only one of these effects that looks to me like it could result in worse-than-linear degradation of I/O demand is maxing out the available RAM for disk cache. So while improving the datatype choice would probably be worth your while, you should first see if fooling with shared_buffers helps, and if not it's time to buy RAM not disk. regards, tom lane
We've done a lot of testing on large DB's with a lot of "inserts" and have a few comments. The updates are "treated" as a large "insert" as we all know from pg's point of view. We've run into 2 classes of problems: excessing WAL checkpoints and affects of low correlation. WAL log write's full 8K block for first modification, then only changes. This can be the source of "undesireable" behaviour during large batch inserts like this. From your config, a check point will be forced when (checkpoint_segments * 16 M) < rows * (8K/N*h + (1-h)*8K) * B Where h is the "hitrate" or correlation between the update scan and the index. Do you have a sense of what this is? In the limits, we have 100% correlation or 0% correlation. N is the lower cost of putting the change in the WAL entry, not sure what this is, but small, I am assuming, say N=100. B is the average number of blocks changed per updated row (assume B=1.1 for your case, heap,serial index have very high correlation) In the 0% correlation case, each updated row will cause the index update to read/modify the block. The modified block will be entirely written to the WAL log. After (30 * 16M) / (8K) / 1.1 ~ 55k rows, a checkpoint will be forced and all modified blocks in shared buffers will be written out. Increasing checkpoint_segments to 300 and seeing if that makes a difference. If so, the excessive WAL checkpoints are your issue. If performance is exactly the same, then I would assume that you have close to 0% correlation between the rows in the heap and index. Can you increase shared_buffers? With a low correlation index, the only solution is to hold the working set of blocks in memory. Also, make sure that the checkpoint segments are big enough for you to modify them in place, don't want checkpoints occurring.... Note that the more updates you do, the larger the tables/index become and the worse the problem becomes. Vacuuming the table is an "answer" but unfortunately, it tends to decrease correlation from our observations. :-( From our observations, dropping index and rebuilding them is not always practical, depends on your application; table will be exclusively locked during the transaction due to drop index. I haven't looked at pg's code for creating an index, but seriously suspect it's doing an extern sort then insert into the index. Such operations would have 100% correlation from the index insert point of view and the "sort" could be in memory or the tape variety (more efficient i/o pattern). Summary, # of indexes, index correlation, pg's multi versioning, shared_buffers and checkpoint_segments are interconnected in weird and wonderful ways... Seldom have found "simple" solutions to performance problems. Marc > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Aaron Turner > Sent: Friday, February 10, 2006 3:17 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] 10+hrs vs 15min because of just one index > > So I'm trying to figure out how to optimize my PG install > (8.0.3) to get better performance without dropping one of my indexes. > > Basically, I have a table of 5M records with 3 columns: > > pri_key (SERIAL) > data char(48) > groupid integer > > there is an additional unique index on the data column. > > The problem is that when I update the groupid column for all > the records, the query takes over 10hrs (after that I just > canceled the update). Looking at iostat, top, vmstat shows > I'm horribly disk IO bound (for data not WAL, CPU 85-90% > iowait) and not swapping. > > Dropping the unique index on data (which isn't used in the > query), running the update and recreating the index runs in > under 15 min. > Hence it's pretty clear to me that the index is the problem > and there's really nothing worth optimizing in my query. > > As I understand from #postgresql, doing an UPDATE on one > column causes all indexes for the effected row to have to be > updated due to the way PG replaces the old row with a new one > for updates. This seems to explain why dropping the unique > index on data solves the performance problem. > > interesting settings: > shared_buffers = 32768 > maintenance_work_mem = 262144 > fsync = true > wal_sync_method = open_sync > wal_buffers = 512 > checkpoint_segments = 30 > effective_cache_size = 10000 > work_mem = <default> (1024 i think?) > > box: > Linux 2.6.9-11EL (CentOS 4.1) > 2x Xeon 3.4 HT > 2GB of RAM (but Apache and other services are running) > 4 disk raid 10 (74G Raptor) for data > 4 disk raid 10 (7200rpm) for WAL > > other then throwing more spindles at the problem, any suggestions? > > Thanks, > Aaron > > -- > Aaron Turner > http://synfin.net/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On 2/12/06, Marc Morin <marc@sandvine.com> wrote: > From your config, a check point will be forced when > > (checkpoint_segments * 16 M) < rows * (8K/N*h + (1-h)*8K) * B > > Where h is the "hitrate" or correlation between the update scan and the > index. Do you have a sense of what this is? I know my checkpoints happen > 30 secs apart, since PG isn't complaining in my log. I have no clue what the correlation is. > In the limits, we have 100% > correlation or 0% correlation. N is the lower cost of putting the > change in the WAL entry, not sure what this is, but small, I am > assuming, say N=100. B is the average number of blocks changed per > updated row (assume B=1.1 for your case, heap,serial index have very > high correlation) > > In the 0% correlation case, each updated row will cause the index update > to read/modify the block. The modified block will be entirely written to > the WAL log. After (30 * 16M) / (8K) / 1.1 ~ 55k rows, a checkpoint > will be forced and all modified blocks in shared buffers will be written > out. > > Increasing checkpoint_segments to 300 and seeing if that makes a > difference. If so, the excessive WAL checkpoints are your issue. If > performance is exactly the same, then I would assume that you have close > to 0% correlation between the rows in the heap and index. Ok, i'll have to give that a try. > Can you increase shared_buffers? With a low correlation index, the only > solution is to hold the working set of blocks in memory. Also, make > sure that the checkpoint segments are big enough for you to modify them > in place, don't want checkpoints occurring.... I'll have to look at my memory usage on this server... with only 2GB and a bunch of other processes running around I'm not sure if I can go up much more without causing swapping. Of course RAM is cheap... > Note that the more updates you do, the larger the tables/index become > and the worse the problem becomes. Vacuuming the table is an "answer" > but unfortunately, it tends to decrease correlation from our > observations. :-( Good to know. > From our observations, dropping index and rebuilding them is not always > practical, depends on your application; table will be exclusively locked > during the transaction due to drop index. Yep. In my case it's not a huge problem right now, but I know it will become a serious one sooner or later. Thanks a lot Marc. Lots of useful info. -- Aaron Turner http://synfin.net/
On 2/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Aaron Turner <synfinatic@gmail.com> writes: > > Well before I go about re-architecting things, it would be good to > > have a strong understanding of just what is going on. Obviously, the > > unique index on the char(48) is the killer. What I don't know is: > > You have another unique index on the integer primary key, so it's not > the mere fact of a unique index that's hurting you. Understood. I just wasn't sure if in general unique indexes are some how more expensive then non-unique indexes. > > 1) Is this because the column is so long? > > Possibly. Allowing for 12 bytes index-entry overhead, the char keys > would be 60 bytes vs 16 for the integer column, so this index is > physically almost 4x larger than the other. You might say "but that > should only cause 4x more I/O" but it's not necessarily so. What's > hard to tell is whether you are running out of RAM disk cache space, > resulting in re-reads of pages that could have stayed in memory when > dealing with one-fifth as much index data. You did not show us the > iostat numbers for the two cases, but it'd be interesting to look at > the proportion of writes to reads on the data drive in both cases. Sounds a lot like what Marc mentioned. > > 2) Is this because PG is not optimized for char(48) (maybe it wants > > powers of 2? or doesn't like even numbers... I don't know, just > > throwing it out there) > > Are the key values really all 48 chars long? If not, you made a > bad datatype choice: varchar(n) (or even text) would be a lot > smarter. char(n) wastes space on blank-padding. Yep, everything exactly 48. Looks like I'll be storing it as a bytea in the near future though. > The only one of these effects that looks to me like it could result in > worse-than-linear degradation of I/O demand is maxing out the available > RAM for disk cache. So while improving the datatype choice would > probably be worth your while, you should first see if fooling with > shared_buffers helps, and if not it's time to buy RAM not disk. Yeah, that's what it's beginning to sound like. Thanks Tom. -- Aaron Turner http://synfin.net/
Hi, Aaron, Aaron Turner wrote: > 4) Does decoding the data (currently base64) and storing the binary > data improve the distribution of the index, thereby masking it more > efficent? Yes, but then you should not use varchar, but a bytea. If your data is some numer internally, numeric or decimal may be even better. If most of your data is different in the first 8 bytes, it may also make sense to duplicate them into a bigint, and create the bigint on them. Then you can use AND in your query to test for the 8 bytes (uses index) and the bytea. Ugly, but may work. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
>> Are the key values really all 48 chars long? If not, you made a >> bad datatype choice: varchar(n) (or even text) would be a lot >> smarter. char(n) wastes space on blank-padding. > > Yep, everything exactly 48. Looks like I'll be storing it as a bytea > in the near future though. It's a good idea not to bloat a column by base64 encoding it if you want to index it. BYTEA should be your friend. If your values are not random, you might want to exploit the correlation. But if they are already quite uncorrelated, and you don't need the index for < >, just for =, you can create an index on the md5 of your column and use it to search. It will use a lot less data but the data will be more random. With a functional index, you don't need to modify your application too much.
Well just a little update: 1) Looks like I'm definately RAM constrained. Just placed an order for another 4GB. 2) I ended up dropping the primary key too which helped with disk thrashing a lot (average disk queue wait was between 500ms and 8500ms before and 250-500ms after) 3) Playing with most of the settings in the postgresql.conf actually dropped performance significantly. Looks like I'm starving the disk cache. 4) I'm going to assume going to a bytea helped some (width is 54 vs 66) but nothing really measurable Thanks everyone for your help! -- Aaron Turner http://synfin.net/