Thread: Extreme bloating of intarray GiST indexes
Hackers, I'm currently looking at a database which has some extreme bloating of intarray GiST indexes. As in 1000% bloating in only a few months. This is not a particularly high-transaction-rate database, so the bloating is a little surprising; I can only explain it if vacuum wasn't cleaning the indexes at all, and maybe not even then. We're currently instrumenting the database so that we can collect a bit more data on update activity, but in the meantime, has anyone seen anything like this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I'm currently looking at a database which has some extreme bloating of > intarray GiST indexes. As in 1000% bloating in only a few months. This > is not a particularly high-transaction-rate database, so the bloating is > a little surprising; I can only explain it if vacuum wasn't cleaning the > indexes at all, and maybe not even then. > We're currently instrumenting the database so that we can collect a bit > more data on update activity, but in the meantime, has anyone seen > anything like this? 1. What PG version? 2. If new enough to have contrib/pgstattuple, what does pgstattuple() have to say about the index? I'm suspicious that this might be bloat caused by a bad picksplit function, not from having a lot of dead entries in the index. We've fixed several other bogus picksplit functions in contrib in the past. regards, tom lane
> 1. What PG version? 8.4.4, so it has the broken picksplit. > 2. If new enough to have contrib/pgstattuple, what does pgstattuple() > have to say about the index? Will check. > I'm suspicious that this might be bloat caused by a bad picksplit function, > not from having a lot of dead entries in the index. We've fixed several > other bogus picksplit functions in contrib in the past. Yeah, I'll test updating to 8.4.8. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> 1. What PG version? > 8.4.4, so it has the broken picksplit. > ... > Yeah, I'll test updating to 8.4.8. Uh, no, the picksplit bugs we fixed were in cube and seg --- there's no reason to think that updating will help this. But 8.4's pgstattuple does appear to support gist indexes, so please run that and see what you get. regards, tom lane
On Thu, Apr 28, 2011 at 11:11 PM, Josh Berkus <josh@agliodbs.com> wrote:
----
With best regards,
Alexander Korotkov.
I'm currently looking at a database which has some extreme bloating of
intarray GiST indexes. As in 1000% bloating in only a few months. This
is not a particularly high-transaction-rate database, so the bloating is
a little surprising; I can only explain it if vacuum wasn't cleaning the
indexes at all, and maybe not even then.
We're currently instrumenting the database so that we can collect a bit
more data on update activity, but in the meantime, has anyone seen
anything like this?
What opclass is used for GiST index: gist__int_ops or gist__intbig_ops?
Do you take into account that gist__int_ops is very inefficient for large datasets?
With best regards,
Alexander Korotkov.
Alexander Korotkov <aekorotkov@gmail.com> writes: > What opclass is used for GiST index: gist__int_ops or gist__intbig_ops? > Do you take into account that gist__int_ops is very inefficient for large > datasets? I seem to recall some discussion recently about documenting where you should cut over to using "gist__intbig_ops" --- IIRC, it wasn't all that "big" by modern standards. But it doesn't look like any such change made it into the docs. Should we reopen that discussion? regards, tom lane
On Fri, Apr 29, 2011 at 1:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I seem to recall some discussion recently about documenting where youshould cut over to using "gist__intbig_ops" --- IIRC, it wasn't all that
"big" by modern standards. But it doesn't look like any such change made
it into the docs. Should we reopen that discussion?
Actually, I don't see a reason to make decision between gist__int_ops and gist__intbig_ops. Because we can choose between full enumeration and lossy bitmap on the fly on the base of array length (when some length threshold achived array is converted to bitmap). If this problem is urgent, I can write a patch with opclass that would seem more suitable to be default to me, when I'll have a time for it.
With best regards,
Alexander Korotkov.
Tom Lane <tgl@sss.pgh.pa.us> writes: > Uh, no, the picksplit bugs we fixed were in cube and seg --- there's > no reason to think that updating will help this. But 8.4's pgstattuple > does appear to support gist indexes, so please run that and see what > you get. There's also gevel that I used to inspect in development GiST index, and I found it pretty useful. Don't know yet how it compares to pgstattuple. http://www.sai.msu.su/~megera/wiki/Gevel Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Tom, Alexander, So we are using gist_intbig_ops, so that's not the issue. Using pgstattuple might be a bit of a challenge. The client doesn't have it installed, and I can't pull it from Yum without also upgrading PostgreSQL, since Yum doesn't stock old versions AFAIK. Maybe we should consider making diagnostic utilities like this standard with PostgreSQL? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Tom, Alexander, > So we are using gist_intbig_ops, so that's not the issue. > Using pgstattuple might be a bit of a challenge. The client doesn't > have it installed, and I can't pull it from Yum without also upgrading > PostgreSQL, since Yum doesn't stock old versions AFAIK. And updating Postgres to latest minor release is a bad thing why? I can't believe you're not holding your client's feet to the fire about running an old version, quite independently of the fact that they need that contrib module. But having said that, what you say makes no sense at all. They have intarray installed, so they have postgresql-contrib. I know of no Yum-accessible distributions in which intarray and pgstattuple wouldn't be delivered in the same RPM. regards, tom lane
Tom, Alexander, So, some data: corp=# select indexname, pg_size_pretty(pg_relation_size(indexname::text)) as indexsize, pg_size_pretty(pg_relation_size(tablename::text)) as tablesize from pg_indexes where indexname like '%__listings_features' order by pg_relation_size(indexname::text) desc; indexname | indexsize | tablesize ---------------------------------------+------------+------------idx__listings_features | 52 MB | 20MB corp=# select * from pg_indexes where indexname = 'idx__listings_features'; -[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- schemaname | boards tablename | listings indexname | idx__listings_features tablespace | indexdef | CREATE INDEX idx__listings_features ON listings USING gist (features public.gist__intbig_ops) WHERE ((deleted_at IS NULL) AND (status_id = 1)) corp=# select * from public.pgstattuple('idx__listings_features'); -[ RECORD 1 ]------+--------- table_len | 54190080 tuple_count | 7786 tuple_len | 2117792 tuple_percent | 3.91 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 49297536 free_percent | 90.97 ^^^^^^^^^ Well, that explains the bloating. Why all that free space, though? Maybe autovac isn't running? Nope: corp=# select * from pg_stat_user_tables where relname = 'listings'; -[ RECORD 1 ]----+------------------------------ relid | 110919 schemaname | boards relname | listings seq_scan | 37492 seq_tup_read | 328794009 idx_scan | 33982523 idx_tup_fetch | 302782765 n_tup_ins | 19490 n_tup_upd | 668445 n_tup_del | 9826 n_tup_hot_upd | 266661 n_live_tup | 9664 n_dead_tup | 776 last_vacuum | 2010-07-25 19:46:45.922861+00 last_autovacuum | 2011-04-30 17:30:40.555311+00 last_analyze | 2010-07-25 19:46:45.922861+00 last_autoanalyze | 2011-04-28 23:49:54.968689+00 I don't know when stats were last reset (see, this is why we need a reset timestamp!) so not sure how long those have been accumulating. (note: object names changed for confidentiality) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
All, Some trending data, since there's a lot of bloated indexes here: select 'index_' || ( row_number() over ( order by free_percent desc ) ) as "index", * from ( select (public.pgstattuple(indexname::text)).free_percent, round(( n_tup_upd )::numeric / n_tup_ins, 2) as update_ratio, round(( n_tup_hot_upd )::numeric / n_tup_ins, 2) as hot_update_ratio, round(( n_tup_del * 100 )::numeric / n_tup_ins) as delete_percent, extract('days' from ( now() - last_autovacuum )) as days_since_vac, n_live_tup / 1000 as "1K_tuples" from pg_indexes join pg_stat_user_tables as tables ON pg_indexes.schemaname = tables.schemaname AND pg_indexes.tablename = tables.relname where indexname like '%__listings_features' ) as idxstats order by free_percent desc; index|free_percent|update_ratio|hot_update_ratio|delete_percent|days_since_vac|1K_tuples index_1|90.97|34.30|13.68|50|3|9 index_2|87.14|15.54|2.99|41|1|2 index_3|85.08|10.86|1.42|35|5|77 index_4|84.28|22.27|5.47|18|4|370 index_5|82.4|13.65|3.89|24|49|82 index_6|82.2|11.32|2.22|29|3|54 index_7|80.97|14.38|2.95|6|14|17 index_8|80.59|15.64|2.73|48|1|29 index_9|78.43|12.81|2.97|21|37|42 index_10|77.91|11.24|2.33|57|1|21 index_11|77.26|12.73|2.00|18|11|55 index_12|77.07|16.62|2.71|15|7|7 index_13|76.56|12.20|3.20|11|11|18 index_14|75.94|14.52|2.00|23|13|15 index_15|74.73|14.94|2.68|17|11|34 index_16|73.78|15.94|3.77|25|5|2 index_17|73.54|50.19|4.26|14|14|10 index_18|73.11|15.07|6.70|20|20|7 index_19|72.82|10.26|4.63|19|11|7 index_20|72.55|15.59|5.14|22|3|13 index_21|68.52|19.69|5.49|13|11|3 index_22|61.47|14.00|4.61|27|47|2 index_23|45.06|18.10|11.65|19|96|2 index_24|37.75|6.04|1.32|36|96|15 index_25|36.87|15.32|3.71|10|96|17 index_26|32.32|7.07|2.15|18|96|15 index_27|0|6.28|0.74|10|316|48 This makes a pretty graph, but the only thing it tells me is that the handful of non-bloated tables are the ones which weren't vacuumed recently, and either have very few rows or haven't gotten a lot of updates. This is not a surprise. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > So, some data: > corp=# select * from public.pgstattuple('idx__listings_features'); > -[ RECORD 1 ]------+--------- > table_len | 54190080 > tuple_count | 7786 > tuple_len | 2117792 > tuple_percent | 3.91 > dead_tuple_count | 0 > dead_tuple_len | 0 > dead_tuple_percent | 0 > free_space | 49297536 > free_percent | 90.97 > ^^^^^^^^^ > Well, that explains the bloating. Why all that free space, though? > Maybe autovac isn't running? No, because you have under 10% dead tuples in the main table. I think this is sufficient proof of the crummy-page-splits theory. Can you provide the data in the column that's indexed? regards, tom lane
I have another hypothesis about index bloat cause. AFAIK, vaccum procedure on GiST don't have any storage utilization guarantee. For example, if only one live item is in some page, then only one item will be left in this page. I.e. there is no index reroganization during vacuum. If there wouldn't be many inserts into such pages in future then they will be stay bloat.
With best regards,
Alexander Korotkov.
Alexander Korotkov <aekorotkov@gmail.com> writes: > I have another hypothesis about index bloat cause. AFAIK, vaccum procedure > on GiST don't have any storage utilization guarantee. For example, if only > one live item is in some page, then only one item will be left in this page. > I.e. there is no index reroganization during vacuum. If there wouldn't be > many inserts into such pages in future then they will be stay bloat. Possibly, but the same is true of btree indexes, and we very seldom see cases where that's a serious issue. In any case, this is all just speculation without evidence --- we need to see actual data to figure out what's going on. regards, tom lane
> No, because you have under 10% dead tuples in the main table. > I think this is sufficient proof of the crummy-page-splits theory. > Can you provide the data in the column that's indexed? Yes, I can. Fortunately, none of it's identifiable. Attached. This is for the index which is 90% free space. So, some other characteristics of this index: * If you didn't notice earlier, it's a partial index. The two columns which determine the partial index change more often than the intarray column. * We've also determined some other unusual patterns from watching the application: (a) the "listings" table is a very wide table, with about 60 columns (b) whenever the table gets updated, the application code updates these 60 columns in 4 sections. So there's 4 updates to the same row, in a single transaction. (c) we *think* that other columns of the table, including other indexed columns, are changed much more frequently than the intarray column is. Currently doing analysis on that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Attachment
Josh Berkus <josh@agliodbs.com> writes: > (c) we *think* that other columns of the table, including other indexed > columns, are changed much more frequently than the intarray column is. > Currently doing analysis on that. Yeah, I noticed that your statistics for the table showed far more updates than insertions or deletions. If the intarray itself didn't change often, that would result in lots of duplicate entries being made in the index. They'd get cleaned by vacuum eventually, but maybe not fast enough to avoid the one-live-tuple-per-page syndrome that Alexander was speculating about. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: >> Can you provide the data in the column that's indexed? > Attached. This is for the index which is 90% free space. I tried loading this data in fresh, and then creating a gist__intbig_ops index on it. I got these pgstattuple numbers (in 8.4.8): table_len | 8806400 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 46.35 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 4088100 free_percent | 46.42 On the other hand, loading the data with a pre-existing empty index gave table_len | 7798784 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 52.33 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 3183672 free_percent | 40.82 Neither of those numbers are great, and it's a bit surprising that CREATE INDEX produces a result notably worse than incremental loading; but still a darn sight better than 90% free space. So I think probably the update pattern has a lot to do with this. > * If you didn't notice earlier, it's a partial index. The two columns > which determine the partial index change more often than the intarray > column. Yeah, with only about half of the table actually indexed, since you showed only 7786 index entries in your results. But unless there's reason to think the indexed and unindexed entries are substantially different in the intarray column, this is probably not very relevant. > * We've also determined some other unusual patterns from watching the > application: > (a) the "listings" table is a very wide table, with about 60 columns > (b) whenever the table gets updated, the application code updates these > 60 columns in 4 sections. So there's 4 updates to the same row, in a > single transaction. Hmm. That is going to lead to four dead index entries for every live one (unless some of the updates are HOT, which won't happen if you're changing any indexed columns). VACUUM will get back the space eventually, but not before you've caused some index bloat. I tried doing something similar to my test table: contrib_regression=# alter table listings add column junk int; ALTER TABLE contrib_regression=# create index li on listings(junk); CREATE INDEX contrib_regression=# begin; BEGIN contrib_regression=# update listings set junk=1; UPDATE 15005 contrib_regression=# update listings set junk=2; UPDATE 15005 contrib_regression=# update listings set junk=3; UPDATE 15005 contrib_regression=# update listings set junk=4; UPDATE 15005 contrib_regression=# commit; COMMIT contrib_regression=# vacuum listings; VACUUM and then got these pgstattuple numbers: table_len | 39460864 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 10.34 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 32923872 free_percent | 83.43 which is up in the same ballpark with your problem. Now probably your client's app is not updating all rows at once, but still this is a pretty wasteful update pattern. Is there a reason not to update all the columns in a single update? If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. regards, tom lane
On 5/4/11 11:29 AM, Tom Lane wrote: > which is up in the same ballpark with your problem. Now probably your > client's app is not updating all rows at once, but still this is a > pretty wasteful update pattern. Is there a reason not to update all > the columns in a single update? Yeah, really crappy application code. Discussing it with app developers now ... > If you can't change the app, I'd suggest more aggressive autovacuuming > as the least painful fix. Will test that. It's not clear that vacuuming is helping at all. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > On 5/4/11 11:29 AM, Tom Lane wrote: >> If you can't change the app, I'd suggest more aggressive autovacuuming >> as the least painful fix. > Will test that. It's not clear that vacuuming is helping at all. Well, you realize of course that you need a REINDEX to get the index size back down to a sane range. Autovacuum may or may not be able to keep it from creeping back up ... but vacuum definitely won't remove existing bloat. regards, tom lane