Thread: Index Partition Size Double of its Table Partition?
We're trying out a new non-unique covering (including) index on a couple of table partitions. We put the index on partitions for last month and this month. Both table partitions have similar sizes (45-46 GB) and row counts (330-333 million). The covering index on last month's partition is 50GB, but this month's index is 79GB already. The table is basically write-only as well. So there shouldn't be any real bloat here.
One thing worth mentioning is that the table is 4 columns, the index is on two of them and includes the other two. I can't think of an explanation for the index being so much larger than its table, especially compared to last month's index.
Curious if anyone has any thoughts on what might be causing this.
-- Don Seiler
www.seiler.us
www.seiler.us
On Wed, Oct 30, 2024 at 11:24 AM Don Seiler <don@seiler.us> wrote: > One thing worth mentioning is that the table is 4 columns, the index is on two of them and includes the other two. I can'tthink of an explanation for the index being so much larger than its table, especially compared to last month's index. > > Curious if anyone has any thoughts on what might be causing this. You mentioned that this has 4 columns. Sounds like this could be a "locally monotonically increasing index". I wonder if you're benefiting from this optimization, though only inconsistently: https://www.youtube.com/watch?v=p5RaATILoiE&t=2079s https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f21668f3 Of course, this is only a guess. I vaguely recall a complaint that sounded vaguely like yours, also involving partitioning. -- Peter Geoghegan
On Wed, Oct 30, 2024 at 10:35 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Oct 30, 2024 at 11:24 AM Don Seiler <don@seiler.us> wrote:
> One thing worth mentioning is that the table is 4 columns, the index is on two of them and includes the other two. I can't think of an explanation for the index being so much larger than its table, especially compared to last month's index.
>
> Curious if anyone has any thoughts on what might be causing this.
You mentioned that this has 4 columns. Sounds like this could be a
"locally monotonically increasing index". I wonder if you're
benefiting from this optimization, though only inconsistently:
https://www.youtube.com/watch?v=p5RaATILoiE&t=2079s
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f21668f3
Of course, this is only a guess. I vaguely recall a complaint that
sounded vaguely like yours, also involving partitioning.
Thanks Peter, I'll look into that shortly.
I should have also included that this is on PG 15 (currently 15.8 but we created the indexes when it was still 15.7) on Ubuntu 22.04 LTS.
Don.
Don Seiler
www.seiler.us
www.seiler.us
On Wed, Oct 30, 2024 at 11:39 AM Don Seiler <don@seiler.us> wrote: > Thanks Peter, I'll look into that shortly. It sounds like you have no updates and deletes. Right? So the only thing that could be different is the way that the pages are being split (aside from variations in the width of index tuples, which seems highly unlikely to be the only factor). The heuristics used to trigger the relevant behavior are fairly conservative. I wonder if we should be more aggressive about it. > I should have also included that this is on PG 15 (currently 15.8 but we created the indexes when it was still 15.7) onUbuntu 22.04 LTS. That shouldn't matter, as far as this theory of mine is concerned. Anything after 12 could be affected by the issue I'm thinking of. -- Peter Geoghegan
On Wed, Oct 30, 2024 at 10:45 AM Peter Geoghegan <pg@bowt.ie> wrote:
It sounds like you have no updates and deletes. Right? So the only
thing that could be different is the way that the pages are being
split (aside from variations in the width of index tuples, which seems
highly unlikely to be the only factor).
Correct, the table only sees inserts as far as DML goes.
The heuristics used to trigger the relevant behavior are fairly
conservative. I wonder if we should be more aggressive about it.
> I should have also included that this is on PG 15 (currently 15.8 but we created the indexes when it was still 15.7) on Ubuntu 22.04 LTS.
That shouldn't matter, as far as this theory of mine is concerned.
Anything after 12 could be affected by the issue I'm thinking of.
Why would last month's index be so much smaller?
Both indexes were created using CONCURRENTLY, as each was created during its month when we started testing. The September index was created toward the end of the month (Sep 26), whereas the October one was created Oct 1. Both table partitions are getting regularly autovacuum/autoanalyze work.
Don.
-- Don Seiler
www.seiler.us
www.seiler.us
On Wed, Oct 30, 2024 at 12:08 PM Don Seiler <don@seiler.us> wrote: > Why would last month's index be so much smaller? Because the split heuristics worked as designed there. That's the theory, at least. > Both indexes were created using CONCURRENTLY, as each was created during its month when we started testing. The Septemberindex was created toward the end of the month (Sep 26), whereas the October one was created Oct 1. Both table partitionsare getting regularly autovacuum/autoanalyze work. If a substantial amount of the index was written by CREATE INDEX (and not by retail inserts) then my theory is unlikely to be correct. It could just be that you managed to absorb most inserts in one partition, but not in the other. That's probably possible when there are only relatively small differences in the number of inserts that need to use of the space left behind by fillfactor in each case. In general page splits tend to come in distinct "waves" after CREATE INDEX is run. -- Peter Geoghegan
On Wed, Oct 30, 2024 at 11:23 AM Peter Geoghegan <pg@bowt.ie> wrote:
If a substantial amount of the index was written by CREATE INDEX (and
not by retail inserts) then my theory is unlikely to be correct. It
could just be that you managed to absorb most inserts in one
partition, but not in the other. That's probably possible when there
are only relatively small differences in the number of inserts that
need to use of the space left behind by fillfactor in each case. In
general page splits tend to come in distinct "waves" after CREATE
INDEX is run.
What do you mean by "absorb" the inserts?
It sounds like the answer will be "No", but: Would rebuilding the index after the month-end (when inserts have stopped on this partition) change anything?
Don.
-- Don Seiler
www.seiler.us
www.seiler.us
Are you able to cluster the table ? The idea is that rows ordered in the same way as the index might reduce it's size ?
On Wed, 30 Oct 2024, 16:29 Don Seiler, <don@seiler.us> wrote:
On Wed, Oct 30, 2024 at 11:23 AM Peter Geoghegan <pg@bowt.ie> wrote:
If a substantial amount of the index was written by CREATE INDEX (and
not by retail inserts) then my theory is unlikely to be correct. It
could just be that you managed to absorb most inserts in one
partition, but not in the other. That's probably possible when there
are only relatively small differences in the number of inserts that
need to use of the space left behind by fillfactor in each case. In
general page splits tend to come in distinct "waves" after CREATE
INDEX is run.What do you mean by "absorb" the inserts?It sounds like the answer will be "No", but: Would rebuilding the index after the month-end (when inserts have stopped on this partition) change anything?Don.--Don Seiler
www.seiler.us
On Wed, Oct 30, 2024 at 4:59 PM David Mullineux <dmullx@gmail.com> wrote:
Are you able to cluster the table ? The idea is that rows ordered in the same way as the index might reduce it's size ?
I'm not sure on this. There are other indexes on these table partitions as well.
Another bit of useful info that I should have shared immediately is that this is a monthly partitioned table, going back years. We don't drop old partitions (yet) on this one. For now we've only added this index to a few individual partitions. The hope was to add it to all of them and then eventually the template (using an older version of pg_partman).
Don.
Don Seiler
www.seiler.us
www.seiler.us
On Thu, Oct 31, 2024 at 11:13 AM Rob Sargent <robjsargent@gmail.com> wrote:
Whats the fill factor?
No fill factor is specified, so I'm assuming it's the default 90% for indexes.
FYI we did a REINDEX for the index in question tonight. Since the index was for last month, there are no more writes to it so we didn't use CONCURRENTLY either. The size went from 83GB to 48GB, which also happens to be the size of the table partition.
--
Don Seiler
www.seiler.us
www.seiler.us