Thread: Creating A GIN index on JSONB column (large database)
Hi all,
I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.
There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to make the json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxes out which crashes the create process resulting in unhealthy created indexes.
This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the temp storage to blow up and crash out and is it possible to mitigate or a necessary evil.
The only solution we have so far is scale up our RDS instance.
Kind regards,
Taylor Smith
> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote: > > Hi all, > > I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis. > > There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to makethe json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxesout which crashes the create process resulting in unhealthy created indexes. > > This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the tempstorage to blow up and crash out and is it possible to mitigate or a necessary evil. > > The only solution we have so far is scale up our RDS instance. > > Kind regards, > > Taylor Smith The scale-up might be needed only for the create index. Scale down afterwards?
Thanks Rob,
This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?
On Tue, Aug 9, 2022 at 3:39 PM Rob Sargent <robjsargent@gmail.com> wrote:
> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Hi all,
>
> I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis.
>
> There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to make the json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxes out which crashes the create process resulting in unhealthy created indexes.
>
> This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the temp storage to blow up and crash out and is it possible to mitigate or a necessary evil.
>
> The only solution we have so far is scale up our RDS instance.
>
> Kind regards,
>
> Taylor Smith
The scale-up might be needed only for the create index. Scale down afterwards?
> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote: > > Thanks Rob, > > This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexesif needed? > > No I would not. The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up youcan scale down to what you need. Was the original spec expecting 20+T? Any chance of rolling older months off to someother node?
Yes, the database was always going to be a minimum of 20TB however we build the indexes on the partition which at max is about 1.3TB, which in temp files will use about 800GB temporary files. Note it doesn't have the same effect when doing non-concurrently. I don't think rolling off would help as its likely largest partition will either be the current month or the previous month so the problem would still persist.
On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent <robjsargent@gmail.com> wrote:
> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?
>
>
No I would not. The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up you can scale down to what you need. Was the original spec expecting 20+T? Any chance of rolling older months off to some other node?
On 8/9/22 09:39, Rob Sargent wrote: > >> On Aug 9, 2022, at 7:04 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote: >> >> Hi all, >> >> I have a database that is in excess of 20TB in size, partitioned by date on a month to month basis. >> >> There is a column within that stores text (validated to be json but not stored as JSONB). I have a requirement to makethe json searchable using GIN indexes. however when we try to create the GIN index on AWS RDS our temp storage maxesout which crashes the create process resulting in unhealthy created indexes. >> >> This might be an AWS issue but we are wondering the reason why creating these indexes (concurrently) would cause the tempstorage to blow up and crash out and is it possible to mitigate or a necessary evil. >> >> The only solution we have so far is scale up our RDS instance. >> >> Kind regards, >> >> Taylor Smith > The scale-up might be needed only for the create index. Scale down afterwards? Scaling down disks means using DMS (which costs money) to migrate the instance to a new instance. -- Angular momentum makes the world go 'round.
800GB/20TB = 4%.
You've got a 20TB database, and run out of space when 4% is added.
Either you're running way to close to the edge, or I misunderstand something.
Either way, enable auto-scaling. https://aws.amazon.com/about-aws/whats-new/2019/06/rds-storage-auto-scaling/
You've got a 20TB database, and run out of space when 4% is added.
Either you're running way to close to the edge, or I misunderstand something.
Either way, enable auto-scaling. https://aws.amazon.com/about-aws/whats-new/2019/06/rds-storage-auto-scaling/
On 8/9/22 10:29, Taylor Smith wrote:
Yes, the database was always going to be a minimum of 20TB however we build the indexes on the partition which at max is about 1.3TB, which in temp files will use about 800GB temporary files. Note it doesn't have the same effect when doing non-concurrently. I don't think rolling off would help as its likely largest partition will either be the current month or the previous month so the problem would still persist.On Tue, Aug 9, 2022 at 3:50 PM Rob Sargent <robjsargent@gmail.com> wrote:
> On Aug 9, 2022, at 7:45 AM, Taylor Smith <taylorjrsmith@gmail.com> wrote:
>
> Thanks Rob,
>
> This is what I was thinking. Would you say it presents any risk then having a database unable to rebuild its own indexes if needed?
>
>
No I would not. The scale-up + rebuild should tell you how much you actually need and if you’ve over done the scale up you can scale down to what you need. Was the original spec expecting 20+T? Any chance of rolling older months off to some other node?
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.