Re: pages_in_range for BRIN index - Mailing list pgsql-novice
From | David Harrigan |
---|---|
Subject | Re: pages_in_range for BRIN index |
Date | |
Msg-id | 20200606132720.ji66sn6fc2zraovg@frodo.localdomain Whole thread Raw |
In response to | Re: pages_in_range for BRIN index (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: pages_in_range for BRIN index
|
List | pgsql-novice |
On 06/06/2020 08:35:56, Stephen Frost wrote: > Greetings, > > * David Harrigan (dharrigan@gmail.com) wrote: > > I'm having difficulty in trying to understand the value I should use for > > `pages_in_range` when creating a BRIN index. I have data coming in, nearly > > 3-4 > > rows per second with an ever increasing `created` timestamp. Perfect for a > > BRIN index to save on space :-). > > Yup, that does sound like the right use-case. > > > I gathered some data (at time of query) on the table I'm interested in: > > > > Size (bytes): 16130588672 > > Pages: 1969066 > > Rows: 48215073 > > Rows/Page: 24 > > Not quite there yet, but hopefully you're thinking about partitioning > too. > > > At the moment, the column I'm indexing on, created, has a BTREE index on it, > > and it's over 2GB in size! > > Well, ok, sure, 2G isn't *that* big tho. :) > > > What would I need to do to go about choosing an appropriate value for > > pages_in_range? > > Ok, so, this really depends on what queries you're running and how many > rows you want to have to scan for them. If you're mostly doing 'bulk' > kinds of queries that hit a lot of rows, then having a larger value is > probably fine since you won't be going through that many rows that you > have to throw away. On the other hand, if the queries that are using > this index are just getting back a couple of rows as a result, then you > probably want a smaller value, to make those queries be fast. > > Of course, the tradeoff here is in the size of the index- a smaller > number means a larger index, and a larger number means a smaller index. > > If you're not sure though- try the default..? > > > Lastly, is it necessary to run a `vacuum analyse freeze` before/after > > index creation? > > Nope. > > Thanks, > > Stephen Hi! Thank you kindly for replying. Very insightful. Thank you! The type of query is mostly "give me all the rows between now and 1 week ago", i.e., 7 days worth of data. In some cases that may be extended to 2 weeks or 1 month ago, but generally it's around that time range (it is filtered on another few attributes, otherwise it would be a massive amount of data - however it is mostly time constrained). Would classify as a "bulk" query (not quite sure what you mean in that regard). Perhaps I will go with the default and see how that works out! I haven't thought about partitioning. I'll have to read up on that (generally, why would I partition?) Thanks again! -=david=- I prefer encrypted and signed messages. GPG Fingerprint: 110AF423364754E2880FADAD1C5285BFB20A22F9 No trees were harmed in the sending of this message, however, a number of electrons were inconvenienced. This email is sent without prejudice.
Attachment
pgsql-novice by date: