Re: pages_in_range for BRIN index - Mailing list pgsql-novice

From David Harrigan
Subject Re: pages_in_range for BRIN index
Date
Msg-id 20200606132720.ji66sn6fc2zraovg@frodo.localdomain
Whole thread Raw
In response to Re: pages_in_range for BRIN index  (Stephen Frost <sfrost@snowman.net>)
Responses Re: pages_in_range for BRIN index
List pgsql-novice
On 06/06/2020 08:35:56, Stephen Frost wrote:

> Greetings,
>
> * David Harrigan (dharrigan@gmail.com) wrote:
> > I'm having difficulty in trying to understand the value I should use for
> > `pages_in_range` when creating a BRIN index. I have data coming in, nearly
> > 3-4
> > rows per second with an ever increasing `created` timestamp. Perfect for a
> > BRIN index to save on space :-).
>
> Yup, that does sound like the right use-case.
>
> > I gathered some data (at time of query) on the table I'm interested in:
> >
> > Size (bytes): 16130588672
> > Pages: 1969066
> > Rows: 48215073
> > Rows/Page: 24
>
> Not quite there yet, but hopefully you're thinking about partitioning
> too.
>
> > At the moment, the column I'm indexing on, created, has a BTREE index on it,
> > and it's over 2GB in size!
>
> Well, ok, sure, 2G isn't *that* big tho. :)
>
> > What would I need to do to go about choosing an appropriate value for
> > pages_in_range?
>
> Ok, so, this really depends on what queries you're running and how many
> rows you want to have to scan for them.  If you're mostly doing 'bulk'
> kinds of queries that hit a lot of rows, then having a larger value is
> probably fine since you won't be going through that many rows that you
> have to throw away.  On the other hand, if the queries that are using
> this index are just getting back a couple of rows as a result, then you
> probably want a smaller value, to make those queries be fast.
>
> Of course, the tradeoff here is in the size of the index- a smaller
> number means a larger index, and a larger number means a smaller index.
>
> If you're not sure though- try the default..?
>
> > Lastly, is it necessary to run a `vacuum analyse freeze` before/after
> > index creation?
>
> Nope.
>
> Thanks,
>
> Stephen

Hi!

Thank you kindly for replying. Very insightful. Thank you! The type of
query is mostly "give me all the rows between now and 1 week ago",
i.e., 7 days worth of data. In some cases that may be extended to 2
weeks or 1 month ago, but generally it's around that time range (it is
filtered on another few attributes, otherwise it would be a massive
amount of data - however it is mostly time constrained). Would
classify as a "bulk" query (not quite sure what you mean in that
regard).

Perhaps I will go with the default and see how that works out!

I haven't thought about partitioning. I'll have to read up on that
(generally, why would I partition?)

Thanks again!

-=david=-

I prefer encrypted and signed messages.
GPG Fingerprint: 110AF423364754E2880FADAD1C5285BFB20A22F9

No trees were harmed in the sending of this message, however, a number
of electrons were inconvenienced.

This email is sent without prejudice.

Attachment

pgsql-novice by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pages_in_range for BRIN index
Next
From: Stephen Frost
Date:
Subject: Re: pages_in_range for BRIN index