Re: pages_in_range for BRIN index - Mailing list pgsql-novice

From David Harrigan
Subject Re: pages_in_range for BRIN index
Date
Msg-id 20200607200215.kajtlbpbgeymk4bz@daffy
Whole thread Raw
In response to Re: pages_in_range for BRIN index  (Stephen Frost <sfrost@snowman.net>)
List pgsql-novice
On 07/06/2020 15:28:24, Stephen Frost wrote:

> Greetings,
>
> * David Harrigan (dharrigan@gmail.com) wrote:
> > Thank you kindly for replying. Very insightful. Thank you! The type of
> > query is mostly "give me all the rows between now and 1 week ago",
> > i.e., 7 days worth of data. In some cases that may be extended to 2
> > weeks or 1 month ago, but generally it's around that time range (it is
> > filtered on another few attributes, otherwise it would be a massive
> > amount of data - however it is mostly time constrained). Would
> > classify as a "bulk" query (not quite sure what you mean in that
> > regard).
>
> Yes, that'd qualify as 'bulk'.
>
> If you really want to work on optimizing this (and, to be clear, I don't
> know that you really need to- the BRIN is likely to be pretty small
> with just the defaults), you could look at how many rows you typically
> have on a page, and then how many pages account for a week or so of
> data, and then maybe set your pages_in_range to, say, a fourth of that?
>
> That'd minimize the size of the BRIN while avoiding having queries using
> it for a single week on average, hopefully, only end up scanning up to a
> fourth of a week or so of pages that weren't actually relevant to the
> query.
>
> > Perhaps I will go with the default and see how that works out!
>
> Definitely a reasonable approach too. :)
>
> > I haven't thought about partitioning. I'll have to read up on that
> > (generally, why would I partition?)
>
> Partitioning is breaking up very large tables (hundreds of millions of
> rows) into multiple tables, to make certain operations easier,
> particularly ones around data maintenance.  Examples include: easier to
> expire out old data (just drop the partition instead of having to do a
> big DELETE query...), takes less time to VACUUM or reindex a partition
> than it does a huge table, pg_dump can export the data in parallel, etc.
>
> Once a partition is known to be 'complete' and you're not adding any
> more rows to it you could also VACUUM FREEZE it, assuming you expect it
> to be around long enough for transaction wraparound to be a possibility,
> so you don't have to wonder when that's going to happen or such.
>
> Thanks,
>
> Stephen

Hi Stephen,

Thanks once again for your reply. Extremely helpful! I'm going to read
up more on partitioning and consider my options with regard the BRIN
index. Exciting! :-)

-=david=-

I prefer encrypted and signed messages.
GPG Fingerprint: 110AF423364754E2880FADAD1C5285BFB20A22F9

No trees were harmed in the sending of this message, however, a number
of electrons were inconvenienced.

This email is sent without prejudice.

Attachment

pgsql-novice by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pages_in_range for BRIN index
Next
From: Tom Lane
Date:
Subject: Re: PGXS on CentOS 7.6