Home > mailing lists

Re: Best partition type for billions of addresses - Mailing list pgsql-performance

From	Justin Pryzby
Subject	Re: Best partition type for billions of addresses
Date	May 2, 2020 17:00:32
Msg-id	20200502140032.GI28974@telsasoft.com Whole thread Raw
In response to	Best partition type for billions of addresses (Arya F <arya6000@gmail.com>)
List	pgsql-performance

Tree view

On Sat, May 02, 2020 at 09:20:06AM -0400, Arya F wrote:
> I need to store about 600 million rows of property addresses across
> multiple counties. I need to have partitioning setup on the table as
> there will be updates and inserts performed to the table frequently
> and I want the queries to have good performance.

I dug up the last messages about this:
https://www.postgresql.org/message-id/flat/CAFoK1aztep-079Fxmaos6umR8X6m3x1K_aZLGtQGpYxfENh9%3DA%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFoK1azkv1Z%3DRr7ZWrJjk0RQSoF6ah%2BQMpLSSyBs1KsEiQ3%3Dvw%40mail.gmail.com
https://www.postgresql.org/message-id/CAFoK1axr_T6nB8ZAq8g2QBcqv_pE%3DdsZsxyjatz8Q67k1VKAnw%40mail.gmail.com

> From what I understand hash partitioning would not be the right
> approach in this case, since for each query PostgreSQL has to check
> the indexes of all partitions?

Indexes are separate from partitioning.  Typically, the partitioned columns are
indexed, but it's not required.

If the partition key isn't used in your typical query, then partitioning didn't
help you, and you chose the wrong partition strategy/key.

> Would list partitioning be suitable? if I want PostgreSQL to know
> which partition the row is it can directly load the relevant index
> without having to check other partitions. Should I be including the
> partition key in the where clause?

It sounds like you're thinking about this backwards.

What are your typical queries ?  That should determines the partition strategy
and key, not the other way around.  You should maybe think about whether there
are views/functions/joins of the partitioned column.

For example, at telsasoft, our report queries *always* say "tbl.start_time >=
t1 AND tbl.start_time < t2", so I partitioned our tables BY RANGE(start_time),
so a typical report hits only a single table.  And, start_time has an index on
it, so a typical query over 1-2 days will only hit a fraction of that table.

-- 
Justin

pgsql-performance by date:

From: Arya F
Date: 02 May 2020, 16:20:06
Subject: Best partition type for billions of addresses

From: Stephen Frost
Date: 02 May 2020, 17:01:09
Subject: Re: Best partition type for billions of addresses

Re: Best partition type for billions of addresses - Mailing list pgsql-performance

Previous

Next