Thread: Postgresql 12, 512 partition by hash. Slow select

Postgresql 12, 512 partition by hash. Slow select

From
Arya F
Date:
I have a table with 120 million rows of data spread among 512
partitioned by hash table. The id column of the table is a uuid, which
is what is being used for the partition hash and it's also the PK for
the table.

The table has a text column, which also has a btree index on it. A
select query on an identical non-partitioned table takes 0.144
seconds, but on the partitioned table it takes 5.689 seconds.

Am I missing something in my setup? Or is this expected? I do know
having more than 100 partitions in prior versions of PostgreSQL 12
would cause a major slow down, but from what I read PostgreSQL 12
addresses that now?

https://www.2ndquadrant.com/en/blog/postgresql-12-partitioning/



Re: Postgresql 12, 512 partition by hash. Slow select

From
Andreas Kretschmer
Date:

Am 05.04.20 um 19:48 schrieb Arya F:
> Am I missing something in my setup? Or is this expected? I do know
> having more than 100 partitions in prior versions of PostgreSQL 12
> would cause a major slow down, but from what I read PostgreSQL 12
> addresses that now?

to say more about your problem we need to know more. For instance, the 
exact table definition, the query and the execution plan (explain 
analyse ...).


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Postgresql 12, 512 partition by hash. Slow select

From
Tom Lane
Date:
Arya F <arya6000@gmail.com> writes:
> I have a table with 120 million rows of data spread among 512
> partitioned by hash table. The id column of the table is a uuid, which
> is what is being used for the partition hash and it's also the PK for
> the table.

> The table has a text column, which also has a btree index on it. A
> select query on an identical non-partitioned table takes 0.144
> seconds, but on the partitioned table it takes 5.689 seconds.

> Am I missing something in my setup? Or is this expected? I do know
> having more than 100 partitions in prior versions of PostgreSQL 12
> would cause a major slow down, but from what I read PostgreSQL 12
> addresses that now?

You have your expectations calibrated wrongly, I suspect.

Your default expectation with a table with many partitions should be
that queries will have to hit all those partitions and it will take a
long time.  If the query is such that the system can prove that it
only needs to access one partition, then it can be fast --- but those
proof rules are not superlatively bright, and they're especially not
bright for hash partitioning since that has so little relationship
to WHERE restrictions that practical queries would use.  But if the
query WHERE isn't restricting the partitioning key at all, as I suspect
is the case for your query, then there's certainly no chance of not
having to search all the partitions.

If you showed us the specific table declaration and query you're
working with, it might be possible to offer more than generalities.

In general though, partitioning should be a last resort when you've
got so much data that you have no other choice.  I doubt that you
are there at all with 100M rows, and you are certainly not at a point
where using hundreds of partitions is a good idea.  They are not
cost-free, by a very long shot.  And when you do partition, you
typically need to think hard about what the partitioning rule will be.
I'm afraid that hash partitioning is more of a shiny trap for novices
than it is a useful tool, because it doesn't organize the data into
meaningful sub-groups.

            regards, tom lane



Re: Postgresql 12, 512 partition by hash. Slow select

From
Arya F
Date:
On Sun, Apr 5, 2020 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Arya F <arya6000@gmail.com> writes:
> > I have a table with 120 million rows of data spread among 512
> > partitioned by hash table. The id column of the table is a uuid, which
> > is what is being used for the partition hash and it's also the PK for
> > the table.
>
> > The table has a text column, which also has a btree index on it. A
> > select query on an identical non-partitioned table takes 0.144
> > seconds, but on the partitioned table it takes 5.689 seconds.
>
> > Am I missing something in my setup? Or is this expected? I do know
> > having more than 100 partitions in prior versions of PostgreSQL 12
> > would cause a major slow down, but from what I read PostgreSQL 12
> > addresses that now?
>
> You have your expectations calibrated wrongly, I suspect.
>
> Your default expectation with a table with many partitions should be
> that queries will have to hit all those partitions and it will take a
> long time.  If the query is such that the system can prove that it
> only needs to access one partition, then it can be fast --- but those
> proof rules are not superlatively bright, and they're especially not
> bright for hash partitioning since that has so little relationship
> to WHERE restrictions that practical queries would use.  But if the
> query WHERE isn't restricting the partitioning key at all, as I suspect
> is the case for your query, then there's certainly no chance of not
> having to search all the partitions.
>
> If you showed us the specific table declaration and query you're
> working with, it might be possible to offer more than generalities.
>
> In general though, partitioning should be a last resort when you've
> got so much data that you have no other choice.  I doubt that you
> are there at all with 100M rows, and you are certainly not at a point
> where using hundreds of partitions is a good idea.  They are not
> cost-free, by a very long shot.  And when you do partition, you
> typically need to think hard about what the partitioning rule will be.
> I'm afraid that hash partitioning is more of a shiny trap for novices
> than it is a useful tool, because it doesn't organize the data into
> meaningful sub-groups.
>
>                         regards, tom lane


The table at some point will have more than 1 billion rows, the
information stored is international residential addresses. Trying to
figure out a way of spreading the data fairly evenly thought out
multiple partitions, but I was unable to come up with a way of
splitting the data so that Postgres does not have to search though all
the partitions.



Re: Postgresql 12, 512 partition by hash. Slow select

From
Neil
Date:
> On Apr 5, 2020, at 2:50 PM, Arya F <arya6000@gmail.com> wrote:
>
> The table at some point will have more than 1 billion rows, the
> information stored is international residential addresses. Trying to
> figure out a way of spreading the data fairly evenly thought out
> multiple partitions, but I was unable to come up with a way of
> splitting the data so that Postgres does not have to search though all
> the partitions.
>


If you have to use partitions, I would split it by country using population for analysis. I understand that address and
populationare different, but I would expect some correlation.  

The largest 14 countries each have a population of 100 million or more and represent about 62% of the world population.
Thatmeans the rest of the world should fit easily into another 14 partitions.  

It seems like it could be fairly easily evened out with a little bit of data analysis.

You could probably refine this to be no more than 20 partitions.

Now China and India could be a problem and need to be split, but I would not do that unless necessary. China and India
bothhave 6 nationally recognized regions that could be used if needed. 

Neil
-
Fairwind Software
https://www.fairwindsoft.com