Thread: Postgresql 12, 512 partition by hash. Slow select
I have a table with 120 million rows of data spread among 512 partitioned by hash table. The id column of the table is a uuid, which is what is being used for the partition hash and it's also the PK for the table. The table has a text column, which also has a btree index on it. A select query on an identical non-partitioned table takes 0.144 seconds, but on the partitioned table it takes 5.689 seconds. Am I missing something in my setup? Or is this expected? I do know having more than 100 partitions in prior versions of PostgreSQL 12 would cause a major slow down, but from what I read PostgreSQL 12 addresses that now? https://www.2ndquadrant.com/en/blog/postgresql-12-partitioning/
Am 05.04.20 um 19:48 schrieb Arya F: > Am I missing something in my setup? Or is this expected? I do know > having more than 100 partitions in prior versions of PostgreSQL 12 > would cause a major slow down, but from what I read PostgreSQL 12 > addresses that now? to say more about your problem we need to know more. For instance, the exact table definition, the query and the execution plan (explain analyse ...). Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Arya F <arya6000@gmail.com> writes: > I have a table with 120 million rows of data spread among 512 > partitioned by hash table. The id column of the table is a uuid, which > is what is being used for the partition hash and it's also the PK for > the table. > The table has a text column, which also has a btree index on it. A > select query on an identical non-partitioned table takes 0.144 > seconds, but on the partitioned table it takes 5.689 seconds. > Am I missing something in my setup? Or is this expected? I do know > having more than 100 partitions in prior versions of PostgreSQL 12 > would cause a major slow down, but from what I read PostgreSQL 12 > addresses that now? You have your expectations calibrated wrongly, I suspect. Your default expectation with a table with many partitions should be that queries will have to hit all those partitions and it will take a long time. If the query is such that the system can prove that it only needs to access one partition, then it can be fast --- but those proof rules are not superlatively bright, and they're especially not bright for hash partitioning since that has so little relationship to WHERE restrictions that practical queries would use. But if the query WHERE isn't restricting the partitioning key at all, as I suspect is the case for your query, then there's certainly no chance of not having to search all the partitions. If you showed us the specific table declaration and query you're working with, it might be possible to offer more than generalities. In general though, partitioning should be a last resort when you've got so much data that you have no other choice. I doubt that you are there at all with 100M rows, and you are certainly not at a point where using hundreds of partitions is a good idea. They are not cost-free, by a very long shot. And when you do partition, you typically need to think hard about what the partitioning rule will be. I'm afraid that hash partitioning is more of a shiny trap for novices than it is a useful tool, because it doesn't organize the data into meaningful sub-groups. regards, tom lane
On Sun, Apr 5, 2020 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Arya F <arya6000@gmail.com> writes: > > I have a table with 120 million rows of data spread among 512 > > partitioned by hash table. The id column of the table is a uuid, which > > is what is being used for the partition hash and it's also the PK for > > the table. > > > The table has a text column, which also has a btree index on it. A > > select query on an identical non-partitioned table takes 0.144 > > seconds, but on the partitioned table it takes 5.689 seconds. > > > Am I missing something in my setup? Or is this expected? I do know > > having more than 100 partitions in prior versions of PostgreSQL 12 > > would cause a major slow down, but from what I read PostgreSQL 12 > > addresses that now? > > You have your expectations calibrated wrongly, I suspect. > > Your default expectation with a table with many partitions should be > that queries will have to hit all those partitions and it will take a > long time. If the query is such that the system can prove that it > only needs to access one partition, then it can be fast --- but those > proof rules are not superlatively bright, and they're especially not > bright for hash partitioning since that has so little relationship > to WHERE restrictions that practical queries would use. But if the > query WHERE isn't restricting the partitioning key at all, as I suspect > is the case for your query, then there's certainly no chance of not > having to search all the partitions. > > If you showed us the specific table declaration and query you're > working with, it might be possible to offer more than generalities. > > In general though, partitioning should be a last resort when you've > got so much data that you have no other choice. I doubt that you > are there at all with 100M rows, and you are certainly not at a point > where using hundreds of partitions is a good idea. They are not > cost-free, by a very long shot. And when you do partition, you > typically need to think hard about what the partitioning rule will be. > I'm afraid that hash partitioning is more of a shiny trap for novices > than it is a useful tool, because it doesn't organize the data into > meaningful sub-groups. > > regards, tom lane The table at some point will have more than 1 billion rows, the information stored is international residential addresses. Trying to figure out a way of spreading the data fairly evenly thought out multiple partitions, but I was unable to come up with a way of splitting the data so that Postgres does not have to search though all the partitions.
> On Apr 5, 2020, at 2:50 PM, Arya F <arya6000@gmail.com> wrote: > > The table at some point will have more than 1 billion rows, the > information stored is international residential addresses. Trying to > figure out a way of spreading the data fairly evenly thought out > multiple partitions, but I was unable to come up with a way of > splitting the data so that Postgres does not have to search though all > the partitions. > If you have to use partitions, I would split it by country using population for analysis. I understand that address and populationare different, but I would expect some correlation. The largest 14 countries each have a population of 100 million or more and represent about 62% of the world population. Thatmeans the rest of the world should fit easily into another 14 partitions. It seems like it could be fairly easily evened out with a little bit of data analysis. You could probably refine this to be no more than 20 partitions. Now China and India could be a problem and need to be split, but I would not do that unless necessary. China and India bothhave 6 nationally recognized regions that could be used if needed. Neil - Fairwind Software https://www.fairwindsoft.com