Thread: Partitioning a table by integer value (preferably in place)
Hi all, Linux Fedora 34 1TB Samsung SSD 4 CPUs, 2 cores PostgreSQL 12.7 (can upgrade if a better solution is to be found in 13 or even 14 beta2 - currently testing a proposed solution, so by the time it's fully implemented, 14 should be on GA and hey, I might even find a bug or two to help with the project!). I have a 400GB joining table (one SMALLINT and the other INTEGER - Primary Keys on other tables) with 1000 fields on one side and 10M on the other, so 10,000M (or 10Bn) records all told. What I would like to do is to partition by the SMALLINT (1 - 1000) value - which would give 1,000 tables of 400MB each. I wish to avoid having to do this manually 1,000 times - is there a way of telling PostgreSQL to partition by value without specifying the SMALLINT value each time? I looked here: https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql and there is LIST, RANGE and HASH partitioning. I think a RANGE of 1 would be what I want here? So, questions; Is 1,000 partitions reasonable? This: https://elephas.io/is-there-a-limit-on-number-of-partitions-handled-by-postgres/ appears to suggest that it shouldn't be a problem? Could I go with a RANGE of, say, 10 values per partition? If I have to explicitly code, I'd prefer this for my test - at least it would save on the typing! :-) This would product 100 tables of ~ 4GB each. Would I see much performance degradation with a 4GB table on an SSD? Finally, the icing on the cake would be if this could be done in place - my SSD is 1TB and the output from df -h is: test=# \! df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 212K 16G 1% /dev/shm tmpfs 6.3G 1.8M 6.3G 1% /run /dev/mapper/fedora_localhost--live-root 69G 11G 55G 17% / tmpfs 16G 284K 16G 1% /tmp /dev/sda5 976M 192M 718M 22% /boot /dev/mapper/fedora_localhost--live-home 1.3T 898G 270G 77% /home /dev/sda2 96M 52M 45M 54% /boot/efi tmpfs 3.2G 96K 3.2G 1% /run/user/1000 test=# So, I only have 270 GB left on disk - and it took > 12 Hrs to fill it with indexes and Foreign Key constraints so I'd like to be able to do it without having to go through that again. This: https://www.2ndquadrant.com/en/blog/partitioning-a-large-table-without-a-long-running-lock/ appears to suggest that it can be done online. There will be no other activity on the table while any partitioning &c. will be ongoing. However, the article makes no mention of space considerations. This is my first time considering partitioning, so I'd be grateful for any advice, pointers, references, URLs &c.... and please let me know if I"ve left out any important information. TIA and rgs, Pól... Some (relevant?) settings; max_worker_processes = 4 max_parallel_workers_per_gather = 2 max_parallel_workers = 4 max_parallel_maintenance_workers = 2 as suggested by pgtune. Is pgtune a good bet for configuration suggestions?
Hi again all, Just a quick follow-up - could I script the creation of 1000 partitions using bash or PL/pgSQL? I think (rightly/wrongly?) that this may be the best solution? I've found samples on the web, but they are for partitioning by date - a quick sample by integer would be gratefully received! Is partman recommended by the community? TIA and rgs, Pól...
On 8/9/21 9:14 AM, Pól Ua Laoínecháin wrote: > Hi again all, > > Just a quick follow-up - could I script the creation of 1000 > partitions using bash Sure. That's what scripting languages are for. > or PL/pgSQL? I think (rightly/wrongly?) that > this may be the best solution? > > I've found samples on the web, but they are for partitioning by date - > a quick sample by integer would be gratefully received! > > Is partman recommended by the community? -- Angular momentum makes the world go 'round.
I have a 400GB joining table (one SMALLINT and the other INTEGER -
Primary Keys on other tables) with 1000 fields on one side and 10M on
the other, so 10,000M (or 10Bn) records all told.
My queries:
Do you have any explain analyze,buffers
<query> results with the existing setup? Does it look problematic?
How would your table grow on either side of the join ? Append only, static data or too frequently updated etc, or dropped periodically, so that delete based bloating can be skipped completely.
How distributed is the data based on smallint keys, equally or unequally.
What kind of queries would be run and results returned ? Oltp or olap like ? Quick queries with few rows retuned or heavy queries with lot of rows returned.
Partitioning has been ever improving, so the best option if possible would be to use the latest pg version is possible,.
Also is there any scope of normalisation of that table, I mean I know theoretically it is possible, but I have not seen any design with that wide table( of 1000 cols), so would be good to know.
Just asking, maybe partitioning would the best option but wanting to know/see the benefit pre and post partitioning.
Thanks,
Vijay
Mumbai, India
Hi all - I resolved my issue - and took a 25 minute query down to 5 seconds. > I have a 400GB joining table (one SMALLINT and the other INTEGER - > What I would like to do is to partition by the SMALLINT (1 - 1000) > value - which would give 1,000 tables of 400MB each. I found this site very helpful for explaining the basics of partitioning (LIST, RANGE and HASH): https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql I then found this absolute beaut of a site which was like manna from heaven: https://www.depesz.com/2021/01/17/are-there-limits-to-partition-counts/ which explained (they had a different issue - I adapted the code) how what I required can be done entirely from the psql client without the need for bash or PL/pgSQL or anything else. The "trick" here is to combine the FORMAT function with GENERATE_SERIES as follows (code from site): CREATE TABLE test_ranged (id serial PRIMARY KEY, payload TEXT) partition BY range (id); SELECT FORMAT ('CREATE TABLE %I partition OF test_ranged FOR VALUES FROM (%s) to (%s);', 'test_ranged_' || i, i, i+1) FROM generate_series(1, 2) i \gexec Output of this (having removed \gexec - another thing I learnt): format --------------------------------------------------------------------------------- CREATE TABLE test_ranged_1 partition OF test_ranged FOR VALUES FROM (1) to (2); CREATE TABLE test_ranged_2 partition OF test_ranged FOR VALUES FROM (2) to (3); So, I generated the series for (1, 1000) with my own fields using the LIST method with a single INTEGER value in the list. Ran the script - had my 1000 partitions in a matter of seconds. Loading them (750GB with indexes) was an overnight job however - but that's not PostgreSQL's fault! :-) I really love the way that PostgreSQL/psql is so flexible that it's possible to do heaps of stuff without having to resort to other tools. I posted this answer to my own question in the hope that it may help others in my situation. If I haven't been clear, or there's something missing, please let me know - or add your own opinions/experience if there's an alternative which may or may not be as efficient. I'm trying to collect as many strings to my bow as possible! Rgs, Pól...
Hi Vijay, and thanks for replying, >> I have a 400GB joining table (one SMALLINT and the other INTEGER - >> Primary Keys on other tables) with 1000 fields on one side and 10M on >> the other, so 10,000M (or 10Bn) records all told. > My queries: >> <query> results with the existing setup? Does it look problematic? > How would your table grow on either side of the join? In this case uniformly! 1 -> 10 > Append only, static data or too frequently updated etc, or dropped periodically, Append only in this case - not updated nor dropped. > so that delete based bloating can be skipped completely. It can be skipped! > How distributed is the data based on smallint keys, equally or unequally. Totally uniform - see my own answer to my question - if it wasn't uniform, I might have considered RANGE based partitioning? > What kind of queries would be run and results returned ? Oltp or olap like ? Quick queries with few rows retuned or heavyqueries with lot of rows returned. Pretty much OLAP like - summary queries. Point queries return in sub-millisecond range when based on PK! > Partitioning has been ever improving, so the best option if possible would be to use the latest pg version is possible,. > Also is there any scope of normalisation of that table, I mean I know theoretically it is possible, but I have not seenany design with that wide table( of 1000 cols), so would be good to know. > Just asking, maybe partitioning would the best option but wanting to know/see the benefit pre and post partitioning. Thanks again for your questions - they gave me pause for thought and I will try to apply them in future partitioning scenarios. (Unfortunatly :-) ) there is no magic number of partitions for, say, a given size of table - otherwise it would be the default and would be done automatically! Rgs, Pól... > Vijay
Dear Ron, > > Just a quick follow-up - could I script the creation of 1000 > > partitions using bash > Sure. That's what scripting languages are for. Thank you so much for your helpful and expansive answer. No wonder everyone talks about how friendly and welcoming the PostgreSQL community is! Your friend always, Pól...
On Fri, 13 Aug 2021 at 14:13, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
Thanks again for your questions - they gave me pause for thought and I
will try to apply them in future partitioning scenarios. (Unfortunatly
:-) ) there is no magic number of partitions for, say, a given size of
table - otherwise it would be the default and would be done
automatically!
no worries, it seems you have a fair idea how things would grow. so partitioning would be of help.
My biggest worry around partitioning is around moving data around partitions/ rebalancing partitions.
It requires making use of triggers, syncing data and then attach/remove partitions if it is simple as
list/range and use binary split for large partitions, to avoid large row movements.
But when it comes to hash, it requires rebuilding the entire setup either via triggers if there is significant space on the db,
or logically replicating to another server
with a new hash scheme. because i do not have lot of exp around sharding/partitioning,
nor about how others dbs on how they claim automatic movements of data between
partitions, I think i wore myself out due to poor design.
Thanks,
Vijay
Mumbai, India
On Fri, Aug 13, 2021 at 3:02 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
... use binary split for large partitions, to avoid large row movements.
Would you expound on this?
On Fri, 13 Aug 2021 at 21:07, Michael Lewis <mlewis@entrata.com> wrote:
On Fri, Aug 13, 2021 at 3:02 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:... use binary split for large partitions, to avoid large row movements.Would you expound on this?
if we have range partitions based on year, 2018 -2019, 2019-2020, etc and for
any one year we see uneven growth in data, we simply split that partition by 2 and move only 6 months worth of data
to another new partition.
If we see similar issues not solved by the first split, we split each again by 2 and only move the remainder of data to all four partitions.
So at any point of time, if the range partition which was planned earlier does not work well for some periods, we split the partitions.
and if we see the trend growing with increased volume, the new partitions are created qtry etc.
I have not done hash partitioning rebuild, but i think this video explains how we could best do it.
Thanks,
Vijay
Mumbai, India