Thread: Time to move table to new tablespace
I'm considering the migration of an existing large (2.3TB) table to a new tablespace. The table size, according to the '\dt+' command:
public | city_summary | table | altosresearch | 2345 GB |
--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
Are there any considerations - besides the usual disk and network IO constraints - that I need to take into account when estimating the amount of time that would be required for the following commands to complete?
psql> create tablespace 'newstorage' location '/some/new/path';
psql> alter table city_summary set tablespace = 'newstorage';
Any other recommendations are welcome, such as "You are an idiot for not using partitioning in the first place" :)
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907
On 7/01/2012 10:52 PM, Jason Buberel wrote:
Maybe you should create a set of partitioned tables in the new tablespace, copy your data over, then drop the old table and rename the partition to the old table's name instead? If the big table keeps on accumulating data (via INSERTs) you can add a trigger that mirrors all updates to the partition while the copy runs.
This won't work so well if the big table has UPDATEs and DELETEs too, since you can't delete or update records from a trigger before they've been copied to the new table and committed.
--
Craig Ringer
I'm considering the migration of an existing large (2.3TB) table to a new tablespace. The table size, according to the '\dt+' command:public | city_summary | table | altosresearch | 2345 GB |Are there any considerations - besides the usual disk and network IO constraints - that I need to take into account when estimating the amount of time that would be required for the following commands to complete?psql> create tablespace 'newstorage' location '/some/new/path';psql> alter table city_summary set tablespace = 'newstorage';Any other recommendations are welcome, such as "You are an idiot for not using partitioning in the first place" :)
Maybe you should create a set of partitioned tables in the new tablespace, copy your data over, then drop the old table and rename the partition to the old table's name instead? If the big table keeps on accumulating data (via INSERTs) you can add a trigger that mirrors all updates to the partition while the copy runs.
This won't work so well if the big table has UPDATEs and DELETEs too, since you can't delete or update records from a trigger before they've been copied to the new table and committed.
--
Craig Ringer
Hi, On 8 January 2012 01:52, Jason Buberel <jason@altosresearch.com> wrote: > psql> create tablespace 'newstorage' location '/some/new/path'; > psql> alter table city_summary set tablespace = 'newstorage'; Be aware that you are not going to move indexes (see ALTER INDEX name SET TABLESPACE tablespace_name). Maybe you don't have to move data and I would be worth to move indexes only. So you ended up with data and index tablespace and reduce random_page_cost when you query your table. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Craig,
--
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
Good suggestion - I have experimented with table partitioning in the past, but the impact on performance was considerable.
In our case, we have one large table with real estate statistics arranged in time (weekly for the last five years) and geography (every zip in the US). You could imagine a date-range partition strategy (each year is a separate partition, for example) or a spatial (10 partitioned tables based on the modulus 10 of ZIP code, perhaps) approach.
However, our query patterns span both data dimensions:
"Select median price for every zip code as of 2012-01-06" (customer exports)
"Select median price for 94086 from 2005-01-01 through 2012-01-06" (charting apps)
So by partitioning in one dimension we impact queries in the other.
If you have a brilliant solution to problems like this, we'd be happy to hire you (not a joke).
Cheers,
Jason
On Sun, Jan 8, 2012 at 5:12 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 7/01/2012 10:52 PM, Jason Buberel wrote:Maybe you should create a set of partitioned tables in the new tablespace, copy your data over, then drop the old table and rename the partition to the old table's name instead? If the big table keeps on accumulating data (via INSERTs) you can add a trigger that mirrors all updates to the partition while the copy runs.I'm considering the migration of an existing large (2.3TB) table to a new tablespace. The table size, according to the '\dt+' command:public | city_summary | table | altosresearch | 2345 GB |Are there any considerations - besides the usual disk and network IO constraints - that I need to take into account when estimating the amount of time that would be required for the following commands to complete?psql> create tablespace 'newstorage' location '/some/new/path';psql> alter table city_summary set tablespace = 'newstorage';Any other recommendations are welcome, such as "You are an idiot for not using partitioning in the first place" :)
This won't work so well if the big table has UPDATEs and DELETEs too, since you can't delete or update records from a trigger before they've been copied to the new table and committed.
--
Craig Ringer
Jason L. Buberel
CTO, Altos Research
http://www.altosresearch.com/
650.603.0907
Hi, On 10 January 2012 06:10, Jason Buberel <jason@altosresearch.com> wrote: > "Select median price for every zip code as of 2012-01-06" (customer exports) > "Select median price for 94086 from 2005-01-01 through 2012-01-06" (charting > apps) > > So by partitioning in one dimension we impact queries in the other. I do not see any issue here; if all your queries contain date range then you should partition by date. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
On 10/01/2012 3:10 AM, Jason Buberel wrote:
Well, I know of *theoretical* solutions that use n-dimensional partitioning of data, but Pg doesn't have automatic and easy-to-administrate 1-dimensional partitioning, so I think going for 2D is a bit of stretch ;-) .
I wonder how hard it'd be to extend constraint exclusion to let it use different constraints for different queries, or even combine constraints? The hard bit would be making it efficient. Even if the answer is "not very", the hard bit would be maintaining a 2D (or more) partitioned table structure. That, and making constraint exclusion *efficient* over that many tables.
If the data being partitioned by is not only ordinal but interval too, then it'd be possible to define regular intervals for partitioning and build a bitmap for matching tables. That'd be a fun honours / masters project, but hardly something suitable for trying to hack into a production setup.
--
Craig Ringer
"Select median price for every zip code as of 2012-01-06" (customer exports)"Select median price for 94086 from 2005-01-01 through 2012-01-06" (charting apps)So by partitioning in one dimension we impact queries in the other.
Well, I know of *theoretical* solutions that use n-dimensional partitioning of data, but Pg doesn't have automatic and easy-to-administrate 1-dimensional partitioning, so I think going for 2D is a bit of stretch ;-) .
I wonder how hard it'd be to extend constraint exclusion to let it use different constraints for different queries, or even combine constraints? The hard bit would be making it efficient. Even if the answer is "not very", the hard bit would be maintaining a 2D (or more) partitioned table structure. That, and making constraint exclusion *efficient* over that many tables.
If the data being partitioned by is not only ordinal but interval too, then it'd be possible to define regular intervals for partitioning and build a bitmap for matching tables. That'd be a fun honours / masters project, but hardly something suitable for trying to hack into a production setup.
--
Craig Ringer