Thread: Upgrade questions
Hello All,
I've looked through the docs, but I'm unable to find complete answers to my questions, so thanks in advance if you can lend any expertise.
Here's the situation I'm in (always a good opener, right? :) ):
We've got a postgres database with *a lot* of data in one table. On the order of 100 million rows at this point. Postgres is, of course, handling it with aplomb.
However, when the engineer who was setting up our schema got things in place, he neglected to think through how many entries we might have eventually and went with the default value for the 'id' column in this database (yes, we are a rails shop, no, he shouldn't have been allowed to design the schema.)
This is 'integer' which, in my understanding, defaults to 32 bit. Sadly, this is not gonna get it done: we will hit that limit some time next year, depending on growth.
OK, simple enough, just run something like this:
ALTER TABLE my_table ALTER COLUMN id TYPE bigint;
However, given the size of this table, I have no idea how long something like this might take. In general I've had a tough time getting feedback from postgres on the progress of a query, how long something might take, etc.
So my question is: is there a way to understand roughly how long something like this might take? Our DB is out on crappy Amazon ec2 instances, so we don't exactly have screamers set up. Any tools I can use? Any tips? I don't need anything exact, just on the order of minutes, hours, days or weeks.
Again, thanks in advance,
Carson
> However, given the size of this table, I have no idea how long something > like this might take. In general I've had a tough time getting feedback > from postgres on the progress of a query, how long something might take, > etc. > You can always do this which would result in minimum hassles. create a new bigint field. copy all the IDs to it. index it in the background at frequency of your choosing sync the id field to the new field to keep it up. at a time of your choosing set the default for the new field to be serial starting at max(id) drop the ID field rename the field to id That last bit could be done in a transaction and hopefully should not take very long at a..
On 03/12/12 1:25 PM, Tim Uckun wrote: > create a new bigint field. > copy all the IDs to it. > index it in the background > at frequency of your choosing sync the id field to the new field to keep it up. > at a time of your choosing set the default for the new field to be > serial starting at max(id) > drop the ID field > rename the field to id if there's other tables that have FK references to this table's ID, that could be problematic. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Tim,
Commando. I like it. Thanks a ton for that suggestion. I'd still like to hear if anyone has a good way to estimate the performance of these operations, but I'll explore what it would mean to do exactly that.
John: thankfully this is a table without any fks in, although it is indexed to hell. I was concerned about the speed of updating the indexes, but with Tim's suggestion we could recalculate everything in the background.
We are also considering sharding the table and maybe the right thing is to simply fix it when we do the sharding.
Thanks for the tips guys,
Carson
Carson
P.S. A side question: this table of ours is under a fairly constant insert load, and is read infrequently but rather violently (e.g. a group by over, say 1-10k rows.) Is that a bad access pattern?
On Mon, Mar 12, 2012 at 2:18 PM, John R Pierce <pierce@hogranch.com> wrote:
On 03/12/12 1:25 PM, Tim Uckun wrote:if there's other tables that have FK references to this table's ID, that could be problematic.create a new bigint field.
copy all the IDs to it.
index it in the background
at frequency of your choosing sync the id field to the new field to keep it up.
at a time of your choosing set the default for the new field to be
serial starting at max(id)
drop the ID field
rename the field to id
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/12/12 5:01 PM, Carson Gross wrote: > > We are also considering sharding the table and maybe the right thing > is to simply fix it when we do the sharding. postgres generally calls that partitioning... Sharding usually means splitting data across multiple servers. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
John,
Thanks, I'll clarify my language around that.
Still hoping that there is a way to get a rough estimate of how long converting an integer column to a bigint will take. Not possible?
Thanks guys,
Carson
Carson
On Mon, Mar 12, 2012 at 6:13 PM, John R Pierce <pierce@hogranch.com> wrote:
On 03/12/12 5:01 PM, Carson Gross wrote:postgres generally calls that partitioning... Sharding usually means splitting data across multiple servers.
We are also considering sharding the table and maybe the right thing is to simply fix it when we do the sharding.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
As a follow up, is the upgrade from integer to bigint violent? I assume so: it has to physically resize the column on disk, right?
Thanks,
Carson
Carson
On Tue, Mar 13, 2012 at 9:43 AM, Carson Gross <carsongross@gmail.com> wrote:
John,Thanks, I'll clarify my language around that.Still hoping that there is a way to get a rough estimate of how long converting an integer column to a bigint will take. Not possible?Thanks guys,
CarsonOn Mon, Mar 12, 2012 at 6:13 PM, John R Pierce <pierce@hogranch.com> wrote:On 03/12/12 5:01 PM, Carson Gross wrote:postgres generally calls that partitioning... Sharding usually means splitting data across multiple servers.
We are also considering sharding the table and maybe the right thing is to simply fix it when we do the sharding.
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/13/12 6:10 PM, Carson Gross wrote: > As a follow up, is the upgrade from integer to bigint violent? I > assume so: it has to physically resize the column on disk, right? > I think we've said several times, any ALTER TABLE ADD/ALTER COLUMN like that will cause every single tuple (row) of the table to be updated. when rows are updated, the new row is written, then the old row is flagged for eventual vacuuming. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
Got it.
Thank you, that's very helpful: we could delete quite a few of the rows before we did the operation and cut way down on the size of the table before we issue the update. Trimming the table size down seems obvious enough, but that's good confirmation that it will very much help. And there are quite a few indexes that I've discovered are useless, so dropping those will speed things up too.
Looking online I see that a query progress indicator is a commonly requested feature, but isn't yet implemented, so it sound like my best bet is to clone the db on similar hardware, take all the advice offered here, and just see how it performs.
Thanks to everyone for the feedback,
Carson
On Tue, Mar 13, 2012 at 6:32 PM, John R Pierce <pierce@hogranch.com> wrote:
On 03/13/12 6:10 PM, Carson Gross wrote:I think we've said several times, any ALTER TABLE ADD/ALTER COLUMN like that will cause every single tuple (row) of the table to be updated. when rows are updated, the new row is written, then the old row is flagged for eventual vacuuming.As a follow up, is the upgrade from integer to bigint violent? I assume so: it has to physically resize the column on disk, right?
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
OK, last post on this topic, I promise. I'm doing some math, and I think I'll have about 100 million rows in the table to deal with.
Given a table that size, I'd like to do the following math:
100 million rows / inserted rows per second = total seconds
Does anyone have a reasonable guess as to the inserts per second postgres is capable of these days on middle-of-the-road hardware? Any order of magnitude would be fine: 10, 100, 1000, 10,000.
Does anyone have a reasonable guess as to the inserts per second postgres is capable of these days on middle-of-the-road hardware? Any order of magnitude would be fine: 10, 100, 1000, 10,000.
Thank you all for your patience,
Carson
On Tue, Mar 13, 2012 at 8:24 PM, Carson Gross <carsongross@gmail.com> wrote:
Got it.Thank you, that's very helpful: we could delete quite a few of the rows before we did the operation and cut way down on the size of the table before we issue the update. Trimming the table size down seems obvious enough, but that's good confirmation that it will very much help. And there are quite a few indexes that I've discovered are useless, so dropping those will speed things up too.Looking online I see that a query progress indicator is a commonly requested feature, but isn't yet implemented, so it sound like my best bet is to clone the db on similar hardware, take all the advice offered here, and just see how it performs.Thanks to everyone for the feedback,CarsonOn Tue, Mar 13, 2012 at 6:32 PM, John R Pierce <pierce@hogranch.com> wrote:On 03/13/12 6:10 PM, Carson Gross wrote:I think we've said several times, any ALTER TABLE ADD/ALTER COLUMN like that will cause every single tuple (row) of the table to be updated. when rows are updated, the new row is written, then the old row is flagged for eventual vacuuming.As a follow up, is the upgrade from integer to bigint violent? I assume so: it has to physically resize the column on disk, right?
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/13/12 8:41 PM, Carson Gross wrote: > Does anyone have a reasonable guess as to the inserts per second > postgres is capable of these days on middle-of-the-road hardware? Any > order of magnitude would be fine: 10, 100, 1000, 10,000. my dedicated database server in my lab, which is a 2U dual Xeon X5660 box with 12 cores at 2.8ghz, 48GB ram, and 20 15000rpm SAS drives in a RAID10 with a 1GB flash-cached raid card, can pretty easily sustain 6000 or more writes/second given enough threads doing the work, although indexes, and/or large rows would slow that down. a single connection/thread will not get that much throughput. thats my definition of a middle of the road database server. I have no idea what yours is. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
I felt pretty good about my server until I read this. On Wed, 2012-03-14 at 00:24 -0700, John R Pierce wrote: > On 03/13/12 8:41 PM, Carson Gross wrote: > > Does anyone have a reasonable guess as to the inserts per second > > postgres is capable of these days on middle-of-the-road hardware? Any > > order of magnitude would be fine: 10, 100, 1000, 10,000. > > my dedicated database server in my lab, which is a 2U dual Xeon X5660 > box with 12 cores at 2.8ghz, 48GB ram, and 20 15000rpm SAS drives in a > RAID10 with a 1GB flash-cached raid card, can pretty easily sustain 6000 > or more writes/second given enough threads doing the work, although > indexes, and/or large rows would slow that down. a single > connection/thread will not get that much throughput. > > thats my definition of a middle of the road database server. I have no > idea what yours is. > > > -- > john r pierce N 37, W 122 > santa cruz ca mid-left coast > >
Heh. OK, so I'll plan on about 100 writes per second... *gulp*
Thanks a bunch for the info guys.
Cheers,
Carson
Carson
On Wed, Mar 14, 2012 at 7:54 AM, Bret Stern <bret_stern@machinemanagement.com> wrote:
I felt pretty good about my server until I read this.On Wed, 2012-03-14 at 00:24 -0700, John R Pierce wrote:
> On 03/13/12 8:41 PM, Carson Gross wrote:
> > Does anyone have a reasonable guess as to the inserts per second
> > postgres is capable of these days on middle-of-the-road hardware? Any
> > order of magnitude would be fine: 10, 100, 1000, 10,000.
>
> my dedicated database server in my lab, which is a 2U dual Xeon X5660
> box with 12 cores at 2.8ghz, 48GB ram, and 20 15000rpm SAS drives in a
> RAID10 with a 1GB flash-cached raid card, can pretty easily sustain 6000
> or more writes/second given enough threads doing the work, although
> indexes, and/or large rows would slow that down. a single
> connection/thread will not get that much throughput.
>
> thats my definition of a middle of the road database server. I have no
> idea what yours is.
>
>
> --
> john r pierce N 37, W 122
> santa cruz ca mid-left coast
>
>
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/14/12 12:24 AM, John R Pierce wrote: > > thats my definition of a middle of the road database server. I have > no idea what yours is. let me add... this server was under $7000 plus the disk drives (it actually has 25 drives, 20 are in the raid10 used for the database testing). we built this specifically to compare against 'big iron' RISC unix servers like IBM Power7 and Sun^W Oracle Sparc stuffs with SAN storage, which frequently end up deep into the 6 digit price range. as a 2-socket Intel 2U server goes, its fairly high end, but there's 4 socket and larger systems out there, as well as the monster RISC stuff where 64 or 128 CPU cores is not unheard of, and 100s of GB of ram. * HP DL180G6 * dual Xeon X5660 6c 2.8Ghz * 48GB ECC ram * p411i 1GB flash-backed RAID card * 25 bay 2.5" SAS2 backplane (this is an option on this server chassis, and means no DVD/CD) -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On 03/14/2012 12:04 PM, John R Pierce wrote: > On 03/14/12 12:24 AM, John R Pierce wrote: >> >> thats my definition of a middle of the road database server. I have >> no idea what yours is. > > let me add... this server was under $7000 plus the disk drives (it > actually has 25 drives... My car was only $5,000...plus the engine and transmission. :) I was just looking at some modest-sized 15k SAS drives that priced out in the $400-550 range. 25 of them would add a minimum of $10,000 to the price tag. Still under 6-figures, though. Cheers, Steve
On 03/14/12 12:16 PM, Steve Crawford wrote: > I was just looking at some modest-sized 15k SAS drives that priced out > in the $400-550 range. 25 of them would add a minimum of $10,000 to > the price tag. Still under 6-figures, though. those disks aren't any cheaper when they are behind a EMC or NetApp SAN/NAS... in fact, most any of the 'big name' enterprise storage vendors would charge about triple that for each disk. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On Wed, Mar 14, 2012 at 1:41 PM, John R Pierce <pierce@hogranch.com> wrote: > On 03/14/12 12:16 PM, Steve Crawford wrote: >> >> I was just looking at some modest-sized 15k SAS drives that priced out in >> the $400-550 range. 25 of them would add a minimum of $10,000 to the price >> tag. Still under 6-figures, though. > > > those disks aren't any cheaper when they are behind a EMC or NetApp > SAN/NAS... > > in fact, most any of the 'big name' enterprise storage vendors would charge > about triple that for each disk. Note that if you don't need a lot of storage you can often use 300G 15k SAS drives which are around $300 each. 20 of those in a RAID-10 gives you ~3TB of storage which is plenty for most transactional applications.
On 03/14/12 12:53 PM, Scott Marlowe wrote: > Note that if you don't need a lot of storage you can often use 300G > 15k SAS drives which are around $300 each. 20 of those in a RAID-10 > gives you ~3TB of storage which is plenty for most transactional > applications. I'm actually using 25 x 146gb 15k SAS2 as we didn't need space, just speed, so the 20xraid10 is 1.4TB. most of our database tests to date have been in the 50-100GB range. and they shread at IOPS. the controller and/or IO channels seems to bottleneck somewhere up around 1.2GB/sec sustained write, or at about 12000 write IOPS. afaik, the HP P411 raid card is a LSI Logic megasas2 card with HP firmware, the hardware is equivalent to the LSI 9260-8i. the HP firmware is somewhat less annoying than the LSI megacli stuff. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
On Wed, Mar 14, 2012 at 2:34 PM, John R Pierce <pierce@hogranch.com> wrote: > On 03/14/12 12:53 PM, Scott Marlowe wrote: >> >> Note that if you don't need a lot of storage you can often use 300G >> 15k SAS drives which are around $300 each. 20 of those in a RAID-10 >> gives you ~3TB of storage which is plenty for most transactional >> applications. > > > I'm actually using 25 x 146gb 15k SAS2 as we didn't need space, just speed, > so the 20xraid10 is 1.4TB. most of our database tests to date have been in > the 50-100GB range. and they shread at IOPS. the controller and/or IO > channels seems to bottleneck somewhere up around 1.2GB/sec sustained write, > or at about 12000 write IOPS. > > afaik, the HP P411 raid card is a LSI Logic megasas2 card with HP firmware, > the hardware is equivalent to the LSI 9260-8i. the HP firmware is somewhat > less annoying than the LSI megacli stuff. And don't forget that if you need way less than the 1.5 to 3TB mentioned earlier, you can short stroke the drives to use the fastest parts of the platters and reduce seek times even more. And yeah, anything is less annoying than megacli. The fact that their GUI / BIOS interface is just as horrific, if not moreso, than megacli is a huge turn off for me with LSI. If you've ever used the web interface on the OOB ethernet interface on an Areca, you're hooked. The fact that it can send emails on its own etc is just frosting on the cake.
On 2012-03-12, Carson Gross <carsongross@gmail.com> wrote: > We've got a postgres database with *a lot* of data in one table. On the > order of 100 million rows at this point. Postgres is, of course, handling > it with aplomb. > ALTER TABLE my_table ALTER COLUMN id TYPE bigint; > However, given the size of this table, I have no idea how long something > like this might take. In general I've had a tough time getting feedback > from postgres on the progress of a query, how long something might take, > etc. I would estimate minutes to hours, it also depends how many foreign keys must be re-checked. > So my question is: is there a way to understand roughly how long something > like this might take? Our DB is out on crappy Amazon ec2 instances, so we > don't exactly have screamers set up. Any tools I can use? use the cloud. set up a clone and do some testing, -- ⚂⚃ 100% natural