Thread: What O/S or hardware feature would be useful for databases?
Hi, I've been wondering, what O/S or hardware feature would be useful for databases? If Postgresql developers could get the CPU and O/S makers to do things that would make certain things easier/faster (and in the long term) what would they be? By long term I mean it's not something that's only useful for a few years. Not something "gimmicky". For example - something like virtual memory definitely made many things easier. Hardware support for virtualization also makes stuff like vmware easier and better. Seems CPU makers currently have more transistors than they know what to do with, so they're adding cores and doing a lot of boring stuff like SSE2, SSE3, SSE4, etc. So is there anything else useful that they (and the O/S ppl) can do that they aren't doing already? Better support for distributed locking (across cluster nodes etc)? OK that's old stuff, but the last I checked HP was burying VMS and Tandem. Hardware acceleration for quickly counting the number of set/unset/matching bits? Regards, Link.
> Seems CPU makers currently have more transistors than they know what to > do with, so they're adding cores and doing a lot of boring stuff like > SSE2, SSE3, SSE4, etc. SSE(n) isn't useless since it speeds up stuff like video encoding by, say, a few times. For databases, I'd say scatter/gather IO, especially asynchronous out-of-order read with a list of blocks to read passed to the OS.
On 06/16/07 10:47, Lincoln Yeoh wrote: > Hi, > > I've been wondering, what O/S or hardware feature would be useful for > databases? > > If Postgresql developers could get the CPU and O/S makers to do things > that would make certain things easier/faster (and in the long term) what > would they be? > > By long term I mean it's not something that's only useful for a few > years. Not something "gimmicky". > > For example - something like virtual memory definitely made many things > easier. Hardware support for virtualization also makes stuff like vmware > easier and better. What's the purpose of a multi-processing OS if you're just going to run a bunch of single-task VMs? > Seems CPU makers currently have more transistors than they know what to > do with, so they're adding cores and doing a lot of boring stuff like > SSE2, SSE3, SSE4, etc. > > So is there anything else useful that they (and the O/S ppl) can do that > they aren't doing already? Reducing memory latency always helps. That's AMD's strong point and now Intel is doing it. They've both got more cache. While I can't see the big use in PCs with quad-cores, multi-core can't help but benefit database servers. AMD, Intel & IBM are always profiling code, to find bottlenecks in their microarchitectures. POWER6 can run at 4GHz and is multi-core. Anyway... databases are always(?) IO bound. I'd try to figure out how to make a bigger hose (or more hoses) between the spindles and the mobo. The Alpha 8400 had multiple PCI *buses*, so as not to have a 133MBps chokepoint. A server with multiple PCI-e buses, 10Gb Ethernet, and lots of 4Gb HBAs attached to a big, fat SAN chock full of 15K SCSI disks could suck up a heck of a lot of data. > Better support for distributed locking (across cluster nodes etc)? OK > that's old stuff, but the last I checked HP was burying VMS and Tandem. AMD's HyperTransport could probably be used similar to Memory Channel. However, nowadays, gigabit Ethernet is the CI of choice, meaning that it's all done in software. > Hardware acceleration for quickly counting the number of > set/unset/matching bits? x86 doesn't already do that? -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
On 6/16/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > > Hardware acceleration for quickly counting the number of > > set/unset/matching bits? > > x86 doesn't already do that? I don't think so. The fastest way, I believe, is to use precomputed lookup tables. Same for finding the least/most significant set/unset bit, and other operations useful for dealing with bit vectors. Alexander.
On 06/16/07 17:05, Alexander Staubo wrote: > On 6/16/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> > Hardware acceleration for quickly counting the number of >> > set/unset/matching bits? >> >> x86 doesn't already do that? > > I don't think so. The fastest way, I believe, is to use precomputed > lookup tables. Same for finding the least/most significant set/unset > bit, and other operations useful for dealing with bit vectors. A couple of new AMD Barcelona opcodes might help do that: http://www.anandtech.com/showdoc.aspx?i=2939&p=6 While on the topic of instructions, AMD also introduced a few new extensions to its ISA with Barcelona. There are two new bit manipulation instructions: LZCNT and POPCNT. Leading Zero Count (LZCNT) counts the number of leading zeros in an op, while Pop Count counts the leading 1s in an op. Both of these instructions are targeted at crypto- graphy applications. -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
On 6/17/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > On 06/16/07 17:05, Alexander Staubo wrote: > > On 6/16/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > >> > Hardware acceleration for quickly counting the number of > >> > set/unset/matching bits? > >> > >> x86 doesn't already do that? > > > > I don't think so. The fastest way, I believe, is to use precomputed > > lookup tables. Same for finding the least/most significant set/unset > > bit, and other operations useful for dealing with bit vectors. > > A couple of new AMD Barcelona opcodes might help do that: > > http://www.anandtech.com/showdoc.aspx?i=2939&p=6 Very cool. Thanks for the pointer. Alexander.
On Sat, 16 Jun 2007, Ron Johnson wrote: > Anyway... databases are always(?) IO bound. I'd try to figure out how to > make a bigger hose (or more hoses) between the spindles and the mobo. What I keep waiting for is the drives with flash memory built-in to mature. I would love to get reliable writes that use the drive's cache for instant fsyncs, instead of right now where you have to push all that to the controller level. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On 06/17/07 00:19, Greg Smith wrote: > On Sat, 16 Jun 2007, Ron Johnson wrote: > >> Anyway... databases are always(?) IO bound. I'd try to figure out how >> to make a bigger hose (or more hoses) between the spindles and the mobo. > > What I keep waiting for is the drives with flash memory built-in to > mature. I would love to get reliable writes that use the drive's cache > for instant fsyncs, instead of right now where you have to push all that > to the controller level. But drive-based flash memory will always be a fixed size, and only for that drive. Controller-based cache is expandable and caches the whole RAID set (besides being battery-backed). And if you *still* need more cache, rip out that controller and put in a more expensive one, or transition to "plain" SCSI cards and a storage controller. -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
On 6/17/07, Greg Smith <gsmith@gregsmith.com> wrote: > On Sat, 16 Jun 2007, Ron Johnson wrote: > > > Anyway... databases are always(?) IO bound. I'd try to figure out how to > > make a bigger hose (or more hoses) between the spindles and the mobo. > > What I keep waiting for is the drives with flash memory built-in to > mature. I would love to get reliable writes that use the drive's cache > for instant fsyncs, instead of right now where you have to push all that > to the controller level. I don't think flash is the answer here...you should be looking at 'PRAM', i think. Solid state disks are coming very soon but flash is barely faster than traditional disks for random writes. (much faster for random reads however). Maybe this will change...flash is improving all the time. Already, the write cycle problem has been all but eliminated for the higher grade flash devices. That being said, it's pretty clear to me we are in the last days of the disk drive. When solid state drives become prevalent in server environments, database development will enter a new era...physical considerations will play less and less a role in how systems are engineered. So, to answer the OP, my answer would be to 'get rid of the spinning disk!' :-) merlin
On 06/18/07 08:05, Merlin Moncure wrote: [snip] > > That being said, it's pretty clear to me we are in the last days of > the disk drive. Oh, puhleeze. Seagate, Hitachi, Fuji and WD aren't sitting around with their thumbs up their arses. In 3-4 years, large companies and spooky TLAs will be stuffing SANs with hundreds of 2TB drives. My (young) kids will be out of college before the density/dollar of RAM gets anywhere near that of disks. If it ever does. What we are in, though, is the last decade of tape. > When solid state drives become prevalent in server > environments, database development will enter a new era...physical > considerations will play less and less a role in how systems are > engineered. "Oh, puhleeze" redux. There will always be physical considerations. Why? Even if static RAM drives *do* overtake spindles, you'll still need to engineer them properly. Why? 1) There's always a bottleneck. 2) There's always more data to "find" the bottleneck. > So, to answer the OP, my answer would be to 'get rid of > the spinning disk!' :-) -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
On 7/2/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > On 06/18/07 08:05, Merlin Moncure wrote: > [snip] > > > > That being said, it's pretty clear to me we are in the last days of > > the disk drive. > > Oh, puhleeze. Seagate, Hitachi, Fuji and WD aren't sitting around > with their thumbs up their arses. In 3-4 years, large companies > and spooky TLAs will be stuffing SANs with hundreds of 2TB drives. haven't we had this debate before? I don't know if you've been paying attention to what's going on in the storage industry...Apple, Dell, Fuji, Sandisk, Intel, and others are all making strategic plays in the flash market. At the outset of 2007, flash was predicted to decline 50% for the year...so far, prices have dropped 65% in the first two quarters. Right now it's all about the high end notebooks and media players but the high margin, high rotation speed drives are next. I admit the high density low speed cold storage d2d backup systems will be the last to fall and will be quite some ways off. note by, 'next', and 'last days', i mean that pretty loosely...within the next 5 years or so. 'dead' as well...there are many stages of death to an enterprise legacy product. I consider tape backups to be nearly dead already, although there are many still in use. d2d is where it's at though. merlin
On 07/03/07 13:03, Merlin Moncure wrote: > On 7/2/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> On 06/18/07 08:05, Merlin Moncure wrote: >> [snip] >> > >> > That being said, it's pretty clear to me we are in the last days of >> > the disk drive. >> >> Oh, puhleeze. Seagate, Hitachi, Fuji and WD aren't sitting around >> with their thumbs up their arses. In 3-4 years, large companies >> and spooky TLAs will be stuffing SANs with hundreds of 2TB drives. > > haven't we had this debate before? > > I don't know if you've been paying attention to what's going on in the > storage industry...Apple, Dell, Fuji, Sandisk, Intel, and others are > all making strategic plays in the flash market. At the outset of > 2007, flash was predicted to decline 50% for the year...so far, prices > have dropped 65% in the first two quarters. Right now it's all about > the high end notebooks and media players but the high margin, high > rotation speed drives are next. Technological nay-sayers have been wrong before, but I just can't see a *database* server full of static RAM in the next 10 years. > I admit the high density low speed > cold storage d2d backup systems will be the last to fall and will be > quite some ways off. > > note by, 'next', and 'last days', i mean that pretty loosely...within > the next 5 years or so. 'dead' as well...there are many stages of > death to an enterprise legacy product. I consider tape backups to be > nearly dead already, although there are many still in use. d2d is > where it's at though. Mainframers (and various other oldsters like me) think about 1) shock resistance, 2) media costs, 3) Iron Mountain, 4) media longevity. You can drop a SuperDLT tape from "man height" and recover the data (even if it has to be restrung into a new housing). I wouldn't drop a disk full of data and have any expectation of survival. A 160GB ("320"GB compressed) SATA drive is about $60 plus a $10 carrier. That comparable very well to tapes, I think. An Iron Mountain delivery truck will drive over some nasty bumps. How shock resistant is a disk drive in an external carrier? Not as resistant as a drive in a padded shipping box. But is it resistant "enough"? "Enterprise-level" tapes can sit in storage for 7-15 years and then still be readable. Can a disk drive sit un-used for 7 years? Would the motor freeze up? Will we still be able to connect SATA drives in 7 years? -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
On 7/4/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > "Enterprise-level" tapes can sit in storage for 7-15 years and then > still be readable. Can a disk drive sit un-used for 7 years? Would > the motor freeze up? Will we still be able to connect SATA drives > in 7 years? Same with a tape-drive, no? I've seen so many standard changes in drives and SCSI connectors ... if you don't keep spares of all the equipment involved you'll face the same issue with tapes that you'd face with SATA disks. > -- > Ron Johnson, Jr. > Jefferson LA USA -- Cheers, Andrej Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm
On 07/04/07 16:00, Andrej Ricnik-Bay wrote: > On 7/4/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > >> "Enterprise-level" tapes can sit in storage for 7-15 years and then >> still be readable. Can a disk drive sit un-used for 7 years? Would >> the motor freeze up? Will we still be able to connect SATA drives >> in 7 years? I was a bit harsh about connecting to SATA drives. IDE has been around for 21 years and ATA-133 is backwards compatible with 20MB drives of that era, so I predict that you'll be able to plug SATA-1 drives into machines with SATA-9 interfaces. But then, the motor might still not spin up... :( > Same with a tape-drive, no? I've seen so many standard changes > in drives and SCSI connectors ... if you don't keep spares of all the > equipment involved you'll face the same issue with tapes that you'd > face with SATA disks. No. Enterprise tape drives are not "flavor of the month", and can always read the previous one or two generations of tape. And if you've switched from, for example, SuperDLT to LTO, then you'll still be able to buy some drives on the used market (either eBay or from a dealer). -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good!
"Andrej Ricnik-Bay" <andrej.groups@gmail.com> writes: > On 7/4/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> "Enterprise-level" tapes can sit in storage for 7-15 years and then >> still be readable. Can a disk drive sit un-used for 7 years? Would >> the motor freeze up? Will we still be able to connect SATA drives >> in 7 years? > Same with a tape-drive, no? Uh, no, because the tape is removable. regards, tom lane