Thread: Insert performance
Hi all, Couple of days ago, one of my colleague, Rahul Iyer posted a query regarding insert performance of 5M rows. A common suggestion was to use copy. Unfortunately he can not use copy due to some constraints. I was helping him to get maximum out of it. We were playing with a data set of 500K rows on SunOS5.6 and postgresql 7.3.3 Best we could get was 500K records in 355 sec. That's roughly 1400 inserts per sec. This was with default settings and around 10K inserts per transaction. Postgresql was hogging all the CPU during load. Troubled by this, I set up a similar database at home. That is a Athlon XP2000+/512MB machine with a 40GB seagate disk. It is running slackware 9.0 with 2.4.20(IIRC). I have attached the results of my experiements and the code I used to benchmark. It was a simple table with an integer and a varchar(30) field. I was really amazed to see the numbers. First of all, it beat the sunOS machine left and right. Bruce posted some numbers of 9K inserts/sec. Here we see the same. Secondly I also played with filesystems. Ext3 does not seem to be performing as good. Reiser and ext2 did fine. Unfortunately the kernel didn't support XFS/JFS so could not test them. I have also attached the raw benchmark data in kspread format, for the curious. Didn't exported to anything else because kspread had troubles with exporting formula values. I also noticed that reiser takes hell lot more CPU than ext2 and ext3. It nearly peaks out all CPU capacity. Not so with ext2. Comments? One thing I can't help to notice is sunOs is not on same scale. The sunOS machine is a 1GB RAM machine. It has oracle and mysql running on it and have 300MB swap in use but I am sure it has SCSI disk and in all respect I would rather expect a RISC machine to perform better than an athlon XP machine, at least for an IO. If you want me to post details of sparc machine, please let me know how do I find it. I have never worked with sparcs earlier and have no intention of doing this again..:-) Bye Shridhar -- Fourth Law of Applied Terror: The night before the English History mid-term, your Biology instructor will assign 200 pages on planaria.Corollary: Every instructor assumes that you have nothing else to do except study for that instructor's course.
Attachment
Shridhar, > Unfortunately he can not use copy due to some constraints. Why not use COPY to load the table, and then apply the constraints by query afterwords? It might not be faster, but then again it might. > I was really amazed to see the numbers. First of all, it beat the sunOS > machine left and right. Bruce posted some numbers of 9K inserts/sec. Here > we see the same. <snip> > Comments? One thing I can't help to notice is sunOs is not on same scale. > The sunOS machine is a 1GB RAM machine. It has oracle and mysql running on > it and have 300MB swap in use but I am sure it has SCSI disk and in all > respect I would rather expect a RISC machine to perform better than an > athlon XP machine, at least for an IO. It's been reported on this list several times that Solaris is the *worst* of the *nixes for PostgreSQL performance. No analysis has been posted as to why; my own thoughts are: - Solaris' multi-threaded architecture which imposes a hefty per-process overhead, about triple that of Linux, slowing new connections and large multi-user activity; - Poor filesystem management; Sun simply hasn't kept up with IBM, Reiser, Red Hat and BSD in developing filesystems. ... but that's based on inadequate experimentation, just a few tests on Bonnie++ on a Netra running Solaris 8. -- Josh Berkus Aglio Database Solutions San Francisco
Martha Stewart called it a Good Thing whenjosh@agliodbs.com (Josh Berkus)wrote: > Shridhar, >> Unfortunately he can not use copy due to some constraints. > Why not use COPY to load the table, and then apply the constraints > by query afterwords? It might not be faster, but then again it > might. If you can transform the information into COPYable form, that's certainly a good thing. Whether it will be workable or not is another question. > ... but that's based on inadequate experimentation, just a few tests > on Bonnie++ on a Netra running Solaris 8. As far as the filesystem issues are concerned, you're probably using an "old, standard" version of UFS. The "high performance" option on Solaris involves using third-party Veritas software. The persistence of this is somewhat surprising; I'd somewhat have expected Sun to have bought out Veritas or some such thing, as it's a pretty vital technology that _isn't_ totally under their control. Actually, an entertaining option would be for them to buy out SGI, as that would get them control of XFS and a number of other interesting technologies. My expectations of a Netra wouldn't be terribly high, either; they seem to exist as a product so that people that need a cheap Sun box have an option. They are mostly running IDE disk, and the latest IA-32 hardware is likely to have newer faster interface options. -- (format nil "~S@~S" "cbbrowne" "ntlug.org") http://cbbrowne.com/info/advocacy.html "Windows NT was designed to be administered by an idiot and usually is..." -- Chris Adams" <adamsc@gigante.ml.org>
On 16 Aug 2003 at 11:40, Josh Berkus wrote: > Shridhar, > > > Unfortunately he can not use copy due to some constraints. > > Why not use COPY to load the table, and then apply the constraints by query > afterwords? It might not be faster, but then again it might. Lol.. The constraints I mentioned weren't database constraints.. It need to interface with another database module.. The data import can not be text.. that's a project requirement.. I should have been clearer in first place.. Bye Shridhar -- O'Toole's commentary on Murphy's Law: Murphy was an optimist.
On 18 Aug 2003 at 18:52, Hannu Krosing wrote: > My own experimentation also got numbers in 9k/sec range (on a quad > 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of > ~1000 inserts/transaction. > > Performance dropped to ~300/sec (at about 60M rows) when I added an > index (primary key) - as I did random inserts, the hit rates for index > pages were probably low. I was loading a geographic data couple of months back.. It was 3GB data when loaded in postgresql. I tried loading data first and creating index later. It ran out of available 9GB space. So I created index on an empty table and started loading it. It was slow but at least finished after 3 hours... Co-incidentally oracle had same problems as well. So creating index beforehand remains only option at times, it seems. Tom remarked that it shouldn't have made difference but apparently it does.. You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or per connection? If it is former, probably WAL is hit too hard. You could do some additional testing by having WALit's own disk. I was plannning to do the multiple writers test. But since objective was to find out why postgresql was so slow and it tunred out to be slowaris in first place, didn't have any enthu. left. I recommended my colleague to move to linux. But apparently this product is a part of suit which runs on some HUGE solaris machines. So if it has to run postgresql, it has to run sunos. I hope they are faster with SCSI.. Bye Shridhar -- On my planet, to rest is to rest -- to cease using energy. To me, itis quite illogical to run up and down on green grass, using energy,instead of saving it. -- Spock, "Shore Leave", stardate 3025.2
Chris, > My expectations of a Netra wouldn't be terribly high, either; they > seem to exist as a product so that people that need a cheap Sun box > have an option. They are mostly running IDE disk, and the latest > IA-32 hardware is likely to have newer faster interface options. Neither are ours ... we bought an array of 1U Netras as a dot-com closeout. At $330 per machine including OS, they were a darned good deal, and with 4 of them the redundancy makes up for the lack of individual server performance. -- Josh Berkus Aglio Database Solutions San Francisco
Shridhar Daithankar kirjutas E, 18.08.2003 kell 09:21: > On 16 Aug 2003 at 11:40, Josh Berkus wrote: > > > Shridhar, > > > > > Unfortunately he can not use copy due to some constraints. > > > > Why not use COPY to load the table, and then apply the constraints by query > > afterwords? It might not be faster, but then again it might. > > Lol.. The constraints I mentioned weren't database constraints.. Can't you still apply them later ;) My own experimentation also got numbers in 9k/sec range (on a quad 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of ~1000 inserts/transaction. Performance dropped to ~300/sec (at about 60M rows) when I added an index (primary key) - as I did random inserts, the hit rates for index pages were probably low. -------------- Hannu
Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02: > On 18 Aug 2003 at 18:52, Hannu Krosing wrote: > > My own experimentation also got numbers in 9k/sec range (on a quad > > 1.3GHz Xeons, 2GM mem, 50MB/sec raid) when doing 10-20 parallel runs of > > ~1000 inserts/transaction. > > > > Performance dropped to ~300/sec (at about 60M rows) when I added an > > index (primary key) - as I did random inserts, the hit rates for index > > pages were probably low. > > I was loading a geographic data couple of months back.. It was 3GB data when > loaded in postgresql. With or without indexes ? > I tried loading data first and creating index later. It ran out of available > 9GB space. So I created index on an empty table and started loading it. It was > slow but at least finished after 3 hours... Co-incidentally oracle had same > problems as well. So creating index beforehand remains only option at times, it > seems. Tom remarked that it shouldn't have made difference but apparently it > does.. Tom just fixed some memory leaks on array indexing the other day. Could there be something like that on geographic types ? > You mentioned parallel runs and still getting 9K/sec. Was that overall 9K or > per connection? Overall. But notice that my setup was (a little) slower per processor. > If it is former, probably WAL is hit too hard. You could do > some additional testing by having WALit's own disk. I guess that todays IDE disks are about the same speed (~50MB/sec) as my test RAID was. I run multiple parallel runs to have a chance to use all 4 processors (but IIRC it was heavyly IO-bound) as well as to better use writing time on WAL platters (not to wait for full rotation on each platter) -------------- Hannu
On 18 Aug 2003 at 9:57, Josh Berkus wrote: > Chris, > > > My expectations of a Netra wouldn't be terribly high, either; they > > seem to exist as a product so that people that need a cheap Sun box > > have an option. They are mostly running IDE disk, and the latest > > IA-32 hardware is likely to have newer faster interface options. > > Neither are ours ... we bought an array of 1U Netras as a dot-com closeout. > At $330 per machine including OS, they were a darned good deal, and with 4 of > them the redundancy makes up for the lack of individual server performance. I am sure they would run much better with linux on them rather than solaris.. Bye Shridhar -- Shannon's Observation: Nothing is so frustrating as a bad situation that is beginning to improve.
On 19 Aug 2003 at 1:16, Hannu Krosing wrote: > Shridhar Daithankar kirjutas E, 18.08.2003 kell 19:02: > > I was loading a geographic data couple of months back.. It was 3GB data when > > loaded in postgresql. > > With or without indexes ? Without index. Index was another 3 GB with 50% utilisation. It was 81M rows with 3 floats each.. > > > I tried loading data first and creating index later. It ran out of available > > 9GB space. So I created index on an empty table and started loading it. It was > > slow but at least finished after 3 hours... Co-incidentally oracle had same > > problems as well. So creating index beforehand remains only option at times, it > > seems. Tom remarked that it shouldn't have made difference but apparently it > > does.. > > Tom just fixed some memory leaks on array indexing the other day. Could > there be something like that on geographic types ? Dunno.. This was 7.3.2 or earlier.. Later the project abandoned all types of databases and went to in memory structures since flat data was of the order of 200MB. Now they aer returning to databases as flat data is approaching excess of 3 GB.. God knows what will they do next. It's an ideal example what schedule pressure can do to architecture design of a software.. Bye Shridhar -- Air Force Inertia Axiom: Consistency is always easier to defend than correctness.