Thread: Large databases, performance
Hi, Today we concluded test for database performance. Attached are results and the schema, for those who have missed earlier discussion on this. We have (almost) decided that we will partition the data across machines. The theme is, after every some short interval a burst of data will be entered in new table in database, indexed and vacuume. The table(s) will be inherited so that query on base table will fetch results from all the children. The application has to consolidate all the data per node basis. If the database is not postgresql, app. has to consolidate data across partitions as well. Now we need to investigate whether selecting on base table to include children would use indexes created on children table. It's estimated that when entire data is gathered, total number of children tables would be around 1K-1.1K across all machines. This is in point of average rate of data insertion i.e. 5K records/sec and total data size, estimated to be 9 billion rows max i.e. estimated database size is 900GB. Obviously it's impossible to keep insertion rate on an indexed table high as data grows. So partitioning/inheritance looks better approach. Postgresql is not the final winner as yet. Mysql is in close range. I will keep you guys posted about the result. Let me know about any comments.. Bye Shridhar -- Price's Advice: It's all a game -- play it to have fun. Machine Compaq Proliant Server ML 530 "Intel Xeon 2.4 Ghz Processor x 4, " "4 GB RAM, 5 x 72.8 GB SCSI HDD " "RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0" "Cost - $13,500 ($1,350 for each additional 72GB HDD)" Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2 WITHOUT InnoDB WITH InnoDB for with built-in support for transactional transactional support for transactions support Complete Data Inserts + building a composite index "40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs "about 100 bytes each, schema on 'schema' sheet" "composite index on 3 fields (esn, min, datetime)" Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second Database Size on Disk 48 GB 87 GB 111 GB Average per partition Inserts + building a composite index "300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs "about 100 bytes each, schema on 'schema' sheet" "composite index on 3 fields (esn, min, datetime)" Select Query 7 secs 7 secs 6 secs based on equality match of 2 fields (esn and min) - 4 concurrent queries running Database Size on Disk 341 MB 619 MB 788 MB Field Name Field Type Nullable Indexed type int no no esn char (10) no yes min char (10) no yes datetime timestamp no yes opc0 char (3) no no opc1 char (3) no no opc2 char (3) no no dpc0 char (3) no no dpc1 char (3) no no dpc2 char (3) no no npa char (3) no no nxx char (3) no no rest char (4) no no field0 int yes no field1 char (4) yes no field2 int yes no field3 char (4) yes no field4 int yes no field5 char (4) yes no field6 int yes no field7 char (4) yes no field8 int yes no field9 char (4) yes no
NOTE: Setting follow up to the performance list Funny that the status quo seems to be if you need fast selects on data that has few inserts to pick mysql, otherwise if you have a lot of inserts and don't need super fast selects go with PostgreSQL; yet your data seems to cut directly against this. I'm curious, did you happen to run the select tests while also running the insert tests? IIRC the older mysql versions have to lock the table when doing the insert, so select performance goes in the dumper in that scenario, perhaps that's not an issue with 3.23.52? It also seems like the vacuum after each insert is unnecessary, unless your also deleting/updating data behind it. Perhaps just running an ANALYZE on the table would suffice while reducing overhead. Robert Treat On Thu, 2002-10-03 at 08:36, Shridhar Daithankar wrote: > Machine > Compaq Proliant Server ML 530 > "Intel Xeon 2.4 Ghz Processor x 4, " > "4 GB RAM, 5 x 72.8 GB SCSI HDD " > "RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0" > "Cost - $13,500 ($1,350 for each additional 72GB HDD)" > > Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2 > WITHOUT InnoDB WITH InnoDB for with built-in support > for transactional transactional support for transactions > support > Complete Data > > Inserts + building a composite index > "40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs > "about 100 bytes each, schema on > 'schema' sheet" > "composite index on 3 fields > (esn, min, datetime)" > > Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second > > Database Size on Disk 48 GB 87 GB 111 GB > > Average per partition > > Inserts + building a composite index > "300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs > "about 100 bytes each, schema on > 'schema' sheet" > "composite index on 3 fields > (esn, min, datetime)" > > Select Query 7 secs 7 secs 6 secs > based on equality match of 2 fields > (esn and min) - 4 concurrent queries > running > > Database Size on Disk 341 MB 619 MB 788 MB > ----
On 3 Oct 2002 at 11:57, Robert Treat wrote: > NOTE: Setting follow up to the performance list > > Funny that the status quo seems to be if you need fast selects on data > that has few inserts to pick mysql, otherwise if you have a lot of > inserts and don't need super fast selects go with PostgreSQL; yet your > data seems to cut directly against this. Well, couple of things.. The number of inserts aren't few. it's 5000/sec.required in the field Secondly I don't know really but postgresql seems doing pretty fine in parallel selects. If we use mysql with transaction support then numbers are really close.. May be it's time to rewrite famous myth that postgresql is slow. When properly tuned or given enough head room, it's almost as fast as mysql.. > I'm curious, did you happen to run the select tests while also running > the insert tests? IIRC the older mysql versions have to lock the table > when doing the insert, so select performance goes in the dumper in that > scenario, perhaps that's not an issue with 3.23.52? IMO even if it locks tables that shouldn't affect select performance. It would be fun to watch when we insert multiple chunks of data and fire queries concurrently. I would be surprised if mysql starts slowing down.. > It also seems like the vacuum after each insert is unnecessary, unless > your also deleting/updating data behind it. Perhaps just running an > ANALYZE on the table would suffice while reducing overhead. I believe that was vacuum analyze only. But still it takes lot of time. Good thing is it's not blocking.. Anyway I don't think such frequent vacuums are going to convince planner to choose index scan over sequential scan. I am sure it's already convinced.. Regards, Shridhar ----------------------------------------------------------- Shridhar Daithankar LIMS CPE Team Member, PSPL. mailto:shridhar_daithankar@persistent.co.in Phone:- +91-20-5678900 Extn.270 Fax :- +91-20-5678901 -----------------------------------------------------------
On Thu, 03 Oct 2002 18:06:10 +0530, "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> wrote: >Machine >Compaq Proliant Server ML 530 >"Intel Xeon 2.4 Ghz Processor x 4, " >"4 GB RAM, 5 x 72.8 GB SCSI HDD " >"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0" Shridhar, forgive me if I ask what has been said before: Did you run at 100% CPU or was IO bandwidth your limit? And is the answer the same for all three configurations? Servus Manfred
On Thu, 03 Oct 2002 21:47:03 +0530, "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> wrote: >I believe that was vacuum analyze only. Well there is VACUUM [tablename]; and there is ANALYZE [tablename]; And VACUUM ANALYZE [tablename]; is VACUUM followed by ANALYZE. Servus Manfred
On 3 Oct 2002 at 18:53, Manfred Koizar wrote: > On Thu, 03 Oct 2002 21:47:03 +0530, "Shridhar Daithankar" > <shridhar_daithankar@persistent.co.in> wrote: > >I believe that was vacuum analyze only. > > Well there is > > VACUUM [tablename]; > > and there is > > ANALYZE [tablename]; > > And > > VACUUM ANALYZE [tablename]; > > is VACUUM followed by ANALYZE. I was using vacuum analyze. Good that you pointed out. Now I will modify the postgresql auto vacuum daemon that I wrote to analyze only in case of excesive inserts. I hope that's lighter on performance compared to vacuum analyze.. Bye Shridhar -- Mix's Law: There is nothing more permanent than a temporary building. There is nothing more permanent than a temporary tax.
Can you comment on the tools you are using to do the insertions (Perl, Java?) and the distribution of data (all random, all static), and the transaction scope (all inserts in one transaction, each insert as a single transaction, some group of inserts as a transaction). I'd be curious what happens when you submit more queries than you have processors (you had four concurrent queries and four CPUs), if you care to run any additional tests. Also, I'd report the query time in absolute (like you did) and also in 'Time/number of concurrent queries". This will give you a sense of how the system is scaling as the workload increases. Personally I am more concerned about this aspect than the load time, since I am going to guess that this is where all the time is spent. Was the original posting on GENERAL or HACKERS. Is this moving the PERFORMANCE for follow-up? I'd like to follow this discussion and want to know if I should join another group? Thanks, Charlie P.S. Anyone want to comment on their expectation for 'commercial' databases handling this load? I know that we cannot speak about specific performance metrics on some products (licensing restrictions) but I'd be curious if folks have seen some of the databases out there handle these dataset sizes and respond resonably. Shridhar Daithankar wrote: >Hi, > >Today we concluded test for database performance. Attached are results and the >schema, for those who have missed earlier discussion on this. > >We have (almost) decided that we will partition the data across machines. The >theme is, after every some short interval a burst of data will be entered in >new table in database, indexed and vacuume. The table(s) will be inherited so >that query on base table will fetch results from all the children. The >application has to consolidate all the data per node basis. If the database is >not postgresql, app. has to consolidate data across partitions as well. > >Now we need to investigate whether selecting on base table to include children >would use indexes created on children table. > >It's estimated that when entire data is gathered, total number of children >tables would be around 1K-1.1K across all machines. > >This is in point of average rate of data insertion i.e. 5K records/sec and >total data size, estimated to be 9 billion rows max i.e. estimated database >size is 900GB. Obviously it's impossible to keep insertion rate on an indexed >table high as data grows. So partitioning/inheritance looks better approach. > >Postgresql is not the final winner as yet. Mysql is in close range. I will keep >you guys posted about the result. > >Let me know about any comments.. > >Bye > Shridhar > >-- >Price's Advice: It's all a game -- play it to have fun. > > > > >------------------------------------------------------------------------ > >Machine >Compaq Proliant Server ML 530 >"Intel Xeon 2.4 Ghz Processor x 4, " >"4 GB RAM, 5 x 72.8 GB SCSI HDD " >"RAID 0 (Striping) Hardware Setup, Mandrake Linux 9.0" >"Cost - $13,500 ($1,350 for each additional 72GB HDD)" > >Performance Parameter MySQL 3.23.52 MySQL 3.23.52 PostgreSQL 7.2.2 > WITHOUT InnoDB WITH InnoDB for with built-in support > for transactional transactional support for transactions > support >Complete Data > >Inserts + building a composite index >"40 GB data, 432,000,000 tuples" 3738 secs 18720 secs 20628 secs >"about 100 bytes each, schema on >'schema' sheet" >"composite index on 3 fields >(esn, min, datetime)" > >Load Speed 115570 tuples/second 23076 tuples/second 20942 tuples/second > >Database Size on Disk 48 GB 87 GB 111 GB > >Average per partition > >Inserts + building a composite index >"300MB data, 3,000,000 tuples," 28 secs 130 secs 150 secs >"about 100 bytes each, schema on >'schema' sheet" >"composite index on 3 fields >(esn, min, datetime)" > >Select Query 7 secs 7 secs 6 secs >based on equality match of 2 fields >(esn and min) - 4 concurrent queries >running > >Database Size on Disk 341 MB 619 MB 788 MB > > >------------------------------------------------------------------------ > >Field Name Field Type Nullable Indexed >type int no no >esn char (10) no yes >min char (10) no yes >datetime timestamp no yes >opc0 char (3) no no >opc1 char (3) no no >opc2 char (3) no no >dpc0 char (3) no no >dpc1 char (3) no no >dpc2 char (3) no no >npa char (3) no no >nxx char (3) no no >rest char (4) no no >field0 int yes no >field1 char (4) yes no >field2 int yes no >field3 char (4) yes no >field4 int yes no >field5 char (4) yes no >field6 int yes no >field7 char (4) yes no >field8 int yes no >field9 char (4) yes no > > > >------------------------------------------------------------------------ > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > -- Charles H. Woloszynski ClearMetrix, Inc. 115 Research Drive Bethlehem, PA 18015 tel: 610-419-2210 x400 fax: 240-371-3256 web: www.clearmetrix.com
On 3 Oct 2002 at 8:54, Charles H. Woloszynski wrote: > I'd be curious what happens when you submit more queries than you have > processors (you had four concurrent queries and four CPUs), if you care > to run any additional tests. Also, I'd report the query time in > absolute (like you did) and also in 'Time/number of concurrent queries". > This will give you a sense of how the system is scaling as the workload > increases. Personally I am more concerned about this aspect than the > load time, since I am going to guess that this is where all the time is > spent. OK. I am back from my cave after some more tests are done. Here are the results. I am not repeating large part of it but answering your questions.. Don't ask me how these numbers changed. I am not the person who conducts the test neither I have access to the system. Rest(or most ) of the things remains same.. MySQL 3.23.52 with innodb transaction support: 4 concurrent queries :- 257.36 ms 40 concurrent queries :- 35.12 ms Postgresql 7.2.2 4 concurrent queries :- 257.43 ms 40 concurrent queries :- 41.16 ms Though I can not report oracle numbers, suffice to say that they fall in between these two numbers. Oracle seems to be hell lot faster than mysql/postgresql to load raw data even when it's installed on reiserfs. We plan to run XFS tests later in hope that that would improve mysql/postgresql load times. In this run postgresql has better load time than mysql/innodb ( 18270 sec v/s 17031 sec.) Index creation times are faster as well (100 sec v/s 130 sec). Don't know what parameters are changed. Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All numbers include indexes. This is really going to be a problem when things are deployed. Any idea how can it be taken down? WAL is out, it's not counted. Schema optimisation is later issue. Right now all three databases are using same schema.. Will it help in this situation if I recompile posgresql with block size say 32K rather than 8K default? Will it saev some overhead and offer better performance in data load etc? Will keep you guys updated.. Regards, Shridhar ----------------------------------------------------------- Shridhar Daithankar LIMS CPE Team Member, PSPL. mailto:shridhar_daithankar@persistent.co.in Phone:- +91-20-5678900 Extn.270 Fax :- +91-20-5678901 -----------------------------------------------------------
On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> wrote: >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All >numbers include indexes. This is really going to be a problem when things are >deployed. Any idea how can it be taken down? Shridhar, if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit integer specifying the length followed by as many characters as the length tells. On 32-bit Intel hardware this structure is aligned on a 4-byte boundary. For your row layout this gives the following sizes (look at the "phys size" column): | Field Field Null Indexed phys mini | Name Type size |-------------------------------------------- | type int no no 4 4 | esn char (10) no yes 16 11 | min char (10) no yes 16 11 | datetime timestamp no yes 8 8 | opc0 char (3) no no 8 4 | opc1 char (3) no no 8 4 | opc2 char (3) no no 8 4 | dpc0 char (3) no no 8 4 | dpc1 char (3) no no 8 4 | dpc2 char (3) no no 8 4 | npa char (3) no no 8 4 | nxx char (3) no no 8 4 | rest char (4) no no 8 5 | field0 int yes no 4 4 | field1 char (4) yes no 8 5 | field2 int yes no 4 4 | field3 char (4) yes no 8 5 | field4 int yes no 4 4 | field5 char (4) yes no 8 5 | field6 int yes no 4 4 | field7 char (4) yes no 8 5 | field8 int yes no 4 4 | field9 char (4) yes no 8 5 | ----- ----- | 176 116 Ignoring nulls for now, you have to add 32 bytes for a v7.2 heap tuple header and 4 bytes for ItemIdData per tuple, ending up with 212 bytes per tuple or ca. 85 GB heap space for 432000000 tuples. Depending on fill factor similar calculations give some 30 GB for your index. Now if we had a datatype with only one byte for the string length, char columns could be byte aligned and we'd have column sizes given under "mini" in the table above. The columns would have to be rearranged according to alignment requirements. Thus 60 bytes per heap tuple and 8 bytes per index tuple could be saved, resulting in a database size of ~ 85 GB (index included). And I bet this would be significantly faster, too. Hackers, do you think it's possible to hack together a quick and dirty patch, so that string length is represented by one byte? IOW can a database be built that doesn't contain any char/varchar/text value longer than 255 characters in the catalog? If I'm not told that this is impossibly, I'd give it a try. Shridhar, if such a patch can be made available, would you be willing to test it? What can you do right now? Try using v7.3 beta and creating your table WITHOUT OIDS. This saves 8 bytes per tuple; not much, but better save 4% than nothing. Servus Manfred
On 7 Oct 2002 at 16:10, Manfred Koizar wrote: > if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit > integer specifying the length followed by as many characters as the > length tells. On 32-bit Intel hardware this structure is aligned on a > 4-byte boundary. That shouldn't be necessary for a char field as space is always pre-allocated. Sounds like a possible area of imporvement to me, if that's the case.. > Hackers, do you think it's possible to hack together a quick and dirty > patch, so that string length is represented by one byte? IOW can a > database be built that doesn't contain any char/varchar/text value > longer than 255 characters in the catalog? I say if it's a char field, there should be no indicator of length as it's not required. Just store those many characters straight ahead.. > > If I'm not told that this is impossibly, I'd give it a try. Shridhar, > if such a patch can be made available, would you be willing to test > it? Sure. But the server machine is not available this week. Some other project is using it. So the results won't be out unless at least a week from now. > What can you do right now? Try using v7.3 beta and creating your > table WITHOUT OIDS. This saves 8 bytes per tuple; not much, but > better save 4% than nothing. IIRC there was some header optimisation which saved 4 bytes. So without OIDs that should save 8. Would do that as first next thing. I talked to my friend regarding postgresql surpassing mysql substantially in this test. He told me that the last test where postgresql took 23000+/150 sec for load/index and mysql took 18,000+/130 index, postgresql was running in default configuration. He forgot to copy postgresql.conf to data directory after he modified it. This time results are correct. Postgresql loads data faster, indexes it faster and queries in almost same time.. Way to go.. Regards, Shridhar ----------------------------------------------------------- Shridhar Daithankar LIMS CPE Team Member, PSPL. mailto:shridhar_daithankar@persistent.co.in Phone:- +91-20-5678900 Extn.270 Fax :- +91-20-5678901 -----------------------------------------------------------
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > MySQL 3.23.52 with innodb transaction support: > 4 concurrent queries :- 257.36 ms > 40 concurrent queries :- 35.12 ms > Postgresql 7.2.2 > 4 concurrent queries :- 257.43 ms > 40 concurrent queries :- 41.16 ms I find this pretty fishy. The extreme similarity of the 4-client numbers seems improbable, from what I know of the two databases. I suspect your numbers are mostly measuring some non-database-related overhead --- communications overhead, maybe? > Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All > numbers include indexes. This is really going to be a problem when things are > deployed. Any idea how can it be taken down? 7.3 should be a little bit better because of Manfred's work on reducing tuple header size --- if you create your tables WITHOUT OIDS, you should save 8 bytes per row compared to earlier releases. regards, tom lane
On 7 Oct 2002 at 10:30, Tom Lane wrote: > "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > > MySQL 3.23.52 with innodb transaction support: > > > 4 concurrent queries :- 257.36 ms > > 40 concurrent queries :- 35.12 ms > > > Postgresql 7.2.2 > > > 4 concurrent queries :- 257.43 ms > > 40 concurrent queries :- 41.16 ms > > I find this pretty fishy. The extreme similarity of the 4-client > numbers seems improbable, from what I know of the two databases. > I suspect your numbers are mostly measuring some non-database-related > overhead --- communications overhead, maybe? I don't know but three numbers, postgresql/mysql/oracle all are 25x.xx ms. The clients were on same machie as of server. So no real area to point at.. > > > Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All > > numbers include indexes. This is really going to be a problem when things are > > deployed. Any idea how can it be taken down? > > 7.3 should be a little bit better because of Manfred's work on reducing > tuple header size --- if you create your tables WITHOUT OIDS, you should > save 8 bytes per row compared to earlier releases. Got it.. Bye Shridhar -- Sweater, n.: A garment worn by a child when its mother feels chilly.
On Mon, 07 Oct 2002 19:48:31 +0530, "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> wrote: >I say if it's a char field, there should be no indicator of length as it's not >required. Just store those many characters straight ahead.. This is out of reach for a quick hack ... >Sure. But the server machine is not available this week. Some other project is >using it. So the results won't be out unless at least a week from now. :-) >This time results are correct. Postgresql loads data faster, indexes it faster >and queries in almost same time.. Way to go.. Great! And now let's work on making selects faster, too. Servus Manfred
"Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > I say if it's a char field, there should be no indicator of length as > it's not required. Just store those many characters straight ahead.. Your assumption fails when considering UNICODE or other multibyte character encodings. regards, tom lane
On 7 Oct 2002 at 11:21, Tom Lane wrote: > "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > > I say if it's a char field, there should be no indicator of length as > > it's not required. Just store those many characters straight ahead.. > > Your assumption fails when considering UNICODE or other multibyte > character encodings. Correct but is it possible to have real char string when database is not unicode or when locale defines size of char, to be exact? In my case varchar does not make sense as all strings are guaranteed to be of defined length. While the argument you have put is correct, it's causing a disk space leak, to say so. Bye Shridhar -- Boucher's Observation: He who blows his own horn always plays the music several octaves higher than originally written.
On Tue, Oct 08, 2002 at 11:14:11AM +0530, Shridhar Daithankar wrote: > On 7 Oct 2002 at 11:21, Tom Lane wrote: > > > "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> writes: > > > I say if it's a char field, there should be no indicator of length as > > > it's not required. Just store those many characters straight ahead.. > > > > Your assumption fails when considering UNICODE or other multibyte > > character encodings. > > Correct but is it possible to have real char string when database is not > unicode or when locale defines size of char, to be exact? > > In my case varchar does not make sense as all strings are guaranteed to be of > defined length. While the argument you have put is correct, it's causing a disk > space leak, to say so. Well, maybe. But since 7.1 or so char() and varchar() simply became text with some length restrictions. This was one of the reasons. It also simplified a lot of code. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > There are 10 kinds of people in the world, those that can do binary > arithmetic and those that can't.
On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> wrote: >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. Shridhar, here is an implementation of a set of user types: char3, char4, char10. Put the attached files into a new directory contrib/fixchar, make, make install, and run fixchar.sql through psql. Then create your table as CREATE TABLE tbl ( type int, esn char10, min char10, datetime timestamp, opc0 char3, ... rest char4, field0 int, field1 char4, ... ) This should save 76 bytes per heap tuple and 12 bytes per index tuple, giving a database size of ~ 76 GB. I'd be very interested how this affects performance. Code has been tested for v7.2, it crashes on v7.3 beta 1. If this is a problem, let me know. Servus Manfred
On 9 Oct 2002 at 10:00, Manfred Koizar wrote: > On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar" > <shridhar_daithankar@persistent.co.in> wrote: > >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. > > Shridhar, > > here is an implementation of a set of user types: char3, char4, > char10. Put the attached files into a new directory contrib/fixchar, > make, make install, and run fixchar.sql through psql. Then create > your table as > CREATE TABLE tbl ( > type int, > esn char10, > min char10, > datetime timestamp, > opc0 char3, > ... > rest char4, > field0 int, > field1 char4, > ... > ) > > This should save 76 bytes per heap tuple and 12 bytes per index tuple, > giving a database size of ~ 76 GB. I'd be very interested how this > affects performance. > > Code has been tested for v7.2, it crashes on v7.3 beta 1. If this is > a problem, let me know. Thank you very much for this. I would certainly give it a try. Please be patient as next test is scheuled on monday. Bye Shridhar -- love, n.: When it's growing, you don't mind watering it with a few tears.
On 9 Oct 2002 at 10:00, Manfred Koizar wrote: > On Mon, 07 Oct 2002 15:07:29 +0530, "Shridhar Daithankar" > <shridhar_daithankar@persistent.co.in> wrote: > >Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. > > Shridhar, > > here is an implementation of a set of user types: char3, char4, > char10. Put the attached files into a new directory contrib/fixchar, > make, make install, and run fixchar.sql through psql. Then create > your table as I had a quick look in things. I think it's a great learning material for pg internals..;-) I have a suggestion. In README, it should be worth mentioning that, new types can be added just by changin Makefile. e.g. Changing line OBJS = char3.o char4.o char10.o to OBJS = char3.o char4.o char5.o char10.o would add the datatype char5 as well. Obviously this is for those who might not take efforts to read the source. ( Personally I wouldn't have, had it been part of entire postgres source dump. Just would have done ./configure;make;make install) Thanks for the solution. It wouldn't have occurred to me in ages to create a type for this. I guess that's partly because never used postgresql beyond select/insert/update/delete. Anyway should have been awake.. Thanks once again Bye Shridhar -- But it's real. And if it's real it can be affected ... we may not be ableto break it, but, I'll bet you credits to Navy Beans we can put a dent in it. -- deSalle, "Catspaw", stardate 3018.2
Manfred Koizar <mkoi-pg@aon.at> writes: > here is an implementation of a set of user types: char3, char4, > char10. Coupla quick comments on these: > CREATE FUNCTION charNN_lt(charNN, charNN) > RETURNS boolean > AS '$libdir/fixchar' > LANGUAGE 'c'; > bool > charNN_lt(char *a, char *b) > { > return (strncmp(a, b, NN) < 0); > }/*charNN_lt*/ These functions are dangerous as written, because they will crash on null inputs. I'd suggest marking them strict in the function declarations. Some attention to volatility declarations (isCachable or isImmutable) would be a good idea too. Also, it'd be faster and more portable to write the functions with version-1 calling conventions. Using the Makefile to auto-create the differently sized versions is a slick trick... regards, tom lane
On 9 Oct 2002 at 9:32, Tom Lane wrote: > Manfred Koizar <mkoi-pg@aon.at> writes: > > here is an implementation of a set of user types: char3, char4, > > char10. > > Coupla quick comments on these: > > > CREATE FUNCTION charNN_lt(charNN, charNN) > > RETURNS boolean > > AS '$libdir/fixchar' > > LANGUAGE 'c'; > > > bool > > charNN_lt(char *a, char *b) > > { > > return (strncmp(a, b, NN) < 0); > > }/*charNN_lt*/ > > These functions are dangerous as written, because they will crash on > null inputs. I'd suggest marking them strict in the function > declarations. Some attention to volatility declarations (isCachable > or isImmutable) would be a good idea too. Let me add something. Using char* is bad idea. I had faced a situation recently on HP-UX 11 that with a libc patch, isspace collapsed for char>127. Fix was to use unsigned char. There are other places also where the input character is used as index to an array internally and can cause weird behaviour for values >127 I will apply both the correction here. Will post the final stuff soon. Bye Shridhar -- Hacker's Quicky #313: Sour Cream -n- Onion Potato Chips Microwave Egg Roll Chocolate Milk
On Wed, 09 Oct 2002 09:32:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Coupla quick comments on these: My first attempt on user types; thanks for the tips. >These functions are dangerous as written, because they will crash on >null inputs. I'd suggest marking them strict in the function >declarations. I was not aware of this, just wondered why bpchar routines didn't crash :-) Fixed. >Some attention to volatility declarations (isCachable >or isImmutable) would be a good idea too. >Also, it'd be faster and more portable to write the functions with >version-1 calling conventions. Done, too. In the meantime I've found out why it crashed with 7.3: INSERT INTO pg_opclass is now obsolete, have to use CREATE OPERATOR CLASS ... Servus Manfred
On Wed, 09 Oct 2002 10:00:03 +0200, I wrote: >here is an implementation of a set of user types: char3, char4, >char10. New version available. As I don't want to spam the list with various versions until I get it right eventually, you can get it from http://members.aon.at/pivot/pg/fixchar20021010.tgz if you are interested. What's new: . README updated (per Shridhar's suggestion) . doesn't crash on NULL (p. Tom) . version-1 calling conventions (p. Tom) . isCachable (p. Tom) . works for 7.2 (as delivered) and for 7.3 (make for73) Shridhar, you were concerned about signed/unsigned chars; looking at the code I can not see how this is a problem. So no change in this regard. Thanks for your comments. Have fun! Servus Manfred
On 10 Oct 2002 at 15:30, Manfred Koizar wrote: > On Wed, 09 Oct 2002 10:00:03 +0200, I wrote: > >here is an implementation of a set of user types: char3, char4, > >char10. > > New version available. As I don't want to spam the list with various > versions until I get it right eventually, you can get it from > http://members.aon.at/pivot/pg/fixchar20021010.tgz if you are > interested. > > What's new: > > . README updated (per Shridhar's suggestion) > . doesn't crash on NULL (p. Tom) > . version-1 calling conventions (p. Tom) > . isCachable (p. Tom) > . works for 7.2 (as delivered) and for 7.3 (make for73) > > Shridhar, you were concerned about signed/unsigned chars; looking at > the code I can not see how this is a problem. So no change in this > regard. Well, this is not related to postgresql exactly but to summerise the problem, with libc patch PHCO_19090 or compatible upwards, on HP-UX11, isspace does not work correctly if input value is >127. Can cause lot of problem for an external app. It works fine with unsigned char Does not make a difference from postgrersql point of view but would break non- english locale if they want to use this fix under some situation. But I agree, unless somebody reports it, no point fixing it and we know the fix anyway.. Bye Shridhar -- Live long and prosper. -- Spock, "Amok Time", stardate 3372.7
> Well, this is not related to postgresql exactly but to summerise the > problem, with libc patch PHCO_19090 or compatible upwards, on > HP-UX11, isspace does not work correctly if input value is >127. o isspace() and such are defined in the standards to operate on characters o for historic C reasons, 'char' is widened to 'int' in function calls o it is platform dependent whether 'char' is a signed or unsigned type If your platform has signed 'char' (as HP-UX does on PA-RISC) and you pass a value that is negative it will be sign extended when converted to 'int', and may be outside the range of values for which isspace() is defined. Portable code uses 'unsigned char' when using ctype.h features, even though for many platforms where 'char' is an unsigned type it's not necessary for correct functioning. I don't see any isspace() or similar in the code though, so I'm not sure why this issue is being raised? Regards, Giles
Giles Lean <giles@nemeton.com.au> writes: > Portable code uses 'unsigned char' when using ctype.h features, even > though for many platforms where 'char' is an unsigned type it's not > necessary for correct functioning. Yup. Awhile back I went through the PG sources and made sure we explicitly casted the arguments of ctype.h functions to "unsigned char" if they weren't already. If anyone sees a place I missed (or that snuck in later) please speak up! > I don't see any isspace() or similar in the code though, so I'm not > sure why this issue is being raised? Ditto, I saw no ctype.h usage in Manfred's code. It matters not whether you label strcmp's argument as unsigned... regards, tom lane
On 12 Oct 2002 at 8:54, Giles Lean wrote: > Portable code uses 'unsigned char' when using ctype.h features, even > though for many platforms where 'char' is an unsigned type it's not > necessary for correct functioning. > > I don't see any isspace() or similar in the code though, so I'm not > sure why this issue is being raised? Well, I commented on fixchar contrib module that it should use unsigned char rather than just char, to be on safer side on all platforms. Nothing much.. ByeShridhar -- brokee, n: Someone who buys stocks on the advice of a broker.