Thread: PostgreSQL and mySQL database size question
To all, Before I begin, this is NOT a PostgreSQL verses mySQL post. I have used both databases and have found both of these products to be excellent choices depending on the specific requirements for a given project. That said, I am trying to understand the database size differences between PostgreSQL and mySQL for a current project I am working on. I have attached the schemas for the PostgreSQL and mySQL databases below (minus three tables that are not important for this discussion and have no data in them). I created the database as defined in the schemas below in both PostgreSQL v7.1 (on Windows 2000 via cygwin) and mySQL 3.23.37 (on Windows 2000 native) and populated them both with the same data: PIN table with 1,740 tuples and the PINNDX table with 92,488 tuples. Looking at the number of files and size of the files, I get the following (after doing a vacuumdb with PostgreSQL): PostgreSQL Files 109 Size 60.7MB mySQL Files 24 (3 x 8 tables) Size 11.1MB Why is there such a discrepancy in the database sizes? I know that PostgreSQL has additional capabilities such as logging however, without being able to ascribe a function to each of the numbered files in the PostgreSQL database, it is unclear to me what is taking up so much space. Performance and capabilities (and the cost of disk space) aside, this is a major problem for me. If anyone can shed some light on this I would appreciate it. Thanks, Fred ---- PostgreSQL schema ---- CREATE SEQUENCE pins_seq; CREATE TABLE pins ( pinnum INTEGER NOT NULL DEFAULT nextval('pins_seq') PRIMARY KEY, modified TIMESTAMP NOT NULL DEFAULT now(), indexed CHAR(1) NOT NULL DEFAULT 'N', sindexed CHAR(1) NOT NULL DEFAULT 'N', pin TEXT NOT NULL ) ; CREATE INDEX indexed_ndx ON pins (indexed); CREATE INDEX sindexed_ndx ON pins (sindexed); CREATE SEQUENCE pinndx_seq; CREATE TABLE pinndx ( pinndxnum INTEGER NOT NULL DEFAULT nextval('pinndx_seq') PRIMARY KEY, parent INTEGER NOT NULL, tagpath INTEGER NOT NULL, tagtype CHAR(1) NOT NULL, tagname INTEGER NOT NULL, atrname INTEGER NOT NULL, pinnum INTEGER NOT NULL, nvalue FLOAT4, value TEXT ) ; CREATE INDEX parent_ndx ON pinndx (parent); CREATE INDEX tagpath_ndx ON pinndx (tagpath); CREATE INDEX tagname_ndx ON pinndx (tagname); CREATE INDEX atrname_ndx ON pinndx (atrname); CREATE INDEX pinnum_ndx ON pinndx (pinnum); CREATE INDEX nvalue_ndx ON pinndx (nvalue); CREATE INDEX value_ndx ON pinndx (value); CREATE SEQUENCE tagpath_seq; CREATE TABLE tagpathtbl ( vkey INTEGER NOT NULL DEFAULT nextval('tagpath_seq') PRIMARY KEY, value VARCHAR(255) NOT NULL ); CREATE INDEX tagpathtbl_ndx ON tagpathtbl (value); CREATE SEQUENCE tagname_seq; CREATE TABLE tagnametbl ( vkey INTEGER NOT NULL DEFAULT nextval('tagname_seq') PRIMARY KEY, value VARCHAR(255) NOT NULL ); CREATE INDEX tagnametbl_ndx ON tagnametbl (value); CREATE SEQUENCE atrname_seq; CREATE TABLE atrnametbl ( vkey INTEGER NOT NULL DEFAULT nextval('atrname_seq') PRIMARY KEY, value VARCHAR(255) NOT NULL ); CREATE INDEX atrnametbl_ndx ON atrnametbl (value); ---- mySQL schema ---- CREATE TABLE pins ( pinnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, modified DATETIME NOT NULL, indexed CHAR(1) NOT NULL DEFAULT 'N', INDEX indexedndx (indexed), sindexed CHAR(1) NOT NULL DEFAULT 'N', INDEX sindexedndx (sindexed), pin MEDIUMTEXT NOT NULL ) AVG_ROW_LENGTH=5000 MAX_ROWS=315360000 PACK_KEYS=1 ROW_FORMAT=compressed ; CREATE TABLE pinndx ( pinndxnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, parent INTEGER UNSIGNED NOT NULL, INDEX parentndx (parent), tagpath INTEGER UNSIGNED NOT NULL, INDEX tagpathndx (tagpath), tagtype CHAR(1) NOT NULL, tagname INTEGER UNSIGNED NOT NULL, INDEX tagnamendx (tagname), atrname INTEGER UNSIGNED NOT NULL, INDEX atrnamendx (atrname), pinnum INTEGER UNSIGNED NOT NULL, INDEX pinnumndx (pinnum), nvalue DOUBLE, INDEX nvaluendx (nvalue), value MEDIUMTEXT, INDEX valuendx (value(30)) ) AVG_ROW_LENGTH=500 MAX_ROWS=3153600000 PACK_KEYS=1 ; CREATE TABLE tagpathtbl ( vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, value VARCHAR(255) NOT NULL, INDEX tagpathndx (value(30)) ); CREATE TABLE tagnametbl ( vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, value VARCHAR(255) NOT NULL, INDEX tagnamendx (value(30)) ); CREATE TABLE atrnametbl ( vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, value VARCHAR(255) NOT NULL, INDEX atrnamendx (value(30)) );
Bruce, I have your book right in front of me... do you have that information in the book? If not, I will take a look at the FAQ. As for mySQL, here are the storage requirements: Storage requirements for numeric types Column type Storage required TINYINT 1 byte SMALLINT 2 bytes MEDIUMINT 3 bytes INT 4 bytes INTEGER 4 bytes BIGINT 8 bytes FLOAT(X) 4 if X <= 24 or 8 if 25 <= X <= 53 FLOAT 4 bytes DOUBLE 8 bytes DOUBLE PRECISION 8 bytes REAL 8 bytes DECIMAL(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) NUMERIC(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) Storage requirements for date and time types Column type Storage required DATE 3 bytes DATETIME 8 bytes TIMESTAMP 4 bytes TIME 3 bytes YEAR 1 byte Storage requirements for string types Column type Storage required CHAR(M) M bytes, 1 <= M <= 255 VARCHAR(M) L+1 bytes, where L <= M and 1 <= M <= 255 TINYBLOB, TINYTEXT L+1 bytes, where L < 2^8 BLOB, TEXT L+2 bytes, where L < 2^16 MEDIUMBLOB, MEDIUMTEXT L+3 bytes, where L < 2^24 LONGBLOB, LONGTEXT L+4 bytes, where L < 2^32 ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of enumeration values (65535 values maximum) SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of set members (64 members maximum) Fred -----Original Message----- From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] Sent: Friday, April 27, 2001 1:48 PM To: ingham@erols.com Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] PostgreSQL and mySQL database size question Did you see the FAQ items on estimating database sizes? Does MySQL have less overhead per row?
Did you see the FAQ items on estimating database sizes? Does MySQL have less overhead per row? [ Charset ISO-8859-1 unsupported, converting... ] > To all, > > Before I begin, this is NOT a PostgreSQL verses mySQL post. I have used > both databases and have found both of these products to be excellent choices > depending on the specific requirements for a given project. That said, I > am trying to understand the database size differences between PostgreSQL and > mySQL for a current project I am working on. I have attached the schemas > for the PostgreSQL and mySQL databases below (minus three tables that are > not important for this discussion and have no data in them). > > I created the database as defined in the schemas below in both PostgreSQL > v7.1 (on Windows 2000 via cygwin) and mySQL 3.23.37 (on Windows 2000 native) > and populated them both with the same data: PIN table with 1,740 tuples and > the PINNDX table with 92,488 tuples. Looking at the number of files and > size of the files, I get the following (after doing a vacuumdb with > PostgreSQL): > > PostgreSQL > Files 109 > Size 60.7MB > > mySQL > Files 24 (3 x 8 tables) > Size 11.1MB > > Why is there such a discrepancy in the database sizes? I know that > PostgreSQL has additional capabilities such as logging however, without > being able to ascribe a function to each of the numbered files in the > PostgreSQL database, it is unclear to me what is taking up so much space. > Performance and capabilities (and the cost of disk space) aside, this is a > major problem for me. If anyone can shed some light on this I would > appreciate it. > > Thanks, Fred > > ---- PostgreSQL schema ---- > > CREATE SEQUENCE pins_seq; > > CREATE TABLE pins ( > pinnum INTEGER NOT NULL DEFAULT nextval('pins_seq') PRIMARY KEY, > modified TIMESTAMP NOT NULL DEFAULT now(), > indexed CHAR(1) NOT NULL DEFAULT 'N', > sindexed CHAR(1) NOT NULL DEFAULT 'N', > pin TEXT NOT NULL > ) > ; > > CREATE INDEX indexed_ndx ON pins (indexed); > CREATE INDEX sindexed_ndx ON pins (sindexed); > > CREATE SEQUENCE pinndx_seq; > > CREATE TABLE pinndx ( > pinndxnum INTEGER NOT NULL DEFAULT nextval('pinndx_seq') PRIMARY KEY, > parent INTEGER NOT NULL, > tagpath INTEGER NOT NULL, > tagtype CHAR(1) NOT NULL, > tagname INTEGER NOT NULL, > atrname INTEGER NOT NULL, > pinnum INTEGER NOT NULL, > nvalue FLOAT4, > value TEXT > ) > ; > > CREATE INDEX parent_ndx ON pinndx (parent); > CREATE INDEX tagpath_ndx ON pinndx (tagpath); > CREATE INDEX tagname_ndx ON pinndx (tagname); > CREATE INDEX atrname_ndx ON pinndx (atrname); > CREATE INDEX pinnum_ndx ON pinndx (pinnum); > CREATE INDEX nvalue_ndx ON pinndx (nvalue); > CREATE INDEX value_ndx ON pinndx (value); > > CREATE SEQUENCE tagpath_seq; > > CREATE TABLE tagpathtbl ( > vkey INTEGER NOT NULL DEFAULT nextval('tagpath_seq') PRIMARY KEY, > value VARCHAR(255) NOT NULL > ); > > CREATE INDEX tagpathtbl_ndx ON tagpathtbl (value); > > CREATE SEQUENCE tagname_seq; > > CREATE TABLE tagnametbl ( > vkey INTEGER NOT NULL DEFAULT nextval('tagname_seq') PRIMARY KEY, > value VARCHAR(255) NOT NULL > ); > > CREATE INDEX tagnametbl_ndx ON tagnametbl (value); > > CREATE SEQUENCE atrname_seq; > > CREATE TABLE atrnametbl ( > vkey INTEGER NOT NULL DEFAULT nextval('atrname_seq') PRIMARY KEY, > value VARCHAR(255) NOT NULL > ); > > CREATE INDEX atrnametbl_ndx ON atrnametbl (value); > > ---- mySQL schema ---- > > CREATE TABLE pins ( > pinnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, > modified DATETIME NOT NULL, > indexed CHAR(1) NOT NULL DEFAULT 'N', > INDEX indexedndx (indexed), > sindexed CHAR(1) NOT NULL DEFAULT 'N', > INDEX sindexedndx (sindexed), > pin MEDIUMTEXT NOT NULL > ) > AVG_ROW_LENGTH=5000 > MAX_ROWS=315360000 > PACK_KEYS=1 > ROW_FORMAT=compressed > ; > > CREATE TABLE pinndx ( > pinndxnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, > parent INTEGER UNSIGNED NOT NULL, > INDEX parentndx (parent), > tagpath INTEGER UNSIGNED NOT NULL, > INDEX tagpathndx (tagpath), > tagtype CHAR(1) NOT NULL, > tagname INTEGER UNSIGNED NOT NULL, > INDEX tagnamendx (tagname), > atrname INTEGER UNSIGNED NOT NULL, > INDEX atrnamendx (atrname), > pinnum INTEGER UNSIGNED NOT NULL, > INDEX pinnumndx (pinnum), > nvalue DOUBLE, > INDEX nvaluendx (nvalue), > value MEDIUMTEXT, > INDEX valuendx (value(30)) > ) > AVG_ROW_LENGTH=500 > MAX_ROWS=3153600000 > PACK_KEYS=1 > ; > > CREATE TABLE tagpathtbl ( > vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, > value VARCHAR(255) NOT NULL, > INDEX tagpathndx (value(30)) > ); > > CREATE TABLE tagnametbl ( > vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, > value VARCHAR(255) NOT NULL, > INDEX tagnamendx (value(30)) > ); > > CREATE TABLE atrnametbl ( > vkey INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, > value VARCHAR(255) NOT NULL, > INDEX atrnamendx (value(30)) > ); > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
FAQ item 4.7 discusses table size computations. My guess is that it is the 36 bytes-per-row that is the problem. Those bytes record the visibility of the row for proper transactions semantics and MVCC. > Bruce, > > I have your book right in front of me... do you have that information in the > book? If not, I will take a look at the FAQ. As for mySQL, here are the > storage requirements: > > Storage requirements for numeric types > > Column type Storage required > TINYINT 1 byte > SMALLINT 2 bytes > MEDIUMINT 3 bytes > INT 4 bytes > INTEGER 4 bytes > BIGINT 8 bytes > FLOAT(X) 4 if X <= 24 or 8 if 25 <= X <= 53 > FLOAT 4 bytes > DOUBLE 8 bytes > DOUBLE PRECISION 8 bytes > REAL 8 bytes > DECIMAL(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) > NUMERIC(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) > > Storage requirements for date and time types > > Column type Storage required > DATE 3 bytes > DATETIME 8 bytes > TIMESTAMP 4 bytes > TIME 3 bytes > YEAR 1 byte > > Storage requirements for string types > > Column type Storage required > > CHAR(M) M bytes, 1 <= M <= 255 > VARCHAR(M) L+1 bytes, where L <= M and 1 <= M <= 255 > TINYBLOB, > TINYTEXT L+1 bytes, where L < 2^8 > BLOB, TEXT L+2 bytes, where L < 2^16 > MEDIUMBLOB, > MEDIUMTEXT L+3 bytes, where L < 2^24 > LONGBLOB, > LONGTEXT L+4 bytes, where L < 2^32 > ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of > enumeration values (65535 values maximum) > SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of > set members (64 members maximum) > > Fred > > -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Friday, April 27, 2001 1:48 PM > To: ingham@erols.com > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL and mySQL database size question > > > Did you see the FAQ items on estimating database sizes? Does MySQL have > less overhead per row? > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
OK, but that only accounts for 3.2MB of the extra 49.6MB used by PostgreSQL. -----Original Message----- From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] Sent: Friday, April 27, 2001 2:05 PM To: ingham@erols.com Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] PostgreSQL and mySQL database size question FAQ item 4.7 discusses table size computations. My guess is that it is the 36 bytes-per-row that is the problem. Those bytes record the visibility of the row for proper transactions semantics and MVCC. > Bruce, > > I have your book right in front of me... do you have that information in the > book? If not, I will take a look at the FAQ. As for mySQL, here are the > storage requirements: > > Storage requirements for numeric types > > Column type Storage required > TINYINT 1 byte > SMALLINT 2 bytes > MEDIUMINT 3 bytes > INT 4 bytes > INTEGER 4 bytes > BIGINT 8 bytes > FLOAT(X) 4 if X <= 24 or 8 if 25 <= X <= 53 > FLOAT 4 bytes > DOUBLE 8 bytes > DOUBLE PRECISION 8 bytes > REAL 8 bytes > DECIMAL(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) > NUMERIC(M,D) M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D) > > Storage requirements for date and time types > > Column type Storage required > DATE 3 bytes > DATETIME 8 bytes > TIMESTAMP 4 bytes > TIME 3 bytes > YEAR 1 byte > > Storage requirements for string types > > Column type Storage required > > CHAR(M) M bytes, 1 <= M <= 255 > VARCHAR(M) L+1 bytes, where L <= M and 1 <= M <= 255 > TINYBLOB, > TINYTEXT L+1 bytes, where L < 2^8 > BLOB, TEXT L+2 bytes, where L < 2^16 > MEDIUMBLOB, > MEDIUMTEXT L+3 bytes, where L < 2^24 > LONGBLOB, > LONGTEXT L+4 bytes, where L < 2^32 > ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of > enumeration values (65535 values maximum) > SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of > set members (64 members maximum) > > Fred > > -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Friday, April 27, 2001 1:48 PM > To: ingham@erols.com > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL and mySQL database size question > > > Did you see the FAQ items on estimating database sizes? Does MySQL have > less overhead per row? > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Fred Ingham wrote: > OK, but that only accounts for 3.2MB of the extra 49.6MB used by PostgreSQL. Template databases, system catalog, WAL files? > > -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Friday, April 27, 2001 2:05 PM > To: ingham@erols.com > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL and mySQL database size question > > > > FAQ item 4.7 discusses table size computations. My guess is that it is > the 36 bytes-per-row that is the problem. Those bytes record the > visibility of the row for proper transactions semantics and MVCC. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
"Fred Ingham" <ingham@erols.com> writes: > and populated them both with the same data: PIN table with 1,740 tuples and > the PINNDX table with 92,488 tuples. Looking at the number of files and > size of the files, I get the following (after doing a vacuumdb with > PostgreSQL): > PostgreSQL > Files 109 > Size 60.7MB That seems excessive to me too --- but it's fair to ask exactly what you were counting; does that include system catalogs, or just these two tables and their indexes? It would help to see an ls -lR of the $PGDATA directory and a map of the table numbers; you can make the latter with select relfilenode, relname from pg_class order by relfilenode; regards, tom lane
Hi, Perhaps this is a stupid question but I wonder why I could not put a CREATE USER inside a transaction block? Regards, jesus.
Jesus Aneiros writes: > Perhaps this is a stupid question but I wonder why I could not put a > CREATE USER inside a transaction block? Because CREATE USER may have to update a flat-text file. If you roll back the surrounding transaction the changes to the flat-text file will not be reverted and you may get weird authentication failures. This system is admittedly a bit silly but the alternatives look like too much work for little benefit. I think in 7.1 you can run CREATE USER in transaction blocks, you just have to be careful. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Tom, Thank very much for the query, it was extremely helpful. Using your query, here is what I found for one of the tables: File Size (KB) mySQL pinndx.MYD 3,947 pinndx.MYI 4,144 8,091 PostgreSQL pinndex_seq 8 pinndx 7,856 pinndx_pkey 6,984 parent_ndx 6,952 tagpath_ndx 5,552 tagname_ndx 5,560 atrname_ndx 5,696 pinnum_ndx 6,160 nvalue_ndx 5,832 value_ndx 6,424 57,024 Based on this information, I conclude that PostgreSQL is using significantly more space for its indexes than mySQL (the .MYI file contains all of the indexes on the pinndx table in mySQL). That may also explain another item that has been troubling me, the difference in insert/update speed between PostgreSQL and mySQL. It seems to take PostgreSQL significantly more time to insert into its pinndx table than mySQL... perhaps this is due to the i/o of writing to all of the index files? In any case, is there anyway to reduce the size of the PostgreSQL index tables (are there different indexing schemes)? Fred -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, April 30, 2001 1:40 AM To: ingham@erols.com Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] PostgreSQL and mySQL database size question "Fred Ingham" <ingham@erols.com> writes: > and populated them both with the same data: PIN table with 1,740 tuples and > the PINNDX table with 92,488 tuples. Looking at the number of files and > size of the files, I get the following (after doing a vacuumdb with > PostgreSQL): > PostgreSQL > Files 109 > Size 60.7MB That seems excessive to me too --- but it's fair to ask exactly what you were counting; does that include system catalogs, or just these two tables and their indexes? It would help to see an ls -lR of the $PGDATA directory and a map of the table numbers; you can make the latter with select relfilenode, relname from pg_class order by relfilenode; regards, tom lane
"Fred Ingham" <ingham@erols.com> writes: > PostgreSQL pinndex_seq 8 > pinndx 7,856 > pinndx_pkey 6,984 > parent_ndx 6,952 > tagpath_ndx 5,552 > tagname_ndx 5,560 > atrname_ndx 5,696 > pinnum_ndx 6,160 > nvalue_ndx 5,832 > value_ndx 6,424 > 57,024 Hm. All but value_ndx are indexes on integer columns, so the keys are only 4 bytes. The index tuple overhead will be either 12 or 16 bytes per entry depending on whether your hardware has any datatypes that require 8-byte alignment (I think not on PCs --- what is MAXIMUM_ALIGNOF in your src/include/config.h?). 16 bytes times 92000 entries is only about a meg and a half; even allowing for the traditional 70% fill factor of btrees, you shouldn't see more than a couple meg per index. That assumes random loading of the index, however, and I think there may be pathological cases where the indexes come out less dense after initial load. Was there any particular order to the data values when you imported them? If you drop any of these indexes and CREATE it again, is the result noticeably smaller? > Based on this information, I conclude that PostgreSQL is using significantly > more space for its indexes than mySQL (the .MYI file contains all of the > indexes on the pinndx table in mySQL). I find it hard to believe that MySQL is storing nine indexes on a 92000-entry table in only 4Mb. Storing the keys alone would take 3.3Mb, never mind making the keys point to anything. Are you sure that you've accounted for all of their index storage? > In any case, is there anyway to reduce the size of the PostgreSQL > index tables (are there different indexing schemes)? Do you actually *need* an index on each column? It seems highly unlikely that each one of these indexes will pay its keep. regards, tom lane
Tom, Again, thank you for your quick and knowledgeable assistance. After dropping and recreating the indexes I have (in KB): old new PostgreSQL pinndex_seq 8 8 pinndx 7,856 7,856 pinndx_pkey 6,984 6,984 (did not recreate) parent_ndx 6,952 2,040 tagpath_ndx 5,552 2,040 tagname_ndx 5,560 2,040 atrname_ndx 5,696 2,040 pinnum_ndx 6,160 2,040 nvalue_ndx 5,832 2,040 value_ndx 6,424 2,432 57,024 29,520 So, it appears that I am running into the pathological case. Most of the index values are NOT random, the primary key will be ever increasing (via the sequence), the parent, tagpath, tagname, atrname, and pinnum colums contain many repeated values, only the nvalue and value columns are truly random. I do not import or bulk load the values, they are sent in via JDBC in an asynchronous manner (via a server socket). It would be very difficult (not impossible) for me to periodically drop and recreate the indexes (although, I must say it was very fast doing so). I do in fact need all of the indexes. Without going into detail, the application is 'indexing' millions of XML documents, the tagname, tagpath, atrname columns are for the XML document's element name, the path leading up to the element, and the element's attributes, the parent column associates an element with its parent, etc. In short, they are all needed for acceptable performance querying and retrieving values from the database. With respect to mySQL, I did verify that mySQL did in fact have all of the indexes I created and that they were saved in a single file (the *.MYI). Here is the result of the 'show index' command in mySQL. mysql> show index from pinndx; +--------+------------+------------+--------------+-------------+----------- +-------------+----------+--------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Comment | +--------+------------+------------+--------------+-------------+----------- +-------------+----------+--------+---------+ | pinndx | 0 | PRIMARY | 1 | pinndxnum | A | 92488 | NULL | NULL | | | pinndx | 1 | parentndx | 1 | parent | A | NULL | NULL | NULL | | | pinndx | 1 | tagpathndx | 1 | tagpath | A | NULL | NULL | NULL | | | pinndx | 1 | tagnamendx | 1 | tagname | A | NULL | NULL | NULL | | | pinndx | 1 | atrnamendx | 1 | atrname | A | NULL | NULL | NULL | | | pinndx | 1 | pinnumndx | 1 | pinnum | A | NULL | NULL | NULL | | | pinndx | 1 | nvaluendx | 1 | nvalue | A | NULL | NULL | NULL | | | pinndx | 1 | valuendx | 1 | value | A | NULL | 30 | NULL | | +--------+------------+------------+--------------+-------------+----------- +-------------+----------+--------+---------+ 8 rows in set (0.07 sec) Fred
"Fred Ingham" <ingham@erols.com> writes: > After dropping and recreating the indexes I have (in KB): Ah, that's more like it. > So, it appears that I am running into the pathological case. Actually, a simpler explanation would be if you'd done a bunch of updates and/or deletes at some point. VACUUM currently never shrinks indexes, so the excess index space might just represent the peak size of your tables. (Yes, improving that is on the to-do list.) > In short, they are all needed for acceptable performance querying and > retrieving values from the database. You've got common queries that do selects with WHERE clauses referencing each one of these columns? Possible, certainly, but seems a bit odd... > With respect to mySQL, I did verify that mySQL did in fact have all of the > indexes I created and that they were saved in a single file (the *.MYI). Hm. Anyone know how mySQL represents indexes? That still seems awfully small. regards, tom lane