Thread: PostgreSQL and mySQL database size question

PostgreSQL and mySQL database size question

From
"Fred Ingham"
Date:
To all,

Before I begin, this is NOT a PostgreSQL verses mySQL post.  I have used
both databases and have found both of these products to be excellent choices
depending on the specific requirements for a  given project.   That said, I
am trying to understand the database size differences between PostgreSQL and
mySQL for a current project I am working on.  I have attached the schemas
for the PostgreSQL and mySQL databases below (minus three tables that are
not important for this discussion and have no data in them).

I created the database as defined in the schemas below in both PostgreSQL
v7.1 (on Windows 2000 via cygwin) and mySQL 3.23.37 (on Windows 2000 native)
and populated them both with the same data: PIN table with 1,740 tuples and
the PINNDX table with 92,488 tuples.  Looking at the number of files and
size of the files, I get the following (after doing a vacuumdb with
PostgreSQL):

PostgreSQL
  Files  109
  Size   60.7MB

mySQL
  Files  24 (3 x 8 tables)
  Size   11.1MB

Why is there such a discrepancy in the database sizes?  I know that
PostgreSQL has additional capabilities such as logging however, without
being able to ascribe a function to each of the numbered files in the
PostgreSQL database, it is unclear to me what is taking up so much space.
Performance and capabilities (and the cost of disk space) aside, this is a
major problem for me.  If anyone can shed some light on this I would
appreciate it.

Thanks, Fred

---- PostgreSQL schema ----

CREATE SEQUENCE pins_seq;

CREATE TABLE pins (
  pinnum    INTEGER   NOT NULL DEFAULT nextval('pins_seq') PRIMARY KEY,
  modified  TIMESTAMP NOT NULL DEFAULT now(),
  indexed   CHAR(1)   NOT NULL DEFAULT 'N',
  sindexed  CHAR(1)   NOT NULL DEFAULT 'N',
  pin       TEXT      NOT NULL
)
;

CREATE INDEX indexed_ndx  ON pins (indexed);
CREATE INDEX sindexed_ndx ON pins (sindexed);

CREATE SEQUENCE pinndx_seq;

CREATE TABLE pinndx (
  pinndxnum INTEGER NOT NULL DEFAULT nextval('pinndx_seq') PRIMARY KEY,
  parent    INTEGER NOT NULL,
  tagpath   INTEGER NOT NULL,
  tagtype   CHAR(1)          NOT NULL,
  tagname   INTEGER NOT NULL,
  atrname   INTEGER NOT NULL,
  pinnum    INTEGER NOT NULL,
  nvalue    FLOAT4,
  value     TEXT
)
;

CREATE INDEX parent_ndx  ON pinndx (parent);
CREATE INDEX tagpath_ndx ON pinndx (tagpath);
CREATE INDEX tagname_ndx ON pinndx (tagname);
CREATE INDEX atrname_ndx ON pinndx (atrname);
CREATE INDEX pinnum_ndx  ON pinndx (pinnum);
CREATE INDEX nvalue_ndx  ON pinndx (nvalue);
CREATE INDEX value_ndx   ON pinndx (value);

CREATE SEQUENCE tagpath_seq;

CREATE TABLE tagpathtbl (
  vkey   INTEGER NOT NULL DEFAULT nextval('tagpath_seq') PRIMARY KEY,
  value VARCHAR(255) NOT NULL
);

CREATE INDEX tagpathtbl_ndx ON tagpathtbl (value);

CREATE SEQUENCE tagname_seq;

CREATE TABLE tagnametbl (
  vkey  INTEGER NOT NULL DEFAULT nextval('tagname_seq') PRIMARY KEY,
  value VARCHAR(255) NOT NULL
);

CREATE INDEX tagnametbl_ndx ON tagnametbl (value);

CREATE SEQUENCE atrname_seq;

CREATE TABLE atrnametbl (
  vkey  INTEGER NOT NULL DEFAULT nextval('atrname_seq') PRIMARY KEY,
  value VARCHAR(255) NOT NULL
);

CREATE INDEX atrnametbl_ndx ON atrnametbl (value);

---- mySQL schema ----

CREATE TABLE pins (
  pinnum    INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  modified  DATETIME         NOT NULL,
  indexed   CHAR(1)          NOT NULL DEFAULT 'N',
  INDEX indexedndx (indexed),
  sindexed  CHAR(1)          NOT NULL DEFAULT 'N',
  INDEX sindexedndx (sindexed),
  pin       MEDIUMTEXT       NOT NULL
)
AVG_ROW_LENGTH=5000
MAX_ROWS=315360000
PACK_KEYS=1
ROW_FORMAT=compressed
;

CREATE TABLE pinndx (
  pinndxnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  parent    INTEGER UNSIGNED NOT NULL,
  INDEX parentndx (parent),
  tagpath   INTEGER UNSIGNED NOT NULL,
  INDEX tagpathndx (tagpath),
  tagtype   CHAR(1)          NOT NULL,
  tagname   INTEGER UNSIGNED NOT NULL,
  INDEX tagnamendx (tagname),
  atrname   INTEGER UNSIGNED NOT NULL,
  INDEX atrnamendx (atrname),
  pinnum    INTEGER UNSIGNED NOT NULL,
  INDEX pinnumndx (pinnum),
  nvalue    DOUBLE,
  INDEX nvaluendx (nvalue),
  value     MEDIUMTEXT,
  INDEX valuendx (value(30))
)
AVG_ROW_LENGTH=500
MAX_ROWS=3153600000
PACK_KEYS=1
;

CREATE TABLE tagpathtbl (
  vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  value VARCHAR(255) NOT NULL,
  INDEX tagpathndx (value(30))
);

CREATE TABLE tagnametbl (
  vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  value VARCHAR(255) NOT NULL,
  INDEX tagnamendx (value(30))
);

CREATE TABLE atrnametbl (
  vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  value VARCHAR(255) NOT NULL,
  INDEX atrnamendx (value(30))
);


RE: PostgreSQL and mySQL database size question

From
"Fred Ingham"
Date:
Bruce,

I have your book right in front of me... do you have that information in the
book?  If not, I will take a look at the FAQ.  As for mySQL, here are the
storage requirements:

Storage requirements for numeric types

Column type     Storage required
TINYINT        1 byte
SMALLINT        2 bytes
MEDIUMINT        3 bytes
INT            4 bytes
INTEGER        4 bytes
BIGINT        8 bytes
FLOAT(X)        4 if X <= 24 or 8 if 25 <= X <= 53
FLOAT            4 bytes
DOUBLE        8 bytes
DOUBLE PRECISION    8 bytes
REAL            8 bytes
DECIMAL(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)
NUMERIC(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)

Storage requirements for date and time types

Column type        Storage required
DATE            3 bytes
DATETIME        8 bytes
TIMESTAMP        4 bytes
TIME            3 bytes
YEAR            1 byte

Storage requirements for string types

Column type        Storage required

CHAR(M)        M bytes, 1 <= M <= 255
VARCHAR(M)        L+1 bytes, where L <= M and 1 <= M <= 255
TINYBLOB,
TINYTEXT        L+1 bytes, where L < 2^8
BLOB, TEXT        L+2 bytes, where L < 2^16
MEDIUMBLOB,
MEDIUMTEXT        L+3 bytes, where L < 2^24
LONGBLOB,
LONGTEXT        L+4 bytes, where L < 2^32
ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of
enumeration values (65535 values maximum)
SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of
set members (64 members maximum)

Fred

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Friday, April 27, 2001 1:48 PM
To: ingham@erols.com
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL and mySQL database size question


Did you see the FAQ items on estimating database sizes?  Does MySQL have
less overhead per row?


Re: PostgreSQL and mySQL database size question

From
Bruce Momjian
Date:
Did you see the FAQ items on estimating database sizes?  Does MySQL have
less overhead per row?

[ Charset ISO-8859-1 unsupported, converting... ]
> To all,
>
> Before I begin, this is NOT a PostgreSQL verses mySQL post.  I have used
> both databases and have found both of these products to be excellent choices
> depending on the specific requirements for a  given project.   That said, I
> am trying to understand the database size differences between PostgreSQL and
> mySQL for a current project I am working on.  I have attached the schemas
> for the PostgreSQL and mySQL databases below (minus three tables that are
> not important for this discussion and have no data in them).
>
> I created the database as defined in the schemas below in both PostgreSQL
> v7.1 (on Windows 2000 via cygwin) and mySQL 3.23.37 (on Windows 2000 native)
> and populated them both with the same data: PIN table with 1,740 tuples and
> the PINNDX table with 92,488 tuples.  Looking at the number of files and
> size of the files, I get the following (after doing a vacuumdb with
> PostgreSQL):
>
> PostgreSQL
>   Files  109
>   Size   60.7MB
>
> mySQL
>   Files  24 (3 x 8 tables)
>   Size   11.1MB
>
> Why is there such a discrepancy in the database sizes?  I know that
> PostgreSQL has additional capabilities such as logging however, without
> being able to ascribe a function to each of the numbered files in the
> PostgreSQL database, it is unclear to me what is taking up so much space.
> Performance and capabilities (and the cost of disk space) aside, this is a
> major problem for me.  If anyone can shed some light on this I would
> appreciate it.
>
> Thanks, Fred
>
> ---- PostgreSQL schema ----
>
> CREATE SEQUENCE pins_seq;
>
> CREATE TABLE pins (
>   pinnum    INTEGER   NOT NULL DEFAULT nextval('pins_seq') PRIMARY KEY,
>   modified  TIMESTAMP NOT NULL DEFAULT now(),
>   indexed   CHAR(1)   NOT NULL DEFAULT 'N',
>   sindexed  CHAR(1)   NOT NULL DEFAULT 'N',
>   pin       TEXT      NOT NULL
> )
> ;
>
> CREATE INDEX indexed_ndx  ON pins (indexed);
> CREATE INDEX sindexed_ndx ON pins (sindexed);
>
> CREATE SEQUENCE pinndx_seq;
>
> CREATE TABLE pinndx (
>   pinndxnum INTEGER NOT NULL DEFAULT nextval('pinndx_seq') PRIMARY KEY,
>   parent    INTEGER NOT NULL,
>   tagpath   INTEGER NOT NULL,
>   tagtype   CHAR(1)          NOT NULL,
>   tagname   INTEGER NOT NULL,
>   atrname   INTEGER NOT NULL,
>   pinnum    INTEGER NOT NULL,
>   nvalue    FLOAT4,
>   value     TEXT
> )
> ;
>
> CREATE INDEX parent_ndx  ON pinndx (parent);
> CREATE INDEX tagpath_ndx ON pinndx (tagpath);
> CREATE INDEX tagname_ndx ON pinndx (tagname);
> CREATE INDEX atrname_ndx ON pinndx (atrname);
> CREATE INDEX pinnum_ndx  ON pinndx (pinnum);
> CREATE INDEX nvalue_ndx  ON pinndx (nvalue);
> CREATE INDEX value_ndx   ON pinndx (value);
>
> CREATE SEQUENCE tagpath_seq;
>
> CREATE TABLE tagpathtbl (
>   vkey   INTEGER NOT NULL DEFAULT nextval('tagpath_seq') PRIMARY KEY,
>   value VARCHAR(255) NOT NULL
> );
>
> CREATE INDEX tagpathtbl_ndx ON tagpathtbl (value);
>
> CREATE SEQUENCE tagname_seq;
>
> CREATE TABLE tagnametbl (
>   vkey  INTEGER NOT NULL DEFAULT nextval('tagname_seq') PRIMARY KEY,
>   value VARCHAR(255) NOT NULL
> );
>
> CREATE INDEX tagnametbl_ndx ON tagnametbl (value);
>
> CREATE SEQUENCE atrname_seq;
>
> CREATE TABLE atrnametbl (
>   vkey  INTEGER NOT NULL DEFAULT nextval('atrname_seq') PRIMARY KEY,
>   value VARCHAR(255) NOT NULL
> );
>
> CREATE INDEX atrnametbl_ndx ON atrnametbl (value);
>
> ---- mySQL schema ----
>
> CREATE TABLE pins (
>   pinnum    INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   modified  DATETIME         NOT NULL,
>   indexed   CHAR(1)          NOT NULL DEFAULT 'N',
>   INDEX indexedndx (indexed),
>   sindexed  CHAR(1)          NOT NULL DEFAULT 'N',
>   INDEX sindexedndx (sindexed),
>   pin       MEDIUMTEXT       NOT NULL
> )
> AVG_ROW_LENGTH=5000
> MAX_ROWS=315360000
> PACK_KEYS=1
> ROW_FORMAT=compressed
> ;
>
> CREATE TABLE pinndx (
>   pinndxnum INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   parent    INTEGER UNSIGNED NOT NULL,
>   INDEX parentndx (parent),
>   tagpath   INTEGER UNSIGNED NOT NULL,
>   INDEX tagpathndx (tagpath),
>   tagtype   CHAR(1)          NOT NULL,
>   tagname   INTEGER UNSIGNED NOT NULL,
>   INDEX tagnamendx (tagname),
>   atrname   INTEGER UNSIGNED NOT NULL,
>   INDEX atrnamendx (atrname),
>   pinnum    INTEGER UNSIGNED NOT NULL,
>   INDEX pinnumndx (pinnum),
>   nvalue    DOUBLE,
>   INDEX nvaluendx (nvalue),
>   value     MEDIUMTEXT,
>   INDEX valuendx (value(30))
> )
> AVG_ROW_LENGTH=500
> MAX_ROWS=3153600000
> PACK_KEYS=1
> ;
>
> CREATE TABLE tagpathtbl (
>   vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   value VARCHAR(255) NOT NULL,
>   INDEX tagpathndx (value(30))
> );
>
> CREATE TABLE tagnametbl (
>   vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   value VARCHAR(255) NOT NULL,
>   INDEX tagnamendx (value(30))
> );
>
> CREATE TABLE atrnametbl (
>   vkey   INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
>   value VARCHAR(255) NOT NULL,
>   INDEX atrnamendx (value(30))
> );
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: PostgreSQL and mySQL database size question

From
Bruce Momjian
Date:
FAQ item 4.7 discusses table size computations.   My guess is that it is
the 36 bytes-per-row that is the problem.  Those bytes record the
visibility of the row for proper transactions semantics and MVCC.


> Bruce,
>
> I have your book right in front of me... do you have that information in the
> book?  If not, I will take a look at the FAQ.  As for mySQL, here are the
> storage requirements:
>
> Storage requirements for numeric types
>
> Column type     Storage required
> TINYINT        1 byte
> SMALLINT        2 bytes
> MEDIUMINT        3 bytes
> INT            4 bytes
> INTEGER        4 bytes
> BIGINT        8 bytes
> FLOAT(X)        4 if X <= 24 or 8 if 25 <= X <= 53
> FLOAT            4 bytes
> DOUBLE        8 bytes
> DOUBLE PRECISION    8 bytes
> REAL            8 bytes
> DECIMAL(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)
> NUMERIC(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)
>
> Storage requirements for date and time types
>
> Column type        Storage required
> DATE            3 bytes
> DATETIME        8 bytes
> TIMESTAMP        4 bytes
> TIME            3 bytes
> YEAR            1 byte
>
> Storage requirements for string types
>
> Column type        Storage required
>
> CHAR(M)        M bytes, 1 <= M <= 255
> VARCHAR(M)        L+1 bytes, where L <= M and 1 <= M <= 255
> TINYBLOB,
> TINYTEXT        L+1 bytes, where L < 2^8
> BLOB, TEXT        L+2 bytes, where L < 2^16
> MEDIUMBLOB,
> MEDIUMTEXT        L+3 bytes, where L < 2^24
> LONGBLOB,
> LONGTEXT        L+4 bytes, where L < 2^32
> ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of
> enumeration values (65535 values maximum)
> SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number of
> set members (64 members maximum)
>
> Fred
>
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, April 27, 2001 1:48 PM
> To: ingham@erols.com
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] PostgreSQL and mySQL database size question
>
>
> Did you see the FAQ items on estimating database sizes?  Does MySQL have
> less overhead per row?
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

RE: PostgreSQL and mySQL database size question

From
"Fred Ingham"
Date:
OK, but that only accounts for 3.2MB of the extra 49.6MB used by PostgreSQL.

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Friday, April 27, 2001 2:05 PM
To: ingham@erols.com
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL and mySQL database size question



FAQ item 4.7 discusses table size computations.   My guess is that it is
the 36 bytes-per-row that is the problem.  Those bytes record the
visibility of the row for proper transactions semantics and MVCC.


> Bruce,
>
> I have your book right in front of me... do you have that information in
the
> book?  If not, I will take a look at the FAQ.  As for mySQL, here are the
> storage requirements:
>
> Storage requirements for numeric types
>
> Column type     Storage required
> TINYINT        1 byte
> SMALLINT        2 bytes
> MEDIUMINT        3 bytes
> INT            4 bytes
> INTEGER        4 bytes
> BIGINT        8 bytes
> FLOAT(X)        4 if X <= 24 or 8 if 25 <= X <= 53
> FLOAT            4 bytes
> DOUBLE        8 bytes
> DOUBLE PRECISION    8 bytes
> REAL            8 bytes
> DECIMAL(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)
> NUMERIC(M,D)    M+2 bytes if D > 0, M+1 bytes if D = 0 (D+2, if M < D)
>
> Storage requirements for date and time types
>
> Column type        Storage required
> DATE            3 bytes
> DATETIME        8 bytes
> TIMESTAMP        4 bytes
> TIME            3 bytes
> YEAR            1 byte
>
> Storage requirements for string types
>
> Column type        Storage required
>
> CHAR(M)        M bytes, 1 <= M <= 255
> VARCHAR(M)        L+1 bytes, where L <= M and 1 <= M <= 255
> TINYBLOB,
> TINYTEXT        L+1 bytes, where L < 2^8
> BLOB, TEXT        L+2 bytes, where L < 2^16
> MEDIUMBLOB,
> MEDIUMTEXT        L+3 bytes, where L < 2^24
> LONGBLOB,
> LONGTEXT        L+4 bytes, where L < 2^32
> ENUM('value1','value2',...) 1 or 2 bytes, depending on the number of
> enumeration values (65535 values maximum)
> SET('value1','value2',...) 1, 2, 3, 4 or 8 bytes, depending on the number
of
> set members (64 members maximum)
>
> Fred
>
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, April 27, 2001 1:48 PM
> To: ingham@erols.com
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] PostgreSQL and mySQL database size question
>
>
> Did you see the FAQ items on estimating database sizes?  Does MySQL have
> less overhead per row?
>
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026


Re: PostgreSQL and mySQL database size question

From
Jan Wieck
Date:
Fred Ingham wrote:
> OK, but that only accounts for 3.2MB of the extra 49.6MB used by PostgreSQL.

    Template databases, system catalog, WAL files?

>
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Friday, April 27, 2001 2:05 PM
> To: ingham@erols.com
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] PostgreSQL and mySQL database size question
>
>
>
> FAQ item 4.7 discusses table size computations.   My guess is that it is
> the 36 bytes-per-row that is the problem.  Those bytes record the
> visibility of the row for proper transactions semantics and MVCC.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


Re: PostgreSQL and mySQL database size question

From
Tom Lane
Date:
"Fred Ingham" <ingham@erols.com> writes:
> and populated them both with the same data: PIN table with 1,740 tuples and
> the PINNDX table with 92,488 tuples.  Looking at the number of files and
> size of the files, I get the following (after doing a vacuumdb with
> PostgreSQL):

> PostgreSQL
>   Files  109
>   Size   60.7MB

That seems excessive to me too --- but it's fair to ask exactly what you
were counting; does that include system catalogs, or just these two
tables and their indexes?  It would help to see an ls -lR of the $PGDATA
directory and a map of the table numbers; you can make the latter with
    select relfilenode, relname from pg_class order by relfilenode;

            regards, tom lane

CREATE USER

From
Jesus Aneiros
Date:
Hi,

Perhaps this is a stupid question but I wonder why I could not put a
CREATE USER inside a transaction block?

Regards, jesus.




Re: CREATE USER

From
Peter Eisentraut
Date:
Jesus Aneiros writes:

> Perhaps this is a stupid question but I wonder why I could not put a
> CREATE USER inside a transaction block?

Because CREATE USER may have to update a flat-text file.  If you roll back
the surrounding transaction the changes to the flat-text file will not be
reverted and you may get weird authentication failures.  This system is
admittedly a bit silly but the alternatives look like too much work for
little benefit.  I think in 7.1 you can run CREATE USER in transaction
blocks, you just have to be careful.

--
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter


RE: PostgreSQL and mySQL database size question

From
"Fred Ingham"
Date:
Tom,

Thank very much for the query, it was extremely helpful.  Using your query,
here is what I found for one of the tables:

            File         Size (KB)
mySQL       pinndx.MYD     3,947
            pinndx.MYI     4,144
                         8,091

PostgreSQL  pinndex_seq      8
            pinndx       7,856
            pinndx_pkey  6,984
            parent_ndx   6,952
            tagpath_ndx  5,552
            tagname_ndx  5,560
            atrname_ndx  5,696
            pinnum_ndx   6,160
            nvalue_ndx   5,832
            value_ndx    6,424
                        57,024

Based on this information, I conclude that PostgreSQL is using significantly
more space for its indexes than mySQL (the .MYI file contains all of the
indexes on the pinndx table in mySQL).

That may also explain another item that has been troubling me, the
difference in insert/update speed between PostgreSQL and mySQL.  It seems to
take PostgreSQL significantly more time to insert into its pinndx table than
mySQL... perhaps this is due to the i/o of writing to all of the index
files?  In any case, is there anyway to reduce the size of the PostgreSQL
index tables (are there different indexing schemes)?

Fred

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 30, 2001 1:40 AM
To: ingham@erols.com
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PostgreSQL and mySQL database size question


"Fred Ingham" <ingham@erols.com> writes:
> and populated them both with the same data: PIN table with 1,740 tuples
and
> the PINNDX table with 92,488 tuples.  Looking at the number of files and
> size of the files, I get the following (after doing a vacuumdb with
> PostgreSQL):

> PostgreSQL
>   Files  109
>   Size   60.7MB

That seems excessive to me too --- but it's fair to ask exactly what you
were counting; does that include system catalogs, or just these two
tables and their indexes?  It would help to see an ls -lR of the $PGDATA
directory and a map of the table numbers; you can make the latter with
    select relfilenode, relname from pg_class order by relfilenode;

            regards, tom lane


Re: PostgreSQL and mySQL database size question

From
Tom Lane
Date:
"Fred Ingham" <ingham@erols.com> writes:
> PostgreSQL  pinndex_seq      8
>             pinndx       7,856
>             pinndx_pkey  6,984
>             parent_ndx   6,952
>             tagpath_ndx  5,552
>             tagname_ndx  5,560
>             atrname_ndx  5,696
>             pinnum_ndx   6,160
>             nvalue_ndx   5,832
>             value_ndx    6,424
>                         57,024

Hm.  All but value_ndx are indexes on integer columns, so the keys are
only 4 bytes.  The index tuple overhead will be either 12 or 16 bytes
per entry depending on whether your hardware has any datatypes that
require 8-byte alignment (I think not on PCs --- what is MAXIMUM_ALIGNOF
in your src/include/config.h?).  16 bytes times 92000 entries is only
about a meg and a half; even allowing for the traditional 70% fill
factor of btrees, you shouldn't see more than a couple meg per index.

That assumes random loading of the index, however, and I think there may
be pathological cases where the indexes come out less dense after
initial load.  Was there any particular order to the data values when
you imported them?  If you drop any of these indexes and CREATE it
again, is the result noticeably smaller?

> Based on this information, I conclude that PostgreSQL is using significantly
> more space for its indexes than mySQL (the .MYI file contains all of the
> indexes on the pinndx table in mySQL).

I find it hard to believe that MySQL is storing nine indexes on a
92000-entry table in only 4Mb.  Storing the keys alone would take 3.3Mb,
never mind making the keys point to anything.  Are you sure that you've
accounted for all of their index storage?

> In any case, is there anyway to reduce the size of the PostgreSQL
> index tables (are there different indexing schemes)?

Do you actually *need* an index on each column?  It seems highly
unlikely that each one of these indexes will pay its keep.

            regards, tom lane

RE: PostgreSQL and mySQL database size question

From
"Fred Ingham"
Date:
Tom,

Again, thank you for your quick and knowledgeable assistance.

After dropping and recreating the indexes I have (in KB):

                         old    new
PostgreSQL  pinndex_seq      8     8
            pinndx       7,856  7,856
            pinndx_pkey  6,984  6,984 (did not recreate)
            parent_ndx   6,952  2,040
            tagpath_ndx  5,552  2,040
            tagname_ndx  5,560  2,040
            atrname_ndx  5,696  2,040
            pinnum_ndx   6,160  2,040
            nvalue_ndx   5,832  2,040
            value_ndx    6,424  2,432
                        57,024 29,520

So, it appears that I am running into the pathological case.  Most of the
index values are NOT random, the primary key will be ever increasing (via
the sequence), the parent, tagpath, tagname, atrname, and pinnum colums
contain many repeated values, only the nvalue and value columns are truly
random.

I do not import or bulk load the values, they are sent in via JDBC in an
asynchronous manner (via a server socket).  It would be very difficult (not
impossible) for me to periodically drop and recreate the indexes (although,
I must say it was very fast doing so).

I do in fact need all of the indexes.  Without going into detail, the
application is 'indexing' millions of XML documents, the tagname, tagpath,
atrname columns are for the XML document's element name, the path leading up
to the element, and the element's attributes, the parent column associates
an element with its parent, etc.  In short, they are all needed for
acceptable performance querying and retrieving values from the database.

With respect to mySQL, I did verify that mySQL did in fact have all of the
indexes I created and that they were saved in a single file (the *.MYI).
Here is the result of the 'show index' command in mySQL.

mysql> show index from pinndx;
+--------+------------+------------+--------------+-------------+-----------
+-------------+----------+--------+---------+
| Table  | Non_unique | Key_name   | Seq_in_index | Column_name | Collation
| Cardinality | Sub_part | Packed | Comment |
+--------+------------+------------+--------------+-------------+-----------
+-------------+----------+--------+---------+
| pinndx |          0 | PRIMARY    |            1 | pinndxnum   | A
|       92488 |     NULL | NULL   |         |
| pinndx |          1 | parentndx  |            1 | parent      | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | tagpathndx |            1 | tagpath     | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | tagnamendx |            1 | tagname     | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | atrnamendx |            1 | atrname     | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | pinnumndx  |            1 | pinnum      | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | nvaluendx  |            1 | nvalue      | A
|        NULL |     NULL | NULL   |         |
| pinndx |          1 | valuendx   |            1 | value       | A
|        NULL |       30 | NULL   |         |
+--------+------------+------------+--------------+-------------+-----------
+-------------+----------+--------+---------+
8 rows in set (0.07 sec)

Fred



Re: PostgreSQL and mySQL database size question

From
Tom Lane
Date:
"Fred Ingham" <ingham@erols.com> writes:
> After dropping and recreating the indexes I have (in KB):

Ah, that's more like it.

> So, it appears that I am running into the pathological case.

Actually, a simpler explanation would be if you'd done a bunch of
updates and/or deletes at some point.  VACUUM currently never shrinks
indexes, so the excess index space might just represent the peak size
of your tables.  (Yes, improving that is on the to-do list.)

> In short, they are all needed for acceptable performance querying and
> retrieving values from the database.

You've got common queries that do selects with WHERE clauses referencing
each one of these columns?  Possible, certainly, but seems a bit odd...

> With respect to mySQL, I did verify that mySQL did in fact have all of the
> indexes I created and that they were saved in a single file (the *.MYI).

Hm.  Anyone know how mySQL represents indexes?  That still seems awfully
small.

            regards, tom lane