Thread: BIG files
Hi all Postgresql users, I've been using MySQL for years and now I have decided to switch to Postgresql, because I needed more robust "enterprise" features like views and triggers. I work with VERY large datasets: 60 monthly tables with 700,000 rows and 99 columns each, with mostly large numeric values (15 digits) ( NUMERIC(15,0) datatypes, not all filled). So far, I've migrated 2 of my tables to a dedicated directory and tablespace in Postgres. I've created some views, triggers, run queries, and let me tell you that performance is quite good after tweaking some parameters in postgresql.conf. Now I'm very excited about all Postgresql capabilities, and I want to switch right away. The main problem is disk space. The database files stored in postgres take 4 or 5 times more space than in Mysql. Just to be sure, after each bulk load, I performed a VACUUM FULL to reclaim any posible lost space, but nothing gets reclaimed. My plain text dump files with INSERTS are just 150 Mb in size, while the files in Postgres directory are more than 1 Gb each!!. I've tested other free DBMS like Firebird and Ingres, but Postgresql is far more disk space consumer than the others. Disk space is a very relevant issue to me because harddisks are shared with other people, and it is not posible to buy more hard disks right now. I don't know about this but, is there any configuration parameter that I'm missing, like page size or something? , or is it an OS problem?. What should I look for?. I really like postgres, but if can't solve the space issue I would have to stick to MySQL (which I don't want). By the way, I'm working on a P4, 640 Mb RAM, with Win2K, Postgresql v8.0.3 Thanks in advance, Rodrigo, Chile ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On Sat, Jun 18, 2005 at 13:45:42 -0400, rabt@dim.uchile.cl wrote: > Hi all Postgresql users, > > I've been using MySQL for years and now I have decided to switch to Postgresql, > because I needed more robust "enterprise" features like views and triggers. I > work with VERY large datasets: 60 monthly tables with 700,000 rows and 99 > columns each, with mostly large numeric values (15 digits) ( NUMERIC(15,0) > datatypes, not all filled). So far, I've migrated 2 of my tables to a dedicated > > The main problem is disk space. The database files stored in postgres take 4 or > 5 times more space than in Mysql. Just to be sure, after each bulk load, I > performed a VACUUM FULL to reclaim any posible lost space, but nothing gets > reclaimed. My plain text dump files with INSERTS are just 150 Mb in size, while > the files in Postgres directory are more than 1 Gb each!!. I've tested other > free DBMS like Firebird and Ingres, but Postgresql is far more disk space > consumer than the others. From discussions I have seen here, MYSQL implements Numeric using a floating point type. Postgres stores it using something like a base 10000 digit for each 4 bytes of storage. Plus there will be some overhead for storing the precision and scale. You might be better off using bigint to store your data. That will take 8 bytes per datum and is probably the same size as was used in MYSQL.
On Sun, Jun 19, 2005 at 07:48:08AM -0500, Bruno Wolff III wrote: > > From discussions I have seen here, MYSQL implements Numeric using a floating > point type. Postgres stores it using something like a base 10000 digit > for each 4 bytes of storage. Hmmm...that got me wondering about something.... CREATE TABLE foo (n1 decimal(17, 0), n2 decimal(17, 0)); INSERT INTO foo VALUES (10000000000000000, 10000000000000001); SELECT n1, n2, n1 + n2 FROM foo; PostgreSQL 8.0.3: n1 | n2 | ?column? -------------------+-------------------+------------------- 10000000000000000 | 10000000000000001 | 20000000000000001 MySQL 4.1.12: +-------------------+-------------------+-------------------+ | n1 | n2 | n1 + n2 | +-------------------+-------------------+-------------------+ | 10000000000000000 | 10000000000000001 | 20000000000000000 | +-------------------+-------------------+-------------------+ Wrong sum in MySQL. Nice. Oh wait, even better.... CREATE TABLE foo (n1 decimal(20, 0), n2 decimal(20, 0)); INSERT INTO foo VALUES (10000000000000000000, 10000000000000000001); SELECT n1, n2, n1 + n2 FROM foo; PostgreSQL 8.0.3: n1 | n2 | ?column? ----------------------+----------------------+---------------------- 10000000000000000000 | 10000000000000000001 | 20000000000000000001 MySQL 4.1.12: +----------------------+----------------------+-----------------------+ | n1 | n2 | n1 + n2 | +----------------------+----------------------+-----------------------+ | -8446744073709551616 | -8446744073709551615 | -16893488147419099136 | +----------------------+----------------------+-----------------------+ Beauty. Nice of MySQL not to raise an error if I've exceeded some implementation-defined limit. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Sun, 19 Jun 2005, Michael Fuhr wrote: > PostgreSQL 8.0.3: > > n1 | n2 | ?column? > ----------------------+----------------------+---------------------- > 10000000000000000000 | 10000000000000000001 | 20000000000000000001 > > MySQL 4.1.12: > > +----------------------+----------------------+-----------------------+ > | n1 | n2 | n1 + n2 | > +----------------------+----------------------+-----------------------+ > | -8446744073709551616 | -8446744073709551615 | -16893488147419099136 | > +----------------------+----------------------+-----------------------+ > > Beauty. Nice of MySQL not to raise an error if I've exceeded some > implementation-defined limit. To be fair, we didn't raise proper errors for integer math until fairly recently (as recently as 7.4, we had overflow issues on operators where the arguments were in range but the result was not).
rabt@dim.uchile.cl writes: > I work with VERY large datasets: 60 monthly tables with 700,000 rows and 99 > columns each, with mostly large numeric values (15 digits) ( NUMERIC(15,0) > datatypes, not all filled). > The main problem is disk space. The database files stored in postgres > take 4 or 5 times more space than in Mysql. Just to be sure, after > each bulk load, I performed a VACUUM FULL to reclaim any posible lost > space, but nothing gets reclaimed. My plain text dump files with > INSERTS are just 150 Mb in size, while the files in Postgres directory > are more than 1 Gb each!! Hmm ... it's conventional wisdom that a PG data file will be larger than an equivalent text file, but you seem to have a much worse case than I'd expect. As Bruno noted, BIGINT sounds like a better choice than NUMERIC for your data, but even with NUMERIC I don't see where the bloat is coming from. A 15-digit NUMERIC value ought to occupy 16 bytes on disk (8 bytes of overhead + 4 base-10000 digits at 2 bytes each), so that ought to be just about one-for-one with the text representation. And with 99 of those per row, the 32-bytes-per-row row header overhead isn't what's killing you either. Could we see the exact table declaration (including indexes) and a few rows of sample data? regards, tom lane
rabt@dim.uchile.cl writes: > Tom, this is my table definition including domains: The problem appears to be that you have defined all the columns as NOT NULL with defaults that are obviously NULL substitutes, eg > CREATE DOMAIN f29.tipo_idcomuna > AS int4 > DEFAULT 99999 > NOT NULL > CONSTRAINT tipo_idcomuna_check CHECK ((VALUE <= 99999) AND (VALUE >= 0)); That means that every one of the 99 columns is actually present, with a value, in every row. According to VACUUM FULL VERBOSE the actual on-disk widths of your three sample rows range from 816 to 828 bytes, which works out to an average column width of 8 or so bytes, which seems reasonable. Had you allowed the missing columns to go to NULL there would only be a dozen or so actual values stored in each of these sample rows, so (allowing for an extra 12 bytes for the nulls bitmap) the stored width ought to be in the vicinity of 140-150 bytes. Which more than accounts for the bloat you are seeing. In short: use NULL the way it was intended to be used, that is, to indicate missing values. regards, tom lane