Thread: Select count(*), the sequel
There was some doubt as for the speed of doing the select count(*) in PostgreSQL and Oracle.
To that end, I copied the most part of the Oracle table I used before to Postgres. Although the copy
wasn't complete, the resulting table is already significantly larger than the table it was copied from. The result still shows that Oracle is significantly faster:
Oracle result:
SQL> alter system flush buffer_cache;
System altered.
SQL> select /*+ full(NO) noparallel */ count(*) from ni_occurrence no;
COUNT(*)
----------
402062638
Elapsed: 00:03:16.45
Hints are necessary because Oracle table is declared as parallel and I didn't want the PK index to be used for counting. Oracle has a good habit of using PK's for counting, if available.
SQL> select bytes/1048576 as MB
2 from user_segments
3 where segment_name='NI_OCCURRENCE';
MB
----------
35329
Elapsed: 00:00:00.85
SQL>
So, oracle stores 402 million records in 35GB and counts them in 3 minutes 16.45 seconds The very same table was partially copied to Postgres, copying died with ORA-01555 snapshot too old sometimes this morning. I ran vacuumdb -f -z on the database after the copy completed and the results are below.
mgogala=# select count(*) from ni_occurrence;
count
-----------
382400476
(1 row)
Time: 221716.466 ms
mgogala=#
mgogala=# select 221/60::real;
?column?
------------------
3.68333333333333
(1 row)
Time: 0.357 ms
mgogala=#
mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence'));
pg_size_pretty
----------------
46 GB
(1 row)
Time: 0.420 ms
mgogala=#
The database wasn't restarted, no caches were flushed, the comparison was done with a serious advantage for PostgreSQL. Postgres needed 3.68 minutes to complete the count which is about the same Oracle but still somewhat slower. Also, I am worried about the sizes. Postgres table is 11GB larger than the original, despite having less data. That was an unfair and unbalanced comparison because Oracle's cache was flushed and Oracle was artificially restrained to use the full table scan without the aid of parallelism. Here is the same result, with no hints and the autotrace on, which shows what happens if I turn the hints off:
SQL> select count(*) from ni_occurrence no;
COUNT(*)
----------
402062638
Elapsed: 00:00:52.61
Execution Plan
----------------------------------------------------------
Plan hash value: 53476935
--------------------------------------------------------------------------------
----------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
Time | TQ
--------------------------------------------------------------------------------
----------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54001 (19)|
00:01:08 | | | |
| 1 | SORT AGGREGATE | | 1 | |
| | | |
| 2 | PX COORDINATOR | | | |
| | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | |
| Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | |
| Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 402M| 54001 (19)|
00:01:08 | Q1,00 | PCWC | |
| 6 | INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID | 402M| 54001 (19)|
00:01:08 | Q1,00 | PCWP | |
--------------------------------------------------------------------------------
----------------------------------------
It took just 52 seconds to count everything, but Oracle didn't even scan the table, it scanned a unique index, in parallel. That is the algorithmic advantage that forced me to restrict the execution plan with hints. My conclusion is that the speed of the full scan is OK, about the same as Oracle speed. There are, however, three significant algorithm advantages on the Oracle's side:
1) Oracle can use indexes to calculate "select count"
2) Oracle can use parallelism.
3) Oracle can use indexes in combination with the parallel processing.
Here are the descriptions:
SQL> desc ni_occurrence
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NOT NULL NUMBER(22)
PERMANENT_ID NOT NULL VARCHAR2(12)
CALL_LETTERS NOT NULL VARCHAR2(5)
AIRDATE NOT NULL DATE
DURATION NOT NULL NUMBER(4)
PROGRAM_TITLE VARCHAR2(360)
COST NUMBER(15)
ASSETID NUMBER(12)
MARKET_ID NUMBER
GMT_TIME DATE
ORIG_ST_OCC_ID NUMBER
EPISODE VARCHAR2(450)
IMPRESSIONS NUMBER
SQL>
mgogala=# \d ni_occurrence
Table "public.ni_occurrence"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | bigint | not null
permanent_id | character varying(12) | not null
call_letters | character varying(5) | not null
airdate | timestamp without time zone | not null
duration | smallint | not null
program_title | character varying(360) |
cost | bigint |
assetid | bigint |
market_id | bigint |
gmt_time | timestamp without time zone |
orig_st_occ_id | bigint |
episode | character varying(450) |
impressions | bigint |
Indexes:
"ni_occurrence_pk" PRIMARY KEY, btree (id)
mgogala=#
Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are running Red Hat 5.5:
[mgogala@lpo-postgres-d01 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[mgogala@lpo-postgres-d01 ~]$
Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[mgogala@lpo-postgres-d01 ~]$
To that end, I copied the most part of the Oracle table I used before to Postgres. Although the copy
wasn't complete, the resulting table is already significantly larger than the table it was copied from. The result still shows that Oracle is significantly faster:
Oracle result:
SQL> alter system flush buffer_cache;
System altered.
SQL> select /*+ full(NO) noparallel */ count(*) from ni_occurrence no;
COUNT(*)
----------
402062638
Elapsed: 00:03:16.45
Hints are necessary because Oracle table is declared as parallel and I didn't want the PK index to be used for counting. Oracle has a good habit of using PK's for counting, if available.
SQL> select bytes/1048576 as MB
2 from user_segments
3 where segment_name='NI_OCCURRENCE';
MB
----------
35329
Elapsed: 00:00:00.85
SQL>
So, oracle stores 402 million records in 35GB and counts them in 3 minutes 16.45 seconds The very same table was partially copied to Postgres, copying died with ORA-01555 snapshot too old sometimes this morning. I ran vacuumdb -f -z on the database after the copy completed and the results are below.
mgogala=# select count(*) from ni_occurrence;
count
-----------
382400476
(1 row)
Time: 221716.466 ms
mgogala=#
mgogala=# select 221/60::real;
?column?
------------------
3.68333333333333
(1 row)
Time: 0.357 ms
mgogala=#
mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence'));
pg_size_pretty
----------------
46 GB
(1 row)
Time: 0.420 ms
mgogala=#
The database wasn't restarted, no caches were flushed, the comparison was done with a serious advantage for PostgreSQL. Postgres needed 3.68 minutes to complete the count which is about the same Oracle but still somewhat slower. Also, I am worried about the sizes. Postgres table is 11GB larger than the original, despite having less data. That was an unfair and unbalanced comparison because Oracle's cache was flushed and Oracle was artificially restrained to use the full table scan without the aid of parallelism. Here is the same result, with no hints and the autotrace on, which shows what happens if I turn the hints off:
SQL> select count(*) from ni_occurrence no;
COUNT(*)
----------
402062638
Elapsed: 00:00:52.61
Execution Plan
----------------------------------------------------------
Plan hash value: 53476935
--------------------------------------------------------------------------------
----------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
Time | TQ
|IN-OUT|
PQ Distrib | --------------------------------------------------------------------------------
----------------------------------------
| 0 | SELECT STATEMENT | | 1 | 54001 (19)|
00:01:08 | | | |
| 1 | SORT AGGREGATE | | 1 | |
| | | |
| 2 | PX COORDINATOR | | | |
| | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | |
| Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | |
| Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 402M| 54001 (19)|
00:01:08 | Q1,00 | PCWC | |
| 6 | INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID | 402M| 54001 (19)|
00:01:08 | Q1,00 | PCWP | |
--------------------------------------------------------------------------------
----------------------------------------
It took just 52 seconds to count everything, but Oracle didn't even scan the table, it scanned a unique index, in parallel. That is the algorithmic advantage that forced me to restrict the execution plan with hints. My conclusion is that the speed of the full scan is OK, about the same as Oracle speed. There are, however, three significant algorithm advantages on the Oracle's side:
1) Oracle can use indexes to calculate "select count"
2) Oracle can use parallelism.
3) Oracle can use indexes in combination with the parallel processing.
Here are the descriptions:
SQL> desc ni_occurrence
Name Null? Type
----------------------------------------- -------- ----------------------------
ID NOT NULL NUMBER(22)
PERMANENT_ID NOT NULL VARCHAR2(12)
CALL_LETTERS NOT NULL VARCHAR2(5)
AIRDATE NOT NULL DATE
DURATION NOT NULL NUMBER(4)
PROGRAM_TITLE VARCHAR2(360)
COST NUMBER(15)
ASSETID NUMBER(12)
MARKET_ID NUMBER
GMT_TIME DATE
ORIG_ST_OCC_ID NUMBER
EPISODE VARCHAR2(450)
IMPRESSIONS NUMBER
SQL>
mgogala=# \d ni_occurrence
Table "public.ni_occurrence"
Column | Type | Modifiers
----------------+-----------------------------+-----------
id | bigint | not null
permanent_id | character varying(12) | not null
call_letters | character varying(5) | not null
airdate | timestamp without time zone | not null
duration | smallint | not null
program_title | character varying(360) |
cost | bigint |
assetid | bigint |
market_id | bigint |
gmt_time | timestamp without time zone |
orig_st_occ_id | bigint |
episode | character varying(450) |
impressions | bigint |
Indexes:
"ni_occurrence_pk" PRIMARY KEY, btree (id)
mgogala=#
Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are running Red Hat 5.5:
[mgogala@lpo-postgres-d01 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
[mgogala@lpo-postgres-d01 ~]$
Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[mgogala@lpo-postgres-d01 ~]$
-- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com The Leader in integrated Media Intelligence Solutions
Hi, Interesting data points. The amount of rows that you managed to insert into PostgreSQL before Oracle gave up the ghost is 95% of the rows in the Oracle version of the database. To count 5% fewer rows, it took PostgreSQL 24 seconds longer. Or adjusting for the missing rows, 52 seconds longer for the entire table or 18% longer than the full table scan in Oracle. This seems to be well within the table layout size differences, possibly due to the fillfactor used --not really bad at all. Now the timings due to algorithm changes are interesting as indicating the room for improvement due to those type of changes. A parallel sequential full-table scan in PostgreSQL could provide the same speed up. Currently that is not possible ... but development continues a pace... In fact, developing such functions in PostgreSQL could end up being less expensive long-term than licensing Oracle RAC. I think the point that you have helped make is that PostgreSQL performs very well for many use cases that have typically been relegated to expensive commecial databases such as Oracle, DB2,... Regards, Ken On Sat, Oct 16, 2010 at 12:53:50PM -0400, Mladen Gogala wrote: > There was some doubt as for the speed of doing the select count(*) in > PostgreSQL and Oracle. > To that end, I copied the most part of the Oracle table I used before to > Postgres. Although the copy > wasn't complete, the resulting table is already significantly larger than > the table it was copied from. The result still shows that Oracle is > significantly faster: > Oracle result: > > SQL> alter system flush buffer_cache; > > System altered. > > SQL> select /*+ full(NO) noparallel */ count(*) from ni_occurrence no; > > COUNT(*) > ---------- > 402062638 > > Elapsed: 00:03:16.45 > > > > Hints are necessary because Oracle table is declared as parallel and I > didn't want the PK index to be used for counting. Oracle has a good habit > of using PK's for counting, if available. > > > SQL> select bytes/1048576 as MB > 2 from user_segments > 3 where segment_name='NI_OCCURRENCE'; > > MB > ---------- > 35329 > > Elapsed: 00:00:00.85 > SQL> > > So, oracle stores 402 million records in 35GB and counts them in 3 minutes > 16.45 seconds The very same table was partially copied to Postgres, > copying died with ORA-01555 snapshot too old sometimes this morning. I ran > vacuumdb -f -z on the database after the copy completed and the results are > below. > > mgogala=# select count(*) from ni_occurrence; > count > ----------- > 382400476 > (1 row) > > Time: 221716.466 ms > mgogala=# > mgogala=# select 221/60::real; > ?column? > ------------------ > 3.68333333333333 > (1 row) > > Time: 0.357 ms > mgogala=# > mgogala=# select pg_size_pretty(pg_table_size('ni_occurrence')); > pg_size_pretty > ---------------- > 46 GB > (1 row) > > Time: 0.420 ms > mgogala=# > > The database wasn't restarted, no caches were flushed, the comparison was > done with a serious advantage for PostgreSQL. Postgres needed 3.68 minutes > to complete the count which is about the same Oracle but still somewhat > slower. Also, I am worried about the sizes. Postgres table is 11GB larger > than the original, despite having less data. That was an unfair and > unbalanced comparison because Oracle's cache was flushed and Oracle was > artificially restrained to use the full table scan without the aid of > parallelism. Here is the same result, with no hints and the autotrace on, > which shows what happens if I turn the hints off: > > SQL> select count(*) from ni_occurrence no; > > COUNT(*) > ---------- > 402062638 > > Elapsed: 00:00:52.61 > > Execution Plan > ---------------------------------------------------------- > Plan hash value: 53476935 > > -------------------------------------------------------------------------------- > ---------------------------------------- > > | Id | Operation | Name | Rows | Cost (%CPU)| > Time | TQ |IN-OUT| PQ Distrib | > > -------------------------------------------------------------------------------- > ---------------------------------------- > > | 0 | SELECT STATEMENT | | 1 | 54001 (19)| > 00:01:08 | | | | > > | 1 | SORT AGGREGATE | | 1 | | > | | | | > > | 2 | PX COORDINATOR | | | | > | | | | > > | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | > | > | Q1,00 | P->S | QC (RAND) | > > | 4 | SORT AGGREGATE | | 1 | | > | Q1,00 | PCWP | | > > | 5 | PX BLOCK ITERATOR | | 402M| 54001 (19)| > 00:01:08 | Q1,00 | PCWC | | > > | 6 | INDEX FAST FULL SCAN| IDX_NI_OCCURRENCE_PID | 402M| 54001 > (19)| > 00:01:08 | Q1,00 | PCWP | | > > -------------------------------------------------------------------------------- > ---------------------------------------- > > It took just 52 seconds to count everything, but Oracle didn't even scan > the table, it scanned a unique index, in parallel. That is the algorithmic > advantage that forced me to restrict the execution plan with hints. My > conclusion is that the speed of the full scan is OK, about the same as > Oracle speed. There are, however, three significant algorithm advantages > on the Oracle's side: > > 1) Oracle can use indexes to calculate "select count" > 2) Oracle can use parallelism. > 3) Oracle can use indexes in combination with the parallel processing. > > > > Here are the descriptions: > > SQL> desc ni_occurrence > Name Null? Type > ----------------------------------------- -------- > ---------------------------- > ID NOT NULL NUMBER(22) > PERMANENT_ID NOT NULL VARCHAR2(12) > CALL_LETTERS NOT NULL VARCHAR2(5) > AIRDATE NOT NULL DATE > DURATION NOT NULL NUMBER(4) > PROGRAM_TITLE VARCHAR2(360) > COST NUMBER(15) > ASSETID NUMBER(12) > MARKET_ID NUMBER > GMT_TIME DATE > ORIG_ST_OCC_ID NUMBER > EPISODE VARCHAR2(450) > IMPRESSIONS NUMBER > > SQL> > mgogala=# \d ni_occurrence > Table "public.ni_occurrence" > Column | Type | Modifiers > ----------------+-----------------------------+----------- > id | bigint | not null > permanent_id | character varying(12) | not null > call_letters | character varying(5) | not null > airdate | timestamp without time zone | not null > duration | smallint | not null > program_title | character varying(360) | > cost | bigint | > assetid | bigint | > market_id | bigint | > gmt_time | timestamp without time zone | > orig_st_occ_id | bigint | > episode | character varying(450) | > impressions | bigint | > Indexes: > "ni_occurrence_pk" PRIMARY KEY, btree (id) > > mgogala=# > > Oracle block is 16k, version is 10.2.0.5 RAC, 64 bit (is anybody still > using 32bit db servers?) . Postgres is 9.0.1, 64 bit. Both machines are > running Red Hat 5.5: > > > [mgogala@lpo-postgres-d01 ~]$ cat /etc/redhat-release > Red Hat Enterprise Linux Server release 5.5 (Tikanga) > [mgogala@lpo-postgres-d01 ~]$ > > Linux lpo-postgres-d01 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 > x86_64 x86_64 x86_64 GNU/Linux > [mgogala@lpo-postgres-d01 ~]$ > > -- > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > www.vmsinfo.com > The Leader in integrated Media Intelligence Solutions >
On Sat, Oct 16, 2010 at 2:44 PM, Kenneth Marshall <ktm@rice.edu> wrote: > Interesting data points. The amount of rows that you managed to > insert into PostgreSQL before Oracle gave up the ghost is 95% > of the rows in the Oracle version of the database. To count 5% > fewer rows, it took PostgreSQL 24 seconds longer. Or adjusting > for the missing rows, 52 seconds longer for the entire table > or 18% longer than the full table scan in Oracle. This seems to > be well within the table layout size differences, possibly due > to the fillfactor used --not really bad at all. I don't think this is due to fillfactor - the default fillfactor is 100, and anyway we ARE larger on disk than Oracle. We really need to do something about that, in the changes to NUMERIC in 9.1 are a step in that direction, but I think a lot more work is needed. I think it would be really helpful if we could try to quantify where the extra space is going. Some places to look: - Bloated representations of individual datatypes. (I know that even the new NUMERIC format is larger than Oracle's NUMBER.) - Excessive per-tuple overhead. Ours is 24 bytes, plus the item pointer. - Alignment requirements. We have a fair number of datatypes that require 4 or 8 byte alignment. How much is that hurting us? - Compression. Maybe Oracle's algorithm does better than PGLZ. If we can quantify where we're losing vs. Oracle - or any other competitor - that might give us some idea where to start looking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I don't think this is due to fillfactor - the default fillfactor is > 100, and anyway we ARE larger on disk than Oracle. We really need to > do something about that, in the changes to NUMERIC in 9.1 are a step > in that direction, but I think a lot more work is needed. Of course, the chances of doing anything more than extremely-marginal kluges without breaking on-disk compatibility are pretty tiny. Given where we are at the moment, I see no appetite for forced dump-and-reloads for several years to come. So I don't foresee that anything is likely to come of such efforts in the near future. Even if somebody had a great idea that would make things smaller without any other penalty, which I'm not sure I believe either. regards, tom lane
On Tue, Oct 26, 2010 at 6:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't think this is due to fillfactor - the default fillfactor is >> 100, and anyway we ARE larger on disk than Oracle. We really need to >> do something about that, in the changes to NUMERIC in 9.1 are a step >> in that direction, but I think a lot more work is needed. > > Of course, the chances of doing anything more than extremely-marginal > kluges without breaking on-disk compatibility are pretty tiny. Given > where we are at the moment, I see no appetite for forced dump-and-reloads > for several years to come. So I don't foresee that anything is likely > to come of such efforts in the near future. Even if somebody had a > great idea that would make things smaller without any other penalty, > which I'm not sure I believe either. Let's try not to prejudge the outcome without doing the research. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Even if somebody had a > great idea that would make things smaller without any other penalty, > which I'm not sure I believe either. I'd say that the only things likely to bring an improvement significant enough to warrant the (quite large) hassle of implementation would be : - read-only / archive tables (get rid of row header overhead) - in-page compression using per-column delta storage for instance (no random access penalty, but hard to implement, maybe easier for read-only tables) - dumb LZO-style compression (license problems, needs parallel decompressor, random access penalty, hard to implement too)
"Pierre C" <lists@peufeu.com> wrote: > in-page compression How would that be different from the in-page compression done by TOAST now? Or are you just talking about being able to make it more aggressive? -Kevin
On Wed, Oct 27, 2010 at 09:52:49PM +0200, Pierre C wrote: >> Even if somebody had a >> great idea that would make things smaller without any other penalty, >> which I'm not sure I believe either. > > I'd say that the only things likely to bring an improvement significant > enough to warrant the (quite large) hassle of implementation would be : > > - read-only / archive tables (get rid of row header overhead) > - in-page compression using per-column delta storage for instance (no > random access penalty, but hard to implement, maybe easier for read-only > tables) > - dumb LZO-style compression (license problems, needs parallel > decompressor, random access penalty, hard to implement too) > Different algorithms have been discussed before. A quick search turned up: quicklz - GPL or commercial fastlz - MIT works with BSD okay zippy - Google - no idea about the licensing lzf - BSD-type lzo - GPL or commercial zlib - current algorithm Of these lzf can compress at almost 3.7X of zlib and decompress at 1.7X and fastlz can compress at 3.1X of zlib and decompress at 1.9X. The same comparison put lzo at 3.0X for compression and 1.8X decompress. The block design of lzl/fastlz may be useful to support substring access to toasted data among other ideas that have been floated here in the past. Just keeping the hope alive for faster compression. Cheers, Ken
Kenneth Marshall, 27.10.2010 22:41: > Different algorithms have been discussed before. A quick search turned > up: > > quicklz - GPL or commercial > fastlz - MIT works with BSD okay > zippy - Google - no idea about the licensing > lzf - BSD-type > lzo - GPL or commercial > zlib - current algorithm > > Of these lzf can compress at almost 3.7X of zlib and decompress at 1.7X > and fastlz can compress at 3.1X of zlib and decompress at 1.9X. The same > comparison put lzo at 3.0X for compression and 1.8X decompress. The block > design of lzl/fastlz may be useful to support substring access to toasted > data among other ideas that have been floated here in the past. > > Just keeping the hope alive for faster compression. What about a dictionary based compression (like DB2 does)? In a nutshell: it creates a list of "words" in a page. For each word, the occurance in the db-block are stored and the actualword is removed from the page/block itself. This covers all rows on a page and can give a very impressive overall compression. This compression is not done only on disk but in-memory as well (the page is loaded with the dictionary into memory). I believe Oracle 11 does something similar. Regards Thomas
Kenneth Marshall <ktm@rice.edu> writes: > Just keeping the hope alive for faster compression. Is there any evidence that that's something we should worry about? I can't recall ever having seen a code profile that shows the pg_lzcompress.c code high enough to look like a bottleneck compared to other query costs. Now, the benefits of 2X or 3X space savings would be pretty obvious, but I've seen no evidence that we could easily achieve that either. regards, tom lane
> "Pierre C" <lists@peufeu.com> wrote: > >> in-page compression > How would that be different from the in-page compression done by > TOAST now? Or are you just talking about being able to make it > more aggressive? > -Kevin Well, I suppose lzo-style compression would be better used on data that is written a few times maximum and then mostly read (like a forum, data warehouse, etc). Then, good candidate pages for compression also probably have all tuples visible to all transactions, therefore all row headers would be identical and would compress very well. Of course this introduces a "small" problem for deletes and updates... Delta compression is : take all the values for a column inside a page, look at the values and their statistical distribution, notice for example that they're all INTs and the values on the page fit between X+n and X-n, store X and only encode n with as few bits as possible for each row. This is only an example, the idea is to exploit the fact that on the same page, all the values of one column often have lots in common. xid values in row headers are a good example of this. TOAST compresses datums, so it performs well on large datums ; this is the opposite, the idea is to compress small tuples by using the reduncancies between tuples.
On Wed, Oct 27, 2010 at 05:49:42PM -0400, Tom Lane wrote: > Kenneth Marshall <ktm@rice.edu> writes: > > Just keeping the hope alive for faster compression. > > Is there any evidence that that's something we should worry about? > I can't recall ever having seen a code profile that shows the > pg_lzcompress.c code high enough to look like a bottleneck compared > to other query costs. > > Now, the benefits of 2X or 3X space savings would be pretty obvious, > but I've seen no evidence that we could easily achieve that either. > > regards, tom lane > One use is to allow substr() on toasted values without needing to decompress the entire contents. Another possibility is to keep larger fields compressed in memory for some value of "larger". With faster compression, it might by useful to compress the WAL files to support faster data rates and therefore update rates for the same hardware. And there are always the in page common substring storage optimizations to reduce index/table sizes. Regards, Ken