Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb - Mailing list pgsql-general

From Merlin Moncure
Subject Re: [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb
Date
Msg-id CAHyXU0zeziyYOMtgra3VH-P_m8NR+gAfW8LvQiM8cFxGE-BOOQ@mail.gmail.com
Whole thread Raw
In response to [GENERAL] clustered index benchmark comparing Postgresql vs Mariadb  (유상지<y0212@naver.com>)
List pgsql-general
On Wed, Aug 30, 2017 at 9:03 PM, 유상지 <y0212@naver.com> wrote:

I want to get help with Postgresql.

I investigated that Postgresql could be rather fast in an environment using a secondary index. but It came up with different results on benckmark.

The database I compared was mariadb, and the benchmark tool was sysbench 1.0.8 with the postgresql driver.

Server environment: vmware, Ubuntu 17.04, processor: 4, RAM: 4 GB, Harddisk: 40 GB, Mariadb (v10.3), PostgreSQL (v9.6.4)

mysql and other systems (for example sql server) use a technique where the table is automatically clustered around an index-- generally the primary key.  This technique has some tradeoffs; the main upside is that lookups on the pkey are somewhat faster whereas lookups on any other index are somewhat slower and insertions can be slower in certain cases (especially if guids are the pkey).  I would call this technique 'index organized table'.  The technique exploits the persistent organization so that the index is implied and does not have to be kept separate from the heap (the main table data storage).

postgres 'cluster' command currently is a one time pass over the table that organizes the table physically in index order but does not maintain the table in that order nor does it exploit the ordering to eliminate the primary key index in the manner that other systems do.   From a postgres point of view, the main advantage is that scans (not single record lookups) over the key will be sequential physical reads and will tend to have to read less physical pages since adjacent key records logically will also be adjacent physically.

For my part, I generally prefer the postgres style of organization for most workloads, particularly for the surrogate key pattern. I would definitely like to have the option of having the indexed organized style however.  It's definitely possible to tease out the tradeoffs in synthetic benchmarking but in the gross aggregate I suspect (but can't obviously prove) the technique is a loser since as database models mature the kinds of ways tables are indexed looked up and joined tends to proliferate. 

merlin
 

pgsql-general by date:

Previous
From: Melvin Davidson
Date:
Subject: Re: [GENERAL] Table create time
Next
From: Melvin Davidson
Date:
Subject: Re: [GENERAL] Table create time