[HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory - Mailing list pgsql-hackers
From | Yoshimi Ichiyanagi |
---|---|
Subject | [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory |
Date | |
Msg-id | C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistentmemory |
List | pgsql-hackers |
Hi. These patches enable to use Persistent Memory Development Kit(PMDK)[1] for reading/writing WAL logs on persistent memory(PMEM). PMEM is next generation storage and it has a number of nice features: fast, byte-addressable and non-volatile. Using pgbench which is a PostgreSQL general benchmark, the postgres server to which the patches is applied is about 5% faster than original server. And using my insert benchmark, it is up to 90% faster than original one. I will describe these details later. This e-mail describes the following: A) About PMDK B) About the patches C) The way of running benchmarks using the patches, and the results A) About PMDK PMDK provides the functions to allow an application to directly access PMEM without going through the kernel as a memory for the purpose of high-speed access to PMEM by the application. The following APIs are available through PMDK. A-1. APIs to open a file on PMEM, to create a file on PMEM, and to map a file on PMEM to virtual addresses A-2. APIs to read/write data from and to a file on PMEM A-1. APIs to open a file on PMEM, to create a file on PMEM, and to map a file on PMEM to virtual addresses PMDK provides these APIs using DAX filesystem(DAX FS)[2] feature. DAX FS is a PMEM-aware file system which allows direct access to PMEM without using the kernel page caches. A file in DAX FS can be mapped to memory using standard calls like mmap() on Linux. Furthermore by mapping the file on PMEM to virtual addresses(and after any initial minor page faults that may be required to create the mappings in the MMU), the applications can access PMEM using CPU load/store instructions instead of read/write system calls. A-2. APIs to read/write data from and to a file on PMEM PMDK provides the APIs like memcpy() to copy data to PMEM using single instruction, multiple data(SIMD) instructions[3] and NT store instructions[4]. These instructions improve the performance to copy data to PMEM. As a result, using these APIs is faster than using read/write system calls. [1] http://pmem.io/pmdk/ [2] https://www.usenix.org/system/files/login/articles/login_summer17_07_rudoff.pdf [3] SIMD: SIMD is the instruction operates on all loaded data in a single operation. If the SIMD system loads eight data into registers at once, the store operation to PMEM will happen to all eight values at the same time. [4] NT store instructions: NT store instructions bypass the CPU cache, so using these instructions does not require a flush. B) About the patches Changes by the patches: 0001-Add-configure-option-for-PMDK.patch: - Added "--with-libpmem" configure option to execute I/O with PMDK library 0002-Read-write-WAL-files-using-PMDK.patch: - Added PMDK implementation for WAL I/O operations - Added "pmem-drain" to the wal_sync_method parameter list to write logs synchronously on PMEM 0003-Walreceiver-WAL-IO-using-PMDK.patch: - Added PMDK implementation for Walreceiver of secondary server processes C) The way of running benchmarks using the patches, and the results It's the following: Experimental setup Server: HP ProLiant DL360 Gen9 CPU: Xeon E5-2667 v4 (3.20GHz); 2 processors(without HT) DRAM: DDR4-2400; 32 GiB/processor (8GiB/socket x 4 sockets/processor) x 2 processors NVDIMM: DDR4-2133; 32 GiB/processor (8GiB/socket x 4 sockets/processor) x 2 processors HDD: Seagate Constellation2 2.5inch SATA 3.0. 6Gb/s 1TB 7200rpm x 1 OS: Ubuntu 16.04, linux-4.12 DAX FS: ext4 NVML: master@Aug 30, 2017 PostgreSQL: master Note: I bound the postgres processes to one NUMA node, and the benchmarks to other NUMA node. C-1. Configuring PMEM for using as a block device # ndctl list # ndctl create-namespace -f -e namespace0.0 --mode=memory -M dev C-2. Creating a file system on PMEM, and mounting it with DAX # mkfs.ext4 /dev/pmem0 # mount -t ext4 -o dax /dev/pmem0 /mnt/pmem0 C-3. Setting PMEM_IS_PMEM_FORCE to determine if the WAL files is stored on PMEM Note: If this environment variable is not set, postgres processes are not started. # export PMEM_IS_PMEM_FORCE=1 C-4. Installing PostgreSQL Note: There are 3 important things in installing PostgreSQL. a. Executing "./configure --with-libpmem" to link libpmem b. Setting WAL directory on PMEM c. Modifying wal_sync_method parameter in postgresql.conf from fdatasync to pmem_drain # cd /path/to/[PG_source dir] # ./configure --with-libpmem # make && make install # initdb /path/to/PG_DATA -X /mnt/pmem0/path/to/[PG_WAL dir] # cat /path/to/PG_DATA/postgresql.conf | sed -e s/#wal_sync_method\ =\ fsync/wal_sync_method\ =\ pmem_drain/ > /path/to/PG_DATA/postgresql.conf. tmp # mv /path/to/PG_DATA/postgresql.conf.tmp /path/to/PG_DATA/postgresql.conf # pg_ctl start -D /path/to/PG_DATA # created [DB_NAME] C-5. Running the 2 benchmarks(1. pgbench, 2. my insert benchmark) C-5-1. pgbench # numactl -N 1 pgbech -c 32 -j 8 -T 120 -M prepared [DB_NAME] The averages of running pgbench three times are: wal_sync_method=fdatasync: tps = 43,179 wal_sync_method=pmem_drain: tps = 45,254 C-5-2. pclinet_thread: my insert benchmark Preparation CREATE TABLE [TABLE_NAME] (id int8, value text); ALTER TABLE [TABLE_NAME] ALTER value SET STORAGE external; PREPARE insert_sql (int8) AS INSERT INTO %s (id, value) values ($1, ' [1K_data]'); Execution BEGIN; EXECUTE insert_sql(%lld); COMMIT; Note: I ran this quer 5M times with 32 threads. # ./pclient_thread Invalid Arguments: Usage: ./pclient_thread [The number of threads] [The number to insert tuples] [data size(KB)] # numactl -N 1 ./pclient_thread 32 5242880 1 The averages of running this benchmark three times are: wal_sync_method=fdatasync: tps = 67,780 wal_sync_method=pmem_drain: tps = 131,962 -- Yoshimi Ichiyanagi
Attachment
pgsql-hackers by date: