Thread: full_page_writes on SSD?

full_page_writes on SSD?

From
Marcin Mańk
Date:
I saw this: http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres page size equal to the SSD page size, do we still need full_page_writes?

Regards
Marcin Mańk

Re: full_page_writes on SSD?

From
Kevin Grittner
Date:
On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:

> if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
> size equal to the SSD page size, do we still need full_page_writes?

If an OS write of the PostgreSQL page size has no chance of being
partially persisted (a/k/a torn), I don't think full page writes
are needed.  That seems likely to be true if pg page size matches
SSD sector size.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: full_page_writes on SSD?

From
Andres Freund
Date:
On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote:
> On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:
>
> > if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
> > size equal to the SSD page size, do we still need full_page_writes?
>
> If an OS write of the PostgreSQL page size has no chance of being
> partially persisted (a/k/a torn), I don't think full page writes
> are needed.  That seems likely to be true if pg page size matches
> SSD sector size.

At the very least it also needs to match the page size used by the OS
(4KB on x86).

But be generally wary of turning of fpw's if you use replication. Not
having them often turns a asynchronously batched write workload into one
containing a lot of synchronous, single threaded, reads. Even with SSDs
that can very quickly lead to not being able to keep up with replay
anymore.


Re: full_page_writes on SSD?

From
John R Pierce
Date:
On 11/24/2015 10:48 AM, Marcin Mańk wrote:
> I saw this:
> http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages
>
> It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the
> Postgres page size equal to the SSD page size, do we still need
> full_page_writes?


an SSD's actual write block is much much larger than that.    they
emulate 512 or 4k sectors, but they are not actually written in sector
order, rather new writes are accumulated in a buffer on the drive, then
written out to a whole block, and a sector mapping table is maintained
by the drive.

--
john r pierce, recycling bits in santa cruz



Re: full_page_writes on SSD?

From
Tomas Vondra
Date:
On 11/24/2015 08:14 PM, Andres Freund wrote:
> On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote:
>> On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.mank@gmail.com> wrote:
>>
>>> if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
>>> size equal to the SSD page size, do we still need
>>> full_page_writes?
>>
>> If an OS write of the PostgreSQL page size has no chance of being
>> partially persisted (a/k/a torn), I don't think full page writes
>> are needed. That seems likely to be true if pg page size matches
>> SSD sector size.
>
> At the very least it also needs to match the page size used by the
> OS (4KB on x86).

Right. I find this possibility (when the OS and SSD page sizes match)
interesting, exactly because it might make the storage resilient to torn
pages.

>
> But be generally wary of turning of fpw's if you use replication.
> Not having them often turns a asynchronously batched write workload
> into one containing a lot of synchronous, single threaded, reads.
> Even with SSDs that can very quickly lead to not being able to keep
> up with replay anymore.
>

I don't immediately see why that would happen? Can you elaborate?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: full_page_writes on SSD?

From
Tomas Vondra
Date:
On 11/24/2015 08:40 PM, John R Pierce wrote:
> On 11/24/2015 10:48 AM, Marcin Mańk wrote:
>> I saw this:
>> http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages
>>
>> It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the
>> Postgres page size equal to the SSD page size, do we still need
>> full_page_writes?
>
>
> an SSD's actual write block is much much larger than that. they
> emulate 512 or 4k sectors, but they are not actually written in
> sector order, rather new writes are accumulated in a buffer on the
> drive, then written out to a whole block, and a sector mapping table
> is maintained by the drive.

I don't see how that's related to full_page_writes?

It's true that SSDs optimize the writes in various ways, generally along
the lines you described, because they do work with "erase
blocks"(generally 256kB - 1MB chunks) and such.

But the internal structure of SSD has very little to do with FPW because
what matters is whether the on-drive write cache is volatile or not (SSD
can't really work without it).

What matters (when it comes to resiliency to torn pages) is the page
size at the OS level, because that's what's being handed over to the SSD.

Of course, there might be other benefits of further lowering page sizes
at the OS/database level (and AFAIK there are SSD drives that use pages
smaller than 4kB).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: full_page_writes on SSD?

From
NTPT
Date:
Hi,


I investigate bit about SSD  and how it works and need to be aligned .

And I  conclude that in the ideal world we need a  general --ebs=xxx switch in various linux tools to ensure alignment. Or make calculation by had..

On the market there are SSD disks with page size 4 or 8 kb. But there is  for ssd disk typical property - the EBS - Erase Block Size. If disk operate and write to single sector, whole Erase block must be read by driver electronic, modified and write back to the drive.

On the market there are devices with multiple EBS sizes . 128, 256, 512 1024 1534 2048 kib etc
In my case Samsung 850evo  there are 8k pages and 1536 Erase Block

So first problem with alegment - partition should start on the  Erase block bounduary .  So --ebs  switch in partition tools for propper aignment  would be practical. Or calculate by hand. In my sase 1536 = 3072 512b sectors.

Things get complicate if You use  mdadm raid. Because Raid superblock is located on the begining of the raid  device and  does not fill whole rerase block, it is practical to set in creation of raid  --offset to real filesystem start at next erase block from the begining of raid device so underlying filesystem would be aligned as well.  so --ebs=xxx on mdadm would be practice

And now ext4  so blocksize 4096 . because page size of ssd is 8kb , setting stride´wit is a smallest unit on with filesystem operate in one disk to 2  to fill ssd pagesize is practical. And stripe size set  as ebs/pagesize or as whole ebs . and may be it would be useful to use ext4 --offset to edb as well.

this should align partition, raid and filesystem. fix me if I am wrong.

And  now it is turn for database storage engine. I think try to write on erase block size bounduary and  erase block size amount of data may have some benefits not with the speed but in lower wear-out of the entire ssd disk..

 

 


---------- Původní zpráva ----------
Od: Marcin Mańk <marcin.mank@gmail.com>
Komu: PostgreSQL <pgsql-general@postgresql.org>
Datum: 24. 11. 2015 20:07:30
Předmět: [GENERAL] full_page_writes on SSD?

I saw this: http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres page size equal to the SSD page size, do we still need full_page_writes?

Regards
Marcin Mańk

Re: full_page_writes on SSD?

From
"FarjadFarid\(ChkNet\)"
Date:

I am constantly using SSD both on my OS and database and have none of these problems.

 

However I don’t use SSD for O/S’s virtual memory.

 

From what I have read of this thread.

 

Potentially there could also be a situation that SSD is hitting its limit of auto recovery and has been over used.

It is well known that using SSD’s for OS’s virtual memory causes SSDs to wear out much quicker.

 

To really test all these. One needs to use a brand new SSD. Also ensure you are not using O/S’s virtual memory on the same SSD as DB and its log file.

 

You might want to also double check the language of the OS and postgresql installed. As these determine the final size of memory used to read and write.

 

 

 

 

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of NTPT
Sent: 25 November 2015 12:10
To: Marcin Mańk
Cc: PostgreSQL
Subject: Re: [GENERAL] full_page_writes on SSD?

 

Hi,


I investigate bit about SSD  and how it works and need to be aligned .

And I  conclude that in the ideal world we need a  general --ebs=xxx switch in various linux tools to ensure alignment. Or make calculation by had..

On the market there are SSD disks with page size 4 or 8 kb. But there is  for ssd disk typical property - the EBS - Erase Block Size. If disk operate and write to single sector, whole Erase block must be read by driver electronic, modified and write back to the drive.

On the market there are devices with multiple EBS sizes . 128, 256, 512 1024 1534 2048 kib etc
In my case Samsung 850evo  there are 8k pages and 1536 Erase Block

So first problem with alegment - partition should start on the  Erase block bounduary .  So --ebs  switch in partition tools for propper aignment  would be practical. Or calculate by hand. In my sase 1536 = 3072 512b sectors.

Things get complicate if You use  mdadm raid. Because Raid superblock is located on the begining of the raid  device and  does not fill whole rerase block, it is practical to set in creation of raid  --offset to real filesystem start at next erase block from the begining of raid device so underlying filesystem would be aligned as well.  so --ebs=xxx on mdadm would be practice

And now ext4  so blocksize 4096 . because page size of ssd is 8kb , setting stride´wit is a smallest unit on with filesystem operate in one disk to 2  to fill ssd pagesize is practical. And stripe size set  as ebs/pagesize or as whole ebs . and may be it would be useful to use ext4 --offset to edb as well.

this should align partition, raid and filesystem. fix me if I am wrong.

And  now it is turn for database storage engine. I think try to write on erase block size bounduary and  erase block size amount of data may have some benefits not with the speed but in lower wear-out of the entire ssd disk..

 

 

---------- Původní zpráva ----------
Od: Marcin Mańk <marcin.mank@gmail.com>
Komu: PostgreSQL <pgsql-general@postgresql.org>
Datum: 24. 11. 2015 20:07:30
Předmět: [GENERAL] full_page_writes on SSD?

 

I saw this: http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

 

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the Postgres page size equal to the SSD page size, do we still need full_page_writes?

 

Regards

Marcin Mańk

=

Re: full_page_writes on SSD?

From
Jim Nasby
Date:
On 11/25/15 5:38 AM, Tomas Vondra wrote:
>> But be generally wary of turning of fpw's if you use replication.
>> Not having them often turns a asynchronously batched write workload
>> into one containing a lot of synchronous, single threaded, reads.
>> Even with SSDs that can very quickly lead to not being able to keep
>> up with replay anymore.
>>
>
> I don't immediately see why that would happen? Can you elaborate?

If there's no FPI records in WAL then recovery/replay has to read the
blocks from disk before it can apply the real WAL record.

Way back in the day, recovery would always do this... someone had the
bright idea around 8.0 to make use of the FPIs if they're present. IIRC
that resulted in order of magnitude improvements of recovery time in
many cases.

For SR, the effect might not be as large, if the slave is actively being
used, and if the queries hitting the slave tend to be grabbing the same
data that's being written on the master. In many environments I expect
that to be the case. But if it's not it wouldn't surprise me if it
became very easy to lag a slave as replay constantly waited for blocks
to come in.

If running with full_page_writes turned off becomes remotely common it'd
probably be worth finding a way to pre-issue read requests to the OS,
similar to what we do in some cases if effective_io_concurrency > 1.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com