Thread: DELETE vs TRUNCATE explanation

DELETE vs TRUNCATE explanation

From

Craig Ringer

Date:

10 July 2012, 21:37:47

Hi

After seeing a few discussions here and on Stack Overflow I've put
together a quick explanation of why "DELETE FROM table;" may be faster
than "TRUNCATE table" for people doing unit testing on lots of tiny
tables, people who're doing this so often they care how long it takes.

I'd love it if a few folks who know the guts were to take a look and
verify its correctness:

http://stackoverflow.com/a/11423886/398670

--
Craig Ringer

Re: DELETE vs TRUNCATE explanation

From

Daniel Farina

Date:

11 July 2012, 02:23:26

On Tue, Jul 10, 2012 at 5:37 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> Hi
>
> After seeing a few discussions here and on Stack Overflow I've put together
> a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE
> table" for people doing unit testing on lots of tiny tables, people who're
> doing this so often they care how long it takes.
>
> I'd love it if a few folks who know the guts were to take a look and verify
> its correctness:

I haven't said this before, but think it every time someone asks me
about this, so I'll say it now:

This is a papercut that should be solved with improved mechanics.
TRUNCATE should simply be very nearly the fastest way to remove data
from a table while retaining its type information, and if that means
doing DELETE without triggers when the table is small, then it should.
 The only person who could thwart me is someone who badly wants their
128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
of catalog anyway.

Does that sound reasonable?  As in, would anyone object if TRUNCATE
learned this behavior?

--
fdr

Re: DELETE vs TRUNCATE explanation

From

Tom Lane

Date:

11 July 2012, 11:06:04

Daniel Farina <daniel@heroku.com> writes:
> TRUNCATE should simply be very nearly the fastest way to remove data
> from a table while retaining its type information, and if that means
> doing DELETE without triggers when the table is small, then it should.
>  The only person who could thwart me is someone who badly wants their
> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
> of catalog anyway.

> Does that sound reasonable?  As in, would anyone object if TRUNCATE
> learned this behavior?

Yes, I will push back on that.

(1) We don't need the extra complexity.

(2) I don't believe that you know where the performance crossover point
would be (according to what metric, anyway?).

(3) The performance of the truncation itself should not be viewed in
isolation; subsequent behavior also needs to be considered.  An example
of possible degradation is that index bloat would no longer be
guaranteed to be cleaned up over a series of repeated truncations.
(You might argue that if the table is small then the indexes couldn't
be very bloated, but I don't think that holds up over a long series.)

IOW, I think it's fine as-is.  I'd certainly wish to see many more
than one complainant before we expend effort in this area.

            regards, tom lane

Re: DELETE vs TRUNCATE explanation

From

"ktm@rice.edu"

Date:

11 July 2012, 11:20:10

On Wed, Jul 11, 2012 at 10:05:48AM -0400, Tom Lane wrote:
> Daniel Farina <daniel@heroku.com> writes:
> > TRUNCATE should simply be very nearly the fastest way to remove data
> > from a table while retaining its type information, and if that means
> > doing DELETE without triggers when the table is small, then it should.
> >  The only person who could thwart me is someone who badly wants their
> > 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
> > of catalog anyway.
>
> > Does that sound reasonable?  As in, would anyone object if TRUNCATE
> > learned this behavior?
>
> Yes, I will push back on that.
>
> (1) We don't need the extra complexity.
>
> (2) I don't believe that you know where the performance crossover point
> would be (according to what metric, anyway?).
>
> (3) The performance of the truncation itself should not be viewed in
> isolation; subsequent behavior also needs to be considered.  An example
> of possible degradation is that index bloat would no longer be
> guaranteed to be cleaned up over a series of repeated truncations.
> (You might argue that if the table is small then the indexes couldn't
> be very bloated, but I don't think that holds up over a long series.)
>
> IOW, I think it's fine as-is.  I'd certainly wish to see many more
> than one complainant before we expend effort in this area.
>
>             regards, tom lane
>

+1 TRUNCATE needs to keep the same properties independent of the size
of the table. Smearing it into a DELETE would not be good at all. If
there are optimizations that can be done to keep its current behavior,
those might be possible, but the complexity may not be worthwhile for
a relative corner case.

Regards,
Ken

Re: DELETE vs TRUNCATE explanation

From

Matthew Woodcraft

Date:

11 July 2012, 15:10:56

Tom Lane wrote:
> (3) The performance of the truncation itself should not be viewed in
> isolation; subsequent behavior also needs to be considered.  An example
> of possible degradation is that index bloat would no longer be
> guaranteed to be cleaned up over a series of repeated truncations.
> (You might argue that if the table is small then the indexes couldn't
> be very bloated, but I don't think that holds up over a long series.)
>
> IOW, I think it's fine as-is.  I'd certainly wish to see many more
> than one complainant before we expend effort in this area.

I think a documentation change would be worthwhile.

At the moment the TRUNCATE page says, with no caveats, that it is faster than
unqualified DELETE.

It surprised me to find that this wasn't true (with 7.2, again with small
tables in a testsuite), and evidently it's still surprising people today.

-M-

Re: DELETE vs TRUNCATE explanation

From

Craig James

Date:

11 July 2012, 17:19:00

On Wed, Jul 11, 2012 at 7:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Farina <daniel@heroku.com> writes:
> TRUNCATE should simply be very nearly the fastest way to remove data
> from a table while retaining its type information, and if that means
> doing DELETE without triggers when the table is small, then it should.
> The only person who could thwart me is someone who badly wants their
> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
> of catalog anyway.

> Does that sound reasonable? As in, would anyone object if TRUNCATE
> learned this behavior?

Yes, I will push back on that.

(1) We don't need the extra complexity.

(2) I don't believe that you know where the performance crossover point
would be (according to what metric, anyway?).

(3) The performance of the truncation itself should not be viewed in
isolation; subsequent behavior also needs to be considered. An example
of possible degradation is that index bloat would no longer be
guaranteed to be cleaned up over a series of repeated truncations.
(You might argue that if the table is small then the indexes couldn't
be very bloated, but I don't think that holds up over a long series.)

IOW, I think it's fine as-is. I'd certainly wish to see many more
than one complainant before we expend effort in this area.

It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not?

The main point of truncate is to provide a more efficient mechanism to delete all data from large tables. If your app developers don't know within a couple orders of magnitude how much data your tables hold, and can't figure out whether to use delete or truncate, I can't find much sympathy in my heart.

Craig

Re: DELETE vs TRUNCATE explanation

From

Shaun Thomas

Date:

11 July 2012, 17:47:41

On 07/11/2012 03:18 PM, Craig James wrote:

> It strikes me as a contrived case rather than a use case.  What sort of
> app repeatedly fills and truncates a small table thousands of times ...
> other than a test app to see whether you can do it or not?

Test systems. Any company with even a medium-size QA environment will
have continuous integration systems that run unit tests on a trash
database hundreds or thousands of times through the day. Aside from
dropping/creating the database via template, which would be *really*
slow, truncate is the easiest/fastest way to reset between tests.

If TRUNCATE suddenly started defaulting to DELETE on small table-sets
and several iterations led to exponential index growth, that would be
rather unfortunate.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: DELETE vs TRUNCATE explanation

From

Andrew Dunstan

Date:

11 July 2012, 18:05:15

On 07/11/2012 04:47 PM, Shaun Thomas wrote:
> On 07/11/2012 03:18 PM, Craig James wrote:
>
>> It strikes me as a contrived case rather than a use case.  What sort of
>> app repeatedly fills and truncates a small table thousands of times ...
>> other than a test app to see whether you can do it or not?
>
> Test systems. Any company with even a medium-size QA environment will
> have continuous integration systems that run unit tests on a trash
> database hundreds or thousands of times through the day. Aside from
> dropping/creating the database via template, which would be *really*
> slow, truncate is the easiest/fastest way to reset between tests.


Why is recreating the test db from a (populated) template going to be
slower than truncating all the tables and repopulating from an external
source? I had a client who achieved a major improvement in speed and
reduction in load by moving to this method of test db setup.

cheers

andrew

Re: DELETE vs TRUNCATE explanation

From

Mark Thornton

Date:

11 July 2012, 18:32:55

On 11/07/12 21:18, Craig James wrote:
>
> It strikes me as a contrived case rather than a use case.  What sort
> of app repeatedly fills and truncates a small table thousands of times
> ... other than a test app to see whether you can do it or not?
If I have a lot of data which updates/inserts an existing table but I
don't know if a given record will be an update or an insert, then I
write all the 'new' data to a temporary table and then use sql
statements to achieve the updates and inserts on the existing table.

Is there a better way of doing this in standard SQL?

Mark

Re: DELETE vs TRUNCATE explanation

From

Craig James

Date:

11 July 2012, 19:10:16

On Wed, Jul 11, 2012 at 2:32 PM, Mark Thornton <mthornton@optrak.com> wrote:

On 11/07/12 21:18, Craig James wrote:

It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not?
If I have a lot of data which updates/inserts an existing table but I don't know if a given record will be an update or an insert, then I write all the 'new' data to a temporary table and then use sql statements to achieve the updates and inserts on the existing table.

Is there a better way of doing this in standard SQL?

If it's a single session, use a temporary table. It is faster to start with (temp tables aren't logged), and it's automatically dropped at the end of the session (or at the end of the transaction if that's what you specified when you created it). This doesn't work if your insert/update spans more than one session.

Another trick that works (depending on how big your tables are) is to scan the primary key before you start, and build a hash table of the keys. That instantly tells you whether each record should be an insert or update.

Craig

Mark

Re: DELETE vs TRUNCATE explanation

From

Daniel Farina

Date:

11 July 2012, 19:52:39

On Wed, Jul 11, 2012 at 7:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Daniel Farina <daniel@heroku.com> writes:
>> TRUNCATE should simply be very nearly the fastest way to remove data
>> from a table while retaining its type information, and if that means
>> doing DELETE without triggers when the table is small, then it should.
>>  The only person who could thwart me is someone who badly wants their
>> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
>> of catalog anyway.
>
>> Does that sound reasonable?  As in, would anyone object if TRUNCATE
>> learned this behavior?
>
> Yes, I will push back on that.
>
> (1) We don't need the extra complexity.

Well, a "need" is justified by the gains, no?  It seems like this
follows from the thoughts presented afterwards, so I'll discuss those.

> (2) I don't believe that you know where the performance crossover point
> would be (according to what metric, anyway?).

Nope. I don't.  But an exact crossover is a level of precision I don't
really need, because here are where things stand on a completely
unremarkable test suite on the closest project to me that meets the
"regular web-app" profile case:

With en-masse DELETE:
rake  41.89s user 3.08s system 76% cpu 58.629 total

With TRUNCATE:
rake  49.86s user 2.93s system 5% cpu 15:17.88 total

15x slower.  This is a Macbook Air with full disk encryption and SSD
disk with fsync off, e.g. a very typical developer configuration.
This is a rather small schema -- probably a half a dozen tables, and
probably about a dozen indexes.  This application is entirely
unremarkable in its test-database workload: it wants to load a few
records, do a few things, and then clear those handful of records.

> (3) The performance of the truncation itself should not be viewed in
> isolation; subsequent behavior also needs to be considered.  An example
> of possible degradation is that index bloat would no longer be
> guaranteed to be cleaned up over a series of repeated truncations.
> (You might argue that if the table is small then the indexes couldn't
> be very bloated, but I don't think that holds up over a long series.)

I'm not entirely convinced to the mechanism, it was simply the most
obvious one, but I bet a one that is better in every respect is also
possible.  It did occur to me that bloat might be a sticky point.

> IOW, I think it's fine as-is.  I'd certainly wish to see many more
> than one complainant before we expend effort in this area.

I've seen way more than one complaint, and I'm quite sure there are
thousands of man hours (or more) spent on people who don't even know
to complain about such atrocious performance (or maybe it's so bad
that most people run a web search and find out, probably being left
really annoyed from having to yak shave as a result).  In spite of how
familiar I am with Postgres and its mailing lists, I have glossed over
this for a long time, just thinking "wow, that really sucks" and only
now -- by serendipity of having skimmed this post -- have seen fit to
complain on behalf of quite a few rounds of dispensing workaround
advice to other people.  It's only when this was brought to the fore
of my mind did I stop to consider how much wasted time I've seen in
people trying to figure this out over and over again (granted, they
tend to remember after the first time).

Perhaps a doc fix is all we need (TRUNCATE is constant-time on large
tables, but can be very slow compared to DELETE on small tables), but
I completely and enthusiastically reject any notion from people
calling this "contrived" or an "edge case," because people writing
software against PostgreSQL that have unit tests have this use case
constantly, often dozens or even hundreds of times a day.

What I don't know is how many people figure out that they should use
DELETE instead, and after how long.  Even though the teams I work with
are very familiar with many of the finer points of Postgres, doing
some probing for the first time took a little while.

If we're going to live with it, I contest that we should own it as a
real and substantial weakness for development productivity, and not
sweep it under the rug as some "contrived" or "corner" case.

--
fdr

Re: DELETE vs TRUNCATE explanation

From

Craig Ringer

Date:

11 July 2012, 22:24:08

On 07/12/2012 02:10 AM, Matthew Woodcraft wrote:
> I think a documentation change would be worthwhile. At the moment the
> TRUNCATE page says, with no caveats, that it is faster than
> unqualified DELETE.

+1  to updating the docs to reflect the fact that TRUNCATE may have a
higher fixed cost than DELETE FROM table; but also prevents bloat.

It's a weird little corner case, but with database-backed unit testing
it's going to become a more significant one whether or not it feels like
it makes any sense.

--
Craig Ringer

Re: DELETE vs TRUNCATE explanation

From

Craig Ringer

Date:

11 July 2012, 22:26:39

On 07/11/2012 01:22 PM, Daniel Farina wrote:
> On Tue, Jul 10, 2012 at 5:37 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> Hi
>>
>> After seeing a few discussions here and on Stack Overflow I've put together
>> a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE
>> table" for people doing unit testing on lots of tiny tables, people who're
>> doing this so often they care how long it takes.
>>
>> I'd love it if a few folks who know the guts were to take a look and verify
>> its correctness:
> I haven't said this before, but think it every time someone asks me
> about this, so I'll say it now:
>
> This is a papercut that should be solved with improved mechanics.
> TRUNCATE should simply be very nearly the fastest way to remove data
> from a table while retaining its type information, and if that means
> doing DELETE without triggers when the table is small, then it should.
>   The only person who could thwart me is someone who badly wants their
> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
> of catalog anyway.
>
> Does that sound reasonable?  As in, would anyone object if TRUNCATE
> learned this behavior?
Yep, I'd object. It's more complicated and less predictable. Also, as I
strongly and repeatedly highlighted in my post, DELETE FROM table; does
a different job to TRUNCATE. You'd at minimum need the effect of DELETE
followed by a VACUUM on the table and its indexes to be acceptable and
avoid the risk of rapid table + index bloat - and that'd be lots slower
than a TRUNCATE. You could be clever and lock the table then DELETE and
set xmax at the same time I guess, but I suspect that'd be a bit of work
and still wouldn't take care of the indexes.

It's also too complicated, not least because AFAIK util commands and
CRUD commands go through very different paths in PostgreSQL.

I guess you could propose and post a prototype patch for a new command
that tried to empty the table via whatever method it thought would be
fastest. Such a new command wouldn't be bound by the accepted and
expected rules followed by TRUNCATE so it could vary its behaviour based
on the table, doing a real truncate on big tables and a
delete-then-vaccum on small tables. I suspect you'd land up writing the
fairly complicated code for the potentially multi-table
delete-and-vaccum yourself.

Honestly, though, it might be much better to start with "how can
TRUNCATE of empty or near-empty tables be made faster?" and start
examining where the time goes.

--
Craig Ringer

Re: DELETE vs TRUNCATE explanation

From

Craig Ringer

Date:

11 July 2012, 22:42:18

On 07/12/2012 06:51 AM, Daniel Farina wrote:
> 15x slower.  This is a Macbook Air with full disk encryption and SSD
> disk with fsync off, e.g. a very typical developer configuration.
Don't use full disk encryption for throwaway test data if you care about
how long those tests take. It's a lot like tuning the engine in your car
while ignoring the fact that the handbrake is jammed on and you're
dragging a parachute. Use a ramdisk or un-encrypted partition, something
that doesn't take three weeks to fsync().

That said, this performance gap makes me wonder if TRUNCATE is forcing
metadata synchronisation even with fsync=off, causing the incredibly
glacially awesomely slow disk access of your average FDE system to kick
in, possibly even once per table or even once per file (index, table,
toast, etc). If so, it may be worth:

- Allowing TRUNCATE to skip synchronization when fsync=off. Pg is
already allowed to eat all your data if it feels like it in this
configuration, so there's no point flushing filesystem metadata to make
sure files are really swapped.

- When fsync=on, trying to flush all changes to all files out at once
rather than once per file as it could be doing (haven't checked) right
now. How to do this without also flushing all other pending I/O on the
whole system (with a global "sync()") would be somewhat OS/filesystem
dependent, unfortunately.

You could help progress this issue constructively by doing some
profiling on your system, tracing Pg's system calls, and determining
what exactly it's doing with DELETE vs TRUNCATE and where the time goes.
On Linux you'd use OProfile for this and on Solaris you'd use DTrace.
Dunno what facilities Mac OS X has but there must be something similar.

Once you've determined why it's slow, you have a useful starting point
for making it faster, first for test systems with fsync=off then, once
that's tracked down, maybe for robust systems with fsync=on.

> I've seen way more than one complaint, and I'm quite sure there are
> thousands of man hours (or more) spent on people who don't even know
> to complain about such atrocious performance (or maybe it's so bad
> that most people run a web search and find out, probably being left
> really annoyed from having to yak shave as a result).
I suspect you're right - as DB based unit testing becomes more
commonplace this is turning up a lot more. As DB unit tests were first
really popular in the ruby/rails crowd they've probably seen the most
pain, but as someone who doesn't move in those circles I wouldn't have
known. They certainly don't seem to have been making noise about it
here, and I've only recently seen some SO questions about it.

> Perhaps a doc fix is all we need (TRUNCATE is constant-time on large
> tables, but can be very slow compared to DELETE on small tables), but
> I completely and enthusiastically reject any notion from people
> calling this "contrived" or an "edge case," because people writing
> software against PostgreSQL that have unit tests have this use case
> constantly, often dozens or even hundreds of times a day.
I have to agree with this - it may have been an edge case in the past,
but it's becoming mainstream and is worth being aware of.

That said, the group of people who care about this most are not well
represented as active contributors to PostgreSQL. I'd love it if you
could help start to change that by stepping in and taking a little time
to profile exactly what's going on with your system so we can learn
what, exactly, is slow.

--
Craig Ringer

Re: DELETE vs TRUNCATE explanation

From

Daniel Farina

Date:

12 July 2012, 03:13:31

On Wed, Jul 11, 2012 at 6:41 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 07/12/2012 06:51 AM, Daniel Farina wrote:
>>
>> 15x slower.  This is a Macbook Air with full disk encryption and SSD
>> disk with fsync off, e.g. a very typical developer configuration.
>
> Don't use full disk encryption for throwaway test data if you care about how
> long those tests take. It's a lot like tuning the engine in your car while
> ignoring the fact that the handbrake is jammed on and you're dragging a
> parachute. Use a ramdisk or un-encrypted partition, something that doesn't
> take three weeks to fsync().

No. Full disk encryption is not that slow.  And as we see, there is a
workaround that works "just fine" (maybe it could be faster, who
knows?) in this exact configuration.  The greater problem is more
likely to be HFS+, the file system.

If someone produces and gets adoption of a wonderfully packaged
test-configurations of Postgres using a ram-based block device that
somehow have a good user experience living alongside the persistent
version, this problem can go away completely.  In fact, that would be
*phenomenal*, because so many things could be so much faster. But
that's surprisingly challenging: for example, last I checked,
Postgres.app, principally written by one of my colleagues, does *not*
disable fsync because we don't know of a great way to communicate the
relaxed expectations of durability, even though Postgres.app is
targeted towards developers: for example, it does not run until you
log in, so it's more like a foreground application.  Maybe if the
connection had an option that said "x-test=true", or
something...deposit your idea here.

Until then, this is an at the level of an is-ought problem: there is
no immediate to even moderately distant future where people are not
going to click the full disk encryption button their OS vendor gives
them (nor should they *not* click that: people love to download bits
of data from production to their local machine to figure out problems,
and I think the world is a better place for it), and people are going
to use HFS+ in large numbers, so talking about how many people "just"
ought to reconfigure is tantamount to blaming the victim, especially
when we have a sound and workable workaround in hand to at least prove
definitively that the problem is not intractable.

> That said, this performance gap makes me wonder if TRUNCATE is forcing
> metadata synchronisation even with fsync=off, causing the incredibly
> glacially awesomely slow disk access of your average FDE system to kick in,
> possibly even once per table or even once per file (index, table, toast,
> etc).

Lousy file system is my guess.  HFS is not that great.  I bet ext3
would be a reasonable model of this amount of pain as well.

> You could help progress this issue constructively by doing some profiling on
> your system, tracing Pg's system calls, and determining what exactly it's
> doing with DELETE vs TRUNCATE and where the time goes. On Linux you'd use
> OProfile for this and on Solaris you'd use DTrace. Dunno what facilities Mac
> OS X has but there must be something similar.

I'm sure I could, but first I want to put to complete rest the notion
that this is an "edge case."  It's only an edge case if the only
database you have runs in production.  An understanding by more people
that this is a problem of at least moderate impact is a good first
step.  I'll ask some of my more Macintosh-adept colleagues for advice.

>> I've seen way more than one complaint, and I'm quite sure there are
>> thousands of man hours (or more) spent on people who don't even know
>> to complain about such atrocious performance (or maybe it's so bad
>> that most people run a web search and find out, probably being left
>> really annoyed from having to yak shave as a result).
>
> I suspect you're right - as DB based unit testing becomes more commonplace
> this is turning up a lot more. As DB unit tests were first really popular in
> the ruby/rails crowd they've probably seen the most pain, but as someone who
> doesn't move in those circles I wouldn't have known. They certainly don't
> seem to have been making noise about it here, and I've only recently seen
> some SO questions about it.

Well, here's another anecdotal data point to show how this can sneak
under the radar: because this was a topic of discussion in the office
today, a colleague in the Department of Data discovered his 1.5 minute
testing cycle could be cut to thirty seconds.  We conservatively
estimate he runs the tests 30 times a day when working on his project,
and probably more.  Multiply that over a few weeks (not even counting
the cost of more broken concentration) and we're talking a real loss
of productivity and satisfaction.

Here's an example of a person that works on a Postgres-oriented
project at his day job, has multi-year experience with it, and can
write detailed articles like these:
https://devcenter.heroku.com/articles/postgresql-concurrency .  If he
didn't know to get this right without having it called out as a
caveat, what number of people have but the most slim chance?  Our best
asset is probably the relative obscurity of TRUNCATE vs. DELETE for
those who are less familiar with the system.

I'm sure he would have found it eventually when starting to profile
his tests when they hit the 3-4 minute mark, although he might just as
easily said "well, TRUNCATE, that's the fast one...nothing to do
there...".

> That said, the group of people who care about this most are not well
> represented as active contributors to PostgreSQL. I'd love it if you could
> help start to change that by stepping in and taking a little time to profile
> exactly what's going on with your system so we can learn what, exactly, is
> slow.

It's not my platform of choice, per se, but on my Ubuntu Precise on
ext4 with fsync off and no disk encryption:

$ rake
55.37user 2.36system 1:15.33elapsed 76%CPU (0avgtext+0avgdata
543120maxresident)k
0inputs+2728outputs (0major+85691minor)pagefaults 0swaps

$ rake
53.85user 1.97system 2:04.38elapsed 44%CPU (0avgtext+0avgdata
547904maxresident)k
0inputs+2640outputs (0major+100226minor)pagefaults 0swaps

Which is a not-as-pathetic slowdown, but still pretty substantial,
being somewhat shy of 2x.  I'll ask around for someone who is
Macintosh-OS-inclined (not as a user, but as a developer) about a good
way to get a profile.

--
fdr

Re: DELETE vs TRUNCATE explanation

From

Craig Ringer

Date:

12 July 2012, 04:46:15

On 07/12/2012 02:12 PM, Daniel Farina wrote:
> On Wed, Jul 11, 2012 at 6:41 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
>> On 07/12/2012 06:51 AM, Daniel Farina wrote:
>>> 15x slower.  This is a Macbook Air with full disk encryption and SSD
>>> disk with fsync off, e.g. a very typical developer configuration.
>> Don't use full disk encryption for throwaway test data if you care about how
>> long those tests take. It's a lot like tuning the engine in your car while
>> ignoring the fact that the handbrake is jammed on and you're dragging a
>> parachute. Use a ramdisk or un-encrypted partition, something that doesn't
>> take three weeks to fsync().
> No. Full disk encryption is not that slow.  And as we see, there is a
> workaround that works "just fine" (maybe it could be faster, who
> knows?) in this exact configuration.  The greater problem is more
> likely to be HFS+, the file system.

The two are somewhat hand in hand in any case.

"Three weeks" is of course hyperbole. Nonetheless, I haven't seen a full
disk encryption system that doesn't dramatically slow down synchronous
operations by forcing a lot more work to be done than would be the case
without disk encryption. Perhaps the Mac OS X / HFS+ solution is an
exception to this, but I doubt it.

Given a small program that repeats the following sequence:

- Creates a file
- Writes few bytes to it
- fsync()s and closes it
- deletes it
- fsync()s the directory  to ensure the metadata change is flushed

... and times it, it'd be interesting to do test runs with and without
encryption on HFS+.

> But
> that's surprisingly challenging: for example, last I checked,
> Postgres.app, principally written by one of my colleagues, does *not*
> disable fsync because we don't know of a great way to communicate the
> relaxed expectations of durability, even though Postgres.app is
> targeted towards developers

I think this is an issue of developer and user responsibility. Proper
test/dev separation from production, and a bit of thought, is all it
takes. After all, Pg can't stop you running your unit tests (full of all
those slow TRUNCATEs) against your production database, either.
Durability isn't worth a damn if you just deleted all your data.

About the only technical aid I can see for this would be some kind of
GUC that the app could proactively check against. Set it to "production"
for your production DB, and "test" for your throwaways. If the unit
tests see "production" they refuse to run; if the app proper sees "test"
it warns about data durability. Have it default to unset or "test" so
admins must explicitly set it to "production".

Handily, this is already possible. You can add whatever custom GUCs you
want. If you want to make your unit tests require that a GUC called
"stage.is_production" be off in order to run, just add to postgresql.conf:

   custom_variable_classes = 'stage'
   stage.is_production = off

now, you can see the new GUC:

regress=# SHOW stage.is_production;
  stage.is_production
---------------------
  off
(1 row)

... so your unit tests and app can check for it. Since you're producing
custom installers, this is something you can bundle as part of the
generated postgresql.conf for easy differentiation between test and
production DBs.

If requirements like this were integrated into common unit testing
frameworks some of these worries would go away. That's not something Pg
cane make happen, though.

How would you want to see it work? How would you solve this problem?

> Until then, this is an at the level of an is-ought problem: there is
> no immediate to even moderately distant future where people are not
> going to click the full disk encryption button their OS vendor gives
> them (nor should they *not* click that: people love to download bits
> of data from production to their local machine to figure out problems,
> and I think the world is a better place for it), and people are going
> to use HFS+ in large numbers, so talking about how many people "just"
> ought to reconfigure is tantamount to blaming the victim, especially
> when we have a sound and workable workaround in hand to at least prove
> definitively that the problem is not intractable.

Yes, people do work on production data in test envs, and FDE is overall
a plus. I'd rather they not turn it off - and rather they not have to.
That's why I suggested using a ramdisk as an alternative; it's
completely non-durable and just gets tossed out, so there's no more
worry about data leakage than there is for access to the disk cache
buffered in RAM or the mounted disks of a FDE machine when it's unlocked.

Setting up Pg to run off a ramdisk isn't a one-click trivial operation,
and it sounds like the group you're mainly interested in are the
database-as-a-utility crowd that prefer not to see, think about, or
touch the database directly, hence Postgres.app etc. If so this is much
more of a packaging problem than a core Pg problem. I take your point
about needing to be able to indicate lack of durability to clients, but
think it's relatively easily done with a custom GUC as shown above.

Of course, Pg on a ramdisk has other issues that quickly become apparent
when you "COPY" that 2GB CSV file into your DB...

> Lousy file system is my guess. HFS is not that great. I bet ext3 would
> be a reasonable model of this amount of pain as well.

Hey, HFS+ Journaled/Extended, which is all that you're ever likely to
see, is merely bad :-P

The original HFS, now that was a monster. Not-so-fond memories of
regular Norton tools defrag runs resurfacing from my Mac OS 7 days...

> I'm sure I could, but first I want to put to complete rest the notion
> that this is an "edge case."  It's only an edge case if the only
> database you have runs in production.  An understanding by more people
> that this is a problem of at least moderate impact is a good first
> step.  I'll ask some of my more Macintosh-adept colleagues for advice.

That'd be great; as this is an issue having real world impact, people
with mac equipment and knowledge need to get involved in helping to
solve it. It's not confined to mac, but seems to be worse there.

The other way you could help would be by providing canned self-contained
test cases that can be used to demonstrate the big performance gaps
you're reporting and test them on other platforms / OSes / file systems.
Something with a "I've never used Ruby" quickstart.

> Here's an example of a person that works on a Postgres-oriented
> project at his day job, has multi-year experience with it, and can
> write detailed articles like these:
> https://devcenter.heroku.com/articles/postgresql-concurrency .  If he
> didn't know to get this right without having it called out as a
> caveat, what number of people have but the most slim chance?  Our best
> asset is probably the relative obscurity of TRUNCATE vs. DELETE for
> those who are less familiar with the system.

Yep. This whole issue was new to me until last week too. I run tests
against my DB but it's fast enough here. In any case, for my tests other
costs are greatly more significant than a few fractions of a second
difference in one DB operation. Clearly that's not the case for some DB
unit testing designs.

Other than ruby/rails/rake, what other systems are you aware of that're
affected by these issues? I'm not dismissing ruby, I just want to know
if you know of other groups or techs that're ALSO affected.

> Which is a not-as-pathetic slowdown, but still pretty substantial,
> being somewhat shy of 2x.  I'll ask around for someone who is
> Macintosh-OS-inclined (not as a user, but as a developer) about a good
> way to get a profile.

That'd be great.  Get them onto the list and involved, because if you
want to see this improved it's going to take some back and forth and
someone who can interpret the profile results, test changes, etc.

I only have a limited ability and willingness to drive this forward; I
have to focus on other things. You'll need to be willing to be proactive
and push this a bit. Figuring out what part of truncation is taking the
time would be a big plus, as would determining how much worse FDE makes
it vs an unencrypted disk.

Hopefully others are interested and following along too.

--
Craig Ringer

Re: DELETE vs TRUNCATE explanation

From

Jeff Janes

Date:

12 July 2012, 16:15:31

On Wed, Jul 11, 2012 at 3:51 PM, Daniel Farina <daniel@heroku.com> wrote:
>
> Nope. I don't.  But an exact crossover is a level of precision I don't
> really need, because here are where things stand on a completely
> unremarkable test suite on the closest project to me that meets the
> "regular web-app" profile case:
>
> With en-masse DELETE:
> rake  41.89s user 3.08s system 76% cpu 58.629 total
>
> With TRUNCATE:
> rake  49.86s user 2.93s system 5% cpu 15:17.88 total
>
> 15x slower.  This is a Macbook Air with full disk encryption and SSD
> disk with fsync off, e.g. a very typical developer configuration.

What is shared_buffers?

> This is a rather small schema -- probably a half a dozen tables, and
> probably about a dozen indexes.  This application is entirely
> unremarkable in its test-database workload: it wants to load a few
> records, do a few things, and then clear those handful of records.

How many rounds of truncation does one rake do?  I.e. how many
truncations are occurring over the course of that 1 minute or 15
minutes?


Cheers,

Jeff

Re: DELETE vs TRUNCATE explanation

From

"Harold A. Giménez"

Date:

12 July 2012, 20:21:32

Hi,

I work with Daniel Farina and was the other engineer who "discovered" this, once again. That is, I got bit by it and have been running TRUNCATE on my test suites for years.

On Thursday, July 12, 2012 at 12:15 PM, Jeff Janes wrote:

On Wed, Jul 11, 2012 at 3:51 PM, Daniel Farina <daniel@heroku.com> wrote:

Nope. I don't. But an exact crossover is a level of precision I don't
really need, because here are where things stand on a completely
unremarkable test suite on the closest project to me that meets the
"regular web-app" profile case:

With en-masse DELETE:
rake 41.89s user 3.08s system 76% cpu 58.629 total

With TRUNCATE:
rake 49.86s user 2.93s system 5% cpu 15:17.88 total

15x slower. This is a Macbook Air with full disk encryption and SSD
disk with fsync off, e.g. a very typical developer configuration.

What is shared_buffers?

1600kB

Not sure this will make much difference with such small data, but of course I could be dead wrong here.

This is a rather small schema -- probably a half a dozen tables, and
probably about a dozen indexes. This application is entirely
unremarkable in its test-database workload: it wants to load a few
records, do a few things, and then clear those handful of records.

How many rounds of truncation does one rake do? I.e. how many
truncations are occurring over the course of that 1 minute or 15
minutes?

All tables are cleared out after every test. On this particular project, I'm running 200+ tests in 1.5 minutes (or 30 seconds with DELETE instead of TRUNCATE). For another, bigger project it's running 1700+ tests in about a minute. You can do the math from there.

I'd say this is not atypical at all, so I too encourage teaching TRUNCATE about small tables and optimizing for that, as well as a section in the docs about postgres tweaks for test suites. I'm sure many people have done independent research in this area, and it'd be great to have it documented in one place.

-Harold

Cheers,

Jeff

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: DELETE vs TRUNCATE explanation

From

Jeff Janes

Date:

12 July 2012, 22:01:07

On Thu, Jul 12, 2012 at 4:21 PM, Harold A. Giménez
<harold.gimenez@gmail.com> wrote:
>
> > What is shared_buffers?
>
>
> 1600kB

That is really small, so the buffer flushing should not be a problem.
Unless you mean 1600MB.

> > > This is a rather small schema -- probably a half a dozen tables, and
> > > probably about a dozen indexes. This application is entirely
> > > unremarkable in its test-database workload: it wants to load a few
> > > records, do a few things, and then clear those handful of records.
> >
> > How many rounds of truncation does one rake do? I.e. how many
> > truncations are occurring over the course of that 1 minute or 15
> > minutes?
>
> All tables are cleared out after every test. On this particular project, I'm
> running 200+ tests in 1.5 minutes (or 30 seconds with DELETE instead of
> TRUNCATE). For another, bigger project it's running 1700+ tests in about a
> minute. You can do the math from there.

so 1700 rounds * 18 relations = truncates 30,600 per minute.

That is actually faster than I get truncates to go when I am purely
limited by CPU.

I think the problem is in the Fsync Absorption queue.  Every truncate
adds a FORGET_RELATION_FSYNC to the queue, and processing each one of
those leads to sequential scanning the checkpointer's pending ops hash
table, which is quite large.  It is almost entirely full of other
requests which have already been canceled, but it still has to dig
through them all.   So this is essentially an N^2 operation.

I'm not sure why we don't just delete the entry instead of marking it
as cancelled.  It looks like the only problem is that you can't delete
an entry other than the one just returned by hash_seq_search.  Which
would be fine, as that is the entry that we would want to delete;
except that mdsync might have a different hash_seq_search open, and so
it wouldn't be safe to delete.

If the segno was taken out of the hash key and handled some other way,
then the forgetting could be done with a simple hash look up rather
than a full scan.

Maybe we could just turn off the pending ops table altogether when
fsync=off, but since fsync is PGC_SIGHUP it is not clear how you could
safely turn it back on.

Cheers,

Jeff

Proposed change for 9.3(?): Require full restart to change fsync parameter, not just pg_ctl reload

From

Craig Ringer

Date:

15 July 2012, 22:55:10

Hi all

Some performance improvements have been proposed - probably for 9.3 -
that will mean the `fsync' GUC can only be changed with a full cluster
restart. See quoted, at end of message.

It is currently possible to change `fsync' by altering postgresql.conf
and issuing a `pg_ctl reload' . It is not clear how safe this really is
even now, and changes proposed to reduce the amount of expensive
bookkeeping done when fsync is set to 'off' will make it even less safe.
Consequently, it is proposed that the ability to change the fsync
setting while Pg is running be removed.

fsync=off is very unsafe anyway, and these days production setups are
able to get similar results with async commits and group commit.

Is there anyone here relying on being able to change fsync=off to
fsync=on at runtime? If so, what for, and what does it gain you over use
of group/async commit?

For related discussion see the -hackers thread:

  "DELETE vs TRUNCATE explanation"


http://archives.postgresql.org/message-id/CAMkU=1yLXvODRZZ_=fgrEeJfk2tvZPTTD-8n8BwrAhNz_WBT0A@mail.gmail.com


and the background threads:

  "PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
fastest way to clean each non-empty table and reset unique identifier
column of empty ones."


http://archives.postgresql.org/message-id/CAFXpGYbgmZYij4TgCbOF24-usoiDD0ASQeaVAkYtB7E2TYm8Wg@mail.gmail.com

   "DELETE vs TRUNCATE explanation"

   http://archives.postgresql.org/message-id/4FFCCAC4.4030503@ringerc.id.au



On 07/16/2012 09:37 AM, Tom Lane wrote:
> Craig Ringer <ringerc@ringerc.id.au> writes:
>> On 07/16/2012 02:29 AM, Tom Lane wrote:
>>> Yeah, you have a point there.  It's not real clear that switching fsync
>>> from off to on is an operation that we can make any guarantees about,
>>> short of executing something like the code recently added to initdb
>>> to force-sync the entire PGDATA tree.
>
>> There's one way that doesn't have any housekeeping cost to Pg. It's
>> pretty bad manners if there's anybody other than Pg on the system though:
>>     sync()
>
> Yeah, I thought about that: if we could document that issuing a manual
> sync after turning fsync on leaves you in a guaranteed-good state once
> the sync is complete, it'd probably be fine.  However, I'm not convinced
> that we could promise that with a straight face.  In the first place,
> PG has only very weak guarantees about how quickly all processes in the
> system will absorb a GUC update.  In the second place, I'm not entirely
> sure that there aren't race conditions around checkpoints and the fsync
> request queue (particularly if we do what Jeff is suggesting and
> suppress queuing requests at the upstream end).  It might be all right,
> or it might be all right after expending some work, but the whole thing
> is not an area where I think anyone wants to spend time.  I think it'd
> be much safer to document that the correct procedure is "stop the
> database, do a manual sync, enable fsync in postgresql.conf, restart the
> database".  And if that's what we're documenting, we lose little or
> nothing by marking fsync as PGC_POSTMASTER.
>
>             regards, tom lane
>

Re: DELETE vs TRUNCATE explanation

From

Jeff Janes

Date:

09 August 2012, 15:07:12

On Thu, Jul 12, 2012 at 4:21 PM, Harold A. Giménez
<harold.gimenez@gmail.com> wrote:
> Hi,
>
> I work with Daniel Farina and was the other engineer who "discovered" this,
> once again. That is, I got bit by it and have been running TRUNCATE on my
> test suites for years.

Hi Daniel and Harold,

I don't know if you followed this thread over into the -hacker mailing list.

There was some bookkeeping code that was N^2 in the number of
truncations performed during any given checkpoint cycle.  That has
been fixed in 9.2Beta3.

I suspect that this was the root cause of the problem you encountered.

If you are in a position to retest using 9.2Beta3
(http://www.postgresql.org/about/news/1405/), I'd be interested to
know if it does make truncations comparable in speed to unqualified
deletes.

Thanks,

Jeff