Thread: DELETE vs TRUNCATE explanation
Hi After seeing a few discussions here and on Stack Overflow I've put together a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE table" for people doing unit testing on lots of tiny tables, people who're doing this so often they care how long it takes. I'd love it if a few folks who know the guts were to take a look and verify its correctness: http://stackoverflow.com/a/11423886/398670 -- Craig Ringer
On Tue, Jul 10, 2012 at 5:37 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > Hi > > After seeing a few discussions here and on Stack Overflow I've put together > a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE > table" for people doing unit testing on lots of tiny tables, people who're > doing this so often they care how long it takes. > > I'd love it if a few folks who know the guts were to take a look and verify > its correctness: I haven't said this before, but think it every time someone asks me about this, so I'll say it now: This is a papercut that should be solved with improved mechanics. TRUNCATE should simply be very nearly the fastest way to remove data from a table while retaining its type information, and if that means doing DELETE without triggers when the table is small, then it should. The only person who could thwart me is someone who badly wants their 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB of catalog anyway. Does that sound reasonable? As in, would anyone object if TRUNCATE learned this behavior? -- fdr
Daniel Farina <daniel@heroku.com> writes: > TRUNCATE should simply be very nearly the fastest way to remove data > from a table while retaining its type information, and if that means > doing DELETE without triggers when the table is small, then it should. > The only person who could thwart me is someone who badly wants their > 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB > of catalog anyway. > Does that sound reasonable? As in, would anyone object if TRUNCATE > learned this behavior? Yes, I will push back on that. (1) We don't need the extra complexity. (2) I don't believe that you know where the performance crossover point would be (according to what metric, anyway?). (3) The performance of the truncation itself should not be viewed in isolation; subsequent behavior also needs to be considered. An example of possible degradation is that index bloat would no longer be guaranteed to be cleaned up over a series of repeated truncations. (You might argue that if the table is small then the indexes couldn't be very bloated, but I don't think that holds up over a long series.) IOW, I think it's fine as-is. I'd certainly wish to see many more than one complainant before we expend effort in this area. regards, tom lane
On Wed, Jul 11, 2012 at 10:05:48AM -0400, Tom Lane wrote: > Daniel Farina <daniel@heroku.com> writes: > > TRUNCATE should simply be very nearly the fastest way to remove data > > from a table while retaining its type information, and if that means > > doing DELETE without triggers when the table is small, then it should. > > The only person who could thwart me is someone who badly wants their > > 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB > > of catalog anyway. > > > Does that sound reasonable? As in, would anyone object if TRUNCATE > > learned this behavior? > > Yes, I will push back on that. > > (1) We don't need the extra complexity. > > (2) I don't believe that you know where the performance crossover point > would be (according to what metric, anyway?). > > (3) The performance of the truncation itself should not be viewed in > isolation; subsequent behavior also needs to be considered. An example > of possible degradation is that index bloat would no longer be > guaranteed to be cleaned up over a series of repeated truncations. > (You might argue that if the table is small then the indexes couldn't > be very bloated, but I don't think that holds up over a long series.) > > IOW, I think it's fine as-is. I'd certainly wish to see many more > than one complainant before we expend effort in this area. > > regards, tom lane > +1 TRUNCATE needs to keep the same properties independent of the size of the table. Smearing it into a DELETE would not be good at all. If there are optimizations that can be done to keep its current behavior, those might be possible, but the complexity may not be worthwhile for a relative corner case. Regards, Ken
Tom Lane wrote: > (3) The performance of the truncation itself should not be viewed in > isolation; subsequent behavior also needs to be considered. An example > of possible degradation is that index bloat would no longer be > guaranteed to be cleaned up over a series of repeated truncations. > (You might argue that if the table is small then the indexes couldn't > be very bloated, but I don't think that holds up over a long series.) > > IOW, I think it's fine as-is. I'd certainly wish to see many more > than one complainant before we expend effort in this area. I think a documentation change would be worthwhile. At the moment the TRUNCATE page says, with no caveats, that it is faster than unqualified DELETE. It surprised me to find that this wasn't true (with 7.2, again with small tables in a testsuite), and evidently it's still surprising people today. -M-
On Wed, Jul 11, 2012 at 7:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not?
The main point of truncate is to provide a more efficient mechanism to delete all data from large tables. If your app developers don't know within a couple orders of magnitude how much data your tables hold, and can't figure out whether to use delete or truncate, I can't find much sympathy in my heart.
Craig
Daniel Farina <daniel@heroku.com> writes:
> TRUNCATE should simply be very nearly the fastest way to remove data
> from a table while retaining its type information, and if that means
> doing DELETE without triggers when the table is small, then it should.
> The only person who could thwart me is someone who badly wants their
> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
> of catalog anyway.
> Does that sound reasonable? As in, would anyone object if TRUNCATE
> learned this behavior?
Yes, I will push back on that.
(1) We don't need the extra complexity.
(2) I don't believe that you know where the performance crossover point
would be (according to what metric, anyway?).
(3) The performance of the truncation itself should not be viewed in
isolation; subsequent behavior also needs to be considered. An example
of possible degradation is that index bloat would no longer be
guaranteed to be cleaned up over a series of repeated truncations.
(You might argue that if the table is small then the indexes couldn't
be very bloated, but I don't think that holds up over a long series.)
IOW, I think it's fine as-is. I'd certainly wish to see many more
than one complainant before we expend effort in this area.
It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not?
The main point of truncate is to provide a more efficient mechanism to delete all data from large tables. If your app developers don't know within a couple orders of magnitude how much data your tables hold, and can't figure out whether to use delete or truncate, I can't find much sympathy in my heart.
Craig
On 07/11/2012 03:18 PM, Craig James wrote: > It strikes me as a contrived case rather than a use case. What sort of > app repeatedly fills and truncates a small table thousands of times ... > other than a test app to see whether you can do it or not? Test systems. Any company with even a medium-size QA environment will have continuous integration systems that run unit tests on a trash database hundreds or thousands of times through the day. Aside from dropping/creating the database via template, which would be *really* slow, truncate is the easiest/fastest way to reset between tests. If TRUNCATE suddenly started defaulting to DELETE on small table-sets and several iterations led to exponential index growth, that would be rather unfortunate. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 07/11/2012 04:47 PM, Shaun Thomas wrote: > On 07/11/2012 03:18 PM, Craig James wrote: > >> It strikes me as a contrived case rather than a use case. What sort of >> app repeatedly fills and truncates a small table thousands of times ... >> other than a test app to see whether you can do it or not? > > Test systems. Any company with even a medium-size QA environment will > have continuous integration systems that run unit tests on a trash > database hundreds or thousands of times through the day. Aside from > dropping/creating the database via template, which would be *really* > slow, truncate is the easiest/fastest way to reset between tests. Why is recreating the test db from a (populated) template going to be slower than truncating all the tables and repopulating from an external source? I had a client who achieved a major improvement in speed and reduction in load by moving to this method of test db setup. cheers andrew
On 11/07/12 21:18, Craig James wrote: > > It strikes me as a contrived case rather than a use case. What sort > of app repeatedly fills and truncates a small table thousands of times > ... other than a test app to see whether you can do it or not? If I have a lot of data which updates/inserts an existing table but I don't know if a given record will be an update or an insert, then I write all the 'new' data to a temporary table and then use sql statements to achieve the updates and inserts on the existing table. Is there a better way of doing this in standard SQL? Mark
On Wed, Jul 11, 2012 at 2:32 PM, Mark Thornton <mthornton@optrak.com> wrote:
If it's a single session, use a temporary table. It is faster to start with (temp tables aren't logged), and it's automatically dropped at the end of the session (or at the end of the transaction if that's what you specified when you created it). This doesn't work if your insert/update spans more than one session.
Another trick that works (depending on how big your tables are) is to scan the primary key before you start, and build a hash table of the keys. That instantly tells you whether each record should be an insert or update.
Craig
On 11/07/12 21:18, Craig James wrote:If I have a lot of data which updates/inserts an existing table but I don't know if a given record will be an update or an insert, then I write all the 'new' data to a temporary table and then use sql statements to achieve the updates and inserts on the existing table.
It strikes me as a contrived case rather than a use case. What sort of app repeatedly fills and truncates a small table thousands of times ... other than a test app to see whether you can do it or not?
Is there a better way of doing this in standard SQL?
If it's a single session, use a temporary table. It is faster to start with (temp tables aren't logged), and it's automatically dropped at the end of the session (or at the end of the transaction if that's what you specified when you created it). This doesn't work if your insert/update spans more than one session.
Another trick that works (depending on how big your tables are) is to scan the primary key before you start, and build a hash table of the keys. That instantly tells you whether each record should be an insert or update.
Craig
Mark
On Wed, Jul 11, 2012 at 7:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> TRUNCATE should simply be very nearly the fastest way to remove data >> from a table while retaining its type information, and if that means >> doing DELETE without triggers when the table is small, then it should. >> The only person who could thwart me is someone who badly wants their >> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB >> of catalog anyway. > >> Does that sound reasonable? As in, would anyone object if TRUNCATE >> learned this behavior? > > Yes, I will push back on that. > > (1) We don't need the extra complexity. Well, a "need" is justified by the gains, no? It seems like this follows from the thoughts presented afterwards, so I'll discuss those. > (2) I don't believe that you know where the performance crossover point > would be (according to what metric, anyway?). Nope. I don't. But an exact crossover is a level of precision I don't really need, because here are where things stand on a completely unremarkable test suite on the closest project to me that meets the "regular web-app" profile case: With en-masse DELETE: rake 41.89s user 3.08s system 76% cpu 58.629 total With TRUNCATE: rake 49.86s user 2.93s system 5% cpu 15:17.88 total 15x slower. This is a Macbook Air with full disk encryption and SSD disk with fsync off, e.g. a very typical developer configuration. This is a rather small schema -- probably a half a dozen tables, and probably about a dozen indexes. This application is entirely unremarkable in its test-database workload: it wants to load a few records, do a few things, and then clear those handful of records. > (3) The performance of the truncation itself should not be viewed in > isolation; subsequent behavior also needs to be considered. An example > of possible degradation is that index bloat would no longer be > guaranteed to be cleaned up over a series of repeated truncations. > (You might argue that if the table is small then the indexes couldn't > be very bloated, but I don't think that holds up over a long series.) I'm not entirely convinced to the mechanism, it was simply the most obvious one, but I bet a one that is better in every respect is also possible. It did occur to me that bloat might be a sticky point. > IOW, I think it's fine as-is. I'd certainly wish to see many more > than one complainant before we expend effort in this area. I've seen way more than one complaint, and I'm quite sure there are thousands of man hours (or more) spent on people who don't even know to complain about such atrocious performance (or maybe it's so bad that most people run a web search and find out, probably being left really annoyed from having to yak shave as a result). In spite of how familiar I am with Postgres and its mailing lists, I have glossed over this for a long time, just thinking "wow, that really sucks" and only now -- by serendipity of having skimmed this post -- have seen fit to complain on behalf of quite a few rounds of dispensing workaround advice to other people. It's only when this was brought to the fore of my mind did I stop to consider how much wasted time I've seen in people trying to figure this out over and over again (granted, they tend to remember after the first time). Perhaps a doc fix is all we need (TRUNCATE is constant-time on large tables, but can be very slow compared to DELETE on small tables), but I completely and enthusiastically reject any notion from people calling this "contrived" or an "edge case," because people writing software against PostgreSQL that have unit tests have this use case constantly, often dozens or even hundreds of times a day. What I don't know is how many people figure out that they should use DELETE instead, and after how long. Even though the teams I work with are very familiar with many of the finer points of Postgres, doing some probing for the first time took a little while. If we're going to live with it, I contest that we should own it as a real and substantial weakness for development productivity, and not sweep it under the rug as some "contrived" or "corner" case. -- fdr
On 07/12/2012 02:10 AM, Matthew Woodcraft wrote: > I think a documentation change would be worthwhile. At the moment the > TRUNCATE page says, with no caveats, that it is faster than > unqualified DELETE. +1 to updating the docs to reflect the fact that TRUNCATE may have a higher fixed cost than DELETE FROM table; but also prevents bloat. It's a weird little corner case, but with database-backed unit testing it's going to become a more significant one whether or not it feels like it makes any sense. -- Craig Ringer
On 07/11/2012 01:22 PM, Daniel Farina wrote: > On Tue, Jul 10, 2012 at 5:37 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: >> Hi >> >> After seeing a few discussions here and on Stack Overflow I've put together >> a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE >> table" for people doing unit testing on lots of tiny tables, people who're >> doing this so often they care how long it takes. >> >> I'd love it if a few folks who know the guts were to take a look and verify >> its correctness: > I haven't said this before, but think it every time someone asks me > about this, so I'll say it now: > > This is a papercut that should be solved with improved mechanics. > TRUNCATE should simply be very nearly the fastest way to remove data > from a table while retaining its type information, and if that means > doing DELETE without triggers when the table is small, then it should. > The only person who could thwart me is someone who badly wants their > 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB > of catalog anyway. > > Does that sound reasonable? As in, would anyone object if TRUNCATE > learned this behavior? Yep, I'd object. It's more complicated and less predictable. Also, as I strongly and repeatedly highlighted in my post, DELETE FROM table; does a different job to TRUNCATE. You'd at minimum need the effect of DELETE followed by a VACUUM on the table and its indexes to be acceptable and avoid the risk of rapid table + index bloat - and that'd be lots slower than a TRUNCATE. You could be clever and lock the table then DELETE and set xmax at the same time I guess, but I suspect that'd be a bit of work and still wouldn't take care of the indexes. It's also too complicated, not least because AFAIK util commands and CRUD commands go through very different paths in PostgreSQL. I guess you could propose and post a prototype patch for a new command that tried to empty the table via whatever method it thought would be fastest. Such a new command wouldn't be bound by the accepted and expected rules followed by TRUNCATE so it could vary its behaviour based on the table, doing a real truncate on big tables and a delete-then-vaccum on small tables. I suspect you'd land up writing the fairly complicated code for the potentially multi-table delete-and-vaccum yourself. Honestly, though, it might be much better to start with "how can TRUNCATE of empty or near-empty tables be made faster?" and start examining where the time goes. -- Craig Ringer
On 07/12/2012 06:51 AM, Daniel Farina wrote: > 15x slower. This is a Macbook Air with full disk encryption and SSD > disk with fsync off, e.g. a very typical developer configuration. Don't use full disk encryption for throwaway test data if you care about how long those tests take. It's a lot like tuning the engine in your car while ignoring the fact that the handbrake is jammed on and you're dragging a parachute. Use a ramdisk or un-encrypted partition, something that doesn't take three weeks to fsync(). That said, this performance gap makes me wonder if TRUNCATE is forcing metadata synchronisation even with fsync=off, causing the incredibly glacially awesomely slow disk access of your average FDE system to kick in, possibly even once per table or even once per file (index, table, toast, etc). If so, it may be worth: - Allowing TRUNCATE to skip synchronization when fsync=off. Pg is already allowed to eat all your data if it feels like it in this configuration, so there's no point flushing filesystem metadata to make sure files are really swapped. - When fsync=on, trying to flush all changes to all files out at once rather than once per file as it could be doing (haven't checked) right now. How to do this without also flushing all other pending I/O on the whole system (with a global "sync()") would be somewhat OS/filesystem dependent, unfortunately. You could help progress this issue constructively by doing some profiling on your system, tracing Pg's system calls, and determining what exactly it's doing with DELETE vs TRUNCATE and where the time goes. On Linux you'd use OProfile for this and on Solaris you'd use DTrace. Dunno what facilities Mac OS X has but there must be something similar. Once you've determined why it's slow, you have a useful starting point for making it faster, first for test systems with fsync=off then, once that's tracked down, maybe for robust systems with fsync=on. > I've seen way more than one complaint, and I'm quite sure there are > thousands of man hours (or more) spent on people who don't even know > to complain about such atrocious performance (or maybe it's so bad > that most people run a web search and find out, probably being left > really annoyed from having to yak shave as a result). I suspect you're right - as DB based unit testing becomes more commonplace this is turning up a lot more. As DB unit tests were first really popular in the ruby/rails crowd they've probably seen the most pain, but as someone who doesn't move in those circles I wouldn't have known. They certainly don't seem to have been making noise about it here, and I've only recently seen some SO questions about it. > Perhaps a doc fix is all we need (TRUNCATE is constant-time on large > tables, but can be very slow compared to DELETE on small tables), but > I completely and enthusiastically reject any notion from people > calling this "contrived" or an "edge case," because people writing > software against PostgreSQL that have unit tests have this use case > constantly, often dozens or even hundreds of times a day. I have to agree with this - it may have been an edge case in the past, but it's becoming mainstream and is worth being aware of. That said, the group of people who care about this most are not well represented as active contributors to PostgreSQL. I'd love it if you could help start to change that by stepping in and taking a little time to profile exactly what's going on with your system so we can learn what, exactly, is slow. -- Craig Ringer
On Wed, Jul 11, 2012 at 6:41 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: > On 07/12/2012 06:51 AM, Daniel Farina wrote: >> >> 15x slower. This is a Macbook Air with full disk encryption and SSD >> disk with fsync off, e.g. a very typical developer configuration. > > Don't use full disk encryption for throwaway test data if you care about how > long those tests take. It's a lot like tuning the engine in your car while > ignoring the fact that the handbrake is jammed on and you're dragging a > parachute. Use a ramdisk or un-encrypted partition, something that doesn't > take three weeks to fsync(). No. Full disk encryption is not that slow. And as we see, there is a workaround that works "just fine" (maybe it could be faster, who knows?) in this exact configuration. The greater problem is more likely to be HFS+, the file system. If someone produces and gets adoption of a wonderfully packaged test-configurations of Postgres using a ram-based block device that somehow have a good user experience living alongside the persistent version, this problem can go away completely. In fact, that would be *phenomenal*, because so many things could be so much faster. But that's surprisingly challenging: for example, last I checked, Postgres.app, principally written by one of my colleagues, does *not* disable fsync because we don't know of a great way to communicate the relaxed expectations of durability, even though Postgres.app is targeted towards developers: for example, it does not run until you log in, so it's more like a foreground application. Maybe if the connection had an option that said "x-test=true", or something...deposit your idea here. Until then, this is an at the level of an is-ought problem: there is no immediate to even moderately distant future where people are not going to click the full disk encryption button their OS vendor gives them (nor should they *not* click that: people love to download bits of data from production to their local machine to figure out problems, and I think the world is a better place for it), and people are going to use HFS+ in large numbers, so talking about how many people "just" ought to reconfigure is tantamount to blaming the victim, especially when we have a sound and workable workaround in hand to at least prove definitively that the problem is not intractable. > That said, this performance gap makes me wonder if TRUNCATE is forcing > metadata synchronisation even with fsync=off, causing the incredibly > glacially awesomely slow disk access of your average FDE system to kick in, > possibly even once per table or even once per file (index, table, toast, > etc). Lousy file system is my guess. HFS is not that great. I bet ext3 would be a reasonable model of this amount of pain as well. > You could help progress this issue constructively by doing some profiling on > your system, tracing Pg's system calls, and determining what exactly it's > doing with DELETE vs TRUNCATE and where the time goes. On Linux you'd use > OProfile for this and on Solaris you'd use DTrace. Dunno what facilities Mac > OS X has but there must be something similar. I'm sure I could, but first I want to put to complete rest the notion that this is an "edge case." It's only an edge case if the only database you have runs in production. An understanding by more people that this is a problem of at least moderate impact is a good first step. I'll ask some of my more Macintosh-adept colleagues for advice. >> I've seen way more than one complaint, and I'm quite sure there are >> thousands of man hours (or more) spent on people who don't even know >> to complain about such atrocious performance (or maybe it's so bad >> that most people run a web search and find out, probably being left >> really annoyed from having to yak shave as a result). > > I suspect you're right - as DB based unit testing becomes more commonplace > this is turning up a lot more. As DB unit tests were first really popular in > the ruby/rails crowd they've probably seen the most pain, but as someone who > doesn't move in those circles I wouldn't have known. They certainly don't > seem to have been making noise about it here, and I've only recently seen > some SO questions about it. Well, here's another anecdotal data point to show how this can sneak under the radar: because this was a topic of discussion in the office today, a colleague in the Department of Data discovered his 1.5 minute testing cycle could be cut to thirty seconds. We conservatively estimate he runs the tests 30 times a day when working on his project, and probably more. Multiply that over a few weeks (not even counting the cost of more broken concentration) and we're talking a real loss of productivity and satisfaction. Here's an example of a person that works on a Postgres-oriented project at his day job, has multi-year experience with it, and can write detailed articles like these: https://devcenter.heroku.com/articles/postgresql-concurrency . If he didn't know to get this right without having it called out as a caveat, what number of people have but the most slim chance? Our best asset is probably the relative obscurity of TRUNCATE vs. DELETE for those who are less familiar with the system. I'm sure he would have found it eventually when starting to profile his tests when they hit the 3-4 minute mark, although he might just as easily said "well, TRUNCATE, that's the fast one...nothing to do there...". > That said, the group of people who care about this most are not well > represented as active contributors to PostgreSQL. I'd love it if you could > help start to change that by stepping in and taking a little time to profile > exactly what's going on with your system so we can learn what, exactly, is > slow. It's not my platform of choice, per se, but on my Ubuntu Precise on ext4 with fsync off and no disk encryption: $ rake 55.37user 2.36system 1:15.33elapsed 76%CPU (0avgtext+0avgdata 543120maxresident)k 0inputs+2728outputs (0major+85691minor)pagefaults 0swaps $ rake 53.85user 1.97system 2:04.38elapsed 44%CPU (0avgtext+0avgdata 547904maxresident)k 0inputs+2640outputs (0major+100226minor)pagefaults 0swaps Which is a not-as-pathetic slowdown, but still pretty substantial, being somewhat shy of 2x. I'll ask around for someone who is Macintosh-OS-inclined (not as a user, but as a developer) about a good way to get a profile. -- fdr
On 07/12/2012 02:12 PM, Daniel Farina wrote: > On Wed, Jul 11, 2012 at 6:41 PM, Craig Ringer <ringerc@ringerc.id.au> wrote: >> On 07/12/2012 06:51 AM, Daniel Farina wrote: >>> 15x slower. This is a Macbook Air with full disk encryption and SSD >>> disk with fsync off, e.g. a very typical developer configuration. >> Don't use full disk encryption for throwaway test data if you care about how >> long those tests take. It's a lot like tuning the engine in your car while >> ignoring the fact that the handbrake is jammed on and you're dragging a >> parachute. Use a ramdisk or un-encrypted partition, something that doesn't >> take three weeks to fsync(). > No. Full disk encryption is not that slow. And as we see, there is a > workaround that works "just fine" (maybe it could be faster, who > knows?) in this exact configuration. The greater problem is more > likely to be HFS+, the file system. The two are somewhat hand in hand in any case. "Three weeks" is of course hyperbole. Nonetheless, I haven't seen a full disk encryption system that doesn't dramatically slow down synchronous operations by forcing a lot more work to be done than would be the case without disk encryption. Perhaps the Mac OS X / HFS+ solution is an exception to this, but I doubt it. Given a small program that repeats the following sequence: - Creates a file - Writes few bytes to it - fsync()s and closes it - deletes it - fsync()s the directory to ensure the metadata change is flushed ... and times it, it'd be interesting to do test runs with and without encryption on HFS+. > But > that's surprisingly challenging: for example, last I checked, > Postgres.app, principally written by one of my colleagues, does *not* > disable fsync because we don't know of a great way to communicate the > relaxed expectations of durability, even though Postgres.app is > targeted towards developers I think this is an issue of developer and user responsibility. Proper test/dev separation from production, and a bit of thought, is all it takes. After all, Pg can't stop you running your unit tests (full of all those slow TRUNCATEs) against your production database, either. Durability isn't worth a damn if you just deleted all your data. About the only technical aid I can see for this would be some kind of GUC that the app could proactively check against. Set it to "production" for your production DB, and "test" for your throwaways. If the unit tests see "production" they refuse to run; if the app proper sees "test" it warns about data durability. Have it default to unset or "test" so admins must explicitly set it to "production". Handily, this is already possible. You can add whatever custom GUCs you want. If you want to make your unit tests require that a GUC called "stage.is_production" be off in order to run, just add to postgresql.conf: custom_variable_classes = 'stage' stage.is_production = off now, you can see the new GUC: regress=# SHOW stage.is_production; stage.is_production --------------------- off (1 row) ... so your unit tests and app can check for it. Since you're producing custom installers, this is something you can bundle as part of the generated postgresql.conf for easy differentiation between test and production DBs. If requirements like this were integrated into common unit testing frameworks some of these worries would go away. That's not something Pg cane make happen, though. How would you want to see it work? How would you solve this problem? > Until then, this is an at the level of an is-ought problem: there is > no immediate to even moderately distant future where people are not > going to click the full disk encryption button their OS vendor gives > them (nor should they *not* click that: people love to download bits > of data from production to their local machine to figure out problems, > and I think the world is a better place for it), and people are going > to use HFS+ in large numbers, so talking about how many people "just" > ought to reconfigure is tantamount to blaming the victim, especially > when we have a sound and workable workaround in hand to at least prove > definitively that the problem is not intractable. Yes, people do work on production data in test envs, and FDE is overall a plus. I'd rather they not turn it off - and rather they not have to. That's why I suggested using a ramdisk as an alternative; it's completely non-durable and just gets tossed out, so there's no more worry about data leakage than there is for access to the disk cache buffered in RAM or the mounted disks of a FDE machine when it's unlocked. Setting up Pg to run off a ramdisk isn't a one-click trivial operation, and it sounds like the group you're mainly interested in are the database-as-a-utility crowd that prefer not to see, think about, or touch the database directly, hence Postgres.app etc. If so this is much more of a packaging problem than a core Pg problem. I take your point about needing to be able to indicate lack of durability to clients, but think it's relatively easily done with a custom GUC as shown above. Of course, Pg on a ramdisk has other issues that quickly become apparent when you "COPY" that 2GB CSV file into your DB... > Lousy file system is my guess. HFS is not that great. I bet ext3 would > be a reasonable model of this amount of pain as well. Hey, HFS+ Journaled/Extended, which is all that you're ever likely to see, is merely bad :-P The original HFS, now that was a monster. Not-so-fond memories of regular Norton tools defrag runs resurfacing from my Mac OS 7 days... > I'm sure I could, but first I want to put to complete rest the notion > that this is an "edge case." It's only an edge case if the only > database you have runs in production. An understanding by more people > that this is a problem of at least moderate impact is a good first > step. I'll ask some of my more Macintosh-adept colleagues for advice. That'd be great; as this is an issue having real world impact, people with mac equipment and knowledge need to get involved in helping to solve it. It's not confined to mac, but seems to be worse there. The other way you could help would be by providing canned self-contained test cases that can be used to demonstrate the big performance gaps you're reporting and test them on other platforms / OSes / file systems. Something with a "I've never used Ruby" quickstart. > Here's an example of a person that works on a Postgres-oriented > project at his day job, has multi-year experience with it, and can > write detailed articles like these: > https://devcenter.heroku.com/articles/postgresql-concurrency . If he > didn't know to get this right without having it called out as a > caveat, what number of people have but the most slim chance? Our best > asset is probably the relative obscurity of TRUNCATE vs. DELETE for > those who are less familiar with the system. Yep. This whole issue was new to me until last week too. I run tests against my DB but it's fast enough here. In any case, for my tests other costs are greatly more significant than a few fractions of a second difference in one DB operation. Clearly that's not the case for some DB unit testing designs. Other than ruby/rails/rake, what other systems are you aware of that're affected by these issues? I'm not dismissing ruby, I just want to know if you know of other groups or techs that're ALSO affected. > Which is a not-as-pathetic slowdown, but still pretty substantial, > being somewhat shy of 2x. I'll ask around for someone who is > Macintosh-OS-inclined (not as a user, but as a developer) about a good > way to get a profile. That'd be great. Get them onto the list and involved, because if you want to see this improved it's going to take some back and forth and someone who can interpret the profile results, test changes, etc. I only have a limited ability and willingness to drive this forward; I have to focus on other things. You'll need to be willing to be proactive and push this a bit. Figuring out what part of truncation is taking the time would be a big plus, as would determining how much worse FDE makes it vs an unencrypted disk. Hopefully others are interested and following along too. -- Craig Ringer
On Wed, Jul 11, 2012 at 3:51 PM, Daniel Farina <daniel@heroku.com> wrote: > > Nope. I don't. But an exact crossover is a level of precision I don't > really need, because here are where things stand on a completely > unremarkable test suite on the closest project to me that meets the > "regular web-app" profile case: > > With en-masse DELETE: > rake 41.89s user 3.08s system 76% cpu 58.629 total > > With TRUNCATE: > rake 49.86s user 2.93s system 5% cpu 15:17.88 total > > 15x slower. This is a Macbook Air with full disk encryption and SSD > disk with fsync off, e.g. a very typical developer configuration. What is shared_buffers? > This is a rather small schema -- probably a half a dozen tables, and > probably about a dozen indexes. This application is entirely > unremarkable in its test-database workload: it wants to load a few > records, do a few things, and then clear those handful of records. How many rounds of truncation does one rake do? I.e. how many truncations are occurring over the course of that 1 minute or 15 minutes? Cheers, Jeff
Hi,
I work with Daniel Farina and was the other engineer who "discovered" this, once again. That is, I got bit by it and have been running TRUNCATE on my test suites for years.
On Thursday, July 12, 2012 at 12:15 PM, Jeff Janes wrote:
On Wed, Jul 11, 2012 at 3:51 PM, Daniel Farina <daniel@heroku.com> wrote:Nope. I don't. But an exact crossover is a level of precision I don'treally need, because here are where things stand on a completelyunremarkable test suite on the closest project to me that meets the"regular web-app" profile case:With en-masse DELETE:rake 41.89s user 3.08s system 76% cpu 58.629 totalWith TRUNCATE:rake 49.86s user 2.93s system 5% cpu 15:17.88 total15x slower. This is a Macbook Air with full disk encryption and SSDdisk with fsync off, e.g. a very typical developer configuration.What is shared_buffers?
1600kB
Not sure this will make much difference with such small data, but of course I could be dead wrong here.
This is a rather small schema -- probably a half a dozen tables, andprobably about a dozen indexes. This application is entirelyunremarkable in its test-database workload: it wants to load a fewrecords, do a few things, and then clear those handful of records.How many rounds of truncation does one rake do? I.e. how manytruncations are occurring over the course of that 1 minute or 15minutes?
All tables are cleared out after every test. On this particular project, I'm running 200+ tests in 1.5 minutes (or 30 seconds with DELETE instead of TRUNCATE). For another, bigger project it's running 1700+ tests in about a minute. You can do the math from there.
I'd say this is not atypical at all, so I too encourage teaching TRUNCATE about small tables and optimizing for that, as well as a section in the docs about postgres tweaks for test suites. I'm sure many people have done independent research in this area, and it'd be great to have it documented in one place.
-Harold
Cheers,Jeff--Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)To make changes to your subscription:
On Thu, Jul 12, 2012 at 4:21 PM, Harold A. Giménez <harold.gimenez@gmail.com> wrote: > > > What is shared_buffers? > > > 1600kB That is really small, so the buffer flushing should not be a problem. Unless you mean 1600MB. > > > This is a rather small schema -- probably a half a dozen tables, and > > > probably about a dozen indexes. This application is entirely > > > unremarkable in its test-database workload: it wants to load a few > > > records, do a few things, and then clear those handful of records. > > > > How many rounds of truncation does one rake do? I.e. how many > > truncations are occurring over the course of that 1 minute or 15 > > minutes? > > All tables are cleared out after every test. On this particular project, I'm > running 200+ tests in 1.5 minutes (or 30 seconds with DELETE instead of > TRUNCATE). For another, bigger project it's running 1700+ tests in about a > minute. You can do the math from there. so 1700 rounds * 18 relations = truncates 30,600 per minute. That is actually faster than I get truncates to go when I am purely limited by CPU. I think the problem is in the Fsync Absorption queue. Every truncate adds a FORGET_RELATION_FSYNC to the queue, and processing each one of those leads to sequential scanning the checkpointer's pending ops hash table, which is quite large. It is almost entirely full of other requests which have already been canceled, but it still has to dig through them all. So this is essentially an N^2 operation. I'm not sure why we don't just delete the entry instead of marking it as cancelled. It looks like the only problem is that you can't delete an entry other than the one just returned by hash_seq_search. Which would be fine, as that is the entry that we would want to delete; except that mdsync might have a different hash_seq_search open, and so it wouldn't be safe to delete. If the segno was taken out of the hash key and handled some other way, then the forgetting could be done with a simple hash look up rather than a full scan. Maybe we could just turn off the pending ops table altogether when fsync=off, but since fsync is PGC_SIGHUP it is not clear how you could safely turn it back on. Cheers, Jeff
Proposed change for 9.3(?): Require full restart to change fsync parameter, not just pg_ctl reload
From
Craig Ringer
Date:
Hi all Some performance improvements have been proposed - probably for 9.3 - that will mean the `fsync' GUC can only be changed with a full cluster restart. See quoted, at end of message. It is currently possible to change `fsync' by altering postgresql.conf and issuing a `pg_ctl reload' . It is not clear how safe this really is even now, and changes proposed to reduce the amount of expensive bookkeeping done when fsync is set to 'off' will make it even less safe. Consequently, it is proposed that the ability to change the fsync setting while Pg is running be removed. fsync=off is very unsafe anyway, and these days production setups are able to get similar results with async commits and group commit. Is there anyone here relying on being able to change fsync=off to fsync=on at runtime? If so, what for, and what does it gain you over use of group/async commit? For related discussion see the -hackers thread: "DELETE vs TRUNCATE explanation" http://archives.postgresql.org/message-id/CAMkU=1yLXvODRZZ_=fgrEeJfk2tvZPTTD-8n8BwrAhNz_WBT0A@mail.gmail.com and the background threads: "PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones." http://archives.postgresql.org/message-id/CAFXpGYbgmZYij4TgCbOF24-usoiDD0ASQeaVAkYtB7E2TYm8Wg@mail.gmail.com "DELETE vs TRUNCATE explanation" http://archives.postgresql.org/message-id/4FFCCAC4.4030503@ringerc.id.au On 07/16/2012 09:37 AM, Tom Lane wrote: > Craig Ringer <ringerc@ringerc.id.au> writes: >> On 07/16/2012 02:29 AM, Tom Lane wrote: >>> Yeah, you have a point there. It's not real clear that switching fsync >>> from off to on is an operation that we can make any guarantees about, >>> short of executing something like the code recently added to initdb >>> to force-sync the entire PGDATA tree. > >> There's one way that doesn't have any housekeeping cost to Pg. It's >> pretty bad manners if there's anybody other than Pg on the system though: >> sync() > > Yeah, I thought about that: if we could document that issuing a manual > sync after turning fsync on leaves you in a guaranteed-good state once > the sync is complete, it'd probably be fine. However, I'm not convinced > that we could promise that with a straight face. In the first place, > PG has only very weak guarantees about how quickly all processes in the > system will absorb a GUC update. In the second place, I'm not entirely > sure that there aren't race conditions around checkpoints and the fsync > request queue (particularly if we do what Jeff is suggesting and > suppress queuing requests at the upstream end). It might be all right, > or it might be all right after expending some work, but the whole thing > is not an area where I think anyone wants to spend time. I think it'd > be much safer to document that the correct procedure is "stop the > database, do a manual sync, enable fsync in postgresql.conf, restart the > database". And if that's what we're documenting, we lose little or > nothing by marking fsync as PGC_POSTMASTER. > > regards, tom lane >
On Thu, Jul 12, 2012 at 4:21 PM, Harold A. Giménez <harold.gimenez@gmail.com> wrote: > Hi, > > I work with Daniel Farina and was the other engineer who "discovered" this, > once again. That is, I got bit by it and have been running TRUNCATE on my > test suites for years. Hi Daniel and Harold, I don't know if you followed this thread over into the -hacker mailing list. There was some bookkeeping code that was N^2 in the number of truncations performed during any given checkpoint cycle. That has been fixed in 9.2Beta3. I suspect that this was the root cause of the problem you encountered. If you are in a position to retest using 9.2Beta3 (http://www.postgresql.org/about/news/1405/), I'd be interested to know if it does make truncations comparable in speed to unqualified deletes. Thanks, Jeff