Thread: Re: dsa_allocate() faliure
Moving to -hackers, hopefully it doesn't confuse the list scripts too much. On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote: > I see the error showing up every night on 2 different servers. But it's a > bit of a heisenbug because If I go there now it won't be reproducible. Do you have query logging enabled ? If not, could you consider it on at least one of those servers ? I'm interested to know what ELSE is running at the time that query failed. Perhaps you could enable query logging JUST for the interval of time that the server usually errors ? The CSV logs can be imported to postgres for analysis. You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN ..AND .. GROUP BY 1 ORDER BY 2; And just maybe there'd be a query there that only runs once per day which would allow reproducing the error at will. Or utility command like vacuum.. I think ideally you'd set: log_statement = all log_min_messages = info log_destination = 'stderr,csvlog' # stderr isn't important for this purpose, but I keep it set to capture crash messages, too You should set these to something that works well at your site: log_rotation_age = '2min' log_rotation_size = '32MB' I would normally set these, and I don't see any reason why you wouldn't set them too: log_checkpoints = on log_lock_waits = on log_temp_files = on log_min_error_statement = notice log_temp_files = 0 log_min_duration_statement = '9sec' log_autovacuum_min_duration = '999sec' And I would set these too but maybe you'd prefer to do something else: log_directory = /var/log/postgresql log_file_mode = 0640 log_filename = postgresql-%Y-%m-%d_%H%M%S.log Justin
> Do you have query logging enabled ? If not, could you consider it on at least
one of those servers ? I'm interested to know what ELSE is running at the time
that query failed.
--
regards,
pozdrawiam,
Jakub Glapa
one of those servers ? I'm interested to know what ELSE is running at the time
that query failed.
Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I have something.
--
regards,
pozdrawiam,
Jakub Glapa
On Thu, Feb 7, 2019 at 12:21 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
Moving to -hackers, hopefully it doesn't confuse the list scripts too much.
On Mon, Feb 04, 2019 at 08:52:17AM +0100, Jakub Glapa wrote:
> I see the error showing up every night on 2 different servers. But it's a
> bit of a heisenbug because If I go there now it won't be reproducible.
Do you have query logging enabled ? If not, could you consider it on at least
one of those servers ? I'm interested to know what ELSE is running at the time
that query failed.
Perhaps you could enable query logging JUST for the interval of time that the
server usually errors ? The CSV logs can be imported to postgres for analysis.
You might do something like SELECT left(message,99),COUNT(1),max(session_id) FROM postgres_log WHERE log_time BETWEEN .. AND .. GROUP BY 1 ORDER BY 2;
And just maybe there'd be a query there that only runs once per day which would
allow reproducing the error at will. Or utility command like vacuum..
I think ideally you'd set:
log_statement = all
log_min_messages = info
log_destination = 'stderr,csvlog'
# stderr isn't important for this purpose, but I keep it set to capture crash messages, too
You should set these to something that works well at your site:
log_rotation_age = '2min'
log_rotation_size = '32MB'
I would normally set these, and I don't see any reason why you wouldn't set
them too:
log_checkpoints = on
log_lock_waits = on
log_temp_files = on
log_min_error_statement = notice
log_temp_files = 0
log_min_duration_statement = '9sec'
log_autovacuum_min_duration = '999sec'
And I would set these too but maybe you'd prefer to do something else:
log_directory = /var/log/postgresql
log_file_mode = 0640
log_filename = postgresql-%Y-%m-%d_%H%M%S.log
Justin
On Thu, Feb 7, 2019 at 9:10 PM Jakub Glapa <jakub.glapa@gmail.com> wrote: > > Do you have query logging enabled ? If not, could you consider it on at least > one of those servers ? I'm interested to know what ELSE is running at the time > that query failed. > > Ok, I have configured that and will enable in the time window when the errors usually occur. I'll report as soon as I havesomething. I don't have the answer yet but I have some progress: I finally reproduced the "could not find %d free pages" error by running lots of concurrent parallel queries. Will investigate. Set up: create table foo (p int, a int, b int) partition by list (p); create table foo_1 partition of foo for values in (1); create table foo_2 partition of foo for values in (2); create table foo_3 partition of foo for values in (3); alter table foo_1 set (parallel_workers = 4); alter table foo_2 set (parallel_workers = 4); alter table foo_3 set (parallel_workers = 4); insert into foo select generate_series(1, 10000000)::int % 3 + 1, generate_series(1, 10000000)::int % 50, generate_series(1, 10000000)::int % 50; create index on foo_1(a); create index on foo_2(a); create index on foo_3(a); create index on foo_1(b); create index on foo_2(b); create index on foo_3(b); analyze; Then I ran three copies of : #!/bin/sh ( echo "set max_parallel_workers_per_gather = 4;" for I in `seq 1 100000`; do echo "explain analyze select count(*) from foo where a between 5 and 6 or b between 5 and 6;" done ) | psql postgres -- Thomas Munro http://www.enterprisedb.com
On Fri, Feb 8, 2019 at 4:49 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I don't have the answer yet but I have some progress: I finally > reproduced the "could not find %d free pages" error by running lots of > concurrent parallel queries. Will investigate. Sometimes FreeManagerPutInternal() returns a number-of-contiguous-pages-created-by-this-insertion that is too large by one. If this happens to be a new max-number-of-contiguous-pages, it causes trouble some arbitrary time later because the max is wrong and this FPM cannot satisfy a request that large, and it may not be recomputed for some time because the incorrect value prevents recomputation. Not sure yet if this is due to the lazy computation logic or a plain old fence-post error in the btree consolidation code or something else. -- Thomas Munro http://www.enterprisedb.com
On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Sometimes FreeManagerPutInternal() returns a > number-of-contiguous-pages-created-by-this-insertion that is too large > by one. If this happens to be a new max-number-of-contiguous-pages, > it causes trouble some arbitrary time later because the max is wrong > and this FPM cannot satisfy a request that large, and it may not be > recomputed for some time because the incorrect value prevents > recomputation. Not sure yet if this is due to the lazy computation > logic or a plain old fence-post error in the btree consolidation code > or something else. I spent a long time thinking about this and starting at code this afternoon, but I didn't really come up with much of anything useful. It seems like a strange failure mode, because FreePageManagerPutInternal() normally just returns its third argument unmodified. The only cases where anything else happens are the ones where we're able to consolidate the returned span with a preceding or following span, and I'm scratching my head as to how that logic could be wrong, especially since it also has some Assert() statements that seem like they would detect the kinds of inconsistencies that would lead to trouble. For example, if we somehow ended up with two spans that (improperly) overlapped, we'd trip an Assert(). And if that didn't happen -- because we're not in an Assert-enabled build -- the code is written so that it only relies on the npages value of the last of the consolidated scans, so an error in the npages value of one of the earlier spans would just get fixed up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Feb 9, 2019 at 9:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > Sometimes FreeManagerPutInternal() returns a > > number-of-contiguous-pages-created-by-this-insertion that is too large > > by one. [...] > > I spent a long time thinking about this and starting at code this > afternoon, but I didn't really come up with much of anything useful. > It seems like a strange failure mode, because > FreePageManagerPutInternal() normally just returns its third argument > unmodified. [...] Bleugh. Yeah. What I said before wasn't quite right. The value returned by FreePageManagerPutInternal() is actually correct at the moment it is returned, but it ceases to be correct immediately afterwards if the following call to FreePageBtreeCleanup() happens to reduce the size of that particular span. The problem is that we clobber fpm->contiguous_pages with the earlier (and by now incorrect) value that we were holding in a local variable. -- Thomas Munro http://www.enterprisedb.com
On Sun, Feb 10, 2019 at 7:24 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sat, Feb 9, 2019 at 9:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Fri, Feb 8, 2019 at 8:00 AM Thomas Munro > > <thomas.munro@enterprisedb.com> wrote: > > > Sometimes FreeManagerPutInternal() returns a > > > number-of-contiguous-pages-created-by-this-insertion that is too large > > > by one. [...] > > > > I spent a long time thinking about this and starting at code this > > afternoon, but I didn't really come up with much of anything useful. > > It seems like a strange failure mode, because > > FreePageManagerPutInternal() normally just returns its third argument > > unmodified. [...] > > Bleugh. Yeah. What I said before wasn't quite right. The value > returned by FreePageManagerPutInternal() is actually correct at the > moment it is returned, but it ceases to be correct immediately > afterwards if the following call to FreePageBtreeCleanup() happens to > reduce the size of that particular span. ... but why would it do that? I can reproduce cases where (for example) FreePageManagerPutInternal() returns 179, and then FreePageManagerLargestContiguous() returns 179, but then after FreePageBtreeCleanup() it returns 178. At that point FreePageDump() says: btree depth 1: 77@0 l: 27(1) 78(178) freelists: 1: 27 129: 78(178) But at first glance it shouldn't be allocating pages, because it just does consolidation to try to convert to singleton format, and then it does recycle list cleanup using soft=true so that no allocation of btree pages should occur. -- Thomas Munro http://www.enterprisedb.com
On Sun, Feb 10, 2019 at 1:55 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Bleugh. Yeah. What I said before wasn't quite right. The value > returned by FreePageManagerPutInternal() is actually correct at the > moment it is returned, but it ceases to be correct immediately > afterwards if the following call to FreePageBtreeCleanup() happens to > reduce the size of that particular span. The problem is that we > clobber fpm->contiguous_pages with the earlier (and by now incorrect) > value that we were holding in a local variable. Yeah, I had similar bugs to that during the initial development work I did on freepage.c, and that's why I got rid of some lazy recomputation thing that I had tried at some point. The version that got committed brought that back again, but possibly it's got the same kind of problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > ... but why would it do that? I can reproduce cases where (for > example) FreePageManagerPutInternal() returns 179, and then > FreePageManagerLargestContiguous() returns 179, but then after > FreePageBtreeCleanup() it returns 178. At that point FreePageDump() > says: > > btree depth 1: > 77@0 l: 27(1) 78(178) > freelists: > 1: 27 > 129: 78(178) > > But at first glance it shouldn't be allocating pages, because it just > does consolidation to try to convert to singleton format, and then it > does recycle list cleanup using soft=true so that no allocation of > btree pages should occur. I think I see what's happening. At the moment the problem occurs, there is no btree - there is only a singleton range. So FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and then ends up in the section with the comment /* Not contiguous; we need to initialize the btree. */. And that section, sadly, does not respect the 'soft' flag, so kaboom. Something like the attached might fix it. Boy, I love FreePageManagerDump! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Sun, Feb 10, 2019 at 12:10:52PM +0530, Robert Haas wrote: > I think I see what's happening. At the moment the problem occurs, > there is no btree - there is only a singleton range. So > FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and > then ends up in the section with the comment /* Not contiguous; we > need to initialize the btree. */. And that section, sadly, does not > respect the 'soft' flag, so kaboom. Something like the attached might > fix it. I ran overnight with this patch, but all parallel processes ended up stuck in the style of bug#15585. So that's either not the root cause, or there's a 2nd issue. https://www.postgresql.org/message-id/flat/15585-324ff6a93a18da46%40postgresql.org Justin
Hi > I ran overnight with this patch, but all parallel processes ended up stuck in > the style of bug#15585. So that's either not the root cause, or there's a 2nd > issue. Maybe i missed something in this discussion, but you can reproduce bug#15585? How? With this testcase: https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com? regards, Sergei
On Sun, Feb 10, 2019 at 07:11:22PM +0300, Sergei Kornilov wrote: > > I ran overnight with this patch, but all parallel processes ended up stuck in > > the style of bug#15585. So that's either not the root cause, or there's a 2nd > > issue. > > Maybe i missed something in this discussion, but you can reproduce bug#15585? How? With this testcase: https://www.postgresql.org/message-id/CAEepm%3D1MvOE-Sfv1chudx5KEmw_qHYrj8F9Og_WmdBRhXSQ%2B%2Bw%40mail.gmail.com? By running the queued_alters query multiple times in a loop: https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com I'm able to trigger dsa "ERROR"s with that query on a newly initdb cluster with only that table. But I think some servers are more likely to hit it than others. I've only tripped on 15585 twice, and only while trying to trigger other DSA bugs (the working hypothesis is that bug is 2ndary issue which happens after hitting some other bug). And not consistently or quickly. Justin
On Sun, Feb 10, 2019 at 5:41 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Feb 10, 2019 at 2:37 AM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > But at first glance it shouldn't be allocating pages, because it just > > does consolidation to try to convert to singleton format, and then it > > does recycle list cleanup using soft=true so that no allocation of > > btree pages should occur. > > I think I see what's happening. At the moment the problem occurs, > there is no btree - there is only a singleton range. So > FreePageManagerInternal() takes the fpm->btree_depth == 0 branch and > then ends up in the section with the comment /* Not contiguous; we > need to initialize the btree. */. And that section, sadly, does not > respect the 'soft' flag, so kaboom. Something like the attached might > fix it. Ouch. Yeah, that'd do it and matches the evidence. With this change, I couldn't reproduce the problem after 90 minutes with a test case that otherwise hits it within a couple of minutes. Here's a patch with a commit message explaining the change. It also removes an obsolete comment, which is in fact related. The comment refers to an output parameter internal_pages_used, which must have been used to report this exact phenomenon in an earlier development version. But there is no such parameter in the committed version, and instead there is the soft flag to prevent internal allocation. I have no view on which approach is best, but yeah, if we're using a soft flag, it has to work reliably. This brings us to a difficult choice: we're about to cut a new release, and this could in theory be included. Even though the fix is quite convincing, it doesn't seem wise to change such complicated code at the last minute, and I know from an off-list chat that that is also Robert's view. So I'll wait until after the release, and we'll have to live with the bug for another 3 months. Note that this patch addresses the error "dsa_allocate could not find %zu free pages". (The error "dsa_area could not attach to segment" is something else and apparently rarer.) > Boy, I love FreePageManagerDump! Yeah. And I love reproducible bugs. -- Thomas Munro http://www.enterprisedb.com
Attachment
Thomas Munro <thomas.munro@enterprisedb.com> writes: > This brings us to a difficult choice: we're about to cut a new > release, and this could in theory be included. Even though the fix is > quite convincing, it doesn't seem wise to change such complicated code > at the last minute, and I know from an off-list chat that that is also > Robert's view. Yeah ... at this point we're just too close to the release deadline, I'm afraid, even though the fix *looks* pretty safe. Not worth the risk given that this seems to be a low-probability bug. I observe from https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html that the edge cases in this function aren't too well exercised by our regression tests, meaning that the buildfarm might not prove much either way about the correctness of this patch. That is one factor pushing me to think we shouldn't risk it. But, taking a longer view, is that something that's practical to improve? > So I'll wait until after the release, and we'll have > to live with the bug for another 3 months. Check. Please hold off committing until you see the release tags appear, probably late Tuesday my time / Wednesday noonish yours. regards, tom lane
On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote: > Ouch. Yeah, that'd do it and matches the evidence. With this change, > I couldn't reproduce the problem after 90 minutes with a test case > that otherwise hits it within a couple of minutes. ... > Note that this patch addresses the error "dsa_allocate could not find > %zu free pages". (The error "dsa_area could not attach to segment" is > something else and apparently rarer.) "could not attach" is the error reported early this morning while stress-testing this patch with queued_alters queries in loops, so that's consistent with your understanding. And I guess it preceded getting stuck on lock; although I don't how long between the first happened and the second, I'm guess not long and perhaps immedidately; since the rest of the processes were all stuck as in bug#15585 rather than ERRORing once every few minutes. I mentioned that "could not attach to segment" occurs in leader either/or parallel worker. And most of the time causes an ERROR only, and doesn't wedge all future parallel workers. Maybe bug#15585 "wedged" state maybe only occurs after some pattern of leader+worker failures (?) I've just triggered bug#15585 again, but if there's a pattern, I don't see it. Please let me know whether you're able to reproduce the "not attach" bug using simultaneous loops around the queued_alters query; it's easy here. Justin
On Mon, Feb 11, 2019 at 11:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Mon, Feb 11, 2019 at 09:45:07AM +1100, Thomas Munro wrote: > > Ouch. Yeah, that'd do it and matches the evidence. With this change, > > I couldn't reproduce the problem after 90 minutes with a test case > > that otherwise hits it within a couple of minutes. > ... > > Note that this patch addresses the error "dsa_allocate could not find > > %zu free pages". (The error "dsa_area could not attach to segment" is > > something else and apparently rarer.) > > "could not attach" is the error reported early this morning while > stress-testing this patch with queued_alters queries in loops, so that's > consistent with your understanding. And I guess it preceded getting stuck on > lock; although I don't how long between the first happened and the second, I'm > guess not long and perhaps immedidately; since the rest of the processes were > all stuck as in bug#15585 rather than ERRORing once every few minutes. > > I mentioned that "could not attach to segment" occurs in leader either/or > parallel worker. And most of the time causes an ERROR only, and doesn't wedge > all future parallel workers. Maybe bug#15585 "wedged" state maybe only occurs > after some pattern of leader+worker failures (?) I've just triggered bug#15585 > again, but if there's a pattern, I don't see it. > > Please let me know whether you're able to reproduce the "not attach" bug using > simultaneous loops around the queued_alters query; it's easy here. I haven't ever managed to reproduce that one yet. It's great you have a reliable repro... Let's discuss it on the #15585 thread. -- Thomas Munro http://www.enterprisedb.com
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I observe from > > https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html > > that the edge cases in this function aren't too well exercised by > our regression tests, meaning that the buildfarm might not prove > much either way about the correctness of this patch. That is one > factor pushing me to think we shouldn't risk it. But, taking a > longer view, is that something that's practical to improve? Yeah. This is a nice example of code that really deserves unit tests written in C. Could be good motivation to built the infrastructure I mentioned here: https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I observe from >> https://coverage.postgresql.org/src/backend/utils/mmgr/freepage.c.gcov.html >> that the edge cases in this function aren't too well exercised by >> our regression tests, meaning that the buildfarm might not prove >> much either way about the correctness of this patch. That is one >> factor pushing me to think we shouldn't risk it. But, taking a >> longer view, is that something that's practical to improve? > Yeah. This is a nice example of code that really deserves unit tests > written in C. Could be good motivation to built the infrastructure I > mentioned here: > https://www.postgresql.org/message-id/flat/CAEepm%3D2heu%2B5zwB65jWap3XY-UP6PpJZiKLQRSV2UQH9BmVRXQ%40mail.gmail.com Meh. I think if you hold out for that, you're going to be waiting a long time. I was thinking more along the lines of making a test API in src/test/modules/, akin to what we've got for predtest or rbtree. regards, tom lane
On Mon, Feb 11, 2019 at 10:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > So I'll wait until after the release, and we'll have > > to live with the bug for another 3 months. > > Check. Please hold off committing until you see the release tags > appear, probably late Tuesday my time / Wednesday noonish yours. Pushed. -- Thomas Munro http://www.enterprisedb.com