Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
Date | |
Msg-id | CAEepm=3iK_JrM3=pFzTU-L3c+g61R6Dg5H7R3bsD9CMdYTLRjA@mail.gmail.com Whole thread Raw |
In response to | Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Re: BUG #12990: Missing pg_multixact/members files
(appears to have wrapped, then truncated)
Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated) |
List | pgsql-bugs |
On Sun, May 3, 2015 at 4:40 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > About data, I have extracted parts where there is a change in > oldest_mxid and segments > > time segments usage_fraction usage_kb oldest_mxid next_mxid next_offset > > 13:48:36 1 0 16 1 1 0 > 13:49:36 369 .0044 94752 1 1 0 > .. > 14:44:04 41703 .5083 10713400 1 8528909 2140755909 > > 14:45:05 1374 .0167 352960 8573819 8722521 2189352521 > .. > 15:37:16 41001 .4997 10529528 8573819 17060811 4282263311 > .. > 15:38:16 709 .0086 182056 17132168 17254423 35892627 > .. > 16:57:15 41440 .5051 10644712 17132168 25592713 2128803417 > .. > 16:58:16 1120 .0136 287416 25695507 25786824 2177525278 > > Based on this data, it seems that truncation of member space > as well as advancement of oldest multixact id happens once > it reaches 50% usage and at that time segments drops down to almost > zero. This happens repeatedly after 1 hour and in-between there > is no progress which indicates that all the work happens at > one go rather than in spreaded way. Won't this choke the system > when it happens due to I/O, isn't it better if we design it in a way such > that it is spreaded over period of time rather than doing everything at > one go? At 50% usage it starts vacuuming tables that have the very oldest mxid in the system. That is, at 50% usage, we try to select the smallest non-zero fraction of tables that can be selected based on relminmxid alone, with any luck exactly one or close to it. As usage increases, we decrease the cutoff age slowly so we start selecting tables with slightly newer relminmxid values for vacuuming too. Only if it reaches 75% usage will it vacuum everything it can and eat all your IO. In a real system, I suppose there would be lots of big tables that need to be vacuumed, the vacuuming would take some time, and they would tend to have different relminmxids so that vacuuming would be effectively spread out by this selection algorithm. I think. Perhaps we should devise a test procedure to try to see if that happens. We'd need to create a bunch of big tables and modify monitor.sh to show the relminmxid for each of them so that you could see when they are being processed -- I will look into that. Restricting ourselves to selecting tables to vacuum using their relminmxid alone makes this patch small since autovacuum already works that way. We *could* introduce code that would be able to spread out the work of vacuuming tables that happen to have identical or very close relminmxid (say by introducing some non-determinism or doing something weird based on hashing table oids and the time to explicitly spread the start of processing over time, or <your idea here>), but I didn't want to propose anything too big/complicated/clever/stupid and I suspect that the relminmxid values will tend to diverge over time (but I could be wrong about that, if they all start at 1 and then move forward in lockstep over long periods of time then what I propose is not good enough... let's see if we can find out). > -- > +int > +compute_max_multixact_age_to_avoid_member_wrap(bool manual) > { > .. > + if (members <= safe_member_count) > + { > + /* > + * There is no danger of > member wrap, so return a number that is not > + * lower than autovacuum_multixact_freeze_max_age. > + > */ > + return -1; > + } > .. > > The above code doesn't seem to match its comments. > Comment says "..not lower than autovacuum_multixact_freeze_max_age", > but then return -1. It seems to me here we should return unchanged > autovacuum_multixact_freeze_max_age as it was coded in the initial > version of patch. Do you have any specific reason to change it? Oops, the comment is fixed in the attached patch. In an earlier version, I was only dealing with the autovacuum case. Now that the VACUUM command also calls it, I didn't want this compute_max_multixact_age_to_avoid_member_wrap function to assume that it was being called by autovacuum code and return the autovacuum-specific GUC in the case that no special action is needed. Also, the function no longer computes a value by scaling autovacuum_multixact_freeze_max_age, it now scales the current number of active multixacts, so that we can begin selecting a small non-zero number of tables to vacuum as soon as we exceed safe_member_count as described above (whereas when we used a scaled down autovaccum_multixact_freeze_max_age, we usually didn't select any tables at all until we scaled it down a lot, ie until we got close to dangerous_member_count). Finally, I wanted a special value like -1 for 'none' so that table_recheck_autovac and ExecVacuum could use a simple test >= 0 to know that they also need to set multixact_freeze_min_age to zero in the case of a member-space-triggered vacuum, so that we get maximum benefit from our table scans by freezing all relevant tuples, not just some older ones (that's why you see usage drop to almost zero each time, whereas the monitor.sh results I showed from the earlier patch trimmed usage by varying amounts, which meant that autovacuum wraparounds would need to be done again sooner). Does that make sense? -- Thomas Munro http://www.enterprisedb.com
Attachment
pgsql-bugs by date: