Thread: Re: pgsql: Avoid creation of the free space map for small heap relations.
On Mon, Jan 28, 2019 at 8:49 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 28, 2019 at 8:17 AM Amit Kapila <akapila@postgresql.org> wrote: > > > > Avoid creation of the free space map for small heap relations. > > > > It seems there is some failure due to this on build farm machines. I > will investigate! > The failure is as below: @@ -26,7 +26,7 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size -----------+---------- - 24576 | 0 + 32768 | 0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold @@ -56,7 +56,7 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; fsm_size ---------- - 16384 + 24576 (1 row) So here in the above tests, we are deleting some rows, performing vacuum and then checking the size of heap and or vacuum. It seems to me sometimes the vacuum is *not* able to remove the rows and truncate the relation/FSM as per tests expectation. One possible theory is that there is some parallel transaction running which prevents a vacuum from doing so. We have tried to avoid that by not allowing to run the FSM test in parallel with other tests, but I think still something seems to have run in parallel to the FSM test. One possibility is that autovacuum has triggered to perform truncation of some other relation (remove pages at the end) which doesn't allow the FSM test to remove the rows/perform truncation and thus let to the failure. Can there be anything else which can start a transaction when a regression test is running? Still thinking, but inputs are welcome. If my theory is correct, then in the newly added tests by this patch, we can't rely on the vacuum to truncate the relation pages at the end and hence can't rely on heap/FSM size. Thoughts? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: Amit> One possibility is that autovacuum has triggered to perform Amit> truncation of some other relation (remove pages at the end) which Amit> doesn't allow the FSM test to remove the rows/perform truncation Amit> and thus let to the failure. Can there be anything else which can Amit> start a transaction when a regression test is running? Still Amit> thinking, but inputs are welcome. I've bumped into issues (cf. commit 64ae420b2) with regression tests of this kind caused by concurrent (auto-)ANALYZE (not VACUUM); VACUUM's snapshot is ignored for testing for whether row versions can be removed, but ANALYZE's is not. -- Andrew (irc:RhodiumToad)
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: Amit> One possibility is that autovacuum has triggered to perform Amit> truncation of some other relation (remove pages at the end) which Amit> doesn't allow the FSM test to remove the rows/perform truncation Amit> and thus let to the failure. Can there be anything else which can Amit> start a transaction when a regression test is running? Still Amit> thinking, but inputs are welcome. I've bumped into issues (cf. commit 64ae420b2) with regression tests of this kind caused by concurrent (auto-)ANALYZE (not VACUUM); VACUUM's snapshot is ignored for testing for whether row versions can be removed, but ANALYZE's is not. -- Andrew (irc:RhodiumToad)
On Mon, Jan 28, 2019 at 10:25 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > >>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: > > Amit> One possibility is that autovacuum has triggered to perform > Amit> truncation of some other relation (remove pages at the end) which > Amit> doesn't allow the FSM test to remove the rows/perform truncation > Amit> and thus let to the failure. Can there be anything else which can > Amit> start a transaction when a regression test is running? Still > Amit> thinking, but inputs are welcome. > > I've bumped into issues (cf. commit 64ae420b2) with regression tests of > this kind caused by concurrent (auto-)ANALYZE (not VACUUM); > Yes, so this could be the cause of the problem. I think we need to change the tests added by the patch such that they don't rely on vacuum to remove dead-row versions? Do you or anybody else see any better way to fix this? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 28, 2019 at 10:25 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > >>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: > > Amit> One possibility is that autovacuum has triggered to perform > Amit> truncation of some other relation (remove pages at the end) which > Amit> doesn't allow the FSM test to remove the rows/perform truncation > Amit> and thus let to the failure. Can there be anything else which can > Amit> start a transaction when a regression test is running? Still > Amit> thinking, but inputs are welcome. > > I've bumped into issues (cf. commit 64ae420b2) with regression tests of > this kind caused by concurrent (auto-)ANALYZE (not VACUUM); > Yes, so this could be the cause of the problem. I think we need to change the tests added by the patch such that they don't rely on vacuum to remove dead-row versions? Do you or anybody else see any better way to fix this? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: Amit> Yes, so this could be the cause of the problem. I think we need Amit> to change the tests added by the patch such that they don't rely Amit> on vacuum to remove dead-row versions? Do you or anybody else see Amit> any better way to fix this? Do you just need to create free space on a page that didn't have enough before? It might be worth tweaking the fillfactor rather than trying to delete anything. -- Andrew (irc:RhodiumToad)
>>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: Amit> Yes, so this could be the cause of the problem. I think we need Amit> to change the tests added by the patch such that they don't rely Amit> on vacuum to remove dead-row versions? Do you or anybody else see Amit> any better way to fix this? Do you just need to create free space on a page that didn't have enough before? It might be worth tweaking the fillfactor rather than trying to delete anything. -- Andrew (irc:RhodiumToad)
Amit Kapila <amit.kapila16@gmail.com> writes: > Yes, so this could be the cause of the problem. I think we need to > change the tests added by the patch such that they don't rely on > vacuum to remove dead-row versions? Do you or anybody else see any > better way to fix this? To be blunt, this patch needs to be reverted immediately. The failures that are showing up are not just "the fsm test is not portable" problems. See for example https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mantid&dt=2019-01-28%2005%3A07%3A06 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02 I don't know what the common thread is here, but you don't get to leave the buildfarm broken this badly while you figure it out. regards, tom lane
Amit Kapila <amit.kapila16@gmail.com> writes: > Yes, so this could be the cause of the problem. I think we need to > change the tests added by the patch such that they don't rely on > vacuum to remove dead-row versions? Do you or anybody else see any > better way to fix this? To be blunt, this patch needs to be reverted immediately. The failures that are showing up are not just "the fsm test is not portable" problems. See for example https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mantid&dt=2019-01-28%2005%3A07%3A06 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02 I don't know what the common thread is here, but you don't get to leave the buildfarm broken this badly while you figure it out. regards, tom lane
On Mon, Jan 28, 2019 at 11:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Yes, so this could be the cause of the problem. I think we need to > > change the tests added by the patch such that they don't rely on > > vacuum to remove dead-row versions? Do you or anybody else see any > > better way to fix this? > > To be blunt, this patch needs to be reverted immediately. Okay, I will do it. > The failures > that are showing up are not just "the fsm test is not portable" problems. > See for example > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mantid&dt=2019-01-28%2005%3A07%3A06 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02 > > I don't know what the common thread is here, but you don't get to leave > the buildfarm broken this badly while you figure it out. > Sure, but I am wondering why none of this ever shown in local tests, as we have done quite some testing related to pgbench as well. Anyway, I think we need figure that out seprately. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 28, 2019 at 11:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Kapila <amit.kapila16@gmail.com> writes: > > Yes, so this could be the cause of the problem. I think we need to > > change the tests added by the patch such that they don't rely on > > vacuum to remove dead-row versions? Do you or anybody else see any > > better way to fix this? > > To be blunt, this patch needs to be reverted immediately. Okay, I will do it. > The failures > that are showing up are not just "the fsm test is not portable" problems. > See for example > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mantid&dt=2019-01-28%2005%3A07%3A06 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02 > > I don't know what the common thread is here, but you don't get to leave > the buildfarm broken this badly while you figure it out. > Sure, but I am wondering why none of this ever shown in local tests, as we have done quite some testing related to pgbench as well. Anyway, I think we need figure that out seprately. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 28, 2019 at 11:10 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > >>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: > > Amit> Yes, so this could be the cause of the problem. I think we need > Amit> to change the tests added by the patch such that they don't rely > Amit> on vacuum to remove dead-row versions? Do you or anybody else see > Amit> any better way to fix this? > > Do you just need to create free space on a page that didn't have enough > before? It might be worth tweaking the fillfactor rather than trying to > delete anything. > No, it also depends on vacuum to remove rows and then truncate the relation accordingly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Jan 28, 2019 at 11:10 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > > >>>>> "Amit" == Amit Kapila <amit.kapila16@gmail.com> writes: > > Amit> Yes, so this could be the cause of the problem. I think we need > Amit> to change the tests added by the patch such that they don't rely > Amit> on vacuum to remove dead-row versions? Do you or anybody else see > Amit> any better way to fix this? > > Do you just need to create free space on a page that didn't have enough > before? It might be worth tweaking the fillfactor rather than trying to > delete anything. > No, it also depends on vacuum to remove rows and then truncate the relation accordingly. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: >> I don't know what the common thread is here, but you don't get to leave >> the buildfarm broken this badly while you figure it out. > Sure, but I am wondering why none of this ever shown in local tests, > as we have done quite some testing related to pgbench as well. Not sure, but I notice that two of the three critters I just pointed to use force_parallel_mode. Dromedary does not, but it's old and slow. Kind of smells like a timing problem ... regards, tom lane
Amit Kapila <amit.kapila16@gmail.com> writes: >> I don't know what the common thread is here, but you don't get to leave >> the buildfarm broken this badly while you figure it out. > Sure, but I am wondering why none of this ever shown in local tests, > as we have done quite some testing related to pgbench as well. Not sure, but I notice that two of the three critters I just pointed to use force_parallel_mode. Dromedary does not, but it's old and slow. Kind of smells like a timing problem ... regards, tom lane
On Mon, Jan 28, 2019 at 6:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 28, 2019 at 11:10 AM Andrew Gierth > you just need to create free space on a page that didn't have enough > > before? It might be worth tweaking the fillfactor rather than trying to > > delete anything. > > > > No, it also depends on vacuum to remove rows and then truncate the > relation accordingly. That particular test could be removed -- it's just verifying behavior that's already been there for years and is a direct consquence of normal truncation combined with the addressing scheme of the FSM logic. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 28, 2019 at 6:58 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jan 28, 2019 at 11:10 AM Andrew Gierth > you just need to create free space on a page that didn't have enough > > before? It might be worth tweaking the fillfactor rather than trying to > > delete anything. > > > > No, it also depends on vacuum to remove rows and then truncate the > relation accordingly. That particular test could be removed -- it's just verifying behavior that's already been there for years and is a direct consquence of normal truncation combined with the addressing scheme of the FSM logic. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Jan 28, 2019 at 07:20:29AM +0100, John Naylor wrote: > That particular test could be removed -- it's just verifying behavior > that's already been there for years and is a direct consquence of > normal truncation combined with the addressing scheme of the FSM > logic. Moved to next CF, please feel free to continue this work, this is interesting stuff. -- Michael
Attachment
On Mon, Jan 28, 2019 at 07:20:29AM +0100, John Naylor wrote: > That particular test could be removed -- it's just verifying behavior > that's already been there for years and is a direct consquence of > normal truncation combined with the addressing scheme of the FSM > logic. Moved to next CF, please feel free to continue this work, this is interesting stuff. -- Michael