On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila <
amit.kapila16@gmail.com> wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
> <
john.naylor@2ndquadrant.com> wrote:
> >
> > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila <
amit.kapila16@gmail.com> wrote:
> > > There are a few buildfarm failures due to this commit, see my email on
> > > pgsql-committers. If you have time, you can also once look into
> > > those.
> >
> > I didn't see anything in common with the configs of the failed
> > members. None have a non-default BLCKSZ that I can see.
> >
>
> I have done an analysis of the different failures on buildfarm.
>
>
> 2.
> @@ -15,13 +15,9 @@
> SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
> ERROR: block number 100 is out of range for relation "test_rel_forks"
> SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> - fsm_0
> --------
> - 8192
> -(1 row)
> -
> +ERROR: could not open file "base/50769/50798_fsm": No such file or directory
> SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> -ERROR: block number 10 is out of range for relation "test_rel_forks"
> +ERROR: could not open file "base/50769/50798_fsm": No such file or directory
>
> This indicates that even though the Vacuum is executed, but the FSM
> doesn't get created. This could be due to different BLCKSZ, but the
> failed machines don't seem to have a non-default value of it. I am
> not sure why this could happen, maybe we need to check once in the
> failed regression database to see the size of relation?
>
This symptom is shown in the below buildfarm critters:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2019-01-28%2005%3A05%3A22https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-28%2003%3A20%3A02https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2019-01-28%2003%3A13%3A47https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary&dt=2019-01-28%2003%3A07%3A39
All of these seems to run with fsync=off. Is it possible that vacuum has updated FSM, but the same is not synced to disk and when we try to read it, we didn't get the required page? This is just a guess.
I have checked all the buildfarm failures and I see only 4 symptoms for which I have sent some initial analysis. I think you can also once cross-verify the same.
--
With Regards,
Amit Kapila.
EnterpriseDB:
http://www.enterprisedb.com