Thread: Query returns no results until REINDEX
Hi all,
Earlier today we were trying to debug why a row wasn't being deleted from a
table and we ran into some interesting behavior.
This is the table in question:
coder=# \d+ extensions
Table "public.extensions"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
ext_name | text | | not null | | extended | |
publisher_name | text | | not null | | extended | |
ext_version | text | | not null | | extended | |
obj | jsonb | | not null | | extended | |
created_at | timestamp with time zone | | not null | | plain | |
updated_at | timestamp with time zone | | not null | | plain | |
Indexes:
"extensions_pkey" PRIMARY KEY, btree (ext_name, publisher_name, ext_version)
Foreign-key constraints:
"extensions_ext_name_fkey" FOREIGN KEY (ext_name, publisher_name) REFERENCES extension_statistics(ext_name, publisher_name) ON DELETE CASCADE
Referenced by:
TABLE "extension_assets" CONSTRAINT "extension_assets_ext_name_fkey" FOREIGN KEY (ext_name, publisher_name, ext_version) REFERENCES extensions(ext_name, publisher_name, ext_version) ON DELETE CASCADE
coder=# select ext_name from extensions;
ext_name
-------------------
vim
Go
min-theme
terraform
prettier-vscode
vscode-icons
gitlens
vscode-eslint
cpptools
rust
Angular-BeastCode
(11 rows)
We start to run into issues when querying for the "Go" extension.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
Other extensions seem to be queried just fine.
coder=# select ext_name from extensions where ext_name = 'Angular-BeastCode';
ext_name
-------------------
Angular-BeastCode
(1 row)
Using LIKE seems to find the broken row fine.
coder=# select ext_name from extensions where ext_name LIKE '%Go';
ext_name
----------
Go
(1 row)
That then begs the question, maybe there's some weird whitespace in front
causing it to fail.
coder=# select encode(ext_name::bytea, 'hex') from extensions where ext_name LIKE '%Go';
encode
--------
476f
(1 row)
Doesn't seem like it. After a bit of confusion I thought to reindex the table.
coder=# reindex table extensions;
REINDEX
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
Go
(1 row)
Seems to work now. My question is, is this something I should report to the
maintainers? I took a snapshot of the data folder before the reindex in case it
would be helpful. Is index corruption something that should be actively looked
out for?
Earlier today we were trying to debug why a row wasn't being deleted from a
table and we ran into some interesting behavior.
This is the table in question:
coder=# \d+ extensions
Table "public.extensions"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
ext_name | text | | not null | | extended | |
publisher_name | text | | not null | | extended | |
ext_version | text | | not null | | extended | |
obj | jsonb | | not null | | extended | |
created_at | timestamp with time zone | | not null | | plain | |
updated_at | timestamp with time zone | | not null | | plain | |
Indexes:
"extensions_pkey" PRIMARY KEY, btree (ext_name, publisher_name, ext_version)
Foreign-key constraints:
"extensions_ext_name_fkey" FOREIGN KEY (ext_name, publisher_name) REFERENCES extension_statistics(ext_name, publisher_name) ON DELETE CASCADE
Referenced by:
TABLE "extension_assets" CONSTRAINT "extension_assets_ext_name_fkey" FOREIGN KEY (ext_name, publisher_name, ext_version) REFERENCES extensions(ext_name, publisher_name, ext_version) ON DELETE CASCADE
coder=# select ext_name from extensions;
ext_name
-------------------
vim
Go
min-theme
terraform
prettier-vscode
vscode-icons
gitlens
vscode-eslint
cpptools
rust
Angular-BeastCode
(11 rows)
We start to run into issues when querying for the "Go" extension.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
Other extensions seem to be queried just fine.
coder=# select ext_name from extensions where ext_name = 'Angular-BeastCode';
ext_name
-------------------
Angular-BeastCode
(1 row)
Using LIKE seems to find the broken row fine.
coder=# select ext_name from extensions where ext_name LIKE '%Go';
ext_name
----------
Go
(1 row)
That then begs the question, maybe there's some weird whitespace in front
causing it to fail.
coder=# select encode(ext_name::bytea, 'hex') from extensions where ext_name LIKE '%Go';
encode
--------
476f
(1 row)
Doesn't seem like it. After a bit of confusion I thought to reindex the table.
coder=# reindex table extensions;
REINDEX
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
Go
(1 row)
Seems to work now. My question is, is this something I should report to the
maintainers? I took a snapshot of the data folder before the reindex in case it
would be helpful. Is index corruption something that should be actively looked
out for?
It's worth noting this particular row has existed in the database for quite a
long time. Probably over 3 months.
Thanks for taking a look,
Colin
On Fri, Feb 7, 2020 at 3:52 PM Colin Adler <colin@coder.com> wrote: > Seems to work now. My question is, is this something I should report to the > maintainers? I am one of the people that maintains the B-Tree code. You didn't mention what version of Postgres you're using here. That could be important. Please let us know. Mention the minor component of the release version, too (i.e. say 12.2, not just 12). > I took a snapshot of the data folder before the reindex in case it > would be helpful. Is index corruption something that should be actively looked > out for? Yes -- look for corruption. If I had to guess, I'd say that this has something to do with upgrading the operating system to use a different, incompatible glibc. Or perhaps it has something to do with streaming replication between machines with different glibc version. You should try running contrib/amcheck, which should be able to isolate index corruption, and give you a specific complaint. You may then be able to inspect the exact index page with the problem using contrib/pageinspect. Something like this ought to do it on Postgres 11 or 12: CREATE EXTENSION IF NOT EXISTS amcheck SELECT bt_index_check('my_index', true); If that doesn't show any errors, then perhaps try this: SELECT bt_index_parent_check('my_index', true); If you're on Postgres 10, then you should leave out the second argument, "true", since that version doesn't have the extra heapallindexed check. Let us know what you see. -- Peter Geoghegan
Peter,
Thanks for taking the time to respond and hope you're having a good weekend. My
apologies for not noting the Postgres version. I thought about it after I sent
the email but I didn't want to spam the thread.
$ psql --version
psql (PostgreSQL) 11.6
One interesting thing you mention is that the system probably changed libc
versions. This is correct. We recently moved from an Alpine based docker
container running `musl libc 1.1.22` to a RHEL based one running `glibc 2.17`.
This was done a month or so ago. The `Go` extension row would've existed in the
database before this switch.
I dumped and restored our database in prod and all seems to be fine now. I'm
now testing using the backup from prod on my local machine with the same RHEL
image running `glibc 2.17`.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
coder=# SELECT bt_index_check('extensions_pkey', true);
ERROR: item order invariant violated for index "extensions_pkey"
DETAIL: Lower index tid=(1,2) (points to heap tid=(0,2)) higher index tid=(1,3) (points to heap tid=(2,1)) page lsn=1/FF9C13B8.
Looks like it found something. I checked out the contrib/pageinspect docs but
wasn't too sure what to run. Are incompatible libc versions causing btree
corruption something you consider a bug? If it's something you'd like to look
into further I can gladly send over the database files.
Thanks,
Colin
Thanks for taking the time to respond and hope you're having a good weekend. My
apologies for not noting the Postgres version. I thought about it after I sent
the email but I didn't want to spam the thread.
$ psql --version
psql (PostgreSQL) 11.6
One interesting thing you mention is that the system probably changed libc
versions. This is correct. We recently moved from an Alpine based docker
container running `musl libc 1.1.22` to a RHEL based one running `glibc 2.17`.
This was done a month or so ago. The `Go` extension row would've existed in the
database before this switch.
I dumped and restored our database in prod and all seems to be fine now. I'm
now testing using the backup from prod on my local machine with the same RHEL
image running `glibc 2.17`.
coder=# select ext_name from extensions where ext_name = 'Go';
ext_name
----------
(0 rows)
coder=# SELECT bt_index_check('extensions_pkey', true);
ERROR: item order invariant violated for index "extensions_pkey"
DETAIL: Lower index tid=(1,2) (points to heap tid=(0,2)) higher index tid=(1,3) (points to heap tid=(2,1)) page lsn=1/FF9C13B8.
Looks like it found something. I checked out the contrib/pageinspect docs but
wasn't too sure what to run. Are incompatible libc versions causing btree
corruption something you consider a bug? If it's something you'd like to look
into further I can gladly send over the database files.
Thanks,
Colin
On Fri, Feb 7, 2020 at 7:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Feb 7, 2020 at 3:52 PM Colin Adler <colin@coder.com> wrote:
> Seems to work now. My question is, is this something I should report to the
> maintainers?
I am one of the people that maintains the B-Tree code.
You didn't mention what version of Postgres you're using here. That
could be important. Please let us know. Mention the minor component of
the release version, too (i.e. say 12.2, not just 12).
> I took a snapshot of the data folder before the reindex in case it
> would be helpful. Is index corruption something that should be actively looked
> out for?
Yes -- look for corruption. If I had to guess, I'd say that this has
something to do with upgrading the operating system to use a
different, incompatible glibc. Or perhaps it has something to do with
streaming replication between machines with different glibc version.
You should try running contrib/amcheck, which should be able to isolate
index corruption, and give you a specific complaint. You may then be
able to inspect the exact index page with the problem using
contrib/pageinspect. Something like this ought to do it on Postgres
11 or 12:
CREATE EXTENSION IF NOT EXISTS amcheck
SELECT bt_index_check('my_index', true);
If that doesn't show any errors, then perhaps try this:
SELECT bt_index_parent_check('my_index', true);
If you're on Postgres 10, then you should leave out the second
argument, "true", since that version doesn't have the extra
heapallindexed check.
Let us know what you see.
--
Peter Geoghegan
On Sun, Feb 9, 2020 at 12:50 PM Colin Adler <colin@coder.com> wrote: > Looks like it found something. I checked out the contrib/pageinspect docs but > wasn't too sure what to run. Are incompatible libc versions causing btree > corruption something you consider a bug? If it's something you'd like to look > into further I can gladly send over the database files. No, this is not considered a bug. It's unfortunate that there is no built in collation versioning mechanism, or something like that -- that might have alerted you to the problem before any real damage occurred. We have that for the ICU collations, but it currently isn't possible to use ICU as the default collation provider. You really have to go out of your way to use ICU collations. -- Peter Geoghegan
On Tue, Feb 11, 2020 at 8:33 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Sun, Feb 9, 2020 at 12:50 PM Colin Adler <colin@coder.com> wrote: > > Looks like it found something. I checked out the contrib/pageinspect docs but > > wasn't too sure what to run. Are incompatible libc versions causing btree > > corruption something you consider a bug? If it's something you'd like to look > > into further I can gladly send over the database files. > > No, this is not considered a bug. > > It's unfortunate that there is no built in collation versioning > mechanism, or something like that -- that might have alerted you to > the problem before any real damage occurred. FTR, this is something being actively worked on [1], and I still hope that in pg13 a glibc or ICU upgrade will be detected and raise proper warning. [1] https://commitfest.postgresql.org/27/2367/