Home > mailing lists

Re: [Postgres-xc-general] "Tuple not found error" during Index creation - Mailing list pgsql-general

From	Mason Sharp
Subject	Re: [Postgres-xc-general] "Tuple not found error" during Index creation
Date	December 10, 2013 16:45:55
Msg-id	CA+A7BJGsDTNFNMgg8rCkA1_3TOcDOv=gWUA0nw+380kDzoxzwg@mail.gmail.com Whole thread Raw
In response to	Re: [Postgres-xc-general] "Tuple not found error" during Index creation (Michael Paquier <michael.paquier@gmail.com>)
Responses	Re: [Postgres-xc-general] "Tuple not found error" during Index creation
List	pgsql-general

Tree view

On Mon, Dec 9, 2013 at 8:49 PM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Tue, Dec 10, 2013 at 7:17 AM, Sandeep Gupta <gupta.sandeep@gmail.com> wrote:
> We are trying to trace cause and potential solution of "tuple not found"
> error with postgres-xc. The problem happens when indexing a large file. It
> seems the autovaccum locks certain cache pages that the indexer tries to
> read. The indexing fails with "tuple not found" error.
>
> I am sure if it qualifies as postgres or postgres-xc error. However, I was
> just wondering what is the recommended way to go about fixing this. Turning
> off the autovaccumer is really not the best of solution because then the
> system runs into memory usage error.
>
> Would greatly appreciate any pointers on this.
This smells like a concurrency issue with autovacuum on XC side. I
recall fixing in the past issues with autovacuum not taking a correct
snapshot from GTM in certain code paths, putting in danger data
consistency in the cluster as autovacuum might clean more tuples than
it should. Another possibility to explain this bug would be the way
RecentGlobalXmin is computed for autovacuum using the GTM snapshots,
which would explain why autovacuum has cleaned away some tuples it
should not have, making the possibility of a failure higher for
long-running transactions.

In our StormDB fork (now TransLattice Storm) I made some changes to address some issues that were uncovered with XC. I am not sure if it will address this specific issue above, but in most cases we make it an error instead of falling back to a local XID like XC does (imagine if a node cannot reach GTM and autovacuum starts cleaning up data with local XIDs and snapshots) . Also, we use GTM for getting XIDs for authentication and for autovacuum launcher because in concurrency testing not doing so results in the same XID being consumed by other sessions and causing hanging and other transaction problems. The bottom line is falling back to local XIDs and snapshots should almost always be avoided (initdb is ok).

Our code is a bit different from vanilla XC, but I can try to put together a similar patch soon.

As a community I feel we should prioritize more on testing and bug fixing like the reported issue and replicated table handling than on new features like the merged coordinator and datanode project.

Those are assumptions though. It would be great if you could provide a
self-contained test case, with let's imagine a table that has its data
generated with for example generate_series. Just by seeing the spec of
the machine you are using, I am sure that i wouldn't be able to
reproduce that on my laptop though. The core team has access to more
powerful machines.

Also: Postgres-XC 1.1.0 is based on Postgres 9.2.4.
--
Michael

Mason Sharp

TransLattice - http://www.translattice.com
Distributed and Clustered Database Solutions

pgsql-general by date:

From: Merlin Moncure
Date: 10 December 2013, 16:33:59
Subject: Re: Q: regarding backends

From: Wolfgang Keller
Date: 10 December 2013, 16:47:42
Subject: postgresql.org inconsistent (Re: PG replication across DataCenters)

Re: [Postgres-xc-general] "Tuple not found error" during Index creation - Mailing list pgsql-general

Previous

Next