Home > mailing lists

Re: preserving forensic information when we freeze - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: preserving forensic information when we freeze
Date	January 2, 2014 20:10:03
Msg-id	20140102200952.GB22022@awork2.anarazel.de Whole thread Raw
In response to	Re: preserving forensic information when we freeze (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On 2014-01-02 14:44:34 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
> >> For real
> >> forensics work, you need to be able to see all tuples, which makes me
> >> think that something akin to pgstattuple is the right API; that is "return
> >> a set of the header info for all tuples on such-and-such pages of this
> >> relation".  That should dodge any performance problem, because the
> >> heap_open overhead could be amortized across lots of tuples, and it also
> >> sidesteps all problems with adding new system columns.
> 
> > The biggest problem with such an API is that it's painful to use - I've
> > used pageinspect a fair bit, and not being able to easily get the
> > content of the rows you're looking at makes it really far less useful in
> > many scenarios. That could partially be improved by a neater API
> 
> Surely.  Why couldn't you join against the table on ctid?

For the case of pageinspect it's because pageinspect doesn't return the
ctid of a tuple in a useful way - its t_ctid is HeapTupleHeader->t_ctid,
not HeapTuple->t_self...
In many cases bulk access really isn't all that useful - you do a SELECT
searching for data that's looking strange and then need the forensic
data for those. That's just painful with any of the proposed fast APIs
afaics.

> > And I really don't see any page-at-a-time access that's going to be
> > convenient.
> 
> As I commented to Robert, the page-at-a-time behavior of pageinspect
> is not an API detail we'd want to copy for this.  I envision something
> like
> 
>        select hdr.*, foo.*
>          from tuple_header_details('foo'::regclass) as hdr
>               left join foo on hdr.ctid = foo.ctid;
> 
> On a large table you might want a version that restricts its scan
> to pages M through N, but that's just optimization.  More useful
> would be to improve the planner's intelligence about joins on ctid ...

That really makes for but ugly queries. E.g. the database I found the
multixact bugs on was ~300GB and I had to look about 80GB of it. So I
would have had to write chunking code for individual tables. Not what
you want to do when shit has hit the fan.

> >>> [ removing system columns from pg_attribute ]]
> >> I think this will inevitably break a lot of code, not all of it ours,
> >> so I'm not in favor of pursuing that direction.
> 
> > Are you thinking of client or extension code? From what I've looked at I
> > don't think it's all that likely too break much of either.
> 
> It will break anything that assumes that every column is represented in
> pg_attribute.  I think if you think this assumption is easily removed,
> you've not looked hard enough.

Uh. And how much code actually is that? Note that system columns already
aren't in a Relation's TupleDesc. So it's not they would be missing from
that - and if that were the issue we could easily add them there when
the cache entry is built.

There really isn't much code in postgres itself that iterates over all
columns including system columns. Some bits around heap.c, tablecmd.c,
lsyscache.c do lookups by name, but they are easily converted to
SystemAttributeByName()/SystemAttributeDefinition().

Most non DDL code doesn't care at all.

> > It would make pg_attribute a fair bit smaller, especially on systems
> > with lots of narrow relations.
> 
> I'd like to do that too, but I think getting rid of xmin/xmax/cmin/cmax
> would be enough to get most of the benefit, and we could do that without
> any inconsistency if we stopped exposing those as system columns.

Well, I have yet to see any realistic proposal to get rid of
them. Having to write significantly more complex and/or significantly more
expensive queries doesn't qualify.

And there really is code out there using xmin/xmax as part of their
code, so I think the rumor of nobody crying about their near death is
just that, a rumor.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Tom Lane
Date: 02 January 2014, 20:06:50
Subject: Re: preserving forensic information when we freeze

From: Andres Freund
Date: 02 January 2014, 20:19:23
Subject: Re: ERROR: missing chunk number 0 for toast value

Re: preserving forensic information when we freeze - Mailing list pgsql-hackers

Previous

Next