Re: crash-safe visibility map, take four - Mailing list pgsql-hackers

From Jesper Krogh
Subject Re: crash-safe visibility map, take four
Date
Msg-id 4D89902C.7000004@krogh.cc
Whole thread Raw
In response to crash-safe visibility map, take four  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: crash-safe visibility map, take four
Re: crash-safe visibility map, take four
Re: crash-safe visibility map, take four
List pgsql-hackers
On 2011-03-22 21:43, Robert Haas wrote:
> I took a crack at implementing the first approach described above,
> which seems to be by far the simplest idea we've come up with to date.
>   Patch attached.  It doesn't seem to be that complicated, which could
> mean either that it's not that complicated or that I'm missing
> something.  Feel free to point and snicker in the latter case.

Looks simple, but there is now benefit on the usage side in the patch,
so it isn't really "testable" yet? I would love to spend some time testing
when its doable (even with rough corners.)

I'm still a bit puzzled with how it would end up working with a page-level
visibillity map bit for index-scans. There is a clear "drop off" in 
usabillity
when the change rates of the table goes up, which may or may not be
relevant, but I cannot really judge, since I haven't even got a ballpark
figure about how much table churn would disable say 50% of the usage.

= Really naive suggestion approaching =
Another layout might be to simply drag out t_xmin, t_xmax pr row (8 bytes)
into a table by itself. This table will be way bigger than the one bit 
per page
map, but could be "wal-logged" as any other change in the system?

It would, by definition make the visibility testing work (way faster 
than today),
no matter how fast the underlying table changes.

State of today (PG 8.4) is that a query like this:
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag'); count
------- 69753
(1 row)

Time: 5863.600 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag'); count
------- 69753
(1 row)

Time: 659.832 ms
testdb=# select count(id) from testdb.seq where fts @@ to_tsquery('tag'); count
------- 69753
(1 row)

Time: 1005.765 ms

Somewhere around 15ns / tuple (not bad at all).
(the first was probably "half warm")

The "average" rows per tuple is somewhere between 4 and 8 for this 
table, assuming
8 and that the 69K are randomly distributed among the 16M other tuples 
(fair assumption
in this case). The 600-1000ms for the fresh cache run are the timing to 
drag:
69753*8192 (page size) = 571MB into memory for visibillity testing 
alone, on warm cache
all pages being in main memory. Packing 16M tuples with 8 bytes / tuple 
in a map would be
around 128MB.

given 8 bytes/row and random distribution of data, that would require us 
to read all 128MB,
so a speedup of x4 on this example, but it would rougly let us count the 
entire table in
the same time.

With regard to disk vs. memory hotness.. those 128MB compares to a table 
size of 32GB
(with a toast table next to it of 64GB) but that shouldn't be touched by 
above query.

The ns/tuple number (today) on a "thin" table in my system is 
approaching 1ns / tuple.

If the page-level bitmap would be set "quite fast" on a fairly busy 
system anyway, then
the above is just noise in the air, but I have currently no feeling, and 
there is
some math in there I have trouble setting reliable ballpark numbers on.

There is, by all approaches room for significant improvements for the 
visibillity
testing for a huge range of installations.

Can I drag out numbers of "frozenness of tuples" from my current systems 
to fill in the
discussion? (how?)

Jesper
-- 
Jesper


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_ctl restart - behaviour based on wrong instance
Next
From: Gokulakannan Somasundaram
Date:
Subject: Re: crash-safe visibility map, take four