Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id CA+U5nMKba+bCem+uRXm4CQShZoVyS26r_fq7WSWnTtfQk5LYjg@mail.gmail.com
Whole thread Raw
In response to Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
Responses Re: Fast insertion indexes: why no developments  (Leonardo Francalanci <m_lists@yahoo.it>)
Re: Fast insertion indexes: why no developments  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On 13 November 2013 09:31, Leonardo Francalanci <m_lists@yahoo.it> wrote:

> Or, in other words: what are you going to write in the minmax index
> documentation, "try and see if they work better for you"?

That seems like good advice to me. Bacon > Aristotle.

There is very little written about suitability of any index type for
Postgres. I'm sure the docs could be improved there.


> Plus, I'm *very* interested in the minmax index

Good, thanks.

> But, at the same time, I don't see any evidence that they work
> better than btrees (except for the size of the index).

Min max indexes are not designed to be a better btree, they are
designed to be roughly same as automatic partitioning. They offer
considerably improved time to build, significantly reduced index size
and significantly improved insert performance. Min max will be clearly
slower than btrees for small numbers of records, though for large
numbers of records we may expect min max to perform same as btrees,
though that requires better testing to get a more accurate picture.
There may yet be optimisations of the patch also.

Based what we have discussed here, we've come up with two new
optimisations that can be used with MinMax, namely the bulk DELETE
case and the Merge Append case. Thank you for highlighting additional
cases and requirements.

From our discussions here, IMHO there is a strong case for avoiding
btrees completely for larger historical data tables. That isn't
something I had even considered as desirable before this conversation
but ISTM now that taking that approach will be more fruitful than
attempting to implement LSM trees.

Having said that, I am also in favour of declarative partitioning,
just that there is no funding available to work on that at present.

Further work on bitmap indexes is expected. They are already designed
with good insert performance in mind and this discussion re-emphasises
that requirement.

> I would like to see some numbers.

Alvaro has given me some results for his patch. The figures I have are
for a 2GB table.

Index Build Time
MinMax 11 s
Btree   96s

Index Size
MinMax 2 pages + metapage
Btree approx 200,000 pages + metapage

Load time
MinMax no overhead, same as raw COPY
BTree - considerably slower

Index SELECT
Slower for small groups of rows
Broadly same for large requests (more results required on that assessment)

I expect to publish results against TPC-H cases in next few weeks.

Additional tests are welcome for other use cases.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: The number of character limitation of custom script on pgbench
Next
From: Peter Eisentraut
Date:
Subject: Re: [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation