Re: Fast insertion indexes: why no developments - Mailing list pgsql-hackers

From Yann Fontana
Subject Re: Fast insertion indexes: why no developments
Date
Msg-id CAAiUYKYi3xXS3HX7-F057aq7SSgyDEYvuV6+r7C0Q=JhGknf+w@mail.gmail.com
Whole thread
In response to Re: Fast insertion indexes: why no developments  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Fast insertion indexes: why no developments
List pgsql-hackers


On 30 October 2013 11:23, Leonardo Francalanci <m_lists@yahoo.it> wrote:

>> In terms of generality, do you think its worth a man year of developer
>> effort to replicate what you have already achieved? Who would pay?

I work on an application that does exactly what Leonardo described. We hit the exact same problem, and came up with the same exact same solution (down to the 15 minutes interval). But I have also worked on other various datamarts (all using Oracle), and they are all subject to this problem in some form: B-tree indexes slow down bulk data inserts too much and need to be disabled or dropped and then recreated after the load. In some cases this is done easily enough, in others it's more complicated (example: every day, a process imports from 1 million to 1 billion records into a table partition that may contain from 0 to 1 billion records. To be as efficient as possible, you need some logic to compare the number of rows to insert to the number of rows already present, in order to decide whether to drop the indexes or not).

Basically, my point is that this is a common problem for datawarehouses and datamarts. In my view, indexes that don't require developers to work around poor insert performance would be a significant feature in a "datawarehouse-ready" DBMS.

Yann

pgsql-hackers by date:

Previous
From: Hiroshi Saito
Date:
Subject: Re: How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] Use MAP_HUGETLB where supported (v3)