Re: Are bitmap index scans slow to start? - Mailing list pgsql-performance
From | Carlo Stonebanks |
---|---|
Subject | Re: Are bitmap index scans slow to start? |
Date | |
Msg-id | 005001ce1c43$cbd9acb0$638d0610$@sympatico.ca Whole thread Raw |
In response to | Re: Are bitmap index scans slow to start? (Jeff Janes <jeff.janes@gmail.com>) |
List | pgsql-performance |
Sorry this took so long to get back to you. Here is where we were:
I said: <<
So, this query is not called often, but the fact is that if it takes over 30 seconds to load an item (because the audit report takes so long to prepare the bitmap index scan when passed new query parameters) then it severely restricts how much data we can resurrect at any one time.
>>
Your reply: <<
Is that a restriction you have observed, or are you extrapolating based on a single query? If you run a bunch of similar queries in close succession, it is likely that the first few queries will warm up the cache, and following queries will then run much faster. Also, if you restructure the series of queries into a large one that reconstructs many rows simultaneously, it might choose a more efficient path than if it is fed the queries one at a time.
>>
Actual observation. The first run with a new parameter actually takes 90 seconds. Another run with the same parameter takes 15-30 seconds. Running the query immediately afterwards with different parameters starts with a new 90 seconds query. Unfortunately, since going to LINUX, our sys ops hiss and snarl at anyone who comes anywhere near machine or DB server configs, so I am no longer well informed on how well optimized the machines are.
Ultimately, the machines need to be optimized by an expert. As I mentioned before, our ETL is entirely single-load reads-and-writes (I didn’t go into the “why” of this because the nature of the data and the product dictates this). And this is an example of one of the few complex joins that return hundreds/thousands of rows. The problem is that a full index scan has to be done before we can start building the results. So, if clustering will help such that the index scan KNOWS that there’s no point is scanning the rest of the index because we’ve gone beyond the maximum value in the list of possible values, then that would help, as each table being scanned has 50 – 100 million rows (there is one table for every month of production).
As always, thanks.
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: March 5, 2013 4:21 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Are bitmap index scans slow to start?
On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>> Is the original query you posted part of the transform process, rather than being the production query you run after the ETL is over?
Neither, it is part of our auditing and maintenance processes. It is not called with any great frequency. The audit report generates rows defining how the a particular item (an “item” being a particular table/row) was created: it returns the names of the import tables, the row ids, the write operations and any transformation messages that may have been generated – all in the order they occurred.
...
So, this query is not called often, but the fact is that if it takes over 30 seconds to load an item (because the audit report takes so long to prepare the bitmap index scan when passed new query parameters) then it severely restricts how much data we can resurrect at any one time.
Is that a restriction you have observed, or are you extrapolating based on a single query? If you run a bunch of similar queries in close succession, it is likely that the first few queries will warm up the cache, and following queries will then run much faster. Also, if you restructure the series of queries into a large one that reconstructs many rows simultaneously, it might choose a more efficient path than if it is fed the queries one at a time.
Cheers,
Jeff
pgsql-performance by date: