Re: Are bitmap index scans slow to start? - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Are bitmap index scans slow to start?
Date
Msg-id CAMkU=1zRqJ6Bn08KiN8vLBoVj14W5xXM7aJLgz+QZwLC0ZRbJQ@mail.gmail.com
Whole thread Raw
In response to Re: Are bitmap index scans slow to start?  ("Carlo Stonebanks" <stonec.register@sympatico.ca>)
Responses Re: Are bitmap index scans slow to start?
Re: Are bitmap index scans slow to start?
List pgsql-performance
On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:

<<Is each of these write operations just covering a single row?  Does this description apply to just one of the many (how many?) databases, so that there are really 14*N concurrent sessions?

>> 

 

All writes are single row. All DB’s have exactly the same structure, only the content is different. Currently the server is hosting five active DB’s – although there 14 DB’s actually on the host, the balance are backups and or testing environments.


I had thought you were saying that any one ETL procedure into one database used 14 concurrent threads.  But really, each ETL procedure is single-threaded, and there can be up to 5 (or theoretically up to 14) of them running at a time into different databases?
 
 

When a feed comes in, it can be anything from dozens to millions of rows, and may take minutes or days to run. I had asked that PG bouncer be installed in front of the host to act as a traffic cop. Try as I may to convince the engineering team that fewer sessions running faster is optimal, they say that the 14 concurrent sessions is based on real-world experience of what imports the fastest.



pgbouncer is more for making connections line up single-file when the line is moving at a very fast clip, say 0.01 second per turn.  If I were trying to make tasks that can each last for hours or days line up and take turns, I don't think pgbouncer would be the way to go.

 

 

<< You really need to know whether those reads and writes are concentrated in a small region (relative to the amount of your RAM), or widely scattered.  If you are reading and writing intensively (which you do seem to be doing) but only within a compact region, then it should not drive other data out of the cache.  But, since you do seem to have IO problems from cache misses, and you do have a high level of activity, the easy conclusion is that you have too little RAM to hold the working size of your data.
>>

 

It won’t be a problem of physical RAM, I believe there is at least 32GB of RAM. What constitutes “a compact region”?


If you have 14 actively going on simultaneously, I'd say a compact region would then be about 512 MB. 
(32GB/ 14 / margin of safety of 4).  Again, assuming that that is the problem.
 

The ETL process takes the feed and distributes it to 85 core tables.  I have been through many PG configuration cycles with the generous help of people in this forum. I think the big problem when getting help has been this issue of those offering assistance understanding that the whopping majority of the time, the system is performing single row reads and writes. The assumption tends to be that the end point of an ETL should just be a series of COPY statements, and it should all happen very quickly in classic SQL bulk queries.


That is often a reasonable assumption, as ETL does end with L :)

Is the original query you posted part of the transform process, rather than being the production query you run after the ETL is over?

If so, maybe you need a EL(S)TL process, were you first load the data to staging table in bulk, and then transform it in bulk rather than one row at a time.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Matt Daw
Date:
Subject: Re: Estimation question...
Next
From: "Carlo Stonebanks"
Date:
Subject: Re: Are bitmap index scans slow to start?