Re: [HACKERS] Slow count(*) again... - Mailing list pgsql-performance

From Віталій Тимчишин
Subject Re: [HACKERS] Slow count(*) again...
Date
Msg-id AANLkTimko0CviRbwkvB=XXci4h7FU8ue7iKPnhM645jX@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Slow count(*) again...  (david@lang.hm)
Responses Re: [HACKERS] Slow count(*) again...  (david@lang.hm)
List pgsql-performance


2011/2/3 <david@lang.hm>

If the table is not large enough to fit in ram, then it will compete for I/O, and the user will have to wait.

what I'm proposing is that as the records are created, the process doing the creation makes copies of the records (either all of them, or some of them if not all are needed for the analysis, possibly via shareing memory with the analysis process), this would be synchronous with the load, not asynchronous.

this would take zero I/O bandwidth, it would take up some ram, memory bandwidth, and cpu time, but a load of a large table like this is I/O contrained.

it would not make sense for this to be the default, but as an option it should save a significant amount of time.

I am making the assumption that an Analyze run only has to go over the data once (a seqential scan of the table if it's >> ram for example) and gathers stats as it goes.

with the current code, this is a completely separate process that knows nothing about the load, so if you kick it off when you start the load, it makes a pass over the table (competing for I/O), finishes, you continue to update the table, so it makes another pass, etc. As you say, this is a bad thing to do. I am saying to have an option that ties the two togeather, essentially making the data feed into the Analyze run be a fork of the data comeing out of the insert run going to disk. So the Analyze run doesn't do any I/O and isn't going to complete until the insert is complete. At which time it will have seen one copy of the entire table.

Actually that are two different problems. The one is to make analyze more automatic to make select right after insert more clever by providing statistics to it. 
Another is to make it take less IO resources.
I dont like for it to be embedded into insert (unless the threshold can be determined before inserts starts). Simply because it is more CPU/memory that will slow down each insert. And if you will add knob, that is disabled by default, this will be no more good than manual analyze.


--
Best regards,
 Vitalii Tymchyshyn

pgsql-performance by date:

Previous
From: Samuel Gendler
Date:
Subject: Re: [HACKERS] Slow count(*) again...
Next
From: Віталій Тимчишин
Date:
Subject: Re: [HACKERS] Slow count(*) again...