Thread: Transactions vs speed.

Transactions vs speed.

From
mlw
Date:
I have a question about Postgres:

Take this update:update table set field = 'X' ;


This is a very expensive function when the table has millions of rows,
it takes over an hour. If I dump the database, and process the data with
perl, then reload the data, it takes minutes. Most of the time is used
creating indexes.

I am not asking for a feature, I am just musing. 

I have a database update procedure which has to merge our data with that
of more than one third party. It takes 6 hours to run.

Do you guys know of any tricks that would allow postgres operate really
fast with an assumption that it is operating on tables which are not
being used. LOCK does not seem to make much difference.

Any bit of info would be helpful.

-- 
http://www.mohawksoft.com


Re: Transactions vs speed.

From
Alfred Perlstein
Date:
* mlw <markw@mohawksoft.com> [010113 17:19] wrote:
> I have a question about Postgres:
> 
> Take this update:
>     update table set field = 'X' ;
> 
> 
> This is a very expensive function when the table has millions of rows,
> it takes over an hour. If I dump the database, and process the data with
> perl, then reload the data, it takes minutes. Most of the time is used
> creating indexes.
> 
> I am not asking for a feature, I am just musing. 

Well you really haven't said if you've tuned your database at all, the
way postgresql ships by default it doesn't use a very large shared memory
segment, also all the writing (at least in 7.0.x) is done syncronously.

There's a boatload of email out there that explains various ways to tune
the system.  Here's some of the flags that I use:

-B 32768   # uses over 300megs of shared memory
-o "-F" # tells database not to call fsync on each update

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Transactions vs speed.

From
mlw
Date:
Alfred Perlstein wrote:
> 
> * mlw <markw@mohawksoft.com> [010113 17:19] wrote:
> > I have a question about Postgres:
> >
> > Take this update:
> >       update table set field = 'X' ;
> >
> >
> > This is a very expensive function when the table has millions of rows,
> > it takes over an hour. If I dump the database, and process the data with
> > perl, then reload the data, it takes minutes. Most of the time is used
> > creating indexes.
> >
> > I am not asking for a feature, I am just musing.
> 
> Well you really haven't said if you've tuned your database at all, the
> way postgresql ships by default it doesn't use a very large shared memory
> segment, also all the writing (at least in 7.0.x) is done syncronously.
> 
> There's a boatload of email out there that explains various ways to tune
> the system.  Here's some of the flags that I use:
> 
> -B 32768   # uses over 300megs of shared memory
> -o "-F" # tells database not to call fsync on each update

I have a good number of buffers (Not 32768, but a few), I have the "-F"
option.


-- 
http://www.mohawksoft.com


Re: Transactions vs speed.

From
Alfred Perlstein
Date:
* mlw <markw@mohawksoft.com> [010113 19:37] wrote:
> Alfred Perlstein wrote:
> > 
> > * mlw <markw@mohawksoft.com> [010113 17:19] wrote:
> > > I have a question about Postgres:
> > >
> > > Take this update:
> > >       update table set field = 'X' ;
> > >
> > >
> > > This is a very expensive function when the table has millions of rows,
> > > it takes over an hour. If I dump the database, and process the data with
> > > perl, then reload the data, it takes minutes. Most of the time is used
> > > creating indexes.
> > >
> > > I am not asking for a feature, I am just musing.
> > 
> > Well you really haven't said if you've tuned your database at all, the
> > way postgresql ships by default it doesn't use a very large shared memory
> > segment, also all the writing (at least in 7.0.x) is done syncronously.
> > 
> > There's a boatload of email out there that explains various ways to tune
> > the system.  Here's some of the flags that I use:
> > 
> > -B 32768   # uses over 300megs of shared memory
> > -o "-F" # tells database not to call fsync on each update
> 
> I have a good number of buffers (Not 32768, but a few), I have the "-F"
> option.

Explain a "good number of buffers" :)

Also, when was the last time you ran vacuum on this database?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


Re: Transactions vs speed.

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> Take this update:
>     update table set field = 'X' ;
> This is a very expensive function when the table has millions of rows,
> it takes over an hour. If I dump the database, and process the data with
> perl, then reload the data, it takes minutes. Most of the time is used
> creating indexes.

Hm.  CREATE INDEX is well known to be faster than incremental building/
updating of indexes, but I didn't think it was *that* much faster.
Exactly what indexes do you have on this table?  Exactly how many
minutes is "minutes", anyway?

You might consider some hack like
drop inessential indexes;UPDATE;recreate dropped indexes;

"inessential" being any index that's not UNIQUE (or even the UNIQUE
ones, if you don't mind finding out about uniqueness violations at
the end).

Might be a good idea to do a VACUUM before rebuilding the indexes, too.
It won't save time in this process, but it'll be cheaper to do it then
rather than later.
        regards, tom lane

PS: I doubt transactions have anything to do with it.