Thread: Transactions vs speed.
I have a question about Postgres: Take this update:update table set field = 'X' ; This is a very expensive function when the table has millions of rows, it takes over an hour. If I dump the database, and process the data with perl, then reload the data, it takes minutes. Most of the time is used creating indexes. I am not asking for a feature, I am just musing. I have a database update procedure which has to merge our data with that of more than one third party. It takes 6 hours to run. Do you guys know of any tricks that would allow postgres operate really fast with an assumption that it is operating on tables which are not being used. LOCK does not seem to make much difference. Any bit of info would be helpful. -- http://www.mohawksoft.com
* mlw <markw@mohawksoft.com> [010113 17:19] wrote: > I have a question about Postgres: > > Take this update: > update table set field = 'X' ; > > > This is a very expensive function when the table has millions of rows, > it takes over an hour. If I dump the database, and process the data with > perl, then reload the data, it takes minutes. Most of the time is used > creating indexes. > > I am not asking for a feature, I am just musing. Well you really haven't said if you've tuned your database at all, the way postgresql ships by default it doesn't use a very large shared memory segment, also all the writing (at least in 7.0.x) is done syncronously. There's a boatload of email out there that explains various ways to tune the system. Here's some of the flags that I use: -B 32768 # uses over 300megs of shared memory -o "-F" # tells database not to call fsync on each update -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
Alfred Perlstein wrote: > > * mlw <markw@mohawksoft.com> [010113 17:19] wrote: > > I have a question about Postgres: > > > > Take this update: > > update table set field = 'X' ; > > > > > > This is a very expensive function when the table has millions of rows, > > it takes over an hour. If I dump the database, and process the data with > > perl, then reload the data, it takes minutes. Most of the time is used > > creating indexes. > > > > I am not asking for a feature, I am just musing. > > Well you really haven't said if you've tuned your database at all, the > way postgresql ships by default it doesn't use a very large shared memory > segment, also all the writing (at least in 7.0.x) is done syncronously. > > There's a boatload of email out there that explains various ways to tune > the system. Here's some of the flags that I use: > > -B 32768 # uses over 300megs of shared memory > -o "-F" # tells database not to call fsync on each update I have a good number of buffers (Not 32768, but a few), I have the "-F" option. -- http://www.mohawksoft.com
* mlw <markw@mohawksoft.com> [010113 19:37] wrote: > Alfred Perlstein wrote: > > > > * mlw <markw@mohawksoft.com> [010113 17:19] wrote: > > > I have a question about Postgres: > > > > > > Take this update: > > > update table set field = 'X' ; > > > > > > > > > This is a very expensive function when the table has millions of rows, > > > it takes over an hour. If I dump the database, and process the data with > > > perl, then reload the data, it takes minutes. Most of the time is used > > > creating indexes. > > > > > > I am not asking for a feature, I am just musing. > > > > Well you really haven't said if you've tuned your database at all, the > > way postgresql ships by default it doesn't use a very large shared memory > > segment, also all the writing (at least in 7.0.x) is done syncronously. > > > > There's a boatload of email out there that explains various ways to tune > > the system. Here's some of the flags that I use: > > > > -B 32768 # uses over 300megs of shared memory > > -o "-F" # tells database not to call fsync on each update > > I have a good number of buffers (Not 32768, but a few), I have the "-F" > option. Explain a "good number of buffers" :) Also, when was the last time you ran vacuum on this database? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
mlw <markw@mohawksoft.com> writes: > Take this update: > update table set field = 'X' ; > This is a very expensive function when the table has millions of rows, > it takes over an hour. If I dump the database, and process the data with > perl, then reload the data, it takes minutes. Most of the time is used > creating indexes. Hm. CREATE INDEX is well known to be faster than incremental building/ updating of indexes, but I didn't think it was *that* much faster. Exactly what indexes do you have on this table? Exactly how many minutes is "minutes", anyway? You might consider some hack like drop inessential indexes;UPDATE;recreate dropped indexes; "inessential" being any index that's not UNIQUE (or even the UNIQUE ones, if you don't mind finding out about uniqueness violations at the end). Might be a good idea to do a VACUUM before rebuilding the indexes, too. It won't save time in this process, but it'll be cheaper to do it then rather than later. regards, tom lane PS: I doubt transactions have anything to do with it.