Thread: PG 7.0 is 2.5 times slower running a big report
I have a large report that I run once a day. Under 6.5.3 it took just over 3hrs to run. Under 7.0 it is now taking 8 hours to run. No other changes were made. This is on RedHat Linux 6.2. A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM disks. One disk contains the Postgres directroy including the data directory, and the other disk has everything else. The Postmaster is started with these options: -i -B 1024 -N 256 -o -F The report is being run on a backup server just after the database has been loaded from a dump and 'vacuum analyse'd. There is practically nothing else running on the box during the run. The report creates four separate concurrent cursors. Each of the queries is sorted by a customer number which is an index. The index is unique in the customer table but not the others. For the other cursors it pops values as needed to process data for the current customer number. There are also other selects that are run for each customer order processed to retrieve its line items. The report does not update the database at all, it is just accumulating totals that will be written out when the report finishes. Top tells me the front end process is using 5 to 10 percent of the CPU and the back end is using 10 to 20 percent. The load average is about 1.0 and the CPU is about 80% idle. I am prettry certain on 6.5.3 that the CPU usage was much higher. Its almost like the new version has some sort of throttle to keep one backend from saturating the system. Indeed the box is much more responsive than it used to be while running this report. Suggestions?
"Bryan White" <bryan@arcamax.com> writes: > Top tells me the front end process is using 5 to 10 percent of the CPU and > the back end is using 10 to 20 percent. The load average is about 1.0 and > the CPU is about 80% idle. It's probably waiting for disk I/O ... What does EXPLAIN tell you about how the queries are being executed? Do you by any chance have the 6.5.3 system still available to compare its EXPLAIN output? regards, tom lane
Try putting 'nofsync' in your pg_options. Bryan White wrote: > > I have a large report that I run once a day. Under 6.5.3 it took just over > 3hrs to run. Under 7.0 it is now taking 8 hours to run. No other changes > were made. > > This is on RedHat Linux 6.2. A PIII 733 with 384MB Ram, and 2 IDE 7200 RPM > disks. One disk contains the Postgres directroy including the data > directory, and the other disk has everything else. > > The Postmaster is started with these options: -i -B 1024 -N 256 -o -F > > The report is being run on a backup server just after the database has been > loaded from a dump and 'vacuum analyse'd. There is practically nothing else > running on the box during the run. > > The report creates four separate concurrent cursors. Each of the queries is > sorted by a customer number which is an index. The index is unique in the > customer table but not the others. For the other cursors it pops values as > needed to process data for the current customer number. There are also > other selects that are run for each customer order processed to retrieve its > line items. The report does not update the database at all, it is just > accumulating totals that will be written out when the report finishes. > > Top tells me the front end process is using 5 to 10 percent of the CPU and > the back end is using 10 to 20 percent. The load average is about 1.0 and > the CPU is about 80% idle. I am prettry certain on 6.5.3 that the CPU usage > was much higher. Its almost like the new version has some sort of throttle > to keep one backend from saturating the system. Indeed the box is much more > responsive than it used to be while running this report. > > Suggestions?
> Try putting 'nofsync' in your pg_options. I am already starting the postmaster with -o -F which I believe has the same effect. Besides this process does not alter the database so there shold be nothing to sync.
I might have missed it, but did you submit the EXPLAIN results of the query you are attempting, as Tom Lane requested? On Wed, 24 May 2000, Bryan White wrote: > > Try putting 'nofsync' in your pg_options. > > I am already starting the postmaster with -o -F which I believe has the same > effect. Besides this process does not alter the database so there shold be > nothing to sync. > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> "Bryan White" <bryan@arcamax.com> writes: > > Top tells me the front end process is using 5 to 10 percent of the CPU and > > the back end is using 10 to 20 percent. The load average is about 1.0 and > > the CPU is about 80% idle. > > It's probably waiting for disk I/O ... > > What does EXPLAIN tell you about how the queries are being executed? > Do you by any chance have the 6.5.3 system still available to compare > its EXPLAIN output? explain select custid, poregdate, firstcontactdate, mastersubscribed, offersubscribed, bouncecount from customer order by custid; Sort (cost=598354.56..598354.56 rows=2446621 width=40) -> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40) explain select custid, orderid, date, leadsource, paymentstatus, shipping + tax from orders order by custid; Sort (cost=167945.80..167945.80 rows=588242 width=60) -> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60) explain select custid, action, offer, date, source from contact order by custid; Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635 width=44) explain select custid, listid from custlist order by custid; Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543 width=8) I find the 'Sort's on customer and orders supprising. Here are the index creates from a dump: CREATE INDEX "icusln" on "customer" using btree ( "lname" "text_ops" ); CREATE UNIQUE INDEX "icusem2" on "customer" using btree ( "email" "text_ops" ); CREATE INDEX "icusph" on "customer" using btree ( "phone" "text_ops" ); CREATE UNIQUE INDEX "icusid" on "customer" using btree ( "custid" "int4_ops" ); CREATE INDEX "iordldsrc3" on "orders" using btree ( "leadsource" "text_ops" ); CREATE UNIQUE INDEX "iordid3" on "orders" using btree ( "orderid" "int4_ops" ); CREATE INDEX "iordcus3" on "orders" using btree ( "custid" "int4_ops", "date" "date_ops" ); CREATE INDEX "iorddate3" on "orders" using btree ( "date" "date_ops" ); The iordcus3 index has a second component that is irrelevent to this operation. Is the optimizer intelligent enough to still use it.
> What does EXPLAIN tell you about how the queries are being executed? > Do you by any chance have the 6.5.3 system still available to compare > its EXPLAIN output? I can shuffle directories and reload under 6.5.3 but I need to wait until todays run of the report finishes. I got bit by the change in the default date format and had to run it again. As it is I won't be able to do this until the morning.
"Bryan White" <bryan@arcamax.com> writes: >> What does EXPLAIN tell you about how the queries are being executed? >> Do you by any chance have the 6.5.3 system still available to compare >> its EXPLAIN output? > explain select custid, poregdate, firstcontactdate, > mastersubscribed, offersubscribed, bouncecount > from customer order by custid; > Sort (cost=598354.56..598354.56 rows=2446621 width=40) > -> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40) > explain select custid, orderid, date, leadsource, > paymentstatus, shipping + tax > from orders order by custid; > Sort (cost=167945.80..167945.80 rows=588242 width=60) > -> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60) > explain select custid, action, offer, date, source > from contact order by custid; > Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635 > width=44) > explain select custid, listid > from custlist order by custid; > Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543 > width=8) > I find the 'Sort's on customer and orders supprising. Well, in fact you should not, because 6.5.* didn't know how to use an index to implement ORDER BY at all ;-). 7.0 knows how to do it both ways, and so the real question here is whether the planner's cost estimates for the two ways are accurate or not. My guess is that the index-based ORDER BYs are actually a lot slower than sort-based for your tables, and that's why 7.0 is slower than 6.5. But the planner thinks they will be faster in two of these cases. We need to figure out why it's making the wrong choice. You can force sort-based or index-based ordering by issuing SET enable_indexscan = OFF; or SET enable_sort = OFF; respectively. I suggest that the next step should be to see what EXPLAIN says for all four queries in both cases (so we can see what the planner's estimates for the two cases actually are for each table), and then to measure the actual runtimes of each of the SELECTs both ways. regards, tom lane
"Bryan White" <bryan@arcamax.com> writes: >> What does EXPLAIN tell you about how the queries are being executed? >> Do you by any chance have the 6.5.3 system still available to compare >> its EXPLAIN output? > I can shuffle directories and reload under 6.5.3 but I need to wait until > todays run of the report finishes. I got bit by the change in the default > date format and had to run it again. As it is I won't be able to do this > until the morning. Don't worry about it. Now that I've seen the queries, I know that there is only one query plan 6.5.3 could have used: seq-scan and sort. So there's no need to reload 6.5.3 just to see its EXPLAIN output. regards, tom lane
> You can force sort-based or index-based ordering by issuing > SET enable_indexscan = OFF; > or > SET enable_sort = OFF; > respectively. I suggest that the next step should be to see what > EXPLAIN says for all four queries in both cases (so we can see what > the planner's estimates for the two cases actually are for each > table), and then to measure the actual runtimes of each of the > SELECTs both ways. Here is to comparison: explain select ... from customer order by custid; Sort (cost=598354.56..598354.56 rows=2446621 width=40) -> Seq Scan on customer (cost=0.00..75939.21 rows=2446621 width=40) Index Scan using icusid on customer (cost=0.00..823287.37 rows=2446621 width=40) Choice was sort explain select ... from orders order by custid; Sort (cost=167945.80..167945.80 rows=588242 width=60) -> Seq Scan on orders (cost=0.00..31399.42 rows=588242 width=60) Index Scan using iordcus3 on orders (cost=0.00..326167.12 rows=588242 width=60) Choice was sort explain select ... from contact order by custid; Sort (cost=1874247.64..1874247.64 rows=6462635 width=44) -> Seq Scan on contact (cost=0.00..141404.35 rows=6462635 width=44) Index Scan using iconcus4 on contact (cost=0.00..1446338.62 rows=6462635 width=44) Choice was index explain select ... from custlist order by custid; Sort (cost=469342.83..469342.83 rows=2738543 width=8) -> Seq Scan on custlist (cost=0.00..42109.43 rows=2738543 width=8) Index Scan using iclcust3 on custlist (cost=0.00..334501.73 rows=2738543 width=8) Choice was index
> "Bryan White" <bryan@arcamax.com> writes: > >> What does EXPLAIN tell you about how the queries are being executed? > >> Do you by any chance have the 6.5.3 system still available to compare > >> its EXPLAIN output? > > > I can shuffle directories and reload under 6.5.3 but I need to wait until > > todays run of the report finishes. I got bit by the change in the default > > date format and had to run it again. As it is I won't be able to do this > > until the morning. > > Don't worry about it. Now that I've seen the queries, I know that there > is only one query plan 6.5.3 could have used: seq-scan and sort. So > there's no need to reload 6.5.3 just to see its EXPLAIN output. I have set index_scan off for tomorrow morning's run. I will let you know what happens.
> Well, when you have 2.7 million records in a database, the code might be as good > as it can be. I have recoverd the performance lost when I moved to Postgres 7.0 by executing SET enable_indexscan = OFF before creating my cursors and turning it back on for the inner loop query. It may even be faster then before so I am happy. > So each run of the log takes 2.7 million queries. > > Is there no way to use a where clause to limit the scope of your queries? No, the point is the resulting report covers the entire database. > An explanation of the purpose of these queries would help me think about what > you are after. Maybe it isn't a code problem, may it a business rules problem. > At the moment, I understand why you need to go through all the items in the db > on every day. Can't you just go through the items that were updated on a given > day? So, in general, what is the goal of this report? The report summarizes a lot of different aspects of our database. It breaks down totals in many by a variety of variables. For example orders are broken down by source, date, status, and sku (most orders have only one line item). I could come up with quicker methods to do any one of the breakdowns but to do them all it is more efficient to make a single pass over all the data. > I'm new to the cursor method (and to PostgreSQL, though I've done a lot of work > with MySQL), but it is interesting, so I'm certainly going to look into it. Part of the advantage of the cursor is in memory management. Is a non-cursor select the database builds and transfers the entire result set to the front end application. For large queries this can be quite a chunk of memory. With a sequential scan and sort of the database I think it is still buffering the result set on the back end but one copy is better then two. Keep in mind I am running this report on a backup database server so I don't have to worry much about other processes being hurt by the load. It seems that with index scans the cursors start producing data right away (but the overall rate is slower). With sequential scan and sort the report gets no data for the first 30 minutes and then runs at about 4 times the rate of the index scan. > What difference do you want to capture in distinguishing the customer table from > the custlist table? I am not certain I understand the question. The custlist table contains subscriptions to various services. It is a very simple table containing a listid and a custid.
> I have set index_scan off for tomorrow morning's run. I will let you know > what happens. I think my problem is fixed. By disabling index scan while creating the cursors and then turning it back on again for the small query that occurs in my inner loop the performance has returned to normal (It may infact be better then before). Thanks a bunch. Bryan
"Bryan White" <bryan@arcamax.com> writes: > I have recoverd the performance lost when I moved to Postgres 7.0 by > executing SET enable_indexscan = OFF before creating my cursors and > turning it back on for the inner loop query. It may even be faster > then before so I am happy. OK, so it was the indexscans that were hurting. (7.0 has new sorting code too, so I was a little afraid that the problem might be with the sorts. Evidently not.) This suggests that at least on your setup, the default value of 4.0 for random_page_cost might still be too low. I have not tried to measure that number on a Linux machine, just on machines with BSD-derived filesystems. Maybe Linux does a lot worse with random accesses than BSD? Needs looking into. > It seems that with index scans the cursors start producing data right away > (but the overall rate is slower). With sequential scan and sort the report > gets no data for the first 30 minutes and then runs at about 4 times the > rate of the index scan. Right, that's what you'd expect: the sort has to be completed before it knows which row to deliver first, but an indexscan has no such startup cost. regards, tom lane
> "Bryan White" <bryan@arcamax.com> writes: > > I have recoverd the performance lost when I moved to Postgres 7.0 by > > executing SET enable_indexscan = OFF before creating my cursors and > > turning it back on for the inner loop query. It may even be faster > > then before so I am happy. > > OK, so it was the indexscans that were hurting. (7.0 has new sorting > code too, so I was a little afraid that the problem might be with the > sorts. Evidently not.) > > This suggests that at least on your setup, the default value of 4.0 for > random_page_cost might still be too low. I have not tried to measure > that number on a Linux machine, just on machines with BSD-derived > filesystems. Maybe Linux does a lot worse with random accesses than > BSD? Needs looking into. As I remember, Linux turns off file system prefetch if a seek is done. BSD file systems turn off prefetch only if prefetched blocks remain unused. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hi, Bruce Momjian: > As I remember, Linux turns off file system prefetch if a seek is done. Correct (still so for Linux 2.4). > BSD file systems turn off prefetch only if prefetched blocks remain > unused. "Unused" is decided when? Do any, all, or some percentage of prefetched blocks need to be unused? -- Matthias Urlichs | noris network GmbH | smurf@noris.de | ICQ: 20193661 The quote was selected randomly. Really. | http://smurf.noris.de/ -- If hyperspace did not already exist, science fiction writers would have had to invent it -Peter Oakley
* Matthias Urlichs <smurf@noris.de> [000526 02:47] wrote: > Hi, > > Bruce Momjian: > > As I remember, Linux turns off file system prefetch if a seek is done. > > Correct (still so for Linux 2.4). > > > BSD file systems turn off prefetch only if prefetched blocks remain > > unused. > > "Unused" is decided when? > Do any, all, or some percentage of prefetched blocks need to be unused? The statement: > > BSD file systems turn off prefetch only if prefetched blocks remain > > unused. is false for FreeBSD afaik. FreeBSD tracks how often sequential access happens and will retry it as the file's access trend indicates. It can flip back and forth from readahead and non-readahead. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> The statement: > > > > BSD file systems turn off prefetch only if prefetched blocks remain > > > unused. > > is false for FreeBSD afaik. > > FreeBSD tracks how often sequential access happens and will retry it > as the file's access trend indicates. It can flip back and forth from > readahead and non-readahead. Yes, read my later posting describing the actual algorithm. Let me know if it is wrong. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Hi, > > Bruce Momjian: > > As I remember, Linux turns off file system prefetch if a seek is done. > > Correct (still so for Linux 2.4). > > > BSD file systems turn off prefetch only if prefetched blocks remain > > unused. > > "Unused" is decided when? > Do any, all, or some percentage of prefetched blocks need to be unused? It starts out prefetching blocks on first file read. In later reads, if the desired block is in the cache, it continues read-ahead and may even scale up the read-ahead window. If the desired block is not in the cache, it starts to think it is reading ahead too much or random reads are being done, and scales back the read-ahead window. It knows if the current read is sequential from from the previous read, and can scale up the read-ahead window in those cases. So the case where a file is being read sequentially but in two sections at the same time should work fine because the read-aheads are being used. It basically a feedback loop causing a read that is sequential from a previous read to make the read-ahead go up, and a read that is not in the cache to make the read-ahead go down. At least that is how I am reading the BSDI code. I could send a bit of the code over, but it is pretty confusing. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Might I suggest a change in the name of this thread or starting a new one. The subject problem was resolved sometime ago and this is just bad publicity. Almost scare me off till I read the user with the problem was now happy. Bill Barnes >===== Original Message From Bruce Momjian <pgman@candle.pha.pa.us> ===== >> The statement: >> >> > > BSD file systems turn off prefetch only if prefetched blocks remain >> > > unused. >> >> is false for FreeBSD afaik. >> >> FreeBSD tracks how often sequential access happens and will retry it >> as the file's access trend indicates. It can flip back and forth from >> readahead and non-readahead. > >Yes, read my later posting describing the actual algorithm. Let me know >if it is wrong. > >-- > Bruce Momjian | http://www.op.net/~candle > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 ------------------------------------------------------------ This e-mail has been sent to you courtesy of OperaMail, a free web-based service from Opera Software, makers of the award-winning Web Browser - http://www.operasoftware.com ------------------------------------------------------------
> Might I suggest a change in the name of this thread or starting a new one. > The subject problem was resolved sometime ago and this is just bad publicity. > Almost scare me off till I read the user with the problem was now happy. > Yeah I still find it scary when I see it and I started the thread.