[HACKERS] Highly Variable Planning Times - Mailing list pgsql-hackers
From | Michael Malis |
---|---|
Subject | [HACKERS] Highly Variable Planning Times |
Date | |
Msg-id | CAFQtOwqWdDFZ-1Xv2=b43=ScNjsi5nOQ7MQyEz6TEXW1cf=3Pg@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Highly Variable Planning Times
(Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Highly Variable Planning Times (Alvaro Herrera <alvherre@2ndquadrant.com>) Re: [HACKERS] Highly Variable Planning Times (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, I've been encountering highly variable planning time on PG 9.5.4. Running `EXPLAIN SELECT * FROM events_1171738` will take anywhere from 200ms to 4s. This likely has to do with the table having 1300 partial indexes on it (for reasons elaborated on in https://blog.heapanalytics.com/running-10-million-postgresql-indexes-in-production/). I'm trying to figure out if there is something I can do to eliminate the slow planning from happening. When I used `strace -c` on the backend process, I found that the number of `semop` system calls is much higher for the slower queries than the faster ones. After digging in a bit more I found that when the query was slow, there would be a single semaphore which would make up most of the `semop` calls. Here is a snippet of the output when I ran `strace -r` on a query with high planning time: 0.000225 open("base/16384/13236647", O_RDWR) = 219 0.000110 lseek(219, 0, SEEK_END) = 16384 0.000122 semop(1736711, [{9, -1, 0}], 1) = 0 0.004110 semop(1769480, [{10, 1, 0}], 1) = 0 0.000197 semop(1736711, [{9, -1, 0}], 1) = 0 0.010096 semop(1540097, [{1, 1, 0}], 1) = 0 0.000176 semop(1736711, [{9, -1, 0}], 1) = 0 0.004069 semop(1638404, [{15, 1, 0}], 1) = 0 0.000244 open("base/16384/13236648", O_RDWR) = 220 0.000131 lseek(220, 0, SEEK_END) = 16384 0.000212 semop(1736711, [{9, -1, 0}], 1) = 0 0.007851 semop(1736711, [{8, 1, 0}], 1) = 0 0.000130 semop(1736711, [{9, -1, 0}], 1) = 0 0.000101 semop(1769480, [{5, 1, 0}], 1) = 0 0.000450 open("base/16384/13236653", O_RDWR) = 221 0.000151 lseek(221, 0, SEEK_END) = 180224 0.000141 semop(1736711, [{9, -1, 0}], 1) = 0 0.000239 semop(1736711, [{9, -1, 0}], 1) = 0 0.000419 semop(1736711, [{10, 1, 0}], 1) = 0 0.000127 semop(1736711, [{9, -1, 0}], 1) = 0 0.001493 semop(1703942, [{3, 1, 0}], 1) = 0 You can see that most of the semop calls are waits on semaphore #9 in semaphore set 1736711. I looked up the relation for the files being opened and they all seem to be indexes on the table. Based on some flame graphs I generated, I believe most of the time is spent inside of the function get_relation_info. I think the open/lseek calls are coming from the estimate_rel_size call in get_relation_info, but I haven't been able to figure out where the semop calls are coming from. It seems that the semop calls are at least a symptom of the high planning time. I'm looking for help in finding the root cause of the high planning time and for figuring out a way for eliminating the high planning time. Thanks, Michael
pgsql-hackers by date: