Thread: Transaction progress
Hi, is there any way to know a transaction progress state ? I'm executing queries from psql and I have a query running for 24hours, and I don't know when It will end, is there any way I could know it ? if not, can I estimate the time from the explain result ? how ? this is the explain result about the query I'm running: Hash Join (cost=100347487.59..145240934.03 rows=26 width=60) -> Index Scan using inars_rectificaciones_inx on inars_rectificaciones b2 (cost=0.00..37839140.62 rows=9546466 width=34) -> Hash (cost=100347379.07..100347379.07 rows=43407 width=26) -> Seq Scan on inars_nocontrib_perm1 (cost=100000000.00..100347379.07 rows=43407 width=26) and this is the query: update inars_nocontrib_perm1 set rectificada=TRUE,actividad=b2.actividad,presentacion=b2.presentacion,remuneracion=b2.remuneracionfrom inars_rectificacionesb2 where inars_nocontrib_perm1.cuil=b2.cuil and inars_nocontrib_perm1.cuit=b2.cuit and inars_nocontrib_perm1.ano=b2.anoand inars_nocontrib_perm1.mes=b2.mes; Thank's in advance, Pablo Yaggi
On Sun, 19 Jan 2003 14:53:49 -0300, Pablo Yaggi <pyaggi@aulamagna.com.ar> wrote: >if not, can I estimate the time from the explain result ? how ? >this is the explain result about the query I'm running: > >Hash Join (cost=100347487.59..145240934.03 rows=26 width=60) > -> Index Scan using inars_rectificaciones_inx on inars_rectificaciones b2 > (cost=0.00..37839140.62 rows=9546466 width=34) ^^^^^^^ ^^ (c) (d) > -> Hash (cost=100347379.07..100347379.07 rows=43407 width=26) > -> Seq Scan on inars_nocontrib_perm1 > (cost=100000000.00..100347379.07 rows=43407 width=26) ^^^^^^^^^^^^ ^^^^^^ (a) (b) The end result of explain, 145240934.03, basically tells us that you have SET enable_seqscan = OFF (a). (b) looks a bit strange. Does inars_nocontrib_perm1 have 300000 pages but only 43407 tuples? If so, you should VACUUM more often. Anyway this seq scan contributes only a small fraction to the overall cost. (c) There are almost 10M rows in inars_rectificaciones, the width of which is at least 34 (d). Counting for overhead and columns not used by your statement, let's guess 80 bytes per tuple (might be more). This gives at least 800MB for the whole relation. Assuming that your physical memory is much less than this and the index inars_rectificaciones_inx doesn't retrieve tuples in physical order, you need one random page read per tuple. So your question boils down to how long it takes to do 10M random reads on your hardware. I have to guess again: 4ms seek, 4ms latency. Ignoring CPU and transfer times we get 8ms per page access or 80000 seconds (more than 22 hours) for the query on average PC hardware. This could be much more, if your disk is fragmented or slow or both ... Next time you run that query SET enable_seqscan = ON; this should cut down execution time to way below one hour. YMMV. Servus Manfred
On Sunday 19 January 2003 07:08 pm, Manfred Koizar wrote: > On Sun, 19 Jan 2003 14:53:49 -0300, Pablo Yaggi > > <pyaggi@aulamagna.com.ar> wrote: > >if not, can I estimate the time from the explain result ? how ? > >this is the explain result about the query I'm running: > > > >Hash Join (cost=100347487.59..145240934.03 rows=26 width=60) > > -> Index Scan using inars_rectificaciones_inx on inars_rectificaciones > > b2 (cost=0.00..37839140.62 rows=9546466 width=34) > > ^^^^^^^ ^^ > (c) (d) > > > -> Hash (cost=100347379.07..100347379.07 rows=43407 width=26) > > -> Seq Scan on inars_nocontrib_perm1 > > (cost=100000000.00..100347379.07 rows=43407 width=26) > > ^^^^^^^^^^^^ ^^^^^^ > (a) (b) > > The end result of explain, 145240934.03, basically tells us that you > have SET enable_seqscan = OFF (a). You're right, that's a configuration error, i fixed it, but the query is still working, so it will be working after I restart the server. But anyway the planner says that it will use Seq scan anyway, so is it not the same if enable_seqscan is on or not ? > > (b) looks a bit strange. Does inars_nocontrib_perm1 have 300000 pages > but only 43407 tuples? If so, you should VACUUM more often. Anyway > this seq scan contributes only a small fraction to the overall cost. how did you realize that inars_nocontrib_perm1 is 300000 pages ? I did a Vacuum before I execute the query that is still in progress, but the plan I tell last time was made before it, so this is the output of the planner while the query is running: Merge Join (cost=209527960.44..247478144.43 rows=612 width=60) -> Sort (cost=209527960.44..209527960.44 rows=35037092 width=26) -> Seq Scan on inars_nocontrib_perm1 (cost=100000000.00..100697315.92 rows=35037092 width=26) -> Index Scan using inars_rectificaciones_inx on inars_rectificaciones b2 (cost=0.00..37838713.13 rows=9546358 width=34) > (c) There are almost 10M rows in inars_rectificaciones, the width of > which is at least 34 (d). Counting for overhead and columns not used 34 (d) ? Sorry, I didn't get it. these are the tables and indx: create table inars_rectificaciones ( cuil int8,cuit int8,ano int2,mes int2, presentacion date,actividad int2,remuneracion float ); create table inars_contrib_perm1 ( cuil int8,cuit int8,ano int2,mes int2, presentacion date,rectificada bool,actividad int2,remuneracion float ); create index inars_rectificaciones_inx on inars_rectificaciones (ano,mes,cuil,cuit); > by your statement, let's guess 80 bytes per tuple (might be more). > This gives at least 800MB for the whole relation. Assuming that your > physical memory is much less than this and the index > inars_rectificaciones_inx doesn't retrieve tuples in physical order, > you need one random page read per tuple. So your question boils down > to how long it takes to do 10M random reads on your hardware. my memory size is 512Mb and the processor is a dual pentium 1.4G > I have to guess again: 4ms seek, 4ms latency. Ignoring CPU and > transfer times we get 8ms per page access or 80000 seconds (more than > 22 hours) for the query on average PC hardware. This could be much > more, if your disk is fragmented or slow or both ... Thank's a lot, Pablo
On Sun, 19 Jan 2003 20:22:45 -0300, Pablo Yaggi <pyaggi@aulamagna.com.ar> wrote: >But anyway the planner says that >it will use Seq scan anyway, so is it not the same if enable_seqscan is on or not ? "Seq scan anyway" on inars_nocontrib_perm1, but there's still that index scan on inars_rectificaciones which will turn into a (much faster) seq scan, if you allow it. >how did you realize that inars_nocontrib_perm1 is 300000 pages ? cost=100000000.00..100347379.07: 100000000.00 is a fake startup cost to talk the planner out of choosing the seq scan. From costsize.c: "The cost of reading a page sequentially is 1.0, by definition." 347379.07 is the cost for reading all pages plus processing all tuples. Given the low number of tuples (43407) the latter is not important, so I guessed 300K pages. >Merge Join (cost=209527960.44..247478144.43 rows=612 width=60) > -> Sort (cost=209527960.44..209527960.44 rows=35037092 width=26) > -> Seq Scan on inars_nocontrib_perm1 (cost=100000000.00..100697315.92 rows=35037092 width=26) > -> Index Scan using inars_rectificaciones_inx on inars_rectificaciones b2 (cost=0.00..37838713.13 rows=9546358 width=34) I have absolutely no feeling for how long a sort of 35M rows might take. Be sure to give it enough sort_mem; but OTOH not too much, the whole dataset doesn't fit into memory, so it has to be written to disk anyway. Sort_mem (and shared_buffers) is discussed on -general, -admin and -performance every now and then. Search the list archives. >> (c) There are almost 10M rows in inars_rectificaciones, the width of >> which is at least 34 (d). Counting for overhead and columns not used >34 (d) ? Sorry, I didn't get it. "34 (d)" means "34 bytes. See note (d) above" BTW, you join 35M rows to 10M rows and the planner expects to get only 612 rows. Is this realistic? If it is, can you change your application to store the "candidates" in a third table as soon as they are created? Then you could get a plan like Nested loop -> Seq scan on candidates (rows=612) -> Index scan on inars_rectificaciones_inx ( rows=1 loops=612) -> Index scan on inars_nocontrib_perm1_inx ( rows=1 loops=612) and an execution time of a few seconds. Servus Manfred
> BTW, you join 35M rows to 10M rows and the planner expects to get only > 612 rows. Is this realistic? If it is, can you change your > application to store the "candidates" in a third table as soon as they > are created? Then you could get a plan like I expect that the join produce about 4M rows, so I think the planner is wrong, and about the data, I already have the data and I preparing it for later processing, so I can't have any original source. I had also created and index inars_nocontrib_perm1_inx this way: create index inars_nocontrib_perm1_inx on inars_nocontrib_perm1 (ano,mes,cuil,cuit) but the planner didn't use it, as you can see. That's the way I broke the config file before (enable_seqscan=false). so this is my scenario, the query I need to do is this: update inars_nocontrib_perm1 set rectificada=TRUE,actividad=b2.actividad,presentacion=b2.presentacion,remuneracion=b2.remuneracionfrom inars_rectificacionesb2 where inars_nocontrib_perm1.cuil=b2.cuil and inars_nocontrib_perm1.cuit=b2.cuit and inars_nocontrib_perm1.ano=b2.anoand inars_nocontrib_perm1.mes=b2.mes; where inars_nocontrib_perm1 is about 35M rows inars_rectificaciones is about 10M rows sort memory 4096 based on your experience/calculation, could you give some advice, do I have to increase sort memory ? do I have to change the query ? ... well something, the query is running from about 28 hours, do I stop it and try something else ? is there anyway to check how long, even estimated, it will take to finish ? well thank's a lot for your help, I hope I'm not abusing of it, Best Regards, Pablo
Sorry, I checked out and I notice, that the index inars_nocontrib_perm1 doesn't exists, do you think it's better if I stop the query and create these index ? Pablo PD: I'm expecting you read this one with the last one. =)
On Sun, 19 Jan 2003 22:23:49 -0300, Pablo Yaggi <pyaggi@aulamagna.com.ar> wrote: >I had also created and index inars_nocontrib_perm1_inx this way: > >create index inars_nocontrib_perm1_inx on inars_nocontrib_perm1 (ano,mes,cuil,cuit) How long did this take? Knowing this can help estimating sort cost. As I told you I have no gut feeling for large sorts; simply not enough experience ... Does this index still exist? >but the planner didn't use it, as you can see. That's the way I broke the config file >before (enable_seqscan=false). If your tuples are physically ordered by ano, mes, cuil, cuit, then an index scan is almost as fast as a seq scan and there is no need for a separate sort step. Unfortunately the planner has its problems with multi column indices. So if *you* know that tuples are stored in index order in both relations, this might indeed be a good case for setting enable_seqscan=off. >based on your experience/calculation, could you give some advice, do I have to increase sort memory ? Definitely! I just don't know how much :-( A shoot into the dark: 60000 or even 120000, but don't leave it that high when you go multiuser. >do I have to change the query ? If the sort turns out to be the problem and it is unavoidable, I'd do several smaller updates: UPDATE ... WHERE ano=2000::int2 AND mes=1::int2; UPDATE ... WHERE ano=2000::int2 AND mes=2::int2; ... UPDATE ... WHERE ano=2003::int2 AND mes=1::int2; >... well something, the query is running from about 28 hours, do I stop it >and try something else ? is there anyway to check how long, even estimated, it will take to finish ? Not that I know of, except watching your disk files grow and trying to estimate how many tuples have already been updated ... Servus Manfred
Manfred, well thank's a lot you really help me to clearfy many concepts about postgres explain and setup parameters. About the uqery, it's finally done, it takes about 40 hours, and it update about 9.1M rows, so the planner missed that. Once again, thanks a lot, Best Regards Pablo On Monday 20 January 2003 07:21 am, Manfred Koizar wrote: > On Sun, 19 Jan 2003 22:23:49 -0300, Pablo Yaggi > > <pyaggi@aulamagna.com.ar> wrote: > >I had also created and index inars_nocontrib_perm1_inx this way: > > > >create index inars_nocontrib_perm1_inx on inars_nocontrib_perm1 > > (ano,mes,cuil,cuit) > > How long did this take? Knowing this can help estimating sort cost. > As I told you I have no gut feeling for large sorts; simply not > enough experience ... > > Does this index still exist? > > >but the planner didn't use it, as you can see. That's the way I broke the > > config file before (enable_seqscan=false). > > If your tuples are physically ordered by ano, mes, cuil, cuit, then an > index scan is almost as fast as a seq scan and there is no need for a > separate sort step. Unfortunately the planner has its problems with > multi column indices. So if *you* know that tuples are stored in > index order in both relations, this might indeed be a good case for > setting enable_seqscan=off. > > >based on your experience/calculation, could you give some advice, do I > > have to increase sort memory ? > > Definitely! I just don't know how much :-( > A shoot into the dark: 60000 or even 120000, but don't leave it that > high when you go multiuser. > > >do I have to change the query ? > > If the sort turns out to be the problem and it is unavoidable, I'd do > several smaller updates: > > UPDATE ... WHERE ano=2000::int2 AND mes=1::int2; > UPDATE ... WHERE ano=2000::int2 AND mes=2::int2; > ... > UPDATE ... WHERE ano=2003::int2 AND mes=1::int2; > > >... well something, the query is running from about 28 hours, do I stop it > >and try something else ? is there anyway to check how long, even > > estimated, it will take to finish ? > > Not that I know of, except watching your disk files grow and trying to > estimate how many tuples have already been updated ... > > Servus > Manfred > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)