Re: Fwd: Tricky join and update with same table - Mailing list pgsql-novice

From Michael Swierczek
Subject Re: Fwd: Tricky join and update with same table
Date
Msg-id CAHp1f1MrE0eZ_aHRRKTauJ-7gJ8bkPLzkbRCKtSjz1uk43qo1g@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: Tricky join and update with same table  (James David Smith <james.david.smith@gmail.com>)
List pgsql-novice
On Fri, Jul 26, 2013 at 10:36 AM, James David Smith
<james.david.smith@gmail.com> wrote:
> Hi Mike,
>
> About half an hour before you replied I actually managed to come up with
> that myself! :-)
>
> Using it in an update query is troubling me though. I've done it a few times
> now and it keeps updating the wrong fields.
>
> Luca sent me an email a short while ago with some code to try however, and
> I'm doing that at the moment. Though it's been running for about 30 minutes
> now and isn't done. Though there are 230,000 rows, so perhaps that isn't
> that surprising after all. Here it is for posterity:
>
> PDATE hlhs_day hd
> SET point_geom = ( SELECT hd2.point_geom
>                       FROM hlhs_day hd2
>                       WHERE hd.spid = hd2.spid
>                       AND hd2.point_geom IS NOT NULL
>                       AND hd2.point_time = ( SELECT min( hd3.point_time )
>                                            FROM hlhs_day hd3
>                                            WHERE hd.spid = hd3.spid
>                                            AND hd3.point_geom IS NOT NULL
>                                            AND hd3.point_time >
> hd.point_time
>                                            )
>                       )
> WHERE hd.point_geom IS null
>
> Thanks
>
> James
>
>


This is a bit of a kluge solution, but if it were me I would take the
query I suggested (and you figured out on your own), put the results
in a text file, and then use Excel or some regular expressions in
Vim/Emacs/Notepad++/whatever to translate each line from
spidx,point_timex,point_geomx
to
update hlhs_day set point_geom = point_geomx where spid = spidx and
point_time  = point_timex;
Then run it through psql or pgAdmin3.

I believe that would probably run much more quickly, and you can break
it up into batches.

Good luck either way.
-Mike


>
>
>
>
> On 26 July 2013 14:37, Michael Swierczek <mike.swierczek@gmail.com> wrote:
>>
>> On Fri, Jul 26, 2013 at 5:24 AM, James David Smith
>> <james.david.smith@gmail.com> wrote:
>> > Hi all,
>> >
>> > Some help if you can please. I have GPS data for a number of people.
>> > Each
>> > person turned the GPS on, at home, at a different time of the day.
>> > Before
>> > that time, I presume that the person was in their house. I have made a
>> > table
>> > for each person for a whole 24 hours (one record per minute), and I now
>> > want
>> > to 'fill in' their location for the rows before they turned the GPS on.
>> > So
>> > for each person I want to take the first row where the point_geom is not
>> > null, and update all of the rows above it with that value. It's driving
>> > me
>> > nuts.
>> >
>> > spid     |     point_time                 |     point_geom
>> > -----------------------------------------------------------
>> > 1         |     2012-01-01 00:01:00   |
>> > 1         |     2012-01-01 00:02:00   |
>> > 1         |     2012-01-01 00:03:00   |     POINT(X, Y)
>> > 1         |     2012-01-01 00:04:00   |     POINT(X, Y)
>> > 1         |     2012-01-01 00:05:00   |     POINT(X, Y)
>> > 2         |     2012-01-01 00:01:00   |
>> > 2         |     2012-01-01 00:02:00   |
>> > 2         |     2012-01-01 00:03:00   |
>> > 2         |     2012-01-01 00:04:00   |
>> > 2         |     2012-01-01 00:05:00   |     POINT(X, Y)
>> > 3         |     2012-01-01 00:01:00   |
>> > 3         |     2012-01-01 00:02:00   |     POINT(X, Y)
>> > 3         |     2012-01-01 00:03:00   |     POINT(X, Y)
>> > 3         |     2012-01-01 00:04:00   |     POINT(X, Y)
>> > 3         |     2012-01-01 00:05:00   |     POINT(X, Y)
>> >
>> > I've managed to select the correct row using this:
>> >
>> > SELECT spid, min(point_time) as point_time
>> > FROM hlhs_day
>> > WHERE point_geom IS NOT NULL
>> > GROUP BY spid;
>> >
>> > However when I try to add in the column 'point_geom' to the query, it
>> > won't
>> > work.
>> >
>> > Thanks for your help and suggestions.
>> >
>> > James
>>
>> Usually when I chime in on questions like this, someone comes along
>> with a better solution after I finish.  But I think you can get the
>> information you want by "wrapping" that query:
>>
>> SELECT hd1.spid, hd1.point_time, hd2.point_geom
>> FROM
>> (SELECT spid, min(point_time) as point_time
>> FROM hlhs_day
>> WHERE point_geom IS NOT NULL GROUP BY spid) as hd1
>> INNER JOIN hlhs_day hd2 ON hd1.spid = hd2.spid AND hd1.point_time =
>> hd2.point_time
>> ORDER BY hd1.spid;
>>
>> I hope this helps.
>> --Mike
>
>


pgsql-novice by date:

Previous
From: David Johnston
Date:
Subject: Re: Fwd: Tricky join and update with same table
Next
From: Luca Ferrari
Date:
Subject: Re: Fwd: Tricky join and update with same table