Thread: large numbers of inserts out of memory strategy
I'm writing a migration utility to move data from non-rdbms data source to a postgres db. Currently I'm generating SQL INSERT statements involving 6 related tables for each 'thing'. With 100k or more 'things' to migrate I'm generating a lot of statements and when I try to import using psql postgres fails with 'out of memory' when running on a Linux VM with 4G of memory. If I break into smaller chunks say ~50K statements then thde import succeeds. I can change my migration utility to generate multiple files each with a limited number of INSERTs to get around this issue but maybe there's another/better way? Ted
> On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote: > > I'm writing a migration utility to move data from non-rdbms data > source to a postgres db. Currently I'm generating SQL INSERT > statements involving 6 related tables for each 'thing'. With 100k or > more 'things' to migrate I'm generating a lot of statements and when I > try to import using psql postgres fails with 'out of memory' when > running on a Linux VM with 4G of memory. If I break into smaller > chunks say ~50K statements then thde import succeeds. I can change my > migration utility to generate multiple files each with a limited > number of INSERTs to get around this issue but maybe there's > another/better way? > > Ted > what tools / languages ate you using?
Hi, On 11/28/2017 06:17 PM, Ted Toth wrote: > I'm writing a migration utility to move data from non-rdbms data > source to a postgres db. Currently I'm generating SQL INSERT > statements involving 6 related tables for each 'thing'. With 100k or > more 'things' to migrate I'm generating a lot of statements and when I > try to import using psql postgres fails with 'out of memory' when > running on a Linux VM with 4G of memory. If I break into smaller > chunks say ~50K statements then thde import succeeds. I can change my > migration utility to generate multiple files each with a limited > number of INSERTs to get around this issue but maybe there's > another/better way? > The question is what exactly runs out of memory, and how did you modify the configuration (particularly related to memory). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote: > >> On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote: >> >> I'm writing a migration utility to move data from non-rdbms data >> source to a postgres db. Currently I'm generating SQL INSERT >> statements involving 6 related tables for each 'thing'. With 100k or >> more 'things' to migrate I'm generating a lot of statements and when I >> try to import using psql postgres fails with 'out of memory' when >> running on a Linux VM with 4G of memory. If I break into smaller >> chunks say ~50K statements then thde import succeeds. I can change my >> migration utility to generate multiple files each with a limited >> number of INSERTs to get around this issue but maybe there's >> another/better way? >> >> Ted >> > what tools / languages ate you using? I'm using python to read binary source files and create the text files contains the SQL. Them I'm running psql -f <file containing SQL>.
On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Hi, > > On 11/28/2017 06:17 PM, Ted Toth wrote: >> I'm writing a migration utility to move data from non-rdbms data >> source to a postgres db. Currently I'm generating SQL INSERT >> statements involving 6 related tables for each 'thing'. With 100k or >> more 'things' to migrate I'm generating a lot of statements and when I >> try to import using psql postgres fails with 'out of memory' when >> running on a Linux VM with 4G of memory. If I break into smaller >> chunks say ~50K statements then thde import succeeds. I can change my >> migration utility to generate multiple files each with a limited >> number of INSERTs to get around this issue but maybe there's >> another/better way? >> > > The question is what exactly runs out of memory, and how did you modify > the configuration (particularly related to memory). > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services I'm pretty new to postgres so I haven't changed any configuration setting and the log is a bit hard for me to make sense of :(
Attachment
On 11/28/2017 10:50 AM, Ted Toth wrote: > On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote: >>> On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote: >>> >>> I'm writing a migration utility to move data from non-rdbms data >>> source to a postgres db. Currently I'm generating SQL INSERT >>> statements involving 6 related tables for each 'thing'. With 100k or >>> more 'things' to migrate I'm generating a lot of statements and when I >>> try to import using psql postgres fails with 'out of memory' when >>> running on a Linux VM with 4G of memory. If I break into smaller >>> chunks say ~50K statements then thde import succeeds. I can change my >>> migration utility to generate multiple files each with a limited >>> number of INSERTs to get around this issue but maybe there's >>> another/better way? >>> >>> Ted >>> >> what tools / languages ate you using? > I'm using python to read binary source files and create the text files > contains the SQL. Them I'm running psql -f <file containing SQL>. If you're going out to the file system, I would use COPY of csv files (if number of records per table is non-trivial). Any bulk loading python available?
On 11/28/2017 06:54 PM, Ted Toth wrote: > On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Hi, >> >> On 11/28/2017 06:17 PM, Ted Toth wrote: >>> I'm writing a migration utility to move data from non-rdbms data >>> source to a postgres db. Currently I'm generating SQL INSERT >>> statements involving 6 related tables for each 'thing'. With 100k or >>> more 'things' to migrate I'm generating a lot of statements and when I >>> try to import using psql postgres fails with 'out of memory' when >>> running on a Linux VM with 4G of memory. If I break into smaller >>> chunks say ~50K statements then thde import succeeds. I can change my >>> migration utility to generate multiple files each with a limited >>> number of INSERTs to get around this issue but maybe there's >>> another/better way? >>> >> >> The question is what exactly runs out of memory, and how did you modify >> the configuration (particularly related to memory). >> >> regards >> >> -- >> Tomas Vondra http://www.2ndQuadrant.com >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > I'm pretty new to postgres so I haven't changed any configuration > setting and the log is a bit hard for me to make sense of :( > The most interesting part of the log is this: SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 2464406352 used PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 free (4 chunks); 535523816 used That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL procedure (500MB). How do you do the load? What libraries/drivers? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 28 Nov 2017 11:17:07 -0600 Ted Toth <txtoth@gmail.com> wrote: > I'm writing a migration utility to move data from non-rdbms data > source to a postgres db. Currently I'm generating SQL INSERT > statements involving 6 related tables for each 'thing'. With 100k or > more 'things' to migrate I'm generating a lot of statements and when I > try to import using psql postgres fails with 'out of memory' when > running on a Linux VM with 4G of memory. If I break into smaller > chunks say ~50K statements then thde import succeeds. I can change my > migration utility to generate multiple files each with a limited > number of INSERTs to get around this issue but maybe there's > another/better way? Chunking the loads has a number of advantages other than avoiding OOM errors, not the least of which are possibly parallel loading and being able to restart after other failures without re-processing all of the input data. Note that 4GiB of core is not all that much by today's standards. You might want to run top/vmstat and ask if the PG daemons are using all/most of the available memory. If PG is sucking up all of the core as it is then tuning the database may not have much of an effect; if there is lots of spare memory then it'll be worth looking at ways to tune PG. Note also that "out of memory" frequently means virutal memory. Q: Does the VM have swap configured? If not then add 8GiB and see if that solves your problem; if so then how much swap is in use when you get the OOM error? -- Steven Lembark 1505 National Ave Workhorse Computing Rockford, IL 61103 lembark@wrkhors.com +1 888 359 3508
On Tue, Nov 28, 2017 at 12:04 PM, Steven Lembark <lembark@wrkhors.com> wrote: > On Tue, 28 Nov 2017 11:17:07 -0600 > Ted Toth <txtoth@gmail.com> wrote: > >> I'm writing a migration utility to move data from non-rdbms data >> source to a postgres db. Currently I'm generating SQL INSERT >> statements involving 6 related tables for each 'thing'. With 100k or >> more 'things' to migrate I'm generating a lot of statements and when I >> try to import using psql postgres fails with 'out of memory' when >> running on a Linux VM with 4G of memory. If I break into smaller >> chunks say ~50K statements then thde import succeeds. I can change my >> migration utility to generate multiple files each with a limited >> number of INSERTs to get around this issue but maybe there's >> another/better way? > > Chunking the loads has a number of advantages other than avoiding > OOM errors, not the least of which are possibly parallel loading > and being able to restart after other failures without re-processing > all of the input data. > > Note that 4GiB of core is not all that much by today's standards. > You might want to run top/vmstat and ask if the PG daemons are using > all/most of the available memory. If PG is sucking up all of the core > as it is then tuning the database may not have much of an effect; if > there is lots of spare memory then it'll be worth looking at ways to > tune PG. > > Note also that "out of memory" frequently means virutal memory. > > Q: Does the VM have swap configured? > > If not then add 8GiB and see if that solves your problem; if so then > how much swap is in use when you get the OOM error? > > > -- > Steven Lembark 1505 National Ave > Workhorse Computing Rockford, IL 61103 > lembark@wrkhors.com +1 888 359 3508 > I understand that 4G is not much ... Yeah in top I see the postmaster process RES grow until it fails. The VM is basically a Centos 6 box with 4G of swap.
On Tue, Nov 28, 2017 at 12:01 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > On 11/28/2017 06:54 PM, Ted Toth wrote: >> On Tue, Nov 28, 2017 at 11:22 AM, Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >>> Hi, >>> >>> On 11/28/2017 06:17 PM, Ted Toth wrote: >>>> I'm writing a migration utility to move data from non-rdbms data >>>> source to a postgres db. Currently I'm generating SQL INSERT >>>> statements involving 6 related tables for each 'thing'. With 100k or >>>> more 'things' to migrate I'm generating a lot of statements and when I >>>> try to import using psql postgres fails with 'out of memory' when >>>> running on a Linux VM with 4G of memory. If I break into smaller >>>> chunks say ~50K statements then thde import succeeds. I can change my >>>> migration utility to generate multiple files each with a limited >>>> number of INSERTs to get around this issue but maybe there's >>>> another/better way? >>>> >>> >>> The question is what exactly runs out of memory, and how did you modify >>> the configuration (particularly related to memory). >>> >>> regards >>> >>> -- >>> Tomas Vondra http://www.2ndQuadrant.com >>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >> >> I'm pretty new to postgres so I haven't changed any configuration >> setting and the log is a bit hard for me to make sense of :( >> > > The most interesting part of the log is this: > > SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); > 2464406352 used > PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 > free (4 chunks); 535523816 used > > > That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL > procedure (500MB). How do you do the load? What libraries/drivers? > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services I'm doing the load with 'psql -f'. I using 9.6 el6 rpms on a Centos VM I downloaded from the postgres repo.
On 11/28/2017 07:26 PM, Ted Toth wrote: > On Tue, Nov 28, 2017 at 12:01 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> >> ... >> >> That is, most of the memory is allocated for SPI (2.4GB) and PL/pgSQL >> procedure (500MB). How do you do the load? What libraries/drivers? >> > > I'm doing the load with 'psql -f'. I using 9.6 el6 rpms on a Centos VM > I downloaded from the postgres repo. > So what does the script actually do? Because psql certainly is not running pl/pgsql procedures on it's own. We need to understand why you're getting OOM in the first place - just inserts alone should not cause failures like that. Please show us more detailed explanation of what the load actually does, so that we can try reproducing it. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra wrote:
> So what does the script actually do? Because psql certainly is not
> running pl/pgsql procedures on it's own. We need to understand why
> you're getting OOM in the first place - just inserts alone should not
> cause failures like that. Please show us more detailed explanation of
> what the load actually does, so that we can try reproducing it.
Perhaps the script is one giant insert statement?
—Brian
Brian Crowell <brian@fluggo.com> writes: > On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com >> wrote: >> So what does the script actually do? Because psql certainly is not >> running pl/pgsql procedures on it's own. We need to understand why >> you're getting OOM in the first place - just inserts alone should not >> cause failures like that. Please show us more detailed explanation of >> what the load actually does, so that we can try reproducing it. > Perhaps the script is one giant insert statement? It's pretty clear from the memory map that the big space consumption is inside a single invocation of a plpgsql function: SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 2464406352 used PL/pgSQL function context: 537911352total in 74 blocks; 2387536 free (4 chunks); 535523816 used So whatever's going on here, there's more to it than a giant client-issued INSERT (or COPY), or for that matter a large number of small ones. What would seem to be required is a many-megabyte-sized plpgsql function body or DO block. Actually, the truly weird thing about that map is that the "PL/pgSQL function context" seems to be a child of a "SPI Proc" context, whereas it's entirely clear from the code that it ought to be a direct child of TopMemoryContext. I have no idea how this state of affairs came to be, and am interested to find out. regards, tom lane
On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Brian Crowell <brian@fluggo.com> writes: >> On Tue, Nov 28, 2017 at 12:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com >>> wrote: >>> So what does the script actually do? Because psql certainly is not >>> running pl/pgsql procedures on it's own. We need to understand why >>> you're getting OOM in the first place - just inserts alone should not >>> cause failures like that. Please show us more detailed explanation of >>> what the load actually does, so that we can try reproducing it. > >> Perhaps the script is one giant insert statement? > > It's pretty clear from the memory map that the big space consumption > is inside a single invocation of a plpgsql function: > > SPI Proc: 2464408024 total in 279 blocks; 1672 free (1 chunks); 2464406352 used > PL/pgSQL function context: 537911352 total in 74 blocks; 2387536 free (4 chunks); 535523816 used > > So whatever's going on here, there's more to it than a giant client-issued > INSERT (or COPY), or for that matter a large number of small ones. What > would seem to be required is a many-megabyte-sized plpgsql function body > or DO block. > > Actually, the truly weird thing about that map is that the "PL/pgSQL > function context" seems to be a child of a "SPI Proc" context, whereas > it's entirely clear from the code that it ought to be a direct child of > TopMemoryContext. I have no idea how this state of affairs came to be, > and am interested to find out. > > regards, tom lane Yes I did generate 1 large DO block: DO $$ DECLARE thingid bigint; thingrec bigint; thingdataid bigint; BEGIN INSERT INTO thing (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec INTO thingid,thingrec; INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); INSERT INTO thingstatus (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); INSERT INTO thinger (thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd) VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10); INSERT INTO thingdata (thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite) VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0) RETURNING id INTO thingdataid; INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES (thingdataid,'013086',0,1502970401,'FOO'); <repeated for each thing> END $$; Should I limit the number of 'thing' inserts within a DO block or wrapping each 'thing' insert in it's own DO block?
Ted Toth <txtoth@gmail.com> writes: > On Tue, Nov 28, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So whatever's going on here, there's more to it than a giant client-issued >> INSERT (or COPY), or for that matter a large number of small ones. What >> would seem to be required is a many-megabyte-sized plpgsql function body >> or DO block. > Yes I did generate 1 large DO block: Apparently by "large" you mean "hundreds of megabytes". Don't do that, at least not on a machine that hasn't got hundreds of megabytes to spare. The entire thing has to be sucked into menory and parsed before anything will happen. regards, tom lane
> > what tools / languages ate you using? > > I'm using python to read binary source files and create the text files > contains the SQL. Them I'm running psql -f <file containing SQL>. Then chunking the input should be trivial. There are a variety of techniques you can use to things like disable indexes during loading, etc. Maybe load them into temp tables and then insert the temp's into the destination tables. The point is to amortize the memory load over the entire load period. -- Steven Lembark 1505 National Ave Workhorse Computing Rockford, IL 61103 lembark@wrkhors.com +1 888 359 3508
> I'm pretty new to postgres so I haven't changed any configuration > setting and the log is a bit hard for me to make sense of :( Diving into the shark tank is a helluva way to learn how to swim :-) Are you interested in finding doc's on how to deal with the tuning? -- Steven Lembark 1505 National Ave Workhorse Computing Rockford, IL 61103 lembark@wrkhors.com +1 888 359 3508
On 28/11/17, Rob Sargent (robjsargent@gmail.com) wrote: > > On 11/28/2017 10:50 AM, Ted Toth wrote: > > On Tue, Nov 28, 2017 at 11:19 AM, Rob Sargent <robjsargent@gmail.com> wrote: > > > > On Nov 28, 2017, at 10:17 AM, Ted Toth <txtoth@gmail.com> wrote: > > > > > > > > I'm writing a migration utility to move data from non-rdbms data > > > > source to a postgres db. Currently I'm generating SQL INSERT > > > > statements involving 6 related tables for each 'thing'. With 100k or > > > > more 'things' to migrate I'm generating a lot of statements and when I > > > > try to import using psql postgres fails with 'out of memory' when > > > > running on a Linux VM with 4G of memory. If I break into smaller > > > > chunks say ~50K statements then thde import succeeds. I can change my > > > > migration utility to generate multiple files each with a limited > > > > number of INSERTs to get around this issue but maybe there's > > > > another/better way? > > > what tools / languages ate you using? > > I'm using python to read binary source files and create the text files > > contains the SQL. Them I'm running psql -f <file containing SQL>. > If you're going out to the file system, I would use COPY of csv files (if > number of records per table is non-trivial). Any bulk loading python > available? psycopg2 has a copy_from function and (possibly pertinent in this case) a copy_expert function which allows the read buffer size to be specified.
On 2017-11-29 08:32:02 -0600, Ted Toth wrote: > Yes I did generate 1 large DO block: > > DO $$ > DECLARE thingid bigint; thingrec bigint; thingdataid bigint; > BEGIN > INSERT INTO thing > (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES > ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec > INTO thingid,thingrec; > INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); > INSERT INTO thingstatus > (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) > VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); > INSERT INTO thinger > (thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd) > VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10); > INSERT INTO thingdata > (thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite) > VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0) > RETURNING id INTO thingdataid; > INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES > (thingdataid,'013086',0,1502970401,'FOO'); > > <repeated for each thing> > > END $$; > > Should I limit the number of 'thing' inserts within a DO block or > wrapping each 'thing' insert in it's own DO block? I would suggest getting rid of the do block entirely if that is possible. Just create lots of insert statements. You can get the current value of a sequence with currval('sequence_name'). Alternately or in addition, since you are using python, you might want to insert directly into the database from python using psycopg2. For separate insert statements that should have about the same performance. (It is usually much faster to write to a csv file and load that with copy than to insert each row, but you don't do that and it might be difficult in your case). hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
On Thu, Nov 30, 2017 at 4:22 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2017-11-29 08:32:02 -0600, Ted Toth wrote: >> Yes I did generate 1 large DO block: >> >> DO $$ >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint; >> BEGIN >> INSERT INTO thing >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec >> INTO thingid,thingrec; >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); >> INSERT INTO thingstatus >> (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); >> INSERT INTO thinger >> (thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd) >> VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10); >> INSERT INTO thingdata >> (thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite) >> VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0) >> RETURNING id INTO thingdataid; >> INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES >> (thingdataid,'013086',0,1502970401,'FOO'); >> >> <repeated for each thing> >> >> END $$; >> >> Should I limit the number of 'thing' inserts within a DO block or >> wrapping each 'thing' insert in it's own DO block? > Thanks for the specific suggestions. > I would suggest getting rid of the do block entirely if that is > possible. Just create lots of insert statements. You can get the current > value of a sequence with currval('sequence_name'). What is the downside of using a DO block? I'd have to do a nextval on each sequence before I could use currval, right? Or I could do 'select last_value from <sequence>'. One thing that is unclear to me is when commits occur while using psql would you know where in the docs I can find information on this subject? > > Alternately or in addition, since you are using python, you might want > to insert directly into the database from python using psycopg2. For > separate insert statements that should have about the same performance. > (It is usually much faster to write to a csv file and load that with > copy than to insert each row, but you don't do that and it might be > difficult in your case). Yes, I thought about generating csv files but didn't see a way to deal with the foreign keys. > > hp > > -- > _ | Peter J. Holzer | we build much bigger, better disasters now > |_|_) | | because we have much more sophisticated > | | | hjp@hjp.at | management tools. > __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
On 2017-11-30 08:43:32 -0600, Ted Toth wrote: > Date: Thu, 30 Nov 2017 08:43:32 -0600 > From: Ted Toth <txtoth@gmail.com> > To: "Peter J. Holzer" <hjp-pgsql@hjp.at> > Cc: pgsql-general@lists.postgresql.org > Subject: Re: large numbers of inserts out of memory strategy > > On Thu, Nov 30, 2017 at 4:22 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2017-11-29 08:32:02 -0600, Ted Toth wrote: > >> Yes I did generate 1 large DO block: > >> > >> DO $$ > >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint; > >> BEGIN > >> INSERT INTO thing > >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES > >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec > >> INTO thingid,thingrec; > >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); > >> INSERT INTO thingstatus > >> (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) > >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); [...] > >> > >> <repeated for each thing> > >> > >> END $$; > >> > >> Should I limit the number of 'thing' inserts within a DO block or > >> wrapping each 'thing' insert in it's own DO block? > > > > Thanks for the specific suggestions. > > > I would suggest getting rid of the do block entirely if that is > > possible. Just create lots of insert statements. You can get the current > > value of a sequence with currval('sequence_name'). > > What is the downside of using a DO block? As you discovered, the downside is that the whole do block needs to be parsed before it can be executed. If you generate a huge do block, that can be an issue. If you create lots of little do blocks, the overhead is probably not noticable. But I'd like to pose the question the other way around: What are the advantages of using a do block? If there aren't any, I wouldn't use it (keep it simple). One advantage is that you can use variables. But I don't think you need that. Other possible advantages could be speed or transactions: I don't know if these exist. > I'd have to do a nextval on each sequence before I could use currval, > right? That happens automatically if you insert the default value into a serial column. You seem to do that here: > >> INSERT INTO thing (ltn,classification,machine,source,thgrec,flags,serial,type) > >> VALUES ('T007336','THING',0,1025,7336,7,'XXX869977564',1) > >> RETURNING id,thgrec INTO thingid,thingrec; The column id is probably defined as serial (or explicitely associated with a sequence). So you can get the value you just inserted with currval('thing_id_seq') (Check the table definition to get the sequence name, but unless you explicitely specified the sequence name that's it). So you could ditch the "RETURNING ..." clause and replace the next statement with: INSERT INTO recnum_thing (recnum, thing_id) VALUES (7336, currval('thing_id_seq')); And similar for the other statements where you use thingid and thingdataid (thingrec is already known as far as i can see). Note that this doesn't work if you need two values from the same sequence. So you can't replace insert into thing(..) values(...) returning id as parent_id; insert into thing(..) values(...) returning id as child_id; insert into structure(parent, child) values(parent_id, child_id); in the same way. But you don't seem to need anything like that. > One thing that is unclear to me is when commits occur while using psql > would you know where in the docs I can find information on this > subject? By default psql enables autocommit which causes an implicit commit after every statement. With a do block I'm not sure whether that means after the do block or after each statement within the do block. I'd just turn autocommit off and add explicit commits wherever I wanted them. > > Alternately or in addition, since you are using python, you might want > > to insert directly into the database from python using psycopg2. For > > separate insert statements that should have about the same performance. > > (It is usually much faster to write to a csv file and load that with > > copy than to insert each row, but you don't do that and it might be > > difficult in your case). > > Yes, I thought about generating csv files but didn't see a way to deal > with the foreign keys. Still: Is there a reason why you use a python script to create an sql script instead of directly issuing the sql queries from your python script? hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
On 30 November 2017 at 05:22, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2017-11-29 08:32:02 -0600, Ted Toth wrote: >> Yes I did generate 1 large DO block: >> >> DO $$ >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint; >> BEGIN >> INSERT INTO thing >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec >> INTO thingid,thingrec; >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); >> INSERT INTO thingstatus >> (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); >> INSERT INTO thinger >> (thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd) >> VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10); >> INSERT INTO thingdata >> (thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite) >> VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0) >> RETURNING id INTO thingdataid; >> INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES >> (thingdataid,'013086',0,1502970401,'FOO'); >> >> <repeated for each thing> >> >> END $$; >> >> Should I limit the number of 'thing' inserts within a DO block or >> wrapping each 'thing' insert in it's own DO block? > > I would suggest getting rid of the do block entirely if that is > possible. Just create lots of insert statements. You can get the current > value of a sequence with currval('sequence_name'). > > Alternately or in addition, since you are using python, you might want > to insert directly into the database from python using psycopg2. For > separate insert statements that should have about the same performance. > (It is usually much faster to write to a csv file and load that with > copy than to insert each row, but you don't do that and it might be > difficult in your case). Once upon a time, Slony-I used to use INSERT/UPDATE/DELETE to do all of its work, so that the replication stream consisted of gigantic numbers of INSERT/UPDATE/DELETE statements that had to be parsed and processed. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On 30 November 2017 at 05:22, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2017-11-29 08:32:02 -0600, Ted Toth wrote: >> Yes I did generate 1 large DO block: >> >> DO $$ >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint; >> BEGIN >> INSERT INTO thing >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec >> INTO thingid,thingrec; >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); >> INSERT INTO thingstatus >> (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); >> INSERT INTO thinger >> (thing_id,spe_key,cse_unc,lat_spd,cov,dtg,lng,spd,ave_spd,cse,tol,nrpts,lat,alpha,sigma,spd_unc,lng_spd) >> VALUES (thingid,-1,0.0,-6.58197336634e-08,'{4.27624291532e-09,0.0,3.07802916488e-09,0.0,4.27624291532e-09,0.0,3.07802916488e-09,4.16110417234e-08,0.0,4.16110417234e-08}',1509459163,2.21596980095,0.000226273215958,1.0,0.0,0.10000000149,121,0.584555745125,10.0,4.23079740131e-08,0.0,-2.49999881907e-10); >> INSERT INTO thingdata >> (thing_id,category,db_num,xref,org_type,trademark,shortname,fcode,platform,callsign,type,orig_xref,shipclass,home_base,uic,service,di,lngfixed,hull,precision,alert,flag,besufx,name,mmsi,catcode,ntds,imo,pn_num,chxref,threat,sconum,latfixed,db_type,pif,echelon,jtn,quantity,overwrite) >> VALUES (thingid,'XXX','','','','','004403704','23','','','','','UNEQUATED','','','','',0.0,'','{0,0,0,0,0}','','KS','','UNKNOWN','004403704','','','','','','AFD','',0.0,3,'','',0,0,0) >> RETURNING id INTO thingdataid; >> INSERT INTO thingnum (thingdata_id,thgnum,state,dtg,cmd) VALUES >> (thingdataid,'013086',0,1502970401,'FOO'); >> >> <repeated for each thing> >> >> END $$; >> >> Should I limit the number of 'thing' inserts within a DO block or >> wrapping each 'thing' insert in it's own DO block? > > I would suggest getting rid of the do block entirely if that is > possible. Just create lots of insert statements. You can get the current > value of a sequence with currval('sequence_name'). > > Alternately or in addition, since you are using python, you might want > to insert directly into the database from python using psycopg2. For > separate insert statements that should have about the same performance. > (It is usually much faster to write to a csv file and load that with > copy than to insert each row, but you don't do that and it might be > difficult in your case). A long time ago (pre version 2.2), Slony-I used to do this, having a big stream of INSERT, UPDATE, and DELETE statements. This seemed pretty OK when tuples weren't very large, but we did get some reports in from people with applications with very large tuples that would run into out-of-memory conditions if a bunch of INSERTs involving very large tuples were processed together. The "fix" we initially did was to have the bulk processing stream hold off on any Very Large Tuples, so that when we'd encounter them, we'd process the big tuples one by one. This worked relatively OK, but meant that we'd be switching logic every time there was big data, and there was a pretty painful amount of configuration to force people to worry about. The "2.2" change was to switch to COPY-based streaming. In our case, we put a trigger onto the single log table and have that initiate tossing data as it came in into the appropriate target table. That introduced an Amount Of Complexity (e.g. - a rather complex stored function in C/SPI), but it's notable that we got a pretty big performance boost as well as complete elimination of memory allocation worries out of turning the process into COPY streaming. It may be challenging to get a suitable set of COPY requests to do what is necessary. (It sure would be cool if it could just be one...) But there are three crucial things I'd observe: a) Loading data via COPY is *way* faster, b) Parsing INSERT statements is a *big* slowdown, c) Parsing INSERT statements means that those statements must be drawn into memory, and that chews a lot of memory if the query has very large attributes. (COPY doesn't chew that memory!) At one point, Jan Wieck was looking into an alternative COPY protocol that would have allowed more actions, notably: - You could specify multiple tables to stream into, and switch between them on a tuple by tuple basis. - You could specify actions of INSERT, UPDATE, or DELETE. It seemed like a really interesting idea at the time; it was intended to be particularly useful for Slony, but some folks designing data warehouse ETL systems observed that it would be useful to them, too. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
On Thu, Nov 30, 2017 at 11:40 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2017-11-30 08:43:32 -0600, Ted Toth wrote: >> Date: Thu, 30 Nov 2017 08:43:32 -0600 >> From: Ted Toth <txtoth@gmail.com> >> To: "Peter J. Holzer" <hjp-pgsql@hjp.at> >> Cc: pgsql-general@lists.postgresql.org >> Subject: Re: large numbers of inserts out of memory strategy >> >> On Thu, Nov 30, 2017 at 4:22 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: >> > On 2017-11-29 08:32:02 -0600, Ted Toth wrote: >> >> Yes I did generate 1 large DO block: >> >> >> >> DO $$ >> >> DECLARE thingid bigint; thingrec bigint; thingdataid bigint; >> >> BEGIN >> >> INSERT INTO thing >> >> (ltn,classification,machine,source,thgrec,flags,serial,type) VALUES >> >> ('T007336','THING',0,1025,7336,7,'XXX869977564',1) RETURNING id,thgrec >> >> INTO thingid,thingrec; >> >> INSERT INTO recnum_thing (recnum,thing_id) VALUES (thingrec,thingid); >> >> INSERT INTO thingstatus >> >> (thing_id,nrpts,rmks_cs,force_type_id,ftn_cs,force_code,arr_cs,mask,toi_state,plot_id,signa_cs,lastchange,des_cs,rig_cs,ownship,correlation,maxrpts,rtn_cs,ctc_cs,group_mask,dep_cs) >> >> VALUES (thingid,121,'{0,0,0,0}',440,0,23,0,0,0,'{23,-1,3803,3805,-1,-1,0,6}',0,1509459164,0,0,0,0,1000,0,0,0,0); > [...] >> >> >> >> <repeated for each thing> >> >> >> >> END $$; >> >> >> >> Should I limit the number of 'thing' inserts within a DO block or >> >> wrapping each 'thing' insert in it's own DO block? >> > >> >> Thanks for the specific suggestions. >> >> > I would suggest getting rid of the do block entirely if that is >> > possible. Just create lots of insert statements. You can get the current >> > value of a sequence with currval('sequence_name'). >> >> What is the downside of using a DO block? > > As you discovered, the downside is that the whole do block needs > to be parsed before it can be executed. If you generate a huge do block, > that can be an issue. If you create lots of little do blocks, the > overhead is probably not noticable. > > But I'd like to pose the question the other way around: What are the > advantages of using a do block? If there aren't any, I wouldn't use it > (keep it simple). One advantage is that you can use variables. But I > don't think you need that. Other possible advantages could be speed or > transactions: I don't know if these exist. > >> I'd have to do a nextval on each sequence before I could use currval, >> right? > > That happens automatically if you insert the default value into a serial > column. You seem to do that here: > >> >> INSERT INTO thing (ltn,classification,machine,source,thgrec,flags,serial,type) >> >> VALUES ('T007336','THING',0,1025,7336,7,'XXX869977564',1) >> >> RETURNING id,thgrec INTO thingid,thingrec; > > The column id is probably defined as serial (or explicitely associated > with a sequence). So you can get the value you just inserted with > currval('thing_id_seq') (Check the table definition to get the sequence > name, but unless you explicitely specified the sequence name that's > it). > > So you could ditch the "RETURNING ..." clause and replace the next > statement with: > > INSERT INTO recnum_thing (recnum, thing_id) VALUES (7336, currval('thing_id_seq')); I only did the DO to get the DECLARE of the variables so I'll change it to use this method instead, thanks. > > And similar for the other statements where you use thingid and > thingdataid (thingrec is already known as far as i can see). > > Note that this doesn't work if you need two values from the same > sequence. So you can't replace > > insert into thing(..) values(...) returning id as parent_id; > insert into thing(..) values(...) returning id as child_id; > insert into structure(parent, child) values(parent_id, child_id); > > in the same way. But you don't seem to need anything like that. > > >> One thing that is unclear to me is when commits occur while using psql >> would you know where in the docs I can find information on this >> subject? > > By default psql enables autocommit which causes an implicit commit after > every statement. With a do block I'm not sure whether that means after > the do block or after each statement within the do block. I'd just turn > autocommit off and add explicit commits wherever I wanted them. So you think I can turn off autocommit and put BEGIN/COMMITs in a large file and then postmaster won't have to parse the whole thing when I feed it to it via psql? I did change my generator to create multiple files with limited numbers of inserts. Somebody had mentioned by chunking it this way it would be easier to know when and where if something went wrong which I thought was a good idea. > > >> > Alternately or in addition, since you are using python, you might want >> > to insert directly into the database from python using psycopg2. For >> > separate insert statements that should have about the same performance. >> > (It is usually much faster to write to a csv file and load that with >> > copy than to insert each row, but you don't do that and it might be >> > difficult in your case). >> >> Yes, I thought about generating csv files but didn't see a way to deal >> with the foreign keys. > > Still: Is there a reason why you use a python script to create an sql > script instead of directly issuing the sql queries from your python > script? I already had code that generated JSON so it was relatively easy to add code and a cmd line arg to generate SQL instead. Theoretically this will be a one time offline operation and performance is not a big factor. Also my target machine will be significantly beefier. > > hp > > -- > _ | Peter J. Holzer | we build much bigger, better disasters now > |_|_) | | because we have much more sophisticated > | | | hjp@hjp.at | management tools. > __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
On Thu, 30 Nov 2017 08:43:32 -0600 Ted Toth <txtoth@gmail.com> wrote: > What is the downside of using a DO block? I'd have to do a nextval on > each sequence before I could use currval, right? Or I could do 'select > last_value from <sequence>'. You are creating a piece of code that has to be parsed, tokenized, and compiled prior to execution. What's biting you is that you've created a function the size of your dataset. If you like do-blocks then write a short block to insert one record using placeholders and call it a few zillion times. That or (in DBI-speak): eval { $dbh->{ RaiseError } = 1; $dbh->{ AutoCommit } = 0; my $sth = $dbh->prepare ( 'insert into yourtable ( field field ) values ( $1, $2 )' ); $sth->do( @$_ ) for @rows; $dbh->commit } or die "Failed execution: $@"; which will be nearly as effecient in the long run. That or just import the data from a csv/tsv (there are good examples of data import available in the PG docs). -- Steven Lembark 1505 National Ave Workhorse Computing Rockford, IL 61103 lembark@wrkhors.com +1 888 359 3508
On 2017-11-30 14:27:58 -0600, Ted Toth wrote: > On Thu, Nov 30, 2017 at 11:40 AM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2017-11-30 08:43:32 -0600, Ted Toth wrote: > >> One thing that is unclear to me is when commits occur while using psql > >> would you know where in the docs I can find information on this > >> subject? > > > > By default psql enables autocommit which causes an implicit commit after > > every statement. With a do block I'm not sure whether that means after > > the do block or after each statement within the do block. I'd just turn > > autocommit off and add explicit commits wherever I wanted them. > > So you think I can turn off autocommit and put BEGIN/COMMITs in a > large file and then postmaster won't have to parse the whole thing > when I feed it to it via psql? No, if you still had one giant do block it would need to parse it completely. I was thinking of the case where you have many small do blocks, each with a few insert statements (one do block per "thing"). In this case it would parse and execute each do block before moving on to the next. But I wasn't sure whether the default autocommit would mean one commit after each do block or one commit after each insert statement (David answered this - thanks), and in any case you probably wouldn't want to commit after each "thing", so I suggested turning autocommit off and adding an explicit commit at the end or possibly after every nth thing. > > Still: Is there a reason why you use a python script to create an sql > > script instead of directly issuing the sql queries from your python > > script? > > I already had code that generated JSON so it was relatively easy to > add code and a cmd line arg to generate SQL instead. Ok. that sounds like a good reason. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>