Thread: postgreSQL multithreading
Hi all, I was trying to find some way to implement multithreading into my postgreSQL stored functions. The thing is, that I have data stored in multiple tables - for each day one table - and I want to write a function which selects data from these tables and stores them into files (or just returns the data in cursor); Information about data stored in these tables are in another table defined like this: partitions_daily(from_date timestamp, to_date timestamp, table_name) From_date and to_date tells in which table the requested data should be found. In this time I have function through which I can get the data, the problem is that everything is processed with only one CPU core. I think it should be very easy to make this model to be multithreaded and use more CPU cores. I was trying to implement this with plperlU but there is problem with using spi in multiple threads. There is also possibility to do multithreading with some external scripts to get the data, but this is no exactly what I want. Can someone please help me with this? How can I force postgreSQL to use multiple CPU cores to run my function. If somebody can give me an advice or post some simple peace of code It would be great. Thank you for all replies, Lukas Houf
postgre@seznam.cz <postgre@seznam.cz> schrieb: > Hi all, > > I was trying to find some way to implement multithreading into my postgreSQL stored functions. > > The thing is, that I have data stored in multiple tables - for each day one table - and I want to write > a function which selects data from these tables and stores them into files (or just returns the data in cursor); > Information about data stored in these tables are in another table defined like this: > > partitions_daily(from_date timestamp, to_date timestamp, table_name) > > From_date and to_date tells in which table the requested data should be found. > > In this time I have function through which I can get the data, the problem is that everything is processed with only > one CPU core. I think it should be very easy to make this model to be multithreaded and use more CPU cores. I think, the only way to achieve this is to use multiple connections to the database. One connection to select the table_name and than, for every table_name, a own connection to the database to select and write the data to files. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
> ------------ Původní zpráva ------------ > Od: Andreas Kretschmer <akretschmer@spamfence.net> > Předmět: Re: [GENERAL] postgreSQL multithreading > Datum: 30.3.2008 17:39:02 > ---------------------------------------- > postgre@seznam.cz <postgre@seznam.cz> schrieb: > > > Hi all, > > > > I was trying to find some way to implement multithreading into my postgreSQL > stored functions. > > > > The thing is, that I have data stored in multiple tables - for each day one > table - and I want to write > > a function which selects data from these tables and stores them into files (or > just returns the data in cursor); > > Information about data stored in these tables are in another table defined > like this: > > > > partitions_daily(from_date timestamp, to_date timestamp, table_name) > > > > From_date and to_date tells in which table the requested data should be > found. > > > > In this time I have function through which I can get the data, the problem is > that everything is processed with only > > one CPU core. I think it should be very easy to make this model to be > multithreaded and use more CPU cores. > > I think, the only way to achieve this is to use multiple connections to > the database. One connection to select the table_name and than, for > every table_name, a own connection to the database to select and write > the data to files. > > What about some C dynamically loaded function in which I could call new connection for each thread? Lukas Houf
postgre@seznam.cz wrote: > > What about some C dynamically loaded function in which I could call new connection for each thread? > Past discussion here suggests that the backends are strictly single threaded. While you might be able to use multiple threads - I don't know - I expect you'd need to protect all SPI access by a lock that serialized everything anyway. Doing it externally with a script / program that uses multiple connections might just be the way. Unfortunately that means that you don't get a single consistent snapshot - each connection will have its own, potentially different, view of the database state. A possible use for read only transactions being able to share a snapshot came up in discussion a few weeks ago. I guess this is another one. -- Craig Ringer
am Sun, dem 30.03.2008, um 18:18:45 +0200 mailte postgre@seznam.cz folgendes: > What about some C dynamically loaded function in which I could call new connection for each thread? Maybe a solution. Please, learn to quote. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
A. Kretschmer <andreas.kretschmer@schollglas.com> wrote: > Maybe a solution. > > Please, learn to quote. Sorry for the quoting. I guess this probably won't be OK too :-) Strange thing about this is that there is no simple way to use more CPUs for one function. It can be done with external functions but why there is nothing similar for internal functions? There is probably no need for such solution on systems with many users who will allocate new process for each connection, but in my case where only few users will work with the db server, the CPUs are unused. Well I hope I will succeed with those C functions. Lukas Houf
Craig Ringer <craig@postnewspapers.com.au> wrote: > Past discussion here suggests that the backends are strictly single > threaded. While you might be able to use multiple threads - I don't know > - I expect you'd need to protect all SPI access by a lock that > serialized everything anyway. > > Doing it externally with a script / program that uses multiple > connections might just be the way. Unfortunately that means that you > don't get a single consistent snapshot - each connection will have its > own, potentially different, view of the database state. > > A possible use for read only transactions being able to share a snapshot > came up in discussion a few weeks ago. I guess this is another one. > > -- > Craig Ringer I fill tables with observation data, so the tables don't change except the one for current day. So I don't need to care about different states of database. From what has been posted I think that C function can do the work for me. But I would still appreciate some peace of code from which I can figure out how this can be done, because I'm new to database programing. Thank you Lukas Houf
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of postgre@seznam.cz > Sent: Sunday, March 30, 2008 10:13 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] postgreSQL multithreading > > Hi all, > > I was trying to find some way to implement multithreading into my > postgreSQL stored functions. > You can use pgAgent to submit a job for each thread. Jon
Roberts Jon <Jon.Roberts@asurion.com> wrote: > > > > Hi all, > > > > I was trying to find some way to implement multithreading into my > > postgreSQL stored functions. > > > > You can use pgAgent to submit a job for each thread. > > > Jon Well, that is also an interesting idea. I will definetely consider it when writing the function. Thanks a lot. Lukas Houf