Thread: postgreSQL multithreading

postgreSQL multithreading

From
postgre@seznam.cz
Date:
Hi all,

I was trying to find some way to implement multithreading into my postgreSQL stored functions.

The thing is, that I have data stored in multiple tables - for each day one table - and I want to write
a function which selects data from these tables and stores them into files (or just returns the data in cursor);
Information about data stored in these tables are in another table defined like this:

partitions_daily(from_date timestamp, to_date timestamp, table_name)

From_date and to_date tells in which table the requested data should be found.

In this time I have function through which I can get the data, the problem is that everything is processed with only
one CPU core. I think it should be very easy to make this model to be multithreaded and use more CPU cores.

I was trying to implement this with plperlU but there is problem with using spi in multiple threads. There is also
possibility to do multithreading with some external scripts to get the data, but this is no exactly what I want.

Can someone please help me with this? How can I force postgreSQL to use multiple CPU cores to run my function.
If somebody can give me an advice or post some simple peace of code It would be great.

Thank you for all replies,

Lukas Houf

Re: postgreSQL multithreading

From
Andreas Kretschmer
Date:
postgre@seznam.cz <postgre@seznam.cz> schrieb:

> Hi all,
>
> I was trying to find some way to implement multithreading into my postgreSQL stored functions.
>
> The thing is, that I have data stored in multiple tables - for each day one table - and I want to write
> a function which selects data from these tables and stores them into files (or just returns the data in cursor);
> Information about data stored in these tables are in another table defined like this:
>
> partitions_daily(from_date timestamp, to_date timestamp, table_name)
>
> From_date and to_date tells in which table the requested data should be found.
>
> In this time I have function through which I can get the data, the problem is that everything is processed with only
> one CPU core. I think it should be very easy to make this model to be multithreaded and use more CPU cores.

I think, the only way to achieve this is to use multiple connections to
the database. One connection to select the table_name and than, for
every table_name, a own connection to the database to select and write
the data to files.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: [GENERAL] postgreSQL multithreading

From
postgre@seznam.cz
Date:
> ------------ Původní zpráva ------------
> Od: Andreas Kretschmer <akretschmer@spamfence.net>
> Předmět: Re: [GENERAL] postgreSQL multithreading
> Datum: 30.3.2008 17:39:02
> ----------------------------------------
> postgre@seznam.cz <postgre@seznam.cz> schrieb:
>
> > Hi all,
> >
> > I was trying to find some way to implement multithreading into my postgreSQL
> stored functions.
> >
> > The thing is, that I have data stored in multiple tables - for each day one
> table - and I want to write
> > a function which selects data from these tables and stores them into files (or
> just returns the data in cursor);
> > Information about data stored in these tables are in another table defined
> like this:
> >
> > partitions_daily(from_date timestamp, to_date timestamp, table_name)
> >
> > From_date and to_date tells in which table the requested data should be
> found.
> >
> > In this time I have function through which I can get the data, the problem is
> that everything is processed with only
> > one CPU core. I think it should be very easy to make this model to be
> multithreaded and use more CPU cores.
>
> I think, the only way to achieve this is to use multiple connections to
> the database. One connection to select the table_name and than, for
> every table_name, a own connection to the database to select and write
> the data to files.
>
>
What about some C dynamically loaded function in which I could call new connection for each thread?

Lukas Houf

Re: Re: [GENERAL] postgreSQL multithreading

From
Craig Ringer
Date:
postgre@seznam.cz wrote:
>
> What about some C dynamically loaded function in which I could call new connection for each thread?
>
Past discussion here suggests that the backends are strictly single
threaded. While you might be able to use multiple threads - I don't know
- I expect you'd need to protect all SPI access by a lock that
serialized everything anyway.

Doing it externally with a script / program that uses multiple
connections might just be the way. Unfortunately that means that you
don't get a single consistent snapshot - each connection will have its
own, potentially different, view of the database state.

A possible use for read only transactions being able to share a snapshot
came up in discussion a few weeks ago. I guess this is another one.

--
Craig Ringer

Re: Re: [GENERAL] postgreSQL multithreading

From
"A. Kretschmer"
Date:
am  Sun, dem 30.03.2008, um 18:18:45 +0200 mailte postgre@seznam.cz folgendes:
> What about some C dynamically loaded function in which I could call new connection for each thread?

Maybe a solution.

Please, learn to quote.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: [GENERAL] Re: [GENERAL] postgreSQL multithreading

From
postgre@seznam.cz
Date:
A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:

> Maybe a solution.
>
> Please, learn to quote.


Sorry for the quoting. I guess this probably won't be OK too :-)

Strange thing about this is that there is no simple way to use
more CPUs for one function. It can be done with external functions
but why there is nothing similar for internal functions? There is
probably no need for such solution on systems with many
users who will allocate new process for each connection,
but in my case where only few users will work with the db
server, the CPUs are unused.

Well I hope I will succeed with those C functions.

Lukas Houf


Re: [GENERAL] Re: [GENERAL] postgreSQL multithreading

From
postgre@seznam.cz
Date:
Craig Ringer <craig@postnewspapers.com.au>  wrote:

> Past discussion here suggests that the backends are strictly single
> threaded. While you might be able to use multiple threads - I don't know
> - I expect you'd need to protect all SPI access by a lock that
> serialized everything anyway.
>
> Doing it externally with a script / program that uses multiple
> connections might just be the way. Unfortunately that means that you
> don't get a single consistent snapshot - each connection will have its
> own, potentially different, view of the database state.
>
> A possible use for read only transactions being able to share a snapshot
> came up in discussion a few weeks ago. I guess this is another one.
>
> --
> Craig Ringer

I fill tables with observation data, so the tables don't change except the
one for current day. So I don't need to care about different states
of database.

From what has been posted I think that C function can do the work for me.
But I would still appreciate some peace of code from which I can figure
out how this can be done, because I'm new to database programing.

Thank you

Lukas Houf

Re: postgreSQL multithreading

From
"Roberts, Jon"
Date:

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of postgre@seznam.cz
> Sent: Sunday, March 30, 2008 10:13 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] postgreSQL multithreading
>
> Hi all,
>
> I was trying to find some way to implement multithreading into my
> postgreSQL stored functions.
>

You can use pgAgent to submit a job for each thread.


Jon

Re: [GENERAL] postgreSQL multithreading

From
postgre@seznam.cz
Date:
Roberts Jon <Jon.Roberts@asurion.com>  wrote:

> >
> > Hi all,
> >
> > I was trying to find some way to implement multithreading into my
> > postgreSQL stored functions.
> >
>
> You can use pgAgent to submit a job for each thread.
>
>
> Jon


Well, that is also an interesting idea. I will definetely consider it when writing the function.

Thanks a lot.

Lukas Houf