Thread: Function written in C, hangs on one machine and not another...

Function written in C, hangs on one machine and not another...

From
CG
Date:
PostgreSQL 7.4 ...

Essentially, I've written a function in C for use with PostgreSQL. The debugger
shows that the program is hanging on the part of the program that is writing
data into it's own STDIN.

[snip]

  // Open up and hijack STDIN
  int pipe_pair[2];
  int pipe_rv = pipe(pipe_pair);
  if (pipe_rv != 0)
  // Abort! Abort!
  {
    close(pipe_pair[1]);
    pfree(param_1);
    pfree(param_2);
    PG_RETURN_NULL();
  }

  int newfd = dup2(pipe_pair[0],STDIN_FILENO);
  if (newfd != 0)
  // Abort! Abort!
  {
    close(pipe_pair[1]);
    pfree(param_1);
    pfree(param_2);
    PG_RETURN_NULL();
  }

  // Write param_1 to hijacked pipe
  write(pipe_pair[1], param_1, param_1_len); // Hangs here...

[/snip]

It works on the machine I use for testing from within PostgreSQL, but it
doesn't work on the machine which is the production server. I'd hate for this
to matter, but I ought to disclose that testing machine is a 1-way AMD Box with
a more recent version of the Linux 2.6 kernel, and a more recent version of
libc. The production machine is a 2-way Dell Xeon processor. Same version of
PostgreSQL, compiled with the same flags (except with debugging symbols for the
testing machine). You'd, or at least I would, think simple code like this would
compile and run on multiple platforms...

I can perform the same STDIN hijacking on both machines in a standalone
program, but it fails under PostgreSQL.

I'm completely stumped, and I need YOUR insight! Thank you!!

CGV



__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

Re: Function written in C, hangs on one machine and not another...

From
Martijn van Oosterhout
Date:
On Fri, Oct 28, 2005 at 06:38:29AM -0700, CG wrote:
> PostgreSQL 7.4 ...
>
> Essentially, I've written a function in C for use with PostgreSQL. The debugger
> shows that the program is hanging on the part of the program that is writing
> data into it's own STDIN.

Umm, what *are* you trying to do? Is this running in the backend?

Firstly, depending on the saize of param_1, the write will block
because it can't write all of it (usually PIPE_BUF). Perhaps recent
kernel versions have changed to make it so no data is accepted until a
reader appears even if the data is smaller than that.

Since apparently you want the read to happen in the same process as the
write, you've just deadlocked yourself. The write won't happen till
someone reads, and the read won't happen because you're stuck
writing...

Finally, this is insane, why would you want to change STDIN?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment
--- Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Fri, Oct 28, 2005 at 06:38:29AM -0700, CG wrote:
>
> Umm, what *are* you trying to do? Is this running in the backend?

Yes, running on the back-end. I'm trying to utilize Adobe's FDF toolkit to
parse the FDF files stored in my database. They distirubte a C-Library that can
be used to parse FDF files.

> Firstly, depending on the saize of param_1, the write will block
> because it can't write all of it (usually PIPE_BUF). Perhaps recent
> kernel versions have changed to make it so no data is accepted until a
> reader appears even if the data is smaller than that.
>
> Since apparently you want the read to happen in the same process as the
> write, you've just deadlocked yourself. The write won't happen till
> someone reads, and the read won't happen because you're stuck
> writing...

So it might be a kernel thing. What is different when the function is called
from within PostgreSQL that is different that the function being called in a
standalone program?

> Finally, this is insane, why would you want to change STDIN?

Insanity? I agree completely. The major issue is that the FDF Toolkit has only
one function for reading in FDF Data:

/*
  FDFOpen: Reads an FDF file into memory. Client should call FDFClose() when
  the FDF is no longer needed. Parameters:

  - fileName: Complete pathname (in Host encoding), or  "-" to read from stdin.
  - howMany: If fileName specifies stdin, then howMany should indicate the
        number of characters to read. Otherwise, it is unused. In a web server
        environment, this is available as the value of the CONTENT_LENGTH
        environment variable. In some servers executing cgi-bin scripts, if the
        script tries to read stdin until an EOF is reached, the script hangs.
        Thus this parameter.
  - pTheFDF: If FDFOpen() returns FDFErcOK, then pTheFDF will point to an
    FDFDoc, which is needed for most other calls in the API.
  - Error codes: FDFErcBadParameter, FDFErcFileSysErr, FDFErcBadFDF,
        FDFErcInternalError
*/
FDFLIBAPI FDFErc FDFOpen(const char* fileName, ASInt32 howMany, FDFDoc*
pTheFDF);

There's no other way to load data into the toolkit! (Can you /feel/ the
insanity?)

Does this give you any more insight into an alternate method of getting this
thing done?





__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs

Re: Function written in C, hangs on one machine and not another...

From
Douglas McNaught
Date:
CG <cgg007@yahoo.com> writes:

> Does this give you any more insight into an alternate method of getting this
> thing done?

I would fork(), set up file descriptors appropriately, then have the
child call the Adobe library and the parent feed the data to it.
Once the document is loaded in the child, do whatever processing you
need to, then pass the results back to the parent via stdout or a
temporary file.

Ugly, but probably the most robust way to do it.  Make sure you don't
call any PG-internal functions in the child process, as that will
confuse things badly.

-Doug

Re: Function written in C, hangs on one machine and not another...

From
Martijn van Oosterhout
Date:
On Fri, Oct 28, 2005 at 07:24:12AM -0700, CG wrote:
> So it might be a kernel thing. What is different when the function is called
> from within PostgreSQL that is different that the function being called in a
> standalone program?

Not entirely sure, but I'm sure the size of the write matters. For
example, if your test rpogram, did you check that the write actually
wrote everything?

> Insanity? I agree completely. The major issue is that the FDF Toolkit has only
> one function for reading in FDF Data:

<snip>

Firstly, instead of using stdin, you can pass /dev/fd/<file descriptor>
as the filename (on Linux). This avoids stuffing with stdin.

That doesn't solve the blocking problem. To do that you really need
multiple threads of execution, so either fork or threads, neither of
which are really supported.

ISTM the best idea: write the data to disk then read it back. Why be
difficult when you can do it easily...

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment
--- Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Fri, Oct 28, 2005 at 07:24:12AM -0700, CG wrote:
> Not entirely sure, but I'm sure the size of the write matters. For
> example, if your test rpogram, did you check that the write actually
> wrote everything?

There's beginning and ending tokens in the FDF filespec. The toolkit complains
if the data isn't all there...

> Firstly, instead of using stdin, you can pass /dev/fd/<file descriptor>
> as the filename (on Linux). This avoids stuffing with stdin.

That's a FANTASTIC idea. I'll give it a go. We'll cross our fingers, hold our
breath, and hope that the blocking issue evaporates. :)

> ISTM the best idea: write the data to disk then read it back. Why be
> difficult when you can do it easily...

I was never supposed to have to do this sort of thing. The idea was never to
pull individual peices of data out of the FDFs. Now, the bosses say I have to
do some usage analysis, and the data is locked up tight in an FDF. I suppose I
could write 1000000+ files to disk and read them back off and then delete them.
At the time, that seemed more insane to me than trying to pump data into stdin.
I'm not so sure anymore.......... :)





__________________________________
Start your day with Yahoo! - Make it your home page!
http://www.yahoo.com/r/hs

Re: Function written in C, hangs on one machine and not another...

From
Dennis Jenkins
Date:
--- CG <cgg007@yahoo.com> wrote:
>
> There's no other way to load data into the toolkit!
> (Can you /feel/ the
> insanity?)
>
> Does this give you any more insight into an
> alternate method of getting this
> thing done?
>

Write a completely seperate process to process your
FDF stuff.  Have this new process expose a
communicastions channel (message queues, sockets,
shared memory, etc...).  Write your PostgreSQL 'C'
function to use this channel.

You'll get almost complete seperation and the ability
to debug each piece independant of the other.  You can
write stubs for both ends: a fake server for testing
the PostgreSQL part, and a fake "client" for testing
the daemon that you wrote.


Dennis Jenkins

Re: Function written in C, hangs on one machine and not another...

From
Dennis Jenkins
Date:

--- Douglas McNaught <doug@mcnaught.org> wrote:

> CG <cgg007@yahoo.com> writes:
>
> > Does this give you any more insight into an
> alternate method of getting this
> > thing done?
>
> I would fork(), set up file descriptors
> appropriately, then have the
> child call the Adobe library and the parent feed the
> data to it.
> Once the document is loaded in the child, do
> whatever processing you
> need to, then pass the results back to the parent
> via stdout or a
> temporary file.
>
> Ugly, but probably the most robust way to do it.
> Make sure you don't
> call any PG-internal functions in the child process,
> as that will
> confuse things badly.
>

Is it safe for the postgres engine to fork()?  Would
the child need to close down anything immediately in
its main() to avoid corrupting the parent?


Dennis Jenkins

Re: Function written in C, hangs on one machine and not another...

From
Douglas McNaught
Date:
Dennis Jenkins <dennis.jenkins@sbcglobal.net> writes:

> Is it safe for the postgres engine to fork()?  Would
> the child need to close down anything immediately in
> its main() to avoid corrupting the parent?

I *think* (Tom may correct me) that as long as you don't call into the
backend code at all in the child process, and don't write to any file
descriptors other than (properly set-up) stdin and stdout, you'd be
OK.  The safest thing to do would be to exec() a separate binary that
does the parsing, but that would incur an additional performace
penalty.

-Doug

Re: Function written in C, hangs on one machine and not another...

From
Martijn van Oosterhout
Date:
On Fri, Oct 28, 2005 at 11:59:03AM -0400, Douglas McNaught wrote:
> Dennis Jenkins <dennis.jenkins@sbcglobal.net> writes:
>
> > Is it safe for the postgres engine to fork()?  Would
> > the child need to close down anything immediately in
> > its main() to avoid corrupting the parent?
>
> I *think* (Tom may correct me) that as long as you don't call into the
> backend code at all in the child process, and don't write to any file
> descriptors other than (properly set-up) stdin and stdout, you'd be
> OK.  The safest thing to do would be to exec() a separate binary that
> does the parsing, but that would incur an additional performace
> penalty.

The things that have screwed me up in the past with pulling tricks like
this are:

1. Program has registered atexit() handlers. _exit() avoids this.
2. Pending stdio output that gets flushed. The backend doesn't use
stdio much so you might be fine here.
3. Signals. Make sure you don't get sent signals that screw state.
Might be wise to block them all, or reset them all to default.

Truly, exec() is the cleanest way to solve all this, it simply replaces
the current process, lock, stock and barrel.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Function written in C, hangs on one machine and not another...

From
Douglas McNaught
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:

> The things that have screwed me up in the past with pulling tricks like
> this are:
>
> 1. Program has registered atexit() handlers. _exit() avoids this.
> 2. Pending stdio output that gets flushed. The backend doesn't use
> stdio much so you might be fine here.
> 3. Signals. Make sure you don't get sent signals that screw state.
> Might be wise to block them all, or reset them all to default.
>
> Truly, exec() is the cleanest way to solve all this, it simply replaces
> the current process, lock, stock and barrel.

Definitely.  It would probably also be good to close all file
descriptors (except for stdin/etdout/stderr) before exec(), just in
case the other binary does something screwy with random file
descriptors (which it obviously shouldn't).

-Doug

Thanks to the great suggestions I've at least gotten it to not hang...

Martijn's hint about blocking led me to open up those filehandles in a
non-blocking mode. It appears that write() will write, at a maximum, only 4096
bytes when it is called from within PostgreSQL. I've tried to push data into it
in <=4096-byte slugs, but after 4096 bytes it just won't take anymore. Since (I
think) using a non-blocking mode could cause problems with thread safety, it's
probably a lost cause.

I'm new to C, so this may seem extremely naive: I'm not sure how to use exec()
to solve this problem. Could you give me a few pointers to get me started?



--- Douglas McNaught <doug@mcnaught.org> wrote:

> Martijn van Oosterhout <kleptog@svana.org> writes:
>
> > The things that have screwed me up in the past with pulling tricks like
> > this are:
> >
> > 1. Program has registered atexit() handlers. _exit() avoids this.
> > 2. Pending stdio output that gets flushed. The backend doesn't use
> > stdio much so you might be fine here.
> > 3. Signals. Make sure you don't get sent signals that screw state.
> > Might be wise to block them all, or reset them all to default.
> >
> > Truly, exec() is the cleanest way to solve all this, it simply replaces
> > the current process, lock, stock and barrel.
>
> Definitely.  It would probably also be good to close all file
> descriptors (except for stdin/etdout/stderr) before exec(), just in
> case the other binary does something screwy with random file
> descriptors (which it obviously shouldn't).
>
> -Doug
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>






__________________________________
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

Re: Function written in C, hangs on one machine and not another...

From
Douglas McNaught
Date:
CG <cgg007@yahoo.com> writes:

> Thanks to the great suggestions I've at least gotten it to not hang...
>
> Martijn's hint about blocking led me to open up those filehandles in
> a non-blocking mode. It appears that write() will write, at a
> maximum, only 4096 bytes when it is called from within
> PostgreSQL. I've tried to push data into it in <=4096-byte slugs,
> but after 4096 bytes it just won't take anymore. Since (I think)
> using a non-blocking mode could cause problems with thread safety,
> it's probably a lost cause.

It's not a thread safety issue--it's more that non-blocking I/O is
quite complicated to do properly.

> I'm new to C, so this may seem extremely naive: I'm not sure how to use exec()
> to solve this problem. Could you give me a few pointers to get me started?

The basic scheme would be:

* Write a standalone program that reads from stdin, does the
  processing/analysis using the Adobe library, and writes its results
  to stdout in some kind of easily-parseable format.
* In your backend function, create two pipes (one for input and one
  for output), call fork(), close the appropriate file descriptors in
  the parent and child, dup2() the pipe descriptors in the child onto
  stdin and stdout, then:
* In the child process, exec() your standalone program.
* In the parent process, write all the data to the output pipe, then
  read the results back from the input pipe.  (This can be problematic
  in the general case, but in yours it should be OK).
* When the child process (your program) finishes, you'll get an EOF on
  the input descriptor in the parent processs.  Close the input and
  output pipes and do whatever you're supposed to do with the data you
  read.

If the above doesn't make sense to you, you need to read up on Unix
programming.  There are some good books on it, and I'm sure there's
lots of stuff on the web.  The fork()/exec() pattern is very standard
stuff, but it's a little tricky to get at first.

This is not going to be tremendously efficient, but given the crap
library you have to deal with, it's the safest way.

-Doug