Discussion:
[SR-Users] Child process exited by a signal 11
Allen Zhang
2014-03-19 22:50:26 UTC
Permalink
Hi,

Scenario:
Our kamailio server normally has a debug level of 2.
The server gets a segmentation fault and dies when we run a certain demo. The error message is:
"[4097]: ALERT: <core> [main.c:789]: child process 4098 exited by a signal 11"
And we got a lot of these before it dies:
"error reading: Connection reset by peer (104)"
"ERROR: tcp_read_req: error reading"
Because of the low debug level, I have no idea of what child process 4098 was doing before it died.

Here is the wired bit:
The problem goes away if I set debug level to 5.
But always occurs at debug level 2.

Two questions:
1, I noticed that the main thread was 4097 and 4098 died. what is the child process created straight after main? My guess is this 4098 child process manages TCP connections. Is this correct?

2, Why does debugging level has an impact on this? Is it because higher debugging level introduces some delay?


Regards,

Allen
Alex Balashov
2014-03-19 22:57:55 UTC
Permalink
Post by Allen Zhang
1, I noticed that the main thread was 4097 and 4098 died. what is the
child process created straight after main? My guess is this 4098 child
process manages TCP connections. Is this correct?
2, Why does debugging level has an impact on this? Is it because higher
debugging level introduces some delay?
That's hard to say. However, changing any aspect of the execution
behaviour changes the state of the program, and can certainly have an
impact on when it crashes, and whether it crashes at all.

The nature of memory bugs is that memory boundaries are often
overstepped, but this does not necessarily result in a crash. The crash
arises from the consequences of accessing that out-of-bounds memory,
such as when the program ingests garbage from that memory area because
it has been written to by something else. And, all of this behaviour
varies with the order of operations, the particular libc you are using,
its version, and the memory footprint of various other executed components.

The way to troubleshoot an issue like this is to analyse the core dump
that is generated by the process that died due to the segmentation fault
(signal 11). You should be able to find that core dump somewhere on your
system. When you do, you can read it with 'gdb':

gdb /path/to/kamailio/binary /path/to/core.4098

Note that by default, many values will be optimised out. To get a fuller
picture, you may need to compile Kamailio without -Ox compiler
optimisations, and with additional debug information, e.g. -g.

-- Alex
--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
Allen Zhang
2014-03-19 23:04:37 UTC
Permalink
Hi Alex,

Shouldn't the debug level only have an impact on the amount of information written to the log?
And that should only changes the delay between operations?

Allen



-----Original Message-----
From: sr-users-***@lists.sip-router.org [mailto:sr-users-***@lists.sip-router.org] On Behalf Of Alex Balashov
Sent: Thursday, 20 March 2014 11:58 a.m.
To: sr-***@lists.sip-router.org
Subject: Re: [SR-Users] Child process exited by a signal 11
Post by Allen Zhang
1, I noticed that the main thread was 4097 and 4098 died. what is the
child process created straight after main? My guess is this 4098 child
process manages TCP connections. Is this correct?
2, Why does debugging level has an impact on this? Is it because
higher debugging level introduces some delay?
That's hard to say. However, changing any aspect of the execution behaviour changes the state of the program, and can certainly have an impact on when it crashes, and whether it crashes at all.

The nature of memory bugs is that memory boundaries are often overstepped, but this does not necessarily result in a crash. The crash arises from the consequences of accessing that out-of-bounds memory, such as when the program ingests garbage from that memory area because it has been written to by something else. And, all of this behaviour varies with the order of operations, the particular libc you are using, its version, and the memory footprint of various other executed components.

The way to troubleshoot an issue like this is to analyse the core dump that is generated by the process that died due to the segmentation fault (signal 11). You should be able to find that core dump somewhere on your system. When you do, you can read it with 'gdb':

gdb /path/to/kamailio/binary /path/to/core.4098

Note that by default, many values will be optimised out. To get a fuller picture, you may need to compile Kamailio without -Ox compiler optimisations, and with additional debug information, e.g. -g.

-- Alex

--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-***@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Alex Balashov
2014-03-19 23:06:27 UTC
Permalink
Post by Allen Zhang
Shouldn't the debug level only have an impact on the amount of
information written to the log? And that should only changes the
delay between operations?
Well, from a programmatic point of view, not necessarily. Writing debug
logs is an operation that involves buffering and parsing strings
internally, which in turn draws on static (stack) and dynamic (heap)
memory allocations. All of that influences the memory state of the
program, and thus has an impact on whether it'll crash, and when it will
do so.
--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
Allen Zhang
2014-03-19 23:10:00 UTC
Permalink
Yes this makes sense.
But higher debug level = more writing.
Then increasing the debug level should causes more problem - because more buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations - instead of hiding the problem, right?

-----Original Message-----
From: sr-users-***@lists.sip-router.org [mailto:sr-users-***@lists.sip-router.org] On Behalf Of Alex Balashov
Sent: Thursday, 20 March 2014 12:06 p.m.
To: sr-***@lists.sip-router.org
Subject: Re: [SR-Users] Child process exited by a signal 11
Post by Allen Zhang
Shouldn't the debug level only have an impact on the amount of
information written to the log? And that should only changes the delay
between operations?
Well, from a programmatic point of view, not necessarily. Writing debug logs is an operation that involves buffering and parsing strings internally, which in turn draws on static (stack) and dynamic (heap) memory allocations. All of that influences the memory state of the program, and thus has an impact on whether it'll crash, and when it will do so.

--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-***@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Alex Balashov
2014-03-19 23:11:56 UTC
Permalink
Yes this makes sense. But higher debug level = more writing. Then
increasing the debug level should causes more problem - because more
buffering and parsing strings internally, which in turn draws on
static (stack) and dynamic (heap) memory allocations - instead of
hiding the problem, right?
That is logical, and is probably true in many cases.

However, it all depends on the memory allocation strategy used by the
program internally, as well as on the operating system side. For
instance, more logging could trigger a larger buffer allocation or
different fragmentation, which could serve to mask the memory bug by not
creating the circumstances that lead to an acute access violation, or
not creating them in the same place or as soon.
--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/
Allen Zhang
2014-03-19 23:17:46 UTC
Permalink
Um....
This makes perfect sense.
Enhanced my understanding about memory allocation, too.
Thanks Alex.

-----Original Message-----
From: sr-users-***@lists.sip-router.org [mailto:sr-users-***@lists.sip-router.org] On Behalf Of Alex Balashov
Sent: Thursday, 20 March 2014 12:12 p.m.
To: sr-***@lists.sip-router.org
Subject: Re: [SR-Users] Child process exited by a signal 11
Yes this makes sense. But higher debug level = more writing. Then
increasing the debug level should causes more problem - because more
buffering and parsing strings internally, which in turn draws on
static (stack) and dynamic (heap) memory allocations - instead of
hiding the problem, right?
That is logical, and is probably true in many cases.

However, it all depends on the memory allocation strategy used by the program internally, as well as on the operating system side. For instance, more logging could trigger a larger buffer allocation or different fragmentation, which could serve to mask the memory bug by not creating the circumstances that lead to an acute access violation, or not creating them in the same place or as soon.

--
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-***@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Loading...