A thread hangs in forkAndExec on Solaris

From Eclipsepedia

Jump to: navigation, search

A few people (including myself) encountered a problem running Hudson on Solaris 10. The symptom of the problem is that a build hangs, and when you obtain a thread dump, you see that the thread is hanging in UNIXProcess.forkAndExec:

"Executor #2 for maddie" Id=81 RUNNABLE (in native)
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.(UNIXProcess.java:53)
	at java.lang.ProcessImpl.start(ProcessImpl.java:65)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
	at hudson.Proc$LocalProc.(Proc.java:104)
	at hudson.Proc$LocalProc.(Proc.java:82)
	at hudson.Proc$LocalProc.(Proc.java:74)
	at hudson.Util.createSymlink(Util.java:725)
	at hudson.model.AbstractBuild$AbstractRunner.createLastSuccessfulLink(AbstractBuild.java:247)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:241)
	at hudson.model.Run.run(Run.java:778)
	at hudson.model.Build.run(Build.java:85)
	at hudson.model.ResourceController.execute(ResourceController.java:70)
	at hudson.model.Executor.run(Executor.java:85)

When this happens, you can see the forked process from the ps command, as below:

hudson@rator ~ $ ps -ef | grep java
noaccess   832     1   0   Jun 03 ?         285:37 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m
    mode 24104     1   0   Jul 01 pts/2       0:00 /usr/dist/share/java,v1.5.0_14/5.x-i86pc/bin/java -Djdk.home=/usr/dist/share/ja
  hudson 20906   536   0   Jul 02 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson  3907   536   0   Jul 02 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson   536     7   0   Jul 01 ?         393:55 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson  7398   536   0   Jul 02 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
      hg 21255 21086   0   Jun 18 ?           6:48 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=28801 -ja
  hudson   671   536   0   Jul 01 ?           0:57 /files/hudson/tools/java1.6/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,serv
  hudson 15109  9416   0 16:39:50 pts/9       0:00 grep java

When you kill one of the forked process (such as 20906), it will bring down the entire Hudson server.

If you run pstack on one of the forked child process, you see the output like this, which indicates that the hang is around PKCS code.

hudson@rator ~ $ ps -ef | grep java
noaccess   832     1   0   Jun 03 ?         300:56 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m
  hudson  6101 15216   0   Jul 03 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson 15216     7   0   Jul 03 ?         557:02 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
    mode 24104     1   0   Jul 01 ?           0:00 /usr/dist/share/java,v1.5.0_14/5.x-i86pc/bin/java -Djdk.home=/usr/dist/share/ja
  hudson 10467 15216   0   Jul 07 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson  2565 15216   0 02:16:33 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson 27123 15216   0   Jul 03 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
      hg 21255 21086   0   Jun 18 ?           8:57 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=28801 -ja
  hudson 15707 15216   0   Jul 03 ?           3:04 /files/hudson/tools/java1.6/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,serv
  hudson 10936 15216   0   Jul 05 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson 16947 15216   0   Jul 05 ?           0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx
  hudson 11346 10308   0 16:58:05 pts/15      0:00 grep java
hudson@rator ~ $ pstack 16947
16947:  java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=
 feeaf52b lwp_park (0, 0, 0)
 feea7ffe mutex_lock_internal (bfbad04, 0, 1) + 1bc
 feea843d mutex_lock_impl (bfbad04, 0) + 12d
 feea8492 mutex_lock (bfbad04) + 10
 91edd1b9 soft_delete_session (bfbad00, 1, 1) + a9
 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8843160c, feea8492) + 49
 91ed79e0 finalize_common (1, 0) + 6c
 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 88431674, fefd314b, 8de50538) + 37
 91f0d4e1 _fini    (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25
 fefd314b call_fini (feffa2dc, 8de50538) + f3
 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2
 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0
 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25
 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d
 fefd7273 dlclose  (91f60cb0) + 3b
 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 884317d4) + b4
 91f7fa3c finalize_common (0, fede0500, 91f96000, 884317e0, 884317e0, 91f7f78f) + dc
 91f7fa89 pkcs11_fini (fede0080, 884317fc, fee47282, 8d137200, fef45000, 88431820) + 41
 91f7f78f pkcs11_fork_child (8d137200, fef45000, 88431820, 8d137200, 88431820, 88431820) + 27
 fee47282 _postfork_child_handler (b, 93eac24, fef45000, 99a72e80, fef49720, fef49700) + 32
 feea167d forkx    (0) + 11d
 feea172e fork     (c45d800, f42d369c, f42d36a0, c45d800, 999c2878, 99a72cd8) + 1a
 fe287edf Java_java_lang_UNIXProcess_forkAndExec (c45d8ec, 88431940, 8843193c, 88431938, 1, 88431930) + 367
 fb20ebd2 ???????? (0, fb20b399, 99a732e0, 99a732d0, 99a732c0, 0)
 fb202de9 ???????? (0, 0, c, 99a72f48, 1, 99a72ee0)
 fbee9bf0 ???????? ()
hudson@rator ~ $ pstack 27123
27123:  java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=
 feeaf52b lwp_park (0, 0, 0)
 feea7ffe mutex_lock_internal (97e6094, 0, 1) + 1bc
 feea843d mutex_lock_impl (97e6094, 0) + 12d
 feea8492 mutex_lock (97e6094) + 10
 91edd1b9 soft_delete_session (97e6090, 1, 1) + a9
 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8c19afbc, feea8492) + 49
 91ed79e0 finalize_common (1, 0) + 6c
 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 8c19b024, fefd314b, 8de50538) + 37
 91f0d4e1 _fini    (8de50538, 92ff0de8, feffa7d8, 8c19b024, fefd314b, 8de50538) + 25
 fefd314b call_fini () + f3
hudson@rator ~ $ pstack 2565 10467 6101
2565:   java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=
 feeaf52b lwp_park (0, 0, 0)
 feea7ffe mutex_lock_internal (cb6cb9c, 0, 1) + 1bc
 feea843d mutex_lock_impl (cb6cb9c, 0) + 12d
 feea8492 mutex_lock (cb6cb9c) + 10
 91edd1b9 soft_delete_session (cb6cb98, 1, 1) + a9
 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 902334cc, feea8492) + 49
 91ed79e0 finalize_common (1, 0) + 6c
 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 90233534, fefd314b, 8de50538) + 37
 91f0d4e1 _fini    (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25
 fefd314b call_fini (feffa2dc, 8de50538) + f3
 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2
 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0
 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25
 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d
 fefd7273 dlclose  (91f60cb0) + 3b
 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 90233694) + b4
 91f7fa3c finalize_common (0, fede0500, 91f96000, 902336a0, 902336a0, 91f7f78f) + dc
 91f7fa89 pkcs11_fini (fede0080, 902336bc, fee47282, 9134b200, fef45000, 902336e0) + 41
 91f7f78f pkcs11_fork_child (9134b200, fef45000, 902336e0, 9134b200, 902336e0, 902336e0) + 27
 fee47282 _postfork_child_handler (0, b1cde73, fef45000, f30bcaa3, fef49720, fef49700) + 32
 feea167d forkx    (0) + 11d
 feea172e fork     (9133c00, f42d3664, f42d3668, 9133c00, a200aa38, 90233730) + 1a
 fe287edf Java_java_lang_UNIXProcess_forkAndExec (9133cec, 90233800, 902337fc, 902337f8, 2, 902337f0) + 367
 fb20ebd2 ???????? (0, fb20b399, 93027718, 93027708, 930276f8, 1)
 fb202de9 ???????? (1, 93027660, 1, 93027458, 2, 930273f0)
 fc91de98 ???????? ()
10467:  java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=
 feeaf52b lwp_park (0, 0, 0)
 feea7ffe mutex_lock_internal (d7ed02c, 0, 1) + 1bc
 feea843d mutex_lock_impl (d7ed02c, 0) + 12d
 feea8492 mutex_lock (d7ed02c) + 10
 91edd1b9 soft_delete_session (d7ed028, 1, 1) + a9
 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 9168b8fc, feea8492) + 49
 91ed79e0 finalize_common (1, 0) + 6c
 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 9168b964, fefd314b, 8de50538) + 37
 91f0d4e1 _fini    (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25
 fefd314b call_fini (feffa2dc, 8de50538) + f3
 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2
 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0
 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25
 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d
 fefd7273 dlclose  (91f60cb0) + 3b
 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 9168bac4) + b4
 91f7fa3c finalize_common (0, fede0500, 91f96000, 9168bad0, 9168bad0, 91f7f78f) + dc
 91f7fa89 pkcs11_fini (fede0080, 9168baec, fee47282, fe45ba00, fef45000, 9168bb10) + 41
 91f7f78f pkcs11_fork_child (fe45ba00, fef45000, 9168bb10, fe45ba00, 9168bb10, 9168bb10) + 27
 fee47282 _postfork_child_handler (1, d6ba709, fef45000, 2, fef49720, fef49700) + 32
 feea167d forkx    (0) + 11d
 feea172e fork     (8da3c00, f42d3664, f42d3668, 8da3c00, a200c160, 9168bb60) + 1a
 fe287edf Java_java_lang_UNIXProcess_forkAndExec (8da3cec, 9168bc2c, 9168bc28, 9168bc24, 2, 0) + 367
 fb20ebd2 ???????? (0, fb20b399, 9658b978, 9658b968, 9658b958, 0)
 fb202de9 ???????? (0, 0, 0, 0, 2, 9658b7c0)
 fb202f0d ???????? (9658b690, 2, 0, 9658b898, c9, 9658b7c0)
 fb202da7 ???????? (9658aba8, 0, 0, f48c9b20, 9658b4b0, 9658b468)
 fb202da7 ???????? (0, 0, 9658b410, 9dab8bb0, 9168bd50, f30f4210)
 fb202da7 ???????? (9658b410, 9dab8bb0, 8da3cec, 0, 9658b410, 9658b428)
 fd058d70 ???????? (0, 0, 8da38d8, 0, 8c3e048, 0)
 00000000 ???????? (fe888b40, fe768ef0, fe7637d4, fe888b34, fe888a8c, fe888b1c)
 00000000 ???????? (fffffe18, 3badd9be, f468fe16, e78b7254, fe1dbbcc, 0)
 00000000 ???????? ()
6101:   java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=
 feeaf52b lwp_park (0, 0, 0)
 feea7ffe mutex_lock_internal (8f809a4, 0, 1) + 1bc
 feea843d mutex_lock_impl (8f809a4, 0) + 12d
 feea8492 mutex_lock (8f809a4) + 10
 91edd1b9 soft_delete_session (8f809a0, 1, 1) + a9
 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8ce1930c, feea8492) + 49
 91ed79e0 finalize_common (1, 0) + 6c
 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 8ce19374, fefd314b, 8de50538) + 37
 91f0d4e1 _fini    (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25
 fefd314b call_fini (feffa2dc, 8de50538) + f3
 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2
 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0
 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25
 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d
 fefd7273 dlclose  (91f60cb0) + 3b
 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 8ce194d4) + b4
 91f7fa3c finalize_common (0, fede0500, 91f96000, 8ce194e0, 8ce194e0, 91f7f78f) + dc
 91f7fa89 pkcs11_fini (fede0080, 8ce194fc, fee47282, 8d111a00, fef45000, 8ce19520) + 41
 91f7f78f pkcs11_fork_child (8d111a00, fef45000, 8ce19520, 8d111a00, 8ce19520, 8ce19520) + 27
 fee47282 _postfork_child_handler (0, c40195b, fef45000, 93018488, fef49720, fef49700) + 32
 feea167d forkx    (0) + 11d
 feea172e fork     (a400400, f42d369c, f42d36a0, a400400, 9b880310, 8ce1958c) + 1a
 fe287edf Java_java_lang_UNIXProcess_forkAndExec (a4004ec, 8ce19640, 8ce1963c, 8ce19638, 2, 8ce19630) + 367
 fb20ebd2 ???????? (0, fb20b399, 930191f8, 930191e8, 930191d8, 1)
 fb202de9 ???????? (1, 93019188, 1, 93018fc8, 2, 93018f58)
 fbee9bf0 ???????? ()

The source code for these code is available on OpenSolaris site: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/pkcs11/pkcs11_softtoken/common/softGeneral.c#finalize_common I found a posting on forum that looks very related. More hits in here

Various suggestions

Apply patch 118918? There's an on-going escalation?

A workaround? Apply patch 127111 -> this was obsolted by 127127.

turns out this was reported in Hudson list some time ago and in issue #2296. Fix

I attempted to apply some of the mentioned patches, but they failed to pkgadd, citing version inconsistencies.

I then applied the suggested workaround in jre/lib/security/java.security file and restart Hudson. Since then I have no longer seen a problem.