Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
A thread hangs in forkAndExec on Solaris
A few people (including myself) encountered a problem running Hudson on Solaris 10. The symptom of the problem is that a build hangs, and when you obtain a thread dump, you see that the thread is hanging in UNIXProcess.forkAndExec:
"Executor #2 for maddie" Id=81 RUNNABLE (in native) at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:53) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) at hudson.Proc$LocalProc.(Proc.java:104) at hudson.Proc$LocalProc.(Proc.java:82) at hudson.Proc$LocalProc.(Proc.java:74) at hudson.Util.createSymlink(Util.java:725) at hudson.model.AbstractBuild$AbstractRunner.createLastSuccessfulLink(AbstractBuild.java:247) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:241) at hudson.model.Run.run(Run.java:778) at hudson.model.Build.run(Build.java:85) at hudson.model.ResourceController.execute(ResourceController.java:70) at hudson.model.Executor.run(Executor.java:85)
When this happens, you can see the forked process from the ps command, as below:
hudson@rator ~ $ ps -ef | grep java noaccess 832 1 0 Jun 03 ? 285:37 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m mode 24104 1 0 Jul 01 pts/2 0:00 /usr/dist/share/java,v1.5.0_14/5.x-i86pc/bin/java -Djdk.home=/usr/dist/share/ja hudson 20906 536 0 Jul 02 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 3907 536 0 Jul 02 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 536 7 0 Jul 01 ? 393:55 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 7398 536 0 Jul 02 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hg 21255 21086 0 Jun 18 ? 6:48 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=28801 -ja hudson 671 536 0 Jul 01 ? 0:57 /files/hudson/tools/java1.6/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,serv hudson 15109 9416 0 16:39:50 pts/9 0:00 grep java
When you kill one of the forked process (such as 20906), it will bring down the entire Hudson server.
If you run pstack on one of the forked child process, you see the output like this, which indicates that the hang is around PKCS code.
hudson@rator ~ $ ps -ef | grep java noaccess 832 1 0 Jun 03 ? 300:56 /usr/java/bin/java -server -Xmx128m -XX:+BackgroundCompilation -XX:PermSize=32m hudson 6101 15216 0 Jul 03 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 15216 7 0 Jul 03 ? 557:02 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx mode 24104 1 0 Jul 01 ? 0:00 /usr/dist/share/java,v1.5.0_14/5.x-i86pc/bin/java -Djdk.home=/usr/dist/share/ja hudson 10467 15216 0 Jul 07 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 2565 15216 0 02:16:33 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 27123 15216 0 Jul 03 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hg 21255 21086 0 Jun 18 ? 8:57 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=28801 -ja hudson 15707 15216 0 Jul 03 ? 3:04 /files/hudson/tools/java1.6/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,serv hudson 10936 15216 0 Jul 05 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 16947 15216 0 Jul 05 ? 0:00 java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5589 -Xmx hudson 11346 10308 0 16:58:05 pts/15 0:00 grep java hudson@rator ~ $ pstack 16947 16947: java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address= feeaf52b lwp_park (0, 0, 0) feea7ffe mutex_lock_internal (bfbad04, 0, 1) + 1bc feea843d mutex_lock_impl (bfbad04, 0) + 12d feea8492 mutex_lock (bfbad04) + 10 91edd1b9 soft_delete_session (bfbad00, 1, 1) + a9 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8843160c, feea8492) + 49 91ed79e0 finalize_common (1, 0) + 6c 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 88431674, fefd314b, 8de50538) + 37 91f0d4e1 _fini (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25 fefd314b call_fini (feffa2dc, 8de50538) + f3 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d fefd7273 dlclose (91f60cb0) + 3b 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 884317d4) + b4 91f7fa3c finalize_common (0, fede0500, 91f96000, 884317e0, 884317e0, 91f7f78f) + dc 91f7fa89 pkcs11_fini (fede0080, 884317fc, fee47282, 8d137200, fef45000, 88431820) + 41 91f7f78f pkcs11_fork_child (8d137200, fef45000, 88431820, 8d137200, 88431820, 88431820) + 27 fee47282 _postfork_child_handler (b, 93eac24, fef45000, 99a72e80, fef49720, fef49700) + 32 feea167d forkx (0) + 11d feea172e fork (c45d800, f42d369c, f42d36a0, c45d800, 999c2878, 99a72cd8) + 1a fe287edf Java_java_lang_UNIXProcess_forkAndExec (c45d8ec, 88431940, 8843193c, 88431938, 1, 88431930) + 367 fb20ebd2 ???????? (0, fb20b399, 99a732e0, 99a732d0, 99a732c0, 0) fb202de9 ???????? (0, 0, c, 99a72f48, 1, 99a72ee0) fbee9bf0 ???????? () hudson@rator ~ $ pstack 27123 27123: java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address= feeaf52b lwp_park (0, 0, 0) feea7ffe mutex_lock_internal (97e6094, 0, 1) + 1bc feea843d mutex_lock_impl (97e6094, 0) + 12d feea8492 mutex_lock (97e6094) + 10 91edd1b9 soft_delete_session (97e6090, 1, 1) + a9 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8c19afbc, feea8492) + 49 91ed79e0 finalize_common (1, 0) + 6c 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 8c19b024, fefd314b, 8de50538) + 37 91f0d4e1 _fini (8de50538, 92ff0de8, feffa7d8, 8c19b024, fefd314b, 8de50538) + 25 fefd314b call_fini () + f3 hudson@rator ~ $ pstack 2565 10467 6101 2565: java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address= feeaf52b lwp_park (0, 0, 0) feea7ffe mutex_lock_internal (cb6cb9c, 0, 1) + 1bc feea843d mutex_lock_impl (cb6cb9c, 0) + 12d feea8492 mutex_lock (cb6cb9c) + 10 91edd1b9 soft_delete_session (cb6cb98, 1, 1) + a9 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 902334cc, feea8492) + 49 91ed79e0 finalize_common (1, 0) + 6c 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 90233534, fefd314b, 8de50538) + 37 91f0d4e1 _fini (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25 fefd314b call_fini (feffa2dc, 8de50538) + f3 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d fefd7273 dlclose (91f60cb0) + 3b 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 90233694) + b4 91f7fa3c finalize_common (0, fede0500, 91f96000, 902336a0, 902336a0, 91f7f78f) + dc 91f7fa89 pkcs11_fini (fede0080, 902336bc, fee47282, 9134b200, fef45000, 902336e0) + 41 91f7f78f pkcs11_fork_child (9134b200, fef45000, 902336e0, 9134b200, 902336e0, 902336e0) + 27 fee47282 _postfork_child_handler (0, b1cde73, fef45000, f30bcaa3, fef49720, fef49700) + 32 feea167d forkx (0) + 11d feea172e fork (9133c00, f42d3664, f42d3668, 9133c00, a200aa38, 90233730) + 1a fe287edf Java_java_lang_UNIXProcess_forkAndExec (9133cec, 90233800, 902337fc, 902337f8, 2, 902337f0) + 367 fb20ebd2 ???????? (0, fb20b399, 93027718, 93027708, 930276f8, 1) fb202de9 ???????? (1, 93027660, 1, 93027458, 2, 930273f0) fc91de98 ???????? () 10467: java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address= feeaf52b lwp_park (0, 0, 0) feea7ffe mutex_lock_internal (d7ed02c, 0, 1) + 1bc feea843d mutex_lock_impl (d7ed02c, 0) + 12d feea8492 mutex_lock (d7ed02c) + 10 91edd1b9 soft_delete_session (d7ed028, 1, 1) + a9 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 9168b8fc, feea8492) + 49 91ed79e0 finalize_common (1, 0) + 6c 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 9168b964, fefd314b, 8de50538) + 37 91f0d4e1 _fini (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25 fefd314b call_fini (feffa2dc, 8de50538) + f3 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d fefd7273 dlclose (91f60cb0) + 3b 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 9168bac4) + b4 91f7fa3c finalize_common (0, fede0500, 91f96000, 9168bad0, 9168bad0, 91f7f78f) + dc 91f7fa89 pkcs11_fini (fede0080, 9168baec, fee47282, fe45ba00, fef45000, 9168bb10) + 41 91f7f78f pkcs11_fork_child (fe45ba00, fef45000, 9168bb10, fe45ba00, 9168bb10, 9168bb10) + 27 fee47282 _postfork_child_handler (1, d6ba709, fef45000, 2, fef49720, fef49700) + 32 feea167d forkx (0) + 11d feea172e fork (8da3c00, f42d3664, f42d3668, 8da3c00, a200c160, 9168bb60) + 1a fe287edf Java_java_lang_UNIXProcess_forkAndExec (8da3cec, 9168bc2c, 9168bc28, 9168bc24, 2, 0) + 367 fb20ebd2 ???????? (0, fb20b399, 9658b978, 9658b968, 9658b958, 0) fb202de9 ???????? (0, 0, 0, 0, 2, 9658b7c0) fb202f0d ???????? (9658b690, 2, 0, 9658b898, c9, 9658b7c0) fb202da7 ???????? (9658aba8, 0, 0, f48c9b20, 9658b4b0, 9658b468) fb202da7 ???????? (0, 0, 9658b410, 9dab8bb0, 9168bd50, f30f4210) fb202da7 ???????? (9658b410, 9dab8bb0, 8da3cec, 0, 9658b410, 9658b428) fd058d70 ???????? (0, 0, 8da38d8, 0, 8c3e048, 0) 00000000 ???????? (fe888b40, fe768ef0, fe7637d4, fe888b34, fe888a8c, fe888b1c) 00000000 ???????? (fffffe18, 3badd9be, f468fe16, e78b7254, fe1dbbcc, 0) 00000000 ???????? () 6101: java -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address= feeaf52b lwp_park (0, 0, 0) feea7ffe mutex_lock_internal (8f809a4, 0, 1) + 1bc feea843d mutex_lock_impl (8f809a4, 0) + 12d feea8492 mutex_lock (8f809a4) + 10 91edd1b9 soft_delete_session (8f809a0, 1, 1) + a9 91edce89 soft_delete_all_sessions (1, 91f0d4bc, 91f24378, 91f24000, 8ce1930c, feea8492) + 49 91ed79e0 finalize_common (1, 0) + 6c 91ed7b67 softtoken_fini (feffa7d8, 91f60958, 91f0d4bc, 8ce19374, fefd314b, 8de50538) + 37 91f0d4e1 _fini (8de50538, 92ff0de8, feffa7d8, 8de50550, 8de50538, 92ff0de8) + 25 fefd314b call_fini (feffa2dc, 8de50538) + f3 fefdc08a remove_hdl (91f60cb0, 92ff0de8, 0) + 7c2 fefd7154 dlclose_core (91f60cb0, 92ff0de8, feffa2dc) + a0 fefd71a1 dlclose_intn (91f60cb0, 92ff0de8) + 25 fefd720d dlclose_check (91f60cb0, 92ff0de8) + 2d fefd7273 dlclose (91f60cb0) + 3b 91f838f0 pkcs11_slottable_delete (91f966d8, 91f96000, 0, fef496c0, fede0500, 8ce194d4) + b4 91f7fa3c finalize_common (0, fede0500, 91f96000, 8ce194e0, 8ce194e0, 91f7f78f) + dc 91f7fa89 pkcs11_fini (fede0080, 8ce194fc, fee47282, 8d111a00, fef45000, 8ce19520) + 41 91f7f78f pkcs11_fork_child (8d111a00, fef45000, 8ce19520, 8d111a00, 8ce19520, 8ce19520) + 27 fee47282 _postfork_child_handler (0, c40195b, fef45000, 93018488, fef49720, fef49700) + 32 feea167d forkx (0) + 11d feea172e fork (a400400, f42d369c, f42d36a0, a400400, 9b880310, 8ce1958c) + 1a fe287edf Java_java_lang_UNIXProcess_forkAndExec (a4004ec, 8ce19640, 8ce1963c, 8ce19638, 2, 8ce19630) + 367 fb20ebd2 ???????? (0, fb20b399, 930191f8, 930191e8, 930191d8, 1) fb202de9 ???????? (1, 93019188, 1, 93018fc8, 2, 93018f58) fbee9bf0 ???????? ()
The source code for these code is available on OpenSolaris site: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/pkcs11/pkcs11_softtoken/common/softGeneral.c#finalize_common I found a posting on forum that looks very related. More hits in here
Various suggestions
Apply patch 118918? There's an on-going escalation?
A workaround? Apply patch 127111 -> this was obsolted by 127127.
turns out this was reported in Hudson list some time ago and in issue #2296. Fix
I attempted to apply some of the mentioned patches, but they failed to pkgadd, citing version inconsistencies.
I then applied the suggested workaround in jre/lib/security/java.security
file and restart Hudson. Since then I have no longer seen a problem.