Troubleshooting hard-to-find Windows Subsystem for Linux (WSL) crash due to issues with NTVDMx64

0.00 avg. rating (0% score) - 0 votes

Recently all of a sudden, my WSL instance running Ubuntu 22 could not start. Even after making sure that hardware virtualization enabled in UEFI setup and uninstalling VMware, which has been known to cause WSL to think that your processor does not support hardware virtualization (VT-X) instructions, WSL still reported “Catastrophic error 0x800ffff”. Multiple attempts to uninstall/reinstall Virtual Machine Platform, Windows Hypervisor Platform and Windows Subsystem for Linux components as well as Hyper-V still could not address the issue, although the WSL start-up error message had now changed to either “An illegal state change was requested” or “The remote procedure call failed”, depending on how WSL was launched.

Checking Control Panel > Windows Tools > Services, I noticed that the Windows Subsystem for Linux service was not running. I restarted the service only for it to stop again almost immediately, which perhaps pointed to an issue during initialization such as missing or corrupted files. The event log contained a single entry stating that the WSL service did not start successfully. The details tab of that entry contained a stack trace pointing to a native code crash within ntdll.dll:

Faulting application name: wslinstaller.exe, version: 2.0.9.0, time stamp: 0x654ea013
Faulting module name: ntdll.dll, version: 10.0.22621.2506, time stamp: 0xbced4b82
Exception code: 0xc0000005
Fault offset: 0x00000000000a5f6e
Faulting process id: 0x0x9DC
Faulting application start time: 0x0x1DA1D129CFF6C48
Faulting application path: C:\Program Files\WindowsApps\MicrosoftCorporationII.WindowsSubsystemForLinux_2.0.9.0_x64__8wekyb3d8bbwe\wslinstaller.exe
Faulting module path: C:\windows\SYSTEM32\ntdll.dll
Report Id: 4a24781c-d5d6-4850-80ad-35fc49fdbd07
Faulting package full name: MicrosoftCorporationII.WindowsSubsystemForLinux_2.0.9.0_x64__8wekyb3d8bbwe
Faulting package-relative application ID: wsl

Exception 0xc0000005 is Access Violation, which could indicate a problem with a dynamic link library (DLL), such as a corrupted or incompatible DLL file, insufficient permissions, or memory-related issues within the application or the operating system. To debug further, the best way is to attach a debugger to the responsible process, wslinstaller.exe, to see if the process might have printed any more meaningful error messages to the console. Normally this can be easily accomplished by using Debug > Attach to Process in Visual Studio, in this case however, the WSL process started and stopped immediately, leaving me with little time to attach a debugger.

A couple of minutes later, I had another idea. Maybe we could attach a debugger to a command prompt, start WSL service from the command line, set the debugger to debug all child processes, which should include our WSL service, and monitor the console output. I then downloaded WinDbg and attached it to an elevated (e.g. running as administrator) command-prompt. From within WibDbg, I typed .childdbg 1 to also attach to any child process of the command prompt. Finally, I started WSL from the command prompt using net start WslInstaller. With this, WinDbg immediately stopped inside NTDLL.DLL with the following trace:

ModLoad: 00007ff8`73920000 00007ff8`73a13000   C:\windows\System32\shcore.dll
ModLoad: 00007ff8`73640000 00007ff8`736da000   C:\windows\System32\msvcp_win.dll
(4280.1f98): Break instruction exception - code 80000003 (first chance)
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\windows\SYSTEM32\ntdll.dll -
ntdll!DbgBreakPoint:
00007ff8`75a33060 cc              int     3

The stack trace indicated a “first chance exception” and pointed to a breakpoint interrupt (INT 3) instruction. The WSL service process was still running. The behavior was simply due to a Win32 exception which was thrown inside NTDLL.DLL. The exception would have been caught (e.g. handled) subsequently and was most likely not the cause of the crash. These types of exceptions are not uncommon – you most likely have seen Visual Studio reporting “a first chance exception has occurred…” multiple times in the console log when attaching a native debugger to any Win32 exception, even those that work fine. Since this was not what I had been looking for, I resumed execution by selecting Debug > Go. After repeating the process a couple of times, I noticed the WSL service was reported to have stopped, with the following lines in the trace:

ModLoad: 00007ff8`70240000 00007ff8`70258000   C:\windows\system32\ldntvdm.dll
LDNTVDM is running inside net1.exe
Hook_IAT_x64(731D0000, ext-ms-win-kernelbase-processthread-l1-1-0.dll, BasepProcessInvalidImage, 7024459C)
Hooked 73543678: 743F9D50 -> 7024459C
LDNTVDM: BasepProcessInvalidImageReal = 743F9D50
LDNTVDM: BaseIsDosApplication = 74421F10
Hook_IAT_x64_IAT(731D0000, ntdll.dll, NtCreateUserProcess, 7024445C, 70254160)
Hooked 734313E0: 75A30D30 -> 7024445C
ldntvdm Init done (https://github.com/leecher1337/ntvdmx64)
ntdll!NtTerminateProcess+0x14:
00007ff8`75a2f8f4 c3
00007ff8`75a2f8f4 c3              ret

This new finding pointed to some incompatibilities between WSL and NTVDMx64, a third-party utility allowing the execution of 16-bit DOS applications on 64-bit version of Windows, which I have explored in my previous article. The tool hooks on to various low-level Windows DLL files so that it can override how Windows handles the loading of .EXE files and allows the user to run 16-bit DOS apps. Because WSL is also trying to do the same or similar to allow user to run Linux apps from Windows, a conflict arises which result in an exception that causes WSL to fail to initialize.

I then tried to launch various 16-bit apps from Windows 11, which also failed to start, showing similar error messages in WinDbg. I subsequently recalled that NTVDMx64 had been working until a few days earlier when I decided to install the latest Windows updates. Most likely, the updates that I had installed broke the my NTVDMx64 installation and caused WSL to crash. After trying to reinstall NTVDMx64 without any luck, I decided to simply uninstall it and test WSL again, praying that the uninstaller had indeed removed all registry customizations that might have contributed to the crash. Sure enough, after a reboot, everything was fine once again and WSL could start and work properly.

It should be noted that once I resolved this issue, VMware, Virtual Box and WSL all worked well together. Recent versions of VirtualBox and VMware have no issues with WSL, provided that you do not have Hyper-V installed on the same machine. Under Windows Features, only Windows Subsystem for Linux and Virtual Machine Platform need to be installed:

Screenshot 2024-01-16 224801

With this setup, both VirtualBox and VMware can co-exist with WSL while still supporting 64-bit virtual machines (which require hardware virtualization). On VMware, although you may encounter a warning related to lack of side channel mitigation, most Windows and Linux virtual machines should still work fine, albeit with a bit of degraded performance.

That’s the end of NTVDMx64 for me, until the author provides an update that makes it compatible with WSL. I hope this note will be useful for those encountering similar issues. I had spent three days trying to fix the issue!

See also

Fixing corrupted Code Integrity Policies (CIP) causing Windows 11 slowness on recently downloaded EXE files

0.00 avg. rating (0% score) - 0 votes
ToughDev

ToughDev

A tough developer who likes to work on just about anything, from software development to electronics, and share his knowledge with the rest of the world.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>