Troubleshooting hard-to-find Windows Subsystem for Linux (WSL) crash due to issues with NTVDMx64
Recently all of a sudden, my WSL instance running Ubuntu 22 could not start. Even after making sure that hardware virtualization enabled in UEFI setup and uninstalling VMware, which has been known to cause WSL to think that your processor does not support hardware virtualization (VT-X) instructions, WSL still reported “Catastrophic error 0x800ffff”. Multiple attempts to uninstall/reinstall Virtual Machine Platform, Windows Hypervisor Platform and Windows Subsystem for Linux components as well as Hyper-V still could not address the issue, although the WSL start-up error message had now changed to either “An illegal state change was requested” or “The remote procedure call failed”, depending on how WSL was launched.
Checking Control Panel > Windows Tools > Services, I noticed that the Windows Subsystem for Linux service was not running. I restarted the service only for it to stop again almost immediately, which perhaps pointed to an issue during initialization such as missing or corrupted files. The event log contained a single entry stating that the WSL service did not start successfully. The details tab of that entry contained a stack trace pointing to a native code crash within ntdll.dll:
Faulting application name: wslinstaller.exe, version: 2.0.9.0, time stamp: 0x654ea013 Faulting module name: ntdll.dll, version: 10.0.22621.2506, time stamp: 0xbced4b82 Exception code: 0xc0000005 Fault offset: 0x00000000000a5f6e Faulting process id: 0x0x9DC Faulting application start time: 0x0x1DA1D129CFF6C48 Faulting application path: C:\Program Files\WindowsApps\MicrosoftCorporationII.WindowsSubsystemForLinux_2.0.9.0_x64__8wekyb3d8bbwe\wslinstaller.exe Faulting module path: C:\windows\SYSTEM32\ntdll.dll Report Id: 4a24781c-d5d6-4850-80ad-35fc49fdbd07 Faulting package full name: MicrosoftCorporationII.WindowsSubsystemForLinux_2.0.9.0_x64__8wekyb3d8bbwe Faulting package-relative application ID: wsl
A couple of minutes later, I had another idea. Maybe we could attach a debugger to a command prompt, start WSL service from the command line, set the debugger to debug all child processes, which should include our WSL service, and monitor the console output. I then downloaded WinDbg and attached it to an elevated (e.g. running as administrator) command-prompt. From within WibDbg, I typed .childdbg 1 to also attach to any child process of the command prompt. Finally, I started WSL from the command prompt using net start WslInstaller. With this, WinDbg immediately stopped inside NTDLL.DLL with the following trace:
ModLoad: 00007ff8`73920000 00007ff8`73a13000 C:\windows\System32\shcore.dll ModLoad: 00007ff8`73640000 00007ff8`736da000 C:\windows\System32\msvcp_win.dll (4280.1f98): Break instruction exception - code 80000003 (first chance) *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\windows\SYSTEM32\ntdll.dll - ntdll!DbgBreakPoint: 00007ff8`75a33060 cc int 3
The stack trace indicated a “first chance exception” and pointed to a breakpoint interrupt (INT 3) instruction. The WSL service process was still running. The behavior was simply due to a Win32 exception which was thrown inside NTDLL.DLL. The exception would have been caught (e.g. handled) subsequently and was most likely not the cause of the crash. These types of exceptions are not uncommon – you most likely have seen Visual Studio reporting “a first chance exception has occurred…” multiple times in the console log when attaching a native debugger to any Win32 exception, even those that work fine. Since this was not what I had been looking for, I resumed execution by selecting Debug > Go. After repeating the process a couple of times, I noticed the WSL service was reported to have stopped, with the following lines in the trace:
ModLoad: 00007ff8`70240000 00007ff8`70258000 C:\windows\system32\ldntvdm.dll LDNTVDM is running inside net1.exe Hook_IAT_x64(731D0000, ext-ms-win-kernelbase-processthread-l1-1-0.dll, BasepProcessInvalidImage, 7024459C) Hooked 73543678: 743F9D50 -> 7024459C LDNTVDM: BasepProcessInvalidImageReal = 743F9D50 LDNTVDM: BaseIsDosApplication = 74421F10 Hook_IAT_x64_IAT(731D0000, ntdll.dll, NtCreateUserProcess, 7024445C, 70254160) Hooked 734313E0: 75A30D30 -> 7024445C ldntvdm Init done (https://github.com/leecher1337/ntvdmx64) ntdll!NtTerminateProcess+0x14: 00007ff8`75a2f8f4 c3 00007ff8`75a2f8f4 c3 ret
This new finding pointed to some incompatibilities between WSL and NTVDMx64, a third-party utility allowing the execution of 16-bit DOS applications on 64-bit version of Windows, which I have explored in my previous article. The tool hooks on to various low-level Windows DLL files so that it can override how Windows handles the loading of .EXE files and allows the user to run 16-bit DOS apps. Because WSL is also trying to do the same or similar to allow user to run Linux apps from Windows, a conflict arises which results in an exception that causes WSL to fail to initialize.
I then tried to launch various 16-bit apps from Windows 11, which also failed to start, showing similar error messages in WinDbg. I subsequently recalled that NTVDMx64 had been working until a few days earlier when I decided to install the latest Windows updates. Most likely, the updates that I had installed broke the my NTVDMx64 installation and caused WSL to crash. After trying to reinstall NTVDMx64 without any luck, I decided to simply uninstall it and test WSL again, praying that the uninstaller had indeed removed all registry customizations that might have contributed to the crash. Sure enough, after a reboot, everything was fine once again and WSL could start and work properly.
It should be noted that once I resolved this issue, VMware, Virtual Box and WSL all worked well together. Recent versions of VirtualBox and VMware have no issues with WSL, provided that you do not have Hyper-V installed on the same machine. Under Windows Features, only Windows Subsystem for Linux and Virtual Machine Platform need to be installed:
With this setup, both VirtualBox and VMware can co-exist with WSL while still supporting 64-bit virtual machines (which require hardware virtualization). On VMware, although you may encounter a warning related to lack of side channel mitigation, most Windows and Linux virtual machines should still work fine, albeit with a bit of degraded performance.
That’s the end of NTVDMx64 for me, until the author provides an update that makes it compatible with WSL. I hope this note will be useful for those encountering similar issues. I had spent three days trying to fix the issue!
UPDATE (Aug 2024): This issue has been fixed by the author and NTVDMx64 now works fine with both WSL (wslservice.exe) and Windows Search (SearchIndexer.exe). The root cause was the Arbitrary Code Guard functionality introduced in recent updates for Windows 10 and 11 which prevents change of code page protection on certain executables and leads to crashes when NTVDM does inline hooks. Read this, this and this for details.
See also
I think I fixed this problem now:
https://github.com/leecher1337/ntvdmx64/commit/b5ebb0e2bc171d17265bd77708c3726928f5f014
Regards,
leecher1337
Hi Leecher,
Thank you for the fix. NTVDMx64 now works fine with both WSL and Windows Search. Great work!