Crash in deallocation when app launched by COM

Jul 18, 2012 at 8:03 PM

I'm seeing a crash and strange behavior when an application is launched from COM (using Python, but launched with CoCreateInstance as a CLSCTX_LOCAL_SERVER). (This does not happen when launched from the command line or Explorer, or without VLD; although that does not mean VLD is the cause; it could just be a trigger.)

When I break in the debugger I see stacks like:

        vld_x86.dll!VisualLeakDetector::mapBlock(void * heap, const void * mem, unsigned long size, bool crtalloc, CallStack * * & ppcallstack) Line 1004        C++
        vld_x86.dll!VisualLeakDetector::AllocateHeap(tls_t * tls, void * heap, void * block, unsigned long size) Line 1586        C++
        vld_x86.dll!VisualLeakDetector::_HeapAlloc(void * heap, unsigned long flags, unsigned long size) Line 1571 + 0x15 bytes        C++
        vld_x86.dll!VisualLeakDetector::__malloc_dbg(void * (unsigned int, int, const char *, int)* p_malloc_dbg, context_t & context, bool debugRuntime, unsigned int size, int type, const char * file, int line) Line 1056 + 0x15 bytes        C++
        vld_x86.dll!CrtMfcPatch<80,1>::crtd__malloc_dbg(unsigned int size, int type, const char * file, int line) Line 198        C++
        vld_x86.dll!VisualLeakDetector::_new(void * (unsigned int)* pnew, context_t & context, bool debugRuntime, unsigned int size) Line 181 + 0x9 bytes        C++
>        vld_x86.dll!CrtMfcPatch<80,1>::mfcud_vector_new(unsigned int size) Line 1215        C++
[subsequent frames are from the application and are resolved correctly]

The unresolved addresses are usually (non-COM cases) resolved into CRT/MFC allocation functions.

Usually the failure is in deallocation - _free_dbg_nolock might assert !_CrtIsValidHeapPointer (showing a message box, which deadlocks trying to load a module since there's another thread that already has the loader lock and wants the heap lock when this thread has the heap lock), or RtlFreeHeap might fail and throw an exception, presumably trying to free a block from a heap the block wasn't allocated from.

Any pointers/suggestions for investigation appreciated.

Jul 20, 2012 at 8:13 PM
Edited Jul 20, 2012 at 9:29 PM


Jul 20, 2012 at 11:33 PM

I encountered the same crash.

Jul 22, 2012 at 1:34 AM

You'll be happy to know, then, that I determined the problem, and it's not VLD's fault, at least in my case. I did a lot of debugging and tracing and determined that in the case of the crash, LoadLibrary had just been called and VLD had done RefreshModules; and then immediately after the LoadLibrary, a simple new/delete would crash as above. I examined the modules detected by EnumerateLoadedModules64 and the addLoadedModules callback, and compared them to the non-COM case: the COM case loaded mfc80ud.dll, msvcr80d.dll, and msvcp80d.dll again (even though the main app already loaded and added them), enumerated them twice, and patched the second set after the LoadLibrary call (determining they hadn't yet been patched). Further, the addresses it patched had no symbols (in the debugger), whereas the addresses patched on initial load did (__imp_free_debug etc.). Examination of the full path of the modules showed that the first set used short names ("C:\PROGRA~1\THEVEN~1\File.dll") and the second long names ("C:\Program Files\The Vendor Name\File.dll). (In this case, since my test machine isn't the one with Visual Studio, the MFC/CRT files are in the same folder as the program.) This was enough to search on (this paragraph elides the days of debugging to figure most of this out) and I determined that Bad Things can happen if a COM server is registered using a short name (which old versions of ATL can do, apparently). Lo and behold, my app was so registered in its CLSID\LocalServer32 key. I changed it to a long name and the problem went away. (I'll also have to make sure it is registered programmatically that way.)

I'm still not sure about the true root cause of the problem (i.e., why the module appeared to be present and even have the right imports, but patching them corrupted a heap), but the short vs. long name loading was the key and I expect since that problem is known there will be information elsewhere with more details for those that want it. E.g., (related but not precisely the same issue)