My First Malware Analysis: A Nymaim Sample (Part 2 - The Launcher)

Stage 2 - The Launcher

2.1 Reconnaissance

Next, let's continue with the analysis of the downloaded binary. I first examined it using some basic Unix tools:
  • file: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
  • md5sum: c967718700a2e6a801531215bf06c96c
I also examined the sample using some of the best packer-detection tools available:
  • PEiD: Found no evidence of packing.

  • Exeinfo PE: Identified the binary as unpacked and suggested that the binary may be compiled from Microsoft Visual C++ code.


To get a better idea of what the binary does, I then submitted the sample to VirusTotal. I also ran it on my own Cuckoo server but that did not yield anything useful since I did not realise then that the malware was employing various anti-analysis techniques to evade analysis in my own Cuckoo server.
From the VirusTotal report, we can see that the binary contains a number of strangely-named sections with high entropy.



The import table of the binary also seems to be pretty lean, giving as very little information about its behaviour at this point of time.

However, one thing that caught my eye was this PCAP that was recorded from the malware's interaction with its Command and Control (C&C) server:


We have already seen the GET request that led to the download of this binary. Therefore, it seems like the binary will then make POST requests to another domain, perhaps to exfiltrate data or simply to check-in to its C&C server. With that much being said, let's now perform our analysis on the binary itself.

2.2 Static Analysis

In analysing the launcher, I used both IDAPro as well as the ida-x86emu plugin. The x86 emulator allowed me to step through the code easily just like a debugger, except it traps and emulates API calls, allowing me to control the return values of the calls while minimising the risk of debugging a malware.

Diving into the assembly code, we see that the binary first tries to trick me into analysing several arithmetic routines which I found to be useless later. 


This is the decompiled C code:


Essentially, this function does factorial on the value in the ECX register and returns the result in the EAX register. After executing factorial on the ECX register, the binary then does another round of arithmetic calculation on ECX, but this routine is much more complicated:

Leading from the ECX factorial to another arithmetic function

This time round, even the decompiled C code does not look as straightforward.


However, after visualising the routine, I was able to simplify it into the following C code instead:


Despite that, I found out later that the result of this function is never used and the malware author successfully drained some of my time analysing it.

After performing the two useless arithmetic routines, the launcher then proceeds to decrypt several strings and uses the loaded module list in the Process Environment Block (PEB) to manually locate the address of LoadLibraryA in kernel32.dll. Subsequently, the binary calls LoadLibraryA("kernel32.dll") to get a handle on the module before using its custom function to locate the addresses of the CreateThread and Sleep functions. This is seen here:


With the address to the CreateThread function, the launcher then proceeds to create 2 threads which both does one of the arithmetic routines described earlier.


The launcher then continues to perform more arithmetic operations before sleeping for 5 seconds.


Finally, and this is where the bulk of the work is, the launcher decrypts more strings and locates the addresses of more functions. The decrypted strings include: MessageBoxA, CreateFileA, CloseHandle, ReadFile, GlobalAlloc, HeapFree, GetProcessHeap, GetFileSize, GetModuleFileNameA, MultiByteToWideChar, GetProcAddress, user32.dll, GetACP, CreateThread, Sleep, ntdll.dll, NtUnmapViewOfSection, VirtualFree, CreateProcessA, VirtualAlloc, GetThreadContext, ReadProcessMemory, WriteProcessMemory, VirtualAllocEx, ResumeThread, RtlFillMemory, SetThreadContext and TerminateProcess.

From the decrypted strings, we can guess that the launcher might be loading two more modules: ntdll.dll and user32.dll later on.

After decrypting the strings, the launcher then manually locates the addresses to the function names from the list above. It then calls GetModuleFileName(0, path_ptr, 0x104) to obtain the path to itself. Using this path, the process then calls CreateFileA(path_ptr, GENERIC_READ, 0, 0, OPEN_EXISTING, 0, 0) to get a handle on the binary file. Note that since the share mode parameter is 0, other processes are effectively prevented from opening the same file if they request delete, read, or write access. This also means that if the file is already opened with a conflicting share mode, the CreateFileA operation would fail. This would happen if the file is already opened by a debugger.


Then, it calls GetFileSize so that it can allocate enough space in the following GlobalAlloc call.


It then calls ReadFile to load the binary file into the allocated buffer followed by a CloseHandle call to close the file handle. The launcher then checks if the DOS header signature of the read file is 0x5A4D and if the signature of the PE header is 0x50450000 before returning.

With the binary file copied into memory, it then splits the binary data using the string "*#/*&" as the delimiter and discards the first and last chunk. In IDAPro, the assembly blocks look like this:



In Python code, this might look something like:


f = open('launcher.exe', 'rb')
f.read().split("*#/*&")[1:-1:]

In total, 9 new chunks of byte code are created and each of these chunks are allocated memory from the heap using the GlobalAlloc function. The 5th chunk is then used as a key to decrypt the 2nd, 8th and 9th chunks using another custom decryption routine.

Lastly, the launcher again obtains the path of itself via a call to GetModuleFileName, checks if the 2nd chunk contains the correct DOS and PE header signatures, and calls CreateProcessA on its own path in a suspended state to create a suspended process from the same launcher binary.


Next, VirtualAlloc is called to allocate a page from the memory which is then used to store the ThreadContext of the new process' thread obtained using the GetThreadContext call.


Since the ThreadContext contains the EBX value of the new process, which now points to the new process' PEB, the launcher uses [EBX+8] to obtain the base address of the new suspended process. 4 bytes are then read from the base address of this new process using ReadProcessMemory and compared with the 4 bytes read from the original image base.


The .text section of the new process is then unmapped using NtUnmapViewOfSection and VirtualAllocEx is called to allocate memory for the image found in the decrypted 2nd chunk.


Then, the parent process calls WriteProcessMemory to copy the PE headers from the start of the 2nd decrypted chunk into the newly allocated memory space of the child process.



The process is then repeated for each of the sections until all sections are written to the child process. Once this is done, the parent process then changes the base address of the child processing by writing 4 bytes to EBX+8 of the child process.


The address of the new entry point is then calculated and replaced in the original ThreadContext of the child process using SetThreadContext. Finally, the child process is resumed using ResumeThread and VirtualFree is called to free the memory that was used to store the 2nd decrypted chunk in the parent process.


To end it off, the parent process finishes executing and leaves the child process running.

2.3 Behaviour of the Launcher in a Nutshell

  1. The launcher decrypted strings such as LoadLibraryA, kernel32.dll, CreateThread and Sleep
  2. It locates and stores the addresses to the LoadLibraryA, CreateThread and Sleep APIs
  3. It creates 2 threads which quickly die after finishing their arithmetic operations
  4. The main thread sleeps for 5 seconds
  5. It calls GetModuleFileName to obtain the file path to itself
  6. It calls CreateFileA to obtain a handle to the file
  7. It calls GetFileSize before using this size in a GlobalAlloc call
  8. It uses ReadFile to load the binary into memory before calling CloseHandle on the file handle
  9. It validates the loaded DOS and PE header signatures
  10. It splits the binary file using the string "*#/*&" as the delimiter, producing 9 new chunks in between these delimiters
  11. Each of the new chunks are copied into separate memory spaces using GlobalAlloc
  12. The 2nd, 8th and 9th chunks are then decrypted using the 5th chunk as the key
  13. The process calls GetModuleFileName to again obtain the path to itself
  14. It checks if the 2nd decrypted chunk contains the correct DOS and PE header signatures
  15. It calls CreateProcessA on its own path to create a suspended process
  16. It calls VirtualAlloc to allocate memory to store the ThreadContext of the child process obtained using the GetThreadContext call
  17. Calls ReadProcessMemory to validate the first 4 bytes of the child process from the base address
  18. Unmaps the .text section of the child process using NtUnmapViewOfSection and calls VirtualAllocEx to allocate enough memory in the child process to store the 2nd decrypted chunk
  19. Copies the header, followed by all the sections in the 2nd decrypted chunk into the allocated memory in the child using WriteProcessMemory
  20. Amends the base address of the child process using WriteProcessMemory
  21. The entry point of the child process is changed using SetThreadContext
  22. The child process is resumed using ResumeThread
  23. The parent calls VirtualFree to free the memory occupied by the 2nd decrypted chunk
  24. The parent finishes execution

2.4 Anti-Analysis Techniques

In the launcher, some anti-analysis techniques discovered include:
  • Executing useless but complicated arithmetic functions
  • Encrypted strings (containing function and library names)
  • Hiding module and function dependencies from the import table
  • Calling CreateFileA on itself with restrictive share mode to prevent execution if the file is being debugged
Overcoming the last hurdle was pretty straightforward though, I merely created a copy of the binary file and changed the paths used in the CreateFileA calls to point to this copied binary instead of the original one.

Having wrapped up the analysis of the launcher, we will then proceed to analyse the third-stage payload in the next post which might be more exciting. Stay tuned!

Comments