Although there have been numerous articles posted, I thought I would write about my recent presentation at the RSA Conference on the subject of touchlogging.
Since many people have asked, I got the term touchlogging from this paper. I do not know if it has been used before, but I decided it was a good name for my presentation.
The idea for the project came from a penetration testing engagement for which we compared financial malware on the Windows platform with (potential) malware on mobile platforms. The goal was to find the various components that allowed the malware to capture financial data and see whether it could be moved to the mobile platforms. It was quickly realized that the key component was the keylogging mechanism.
A lot of apps on mobile platforms already avoid using the built-in keyboard so a keylogger for mobile cannot simply hook the keyboard. Instead, all touch events need to be recorded. It should be noted that hooking the keyboard on jailbroken iOS is not very difficult. Well, I guess nothing is difficult when you know what to do, have the skills to do it and have root access to the device!
The difference for me between keylogging and touchlogging is that when you are touchlogging, you record the X and Y coordinates of where the touch occured on the screen. If you are keylogging, you get the actual key that was pressed. This means that when touchlogging there is an additional step required to figure out what was pressed since the coordinates themselves cannot give you that information. This can be done either by combining the coordinates with screenshots or using some other logic. A simple example is that if the device has not been used for some time, one can make the assumption that the first thing entered is the pin code to unlock the device.
There is a great benefit on mobile platforms to perform touchlogging instead of keylogging. It is not possible to bypass touchlogging by using custom keyboards or using gestures. The user must input the information through the touchscreen which means the touchlogger will capture the information (I know there are other ways, but the vast majority of people input all information through the touchscreen).
So, how do we actually accomplish this on iOS? How do we get the X and Y coordinate of the touch event?
For jailbroken iOS, Method Swizzling is a good way. A semi-accurate explanation of method swizzling is that it is like a man-in-the-middle attack but for methods rather then network traffic. Swizzling just three methods from the UIResponder class provides the X and Y coordinates for most touch events on the device. These can either be logged or sent to a remote server. They can of course also be combined with screenshots. I actually wrote a server that takes screenshots on one port, collects coordinates on another and then overlays the screenshots with the coordinates.
I actually never really tried doing the same on non-jailbroken devices, thinking it would be close to impossible to get it through appstore review. However, FireEye proved me wrong.
In the end, I think there are more ways of recording touch events than what I have shown here. FireEye of course found a way that I had missed, and I would not be surprised if new ways are found in the future. It should be noted that the purpose of my work was to show this attack vector, so that people/companies with high security requirements are aware of it. I did not try to weaponize the attack or make it stealthy.
If you are an app developer and want to defend against this type of attack, it is possible to detect swizzled methods. I believe this is something all apps with high security requirements should do. Detecting swizzled methods will not only protect against this touchlogging attack but also other attacks which are possible through method swizzling.