Bots, bots bots. They’re everywhere apparently. They are becoming more complex and cause havoc to customer facing identity management systems, IoT devices and more. Fake accounts. Dummy accounts. Redundant accounts. Orphan accounts. Fraudulent accounts. Not to mention DDoS (Distributed Denial of Service) attacks. Mine field.
Well, there are certainly some basic steps that can be taken to help identify and prevent bot usage of the key identity management services many public facing API’s and applications expose. Firstly, let’s describe some of the main functional areas bots are likely to attack.
The Identity Attack Vector
Any public facing service or API will expose several identity related endpoints. If you think of the full identity life cycle, take the following as a basic list of expected services: account signup; progressive profiling; social signup; profile management; device registration/device pairing; forgotten password/username; sign in, MFA sign in, MFA enrolment and probably account deletion/RTBF if the service provider is being considerate.
It would seem likely, that the main areas of interest to a bot, would be account signup, account sign-in and potentially password reset.
Account Sign-up, then Clean-up
Without an account, you can’t access a service. So it seems likely this would be first entry point into the application a bot would look to test and use. A few noddy steps here could help. Firstly, leverage a CAPTCHA system. The lovely long acronym (standing for Completely Automated Public Turing Test to Tell Computers and Humans Apart) is a simple way of adding a little barrier to a flow to reduce automation. For the geeks, it’s actually a reverse Turing test, but we won’t go there just now. Google provide their reCAPTCHA integration pretty simply, but there are numerous others that are available. Certainly a CAPTCHA step would be early on in the signup flow. What else? Well, clearly some sort of input validation would be useful during signup. So, perhaps client side libraries to perform some sort of syntax checking for things like email address, username and so on. If exposing API’s, a simple server side validation engine would be needed here.
Another part of the general bot-hygiene process, is constantly reviewing and cleansing existing data. Can you internal processes identify accounts that haven’t been logged in to recently? Can you identity last time they changed their password? Can those accounts be disabled and made inactive?
So let’s skip for ward a little. At account sign-in time, there are several steps that should be considered. We know multi-factor-authentication is omnipresent and also perhaps coming to the end of its useful life – with more modern and flexible fine grained approaches to authentication being needed.
So we need to think about a few non-identity related things. Firstly can we track what devices the account credentials are being sent from? This allows for device printing – not only is this useful to help pair legitimate account credentials to the owners trusted device to prevent credential theft – but could also help with identifying if the same device has been used multiple times with different credentials. A classic sign of a bot running automation. Identifying a device from the service side is quite a coarse grained approach, but by analysing user-agents, IP addresses, JSessionId’s, browser characteristics, plugins etc, a basic picture of device uniqueness can be acquired.
Whilst many bots may work in pools, it is quite common for a bot to create an account in one region, then moments later another bot within the same pool but in a different location, attempts the sign in. To mitigate this sort of attack geo-velocity checking is another quick win. Geo-velocity analysis during sign-in allows a miles-per-hour (or KPH) threshold to be applied to the difference either between registration and current location, or last login location and current.
Throttling, Analytics & Machine Learning
Another big risk of bot infestation, is DDoS. So a basic stopper can be throttling. Applying limits to the number of times a device calls a particular endpoint seems a no-brainer. The throttling is generally tied to things like the servlet session Id or perhaps IP address. Whilst none of these things are insurmountable, they all add to the security in depth approach.
Clearly there are specialist 3rd party systems that can analyse bot traffic patterns, origins, behaviour, ASN, header data and so on. Integrating these systems into the sign-up and sign-in processes would certainly provide a provide protection step. It’s important to have many hook or integration points during the sign-up and sign-in flows, in order to analyse this non-identity related data. This “broad spectrum” approach, is now common place and whilst it doesn’t require all systems to be in one place, it does require the need for nice integration interfaces and better coupling and data flows.
Machine Learning seems to be the flavour of the month when it comes to cyber security in general. Whilst it seems relatively early days with respect to ML or AI best practices on this topic, being able to easily leverage as-a-service machine learning platforms such as AWS’s MXNet, that could easily be configured to consume activity and log data collected via the identity lifecycle, it would seem a nice weapon in the bot fighting arsenal.