Thursday, 21 May 2015

Thoughts on Security Authentication and on adding security into an SDL

Here is an (slightly edited) 'brain dump' I just wrote on the topic of Authorisation and SDL.

Let me know what are your views on the ideas presented below:


The need for a strong Auth strategy

Knowing 'who is talking to whom' is a key pillar of security. Since there is going to be a number of parties and players involved, it will not be possible to have a one-size-fits-all Authentication technology/workflow (specially when dealing with the partner's systems and existing SSO technology).

The Auth solution should be based on the following properties:
  • All parties should be able to login using:
    • Username/email and password (Forms Auth)
    • OAuth from trusted OAuth provider (Google, Twitter, Facebook, GitHub)
    • Custom OAuth based on the user's provider
    • LDAP (which used to mean direct access to an AD/LDAP server, but these days could be provided via Azure or a federated service)
    • Other Token based SSO auth system 
      • based on existing system
      • WIF (Windows Identity Foundation)
      • SAML Tokens
      • SWT (Simple Web Tokens)
      • JWT (JSON Web Tokens)
      • X.509 Client/Server Certs
    • Mobile based Authenticator (like Google Authentication app)
    • Custom Auth API and Tokens (for server-to-server communication and high-value users)
  • In order to have a high customer experience SSO, should be used as transparently as possible (which introduces some security challenges)
  • To increase security, the previous methods should have the option to be completed with 2FA (either via SMS or info provided by hardware tokens)
    • 2FA can be used to reduce risk where '2FA checks' can be made only before certain operations (like asking for a loan)
  • All data-flows and transactions need to be strongly authenticated and its 'risk profile' should be mapped according to the authentication method used. Here are some of the flows that will need to be protected:
    • User vs Website 
    • Partner admin user vs Website 
    • Partner API vs API 
    • Public Web/API tier vs internal REST APIs (and DBs)
  • The key questions to ask in the 8 flows above (2 per flow) are:
    • 'How does A know that it is talking to B': For example when 'user A uses the API to connect to the API', how does user A code knows it is talking to the apps servers and how do they know that they are talking with user A's code?
    • 'How far does that identify flow': for example does the data layer have access to who actually made the request, or just knows that 'record XYZ should be updated with data ABC'. The rule of thumb is that the entity making a security decision should have access to: 'Who is making the request?', ' Who originated the request?' and 'what is being requested?' (only then a good security decision can be made)
    •  'Does the server-side-code executing the user request have access to more data than what should be exposed the current user' - For example if the user should only have access to accounts A,B and C, if the server executing the user's request tries to access account D's data (via a vulnerability or back-door), will it work? (i.e. does that code at that moment in time have access to Account D)
  • Provide 'reference implementations' for the all supported authentication mechanisms
    • From technical/business point of view, don't position the support of all these different auth techniques as a 'burden' but has a 'key business requirement' which not only will allow easy customer onboarding, it will make the business more resilient to competition.
    • By providing reference implementations of 'how to talk with the web services', not only we are making life easier for the partners, but we will be creating a much strongly connection with the client, while having some control over how the implementation is happening on the user's side (of course that more sophisticated and technologic savvy customers will chose to use the APIs directly, but for most, the reference implementation will make a massive difference)
  • For the above to be possible, an SDL and CI environment must be in place that is able to ship code every week (if not every day).
    • This will allow for very quick turn around based on customer requirements and requests
    • Also note that in the real world, there are all sorts of 'problems' which can be easily addressed if one is able to quickly turn around fully working (and tested) code
    • Note that this is not defending 'push to production in a cowboy style'. See below the notes of the SDL workflow that needs to have in order to make this happen in a smooth, controlled and high quality way

The need for 'one stack per customer' deployment model 

One of the areas that is going to have a massive impact on the security requirements of the code under development is the question of Single vs Multi Tenant architecture.

Basically will each customers exist on its own VMs (cloud or locally) and share NO infrastructure between them?, or will all customers share the same Infrastructure?

Although on PaaS solutions it is very tempting to reuse the infrastructure (and code) for all/most customers, this introduces a massive security requirement on every line of code written (since one mistake would allow user from Company A to see data from Company B)

Due to the nature of the development stack, it is possible to adopt an Micro-Services architecture and create a technological stack that can be assembled on-demand (just like Lego)

This would dramatically simplify the code, where the focus would be on ensuring that User A could not have access to User's B data (which could be achieved using impersonation and use of internal auth providers)

Other areas where this micro-services would have a massive impact would be on testing and client-specific customisations. Being able to quickly create disposable test environments is one of the key capabilities for a strong CI environment 

The need to support onsite and windows/linux deployments

One way to 'force' the micro-services model (described above) is to put as an 'key development' requirement that the code developed should be able to run on both Window/Linux and onsite at the client's own datacenter (for 'specific data sensitive customers'). 

Not only this will increase competitiveness, it will create an development environment with few (if any) 'hard-coded dependencies'  

The need for Data encryption (specially at rest)

Due to the sensitive nature of the data harvested, great care must be placed into how the data is stored. 

The best model is one where all data that is transferred and stored is either encrypted or anonymised, so that in case of disclosure there is little (or no) business impact.

This is one of those areas that, if done correctly, it is invisible to most developers most of the time (since it should be done at 'platform level' and not on day-to-day code)

The need for reducing the amount of data known 

Another key protection element is to reduce the amount of data collected from the users and partners. 

There is always a big temptation from business to "collect as much data as we can, since even if we don't need it today, we might in the future"

The problem is that the more data is captured, the more data will need to be protected (encrypted or tokenised). Ideally, data would be consumed 'on demand' with the business only having access to a limited data set. In fact, the best design is one where 95% of the code/micro-services doesn't even know:
 a) who the user is?
 b) what data is being served?
 c) who are we talking to?

Map attack surface and create Security-focused Tests

Even on an development environment with a good test culture, there is usually a focus on the 'Happy path', where what is tested is the behaviour that normal users (or API consumers) would perform (not what is possible).

Unfortunately from a security point of view, we must also think about the abuse cases (i.e. what else will the application do based on user input/data).

For example it is critical that the application's attack surface is mapped (programatically) and feed into tests which will confirm that the expected security properties are still in place. Basically we need assurance that only the authorised users will be able to access the authorised resources.

This 'Attack Surface mapping' is basically a matrix that maps:

 - All urls
 - All values that can submitted via those urls
 - All types of user roles that exist
 - special scenario when multiple user's with multiple levels of auth, need to access the same resource (i.e. url)

TDD and High-Code-Coverage (90% to 95%) are not 'optional' 

In order to achieve strong assurance (and quality) it is critical that code-coverage (of app and tests) is not viewed as a 'chore' or 'pointless metric'

There are numerous ways to create environments which promote highly productive TDD workflows (with high code coverage), and tools like NCrunch really make a massive difference.

For this to happen it is critical that senior devs, engineers and architects are given time (and focus) to spend on test architecture and APIs.

A really powerful property that happens in these environments is that it is possible to 'review code by fixing unit tests' and to 'detect security vulnerabilities via security-focused test breakage' (for example tests can be written that break every time the application's attack surface is changed) 

Strong CI and Ship-to-production button 

With a high code-coverage and strong TDD it is possible to have the 'ship-to-production' button (i.e. commit to branch) which dramatically increases the developer's productivity and allows for quick turn-around on ideas, customers requests, bugs and security fixes.

Note that 'ship-to-production' doesn't mean that it is live/visible for customers to see it. It means that the code is deployed into 'production-like servers' after a number of QA tests being successfully executed (i.e. the code changes did not break anything). See multiple presentations by GitHub or Etsy on this topic

Security reviews should happen after Scrum cycles (or Kanban completed cycles)

Since the sooner bugs and issues are detected, the easier and cheeper it will be to fix them, the best model is to introduce security reviews after the features are completed (and before they hit production).

In an Scrum or Kanban development environment, there will be new code releases at regular intervals, which will represent 'by definition', smaller code changes and easier to review

Note that once this workflow is established, it is a bad sign if security issues are discovered at this stage, since they should had been picked up much earlier by threat models (see below)

Threat models should be done for new features (before they are implemented)

In order to prevent the wack-a-mole cycle (where vulnerabilities are discovered after they are coded) threat models and security reviews should be performed before features are developed.

This usually also helps developers get much better briefs, since the threat model will ask a number of questions about 'how that new XYZ feature will actually work, what resources it will use and how it will affect the attack surface'

Use JIRA as a way to capture security knowledge and make security decisions

Create a separate project for 'application security' focused JIRA tickets, which will be used to track existing issues and to 'force' business owners into 'accepting risk' (for the features they own)

Use of Static and Dynamic Analysis tools to scale up security knowledge

There are a number of commercial tools which are able to scan the code for security vulnerabilities, analyse the application on run-time or try to exploit an live application.

With the caveat that most of these tools really suck before they are customised (and if the tools are able to find anything out of the box, there is really a development/security problem), once they are customised, these tools are able to enforce an application's security requirements, and are a critical piece of the 'how to scale application security knowledge' puzzle.

Provide Developer Security Guidance (and technology/app specific training)

Developers should have easy access to focused security guidance (with code samples specific to what they are working on), and security-training courses are a great way to raise everybody's understanding of the security implications of insecure code

Appoint Security Champions per team

Each team needs to have a security champion which will support the security practices described above .

Have an Application Security team

Finally (but not less important) is the fact that a key driver for all the issues mapped above is the Application Security Team.

This team (or individual) is the one that will have the mandate, power and accountability for making this happen.

See references to SSG (Software Security Groups) in the BSSIM (Building Security In Maturity Model)

Post a Comment