Sometime ago I was asked to design an Azure Defined Datacentre.
The overriding design principal was that it’s footprint must resemble what one would typically find in a traditional ‘terrestrial’ deployment that has a large dependency on developing in-house line of business applications (LOB). That is to say that it had to consist of four independent environments, or tenants: development, testing, staging, and production.
With today’s emphasis on continuous integration and agile development this concept of segregating four identical environments can be cumbersome to the application lifecycle and deployment management and there are perhaps better models to follow, however the decision had been made.
Sometimes ours is not to reason why, ours is but to do and design. And so the stage had been set.
At the highest level the business requirements were:
- Extend the current terrestrial environment into Azure
- Create four logically separated environments as listed above
- Security and billing boundaries must exist between the environments for auditing and consumption control
- It must allow for resiliency and failover by stretching the solution across two geographic Azure regions
Keep in mind that all of this was prior to the general availability (GA) of some of the capabilities of Azure Resource Manager so I was forced to make certain decisions then that I probably wouldn’t make now. But such is the case when you find yourself living with the breakneck cadence of change that is inherent when working with the greatest public cloud platform on earth (MSFT you can find my billing details under my Xbox account to reward me for that shout out). Or any public cloud offering for that matter…my colleagues who are devoted to AWS and OpenStack I am sure face the same realities.
The subscription taxonomy was most notably affected. At the time I was unable to use tagging or implement Resource Groups in order to achieve requirement #3 above. The unavailability of UDR (User Defined Routing) also posed a challenge but I was able to get around that with the help of a well-known third party virtual appliance provi…..oh, hell, I’ll just say it: It was my good friends at Barracuda Networks that helped me get around this issue. We needed a way to ensure that traffic amongst subscriptions and their respective VNets stayed within the fabric and did not have to ‘bounce down and up’ an ExpressRoute circuit in order to be routed.
And so it began with this high concept scribble (on my Surface Pro 3, of course).
Not much to look at but it gave a 40,000 high level view and provided two things:
- I didn’t have to exercise my Visio OCD and spend hours getting boxes to line up perfectly (yeah, I know, there are only four, but those of you who would understand – understand).
- I quickly realised that even though these environments were to be logically separated that some services (name resolution, authentication and authorisation, certain file services, etc.) would need to be shared. After all it would be a nightmare to deploy four separate forests and their dependencies in order to support the solution.
So far so good. But how do we tether back to HQ?
Fortunately, as I was sweating over site-to-site VPNs, gateways, and the like, MSFT made an important announcement: ExpressRoute (#XR) now supported multiple subscriptions hanging off of the same dedicate circuit, albeit in a piggy back fashion.
This would allow us to carve up our 1GB connection into #XR ‘lanes’ for the purposes of bandwidth assignment. Whist it’s beyond the scope of this ramble to dive into #XR at length there are essentially two options for #XR: connecting it through a network service provider (NSP) or an exchange provider (amazingly I have no acronym for this one). Depending on your points of termination within your terrestrial datacentres you’ll have the option to go with one or the other or both. The primary differentiator is the bandwidth that will be available to you. We had to go through an NSP and so had a maximum of 1GB to work with.
We now had the subscription taxonomy, knew that we were going to share some core services, and use #XR to connect back into our two terrestrial datacentres. It was time to go back to the drawing board and further flesh things out.
For the next evolution I started to drill down into more specifics as I began to think about the wider eco-system and other ongoing efforts that were in motion and how they would fit into the picture: O365 adoption and SSO for third party services for example.
Besides being wonderfully colourful the scribble to the left revealed a number of things:
- Per the requirements it captured the four environments and their associated subscriptions: SubProd, SubStage, SubTest, and SubDev
- It introduced the concept of wrapping each subscription around its own virtual network (VNet), or Super VNets as they would later become known
- It depicted an area for core services mentioned above although these seem to be ‘hanging’ in the ether at the moment
- I also realised that we would need some form of DMZ (duh), not unlike the traditional terrestrial DMZs we all know and love, to house things like a web application proxy (WAP) or a *spoiler alert* web application firewall (WAF)
- Azure Active Directory (AAD) had to be included because I knew that O365 was coming down the pike and there were murmurs of Power BI
- Ancillary capabilities like Azure SSO & MFA were called out to light up third party web based applications like SalesForce, SAP, and the like.
This was only half of the picture, of course, as there was to be another mirrored deployment across the English Channel to fulfil the resiliency, high-availability, and business continuity requirements.
Now that we had all or most of the components represented I had to somehow string them all together in a fashion that provided the following:
- Efficiency in routing for both performance and cost control. I knew that any egress traffic out of the Azure datacentres would incur costs as well as any traffic flowing between regions. For this reason, it was imperative that intra-region, inter-subscription traffic be confined to the fabric and not require traversal of #XR to reach its intended target.
- Robust network analytics must be made available regardless of where traffic was being generated or consumed
- A security model that could be easily surfaced and ideally visually represented
- Management that could quickly be adopted by traditional network engineers
This posed, and still poses in some cases, manageability issues with regards to Azure’s data and control planes. While Network Security Groups (NSGs) and Access Control Lists (ACLs) can do most in the way of network segregation there is no easy way to fully configure them outside of the beloved PowerShell. Even then the management and operational handover of the architecture is a challenge because unless you have a network engineer who is keen to learn PowerShell (anyone?) or dive into User Defined Routing (UDR) Azure-style then it will be difficult to transfer ownership in such a way as to achieve Operational Acceptance (OA).
Analytics also pose a challenge. This may have changed by now but at the time there was no way to robustly inspect traffic to and from VNets and/or subscriptions in the way that one might wish to do so today with any number of physical devices. Never forget your network team.
Enter the virtual network appliance:
Widely used interface that is familiar to network engineers and architects? Check.
Easily mapped visual of the underlying architecture? Check.
Robust analytics for security auditing and anomaly detection? Check.
Able to route entirely within the fabric? Check.
And so we arrive at the final evolution. No longer a scribble but an actual Visio diagram.
If you are going through a fast paced sprint into the cloud and you don’t have a visual representation of what your cloud looks like, get one. I can not emphasise this enough. A picture truly is worth a thousand words…or in the case of an Excel spreadsheet – 10,000 rows.
So here are the characteristics of my blueprint and what will eventually be built:
- Azure Core Virtualised Network (ACVN) for Azure Region I which will more or less be duplicated in Azure Region II
- Four Azure subscriptions each ‘wrapped’ by a Super vNet representing development, test, staging, and production
- Each subscription and its corresponding vNet have dedicated subnets (Vsnet) which house virtual network appliances that will govern traffic to and from subscriptions, provide analytics, and provide a familiar interface for my network engineers
- vNetProd contains a dedicated vSnet for a DMZ that will at first house a WAP, or WAF, to facilitate ADFS and other services
- vNetProd also contains a dedicated Service Layer Vsnet which house our virtual network appliances. To start with these are firewalls but the service layer is capable of expanding to include other appliances such as WAN optimisation (Riverbed, for example) products and others.
- The cornerstone of each subscription is a network appliance which allows for intra-regional traffic to be contained within the fabric, and within its own subscription if so desired, without the need to traverse the #XR circuit (disregard any model numbers as they were used as placeholders)
- vNetProd contains a dedicated vSnet to house core, or foundation, services like domain controllers, ADFS servers, files services, or anything else that makes sense to share amongst subscriptions.
And there you have it! From scribbles to reality.
Clearly the behind the scenes story was much more involved: countless discussions, heated debates, loathed politics, the involvement of my friends at both Microsoft and Barracuda (as well as other big players in the network space), collaboration between myself and my colleagues at Network-Insight and MonoConsultancy, and of course the direction of the world class Azure Programme team in Redmond and around the world.
All in all it was a great journey and I look forward to the next one which is already underway….so stay tuned AND STAY TETHERED.