We will soon enable our customers with a way to deploy connectivity through the Portal (the web interface we offer to our customers) and programmatically through our APIs by leveraging the capabilities of our unique infrastructure and that of the Cloud Service Providers’ APIs. Our vision is straightforward: we want to provide networking teams with the same agility offered by the cloud technologies to the software development teams. We also want to address this in a secure, segmented way with guaranteed bandwidth and premium network KPIs backed with SLAs. As a result, you could streamline networking requirements into your CI/CD pipelines. That’s for the marketing speech. What does it imply for InterCloud at the technical level? Let’s make a quick and broad overview.
Our network infrastructure
It all starts with our network infrastructure. Over the years, we have deployed a global private and secure MPLS network, with IS-IS as our IGP and BGP for inter-domain routing with the BGP-LS extension NLRI to be able to share traffic engineering and link-state information outside of the IGP domains. Our clients can connect their infrastructures to our network through cross-connections at our points of presence, and we can even take care of the local loop for them if needed.
Diagram of a client connected through InterCloud to AWS and Azure
We connect to all the major Cloud Service Providers (AWS, Azure, Google, Alibaba, IBM Cloud…) at their onramp locations, through private connectivity with cross-connections on optical fiber or ethernet. Every provider calls its private connectivity service differently. To name a few, for AWS, it is Direct Connect, for Microsoft Azure, it is ExpressRoute, and for Google Cloud Platform, it is Cloud Interconnect.
Besides the marketing names, the technical reality and operational processes to establish a connection are different. However, the industry converges to BGP routing through one or multiple 802.1q circuits. For some CSPs, it may mean creating one VLAN and one BGP session, but for others, like Azure, it may mean two connections, a primary and a secondary, each with its BGP session.
We have decided to normalize and abstract this complexity through our Connector model. For a client, a connection to a CSP or his enterprise network goes through a Connector, and the client can decide to link different Connectors together, in a secure segmented way. Of course, we provide our clients with monitoring on every Link for the network KPIs that we back with SLAs: packet loss, latency, and jitter. We also monitor our Connectors for bitrates and our edges for the BGP sessions state. This abstraction allows us to use the Connectors as building blocks that our Portal and APIs can leverage. But more on that later, let’s talk a bit about the CSP side.
The Cloud Service Providers side
So, let’s imagine we have an EC2 instance in a VPC (Virtual Private Cloud) in AWS Paris and a VM in a VNet (Virtual Network) in Azure London, each with their own private (RFC1918) address space. We want them to be able to talk to each other without ever using the Internet. We can achieve this by connecting the VPC to Direct Connect on the AWS side and the VNet to ExpressRoute on the Azure side and set-up InterCloud in-between, with two Connectors and a Link. It sounds simple, but how does it work in practice? Bear with me.
AWS Direct Connect diagram (from User Guide)
AWS allows for three connection models: dedicated connection, hosted connection, and in the process of being deprecated, the legacy hosted virtual interface (VIF) model.
To keep things simple, we’ll work here with the legacy hosted VIF model. In this model, an AWS partner, such as InterCloud, connected to AWS through Direct Connect at a Direct Connect location, must set up a connection on AWS (AWS calls it dxcon) with a specific port speed. It then creates a VIF (dxvif) with a unique VLAN tag and other required networking information (the private RFC6996 ASN for BGP, the address family, the peering addresses chosen from a /31 RFC3927 subnet…) and configures its own networking devices accordingly. The partner finally offers the hosted VIF to the client’s AWS account.
Once the client accepts the VIF, it is activated, and BGP propagates the routes to the client’s VPC to the partner’s router. As a side note, in the hosted connection model, the process is not the same: the partner sets up its networking devices and offers a dxcon to the client’s AWS account. The client must accept it and then must create the dxvif on this hosted dxcon with the information provided by the partner.
Now, as we saw earlier, Azure has its private connection service called ExpressRoute, and things are a bit different in the process of establishing the connection but also at the technical level. Indeed, Azure enforces redundancy at the peering location with two connections.
Here, the client has to create an ExpressRoute Circuit, the bandwidth that he wishes to allocate to this connection, the partner with whom he wants to connect, and the peering location. When the client creates an ExpressRoute Circuit, Azure grants him a Service Key that he has to give to the partner to finish the configuration. Once the partner has the Service Key, he finishes the provisioning by configuring the Microsoft Azure side with the appropriate networking information. The quirk here is that there are two distinct peering addresses, chosen from an RFC3927 /30 subnet.
To accept hosted VIFs, or hosted connections and create a VIF, in his AWS environment, or to create an ExpressRoute circuit, in his Azure environment, the client can decide to do this manually or to delegate InterCloud this responsibility by providing us with the appropriate credentials.
At this point, the astute and impish reader understands that every cloud provider has its custom interfaces. Well, it is a jungle out there. It is an exciting one, but a jungle nonetheless. We can automate some CSPs through APIs, such as Azure and AWS. However, some CSPs require a request and ticketing process. To make automated deployment possible and handle the situations where the CSP only allows for a semi-automated deployment, we had to factorize all these different scenarios and their quirks in our state machine — but that’s what we do: we simplify and streamline. Let’s have a look at our software stack.
What we provide: Portal and APIs
So we have a Portal (not related to whether the cake is a lie) allowing our customers to retrieve information, such as networking KPIs, and to manage their resources. As we use an API-first approach, whatever the Portal does, it does through our REST APIs. In other words, we are ourselves clients of our APIs. Our customers can use the Portal, the APIs, or both, depending on their automation needs.
We built our software architecture around containerized microservices orchestrated with Docker Swarm. The Portal itself uses React, and we develop our microservices in Golang that we expose through an API Gateway. We have chosen the Kong API Gateway, which offers a rich list of plugins. It allows us to centralize and orchestrate functionalities such as authentication and logging for all our endpoints. As we’re keen on using industry standards, we are using the OpenAPI v2 (Swagger) specification to document our API endpoints.
Simplified diagram of InterCloud’s software stack
As for the deployment, we deploy our Portal and APIs infrastructure on AWS with Terraform, and we ensure their redundancy by using multiple AWS Availability Zones. We like Terraform, and in fact, we love what the guys at HashiCorp are doing, so we also use Consul for service discovery and configuration. We use some services provided by AWS, such as CloudWatch, for the monitoring of deployed services, AmazonMQ to mediate communication between the API and backend workers, RDS, well, to store configuration and users, and Route 53 for fast domain name resolving with high-availability in mind.
Here’s for a quick snapshot of our ever-evolving software stack. The bottom line is: we use the tools we need when we need them, and we are not afraid to change them for better ones if required. We always challenge our past choices and assumptions, and we move fast, in an agile way with short sprints to deploy features as soon as they are ready.
An epic user story
Now, we’ve got the building blocks, let’s have fun and plug everything together. A client of InterCloud has bought a Connector and wants to deploy it using the Portal. He logs on the Portal and asks for the deployment of this Connector. The client then chooses where to deploy the Connector. For instance, if it is an AWS Europe Connector, he may deploy it on the on-ramps relevant for this particular Cloud Service Provider. A service delivery process starts, asking the client to provide InterCloud with the required technical information. When the user provides InterCloud with the information, the APIs receive them and validate their correctness. The service delivery process then continues, and our backend generates a configuration that selects an appropriate edge, takes into account the state of our routers and the previously deployed configurations. If everything looks fine, the backend then puts messages in the service queue destined to workers to configure networking on both our routers and the AWS side. The workers pull their jobs and deploy the configuration. A different process then validates the configuration generated (better safe than sorry) and commits it. The client goes through the same process for an Azure Connector and then sets up a Link between the two Connectors. And, here it is! Automated provisioning of connectivity. The magic of networking does the rest.
Sample client-side Python API script
We learned a lot during the process of creating an automated provisioning feature for our clients. In fact, we are learning every day, and we are always looking for areas to improve and provide the best service possible for our customers. This article paints a broad picture, but every single technical brush stroke here is the result of a lot of trials, errors, and wisdom for our technical teams that would require multiple articles. But that is a story for our awesome technical teams to tell in future articles.
Oh, and by the way, if you want to work with us, we’re hiring! 💪