Two platitudes reign supreme in tech: “Change is the only constant” and “Hindsight is 20⁄20.” As our industry has evolved over the last few decades, things have moved from imperative to object oriented and into functional programming; from waterfall to agile processes; from centralized to distributed versioning control mechanisms; and from desktop GUIs to web and into mobile apps. Even the way we develop, push, build, test, deploy and monitor our precious code continues to evolve with a strong focus on cloud-based solutions and continuous delivery. As these new technologies have emerged, what’s interesting is how obvious they seem by the time they reach wide adoption — but only in retrospect. Until that point, there’s often uncertainty, speculation, and debate.
This uncertainty is what separates the early and late adopters — well, and those who adopt the wrong technologies (sorry, Betamaxers). Early adopters of new technologies will exploit their cutting-edge benefits much sooner — but at the cost of a lot of resources spent in perfecting them. On the other hand, late adopters can swiftly avoid the pitfalls thanks to those who have already paved the way for them — though they risk falling behind the competition in the meantime.
If you are a Google-sized player, you can easily absorb the costs of being an early adopter and enjoy its benefits. For smaller companies, these costs may be prohibitive or the risks involved simply too high to cope with.
So, if you’re a smaller company, how can you gain the benefits of being an early adopter while avoiding most of the risks? Well, at Medallia, we decided we needed to be a smart early adopter. What does being a smart early adopter mean? It means almost looking ahead into the future (gathering data) to figure out which tech solution will be obvious in the years ahead. And this is highlighted by our ongoing adoption of microservice architectures.
What are microservice architectures? They’re the successor of service-oriented architectures (SOAs), which have been slowly gaining traction over the last few years. Many of us may think of big enterprisey push from software giants such as IBM’s WebSphere, myriads of XML and WS-* protocols when we hear about SOAs. However, Microservice architectures are a more recent generation of SOAs with grassroots origins. These architectures embrace principles like clearly defined APIs, unified service discovery and continuous deployment mechanisms. None of these principles are particularly new, but microservice architectures combine them in such a natural and symbiotic way that these ideas recently gained new relevance.
The Medallia Engineering Team saw a big opportunity in this emerging technology, but in order to be agile early adopters with less risk, we spoke with engineering leaders inside of some of the world’s biggest and most successful companies to better understand microservice architectures, and here’s what we found:
Synchronous Inter-service Communication
One aspect that we covered during our conversations was synchronous inter-service communication. We discussed about HTTP/JSON versus binary-based solutions (such as Thrift or Protocol Buffers). While pretty much everyone is using HTTP for public APIs, there is a big diversity in terms of how internal communication is handled. Using Thrift/Protobuf internally has the advantage of rock-solid schema enforcing and backwards/forwards compatibility as well as better out of the box performance compared to HTTP. On the other hand, using HTTP both internally and externally has the advantage of uniformity with the implied benefits in developer agility.
Regardless of the choice of technology, many must-have components need to be manually added on top of either HTTP or binary protocols. Libraries such as hystrix or finagle emerged to deal with client robustness, Servo and Zipkin for monitoring, and the list goes on.
In an app-driven world, many of the microservices need to be somewhat available to the outside world. The simplest approach is to allow each external request (e.g., coming from an external-facing API) to go through directly to the service that it’s trying to reach. This implies that the logic for securing requests (e.g., authenticating the user or app key) must be replicated on every service.
Most of the companies we engaged in conversation with agreed that this is probably not the best way to go about it, and they found it much more effective to establish a clear demarcation layer. External requests get properly authenticated and routed to the appropriate service. Once secured, the external request is converted into an internal request that gets treated pretty much as if it had been originated from another internal service. This demarcation layer acts as a reverse proxy and allows us to have the logic for securing external requests in a single place.
A different problem is raised by the variety of request formats. External requests are most likely to be HTTP/JSON, which is the ubiquitous technology at that level. Internal requests, however, may be using a different technology such as Thrift. Will the demarcation layer be responsible for translating the HTTP/JSON request into a Thrift equivalent? Or will services need to understand both Thrift for internal requests and HTTP/JSON for external ones?
While pondering these alternatives we realised that it is unlikely that external apps will consume services in the same way that internal services do. While the uniform external/internal API seemed to be the most elegant way to go, treating internal and external requests the same way is hardly ever the right choice. Internal requests may handle a lot of ids or data that is not ready to be consumed by an external party. For example, let’s say that we have an Account service and a Roles service. An account has a role, stored as an internal integer ID for such role. So account ‘yolanda’ may have role ‘33’. When exposing this internally in an API it is perfectly fine to return ‘username: yolanda, role: 33′. However, if we were to expose this externally we may decide instead to replace role ‘33’ for its role name by querying the Role service in order to yield ‘username: yolanda, role: manager’ to an external app.
Performing these service compositions on our side, as opposed to leaving it up to the client apps, has quite interesting advantages:
- Efficiency. External request roundtrips are quite expensive so if an app has to fetch ‘username: yolanda, role:33′ only to go and make a second query to fetch the name of role ‘33’ that’s twice as many requests than performing the composition in our side, where network calls are orders of magnitude faster than a public network.
- Consistency. If we leave the composition up to each app then we are bound to have code duplication: each of the apps needs to perform the composition themselves. Even slight variations in how these apps handle the composition can yield inconsistencies in the user experience. For example, while most apps will request the name of role ‘33’, maybe a few of them will instead ask for its short-name so the result would be ‘mgr’ instead of ‘manager’.
- Security. In many cases we don’t want to expose a full object with all its complexity or internal fields to the outside world. If we have the same API for internal and external requests there is no way we can filter out the fields that we show on each case. On the other hand, by performing compositions on our side we can craft the objects that we return in such a way that their visibility is limited to what we decide to show to the external apps. For example, the Account service may also expose the e-mail internally but we may strip that out at the composition layer in order to protect our customers’ PII.
- Plurality. The composition layer is the natural place to translate a HTTP/JSON request into several requests performed internally via Thrift (or other transports). This layer should be absolutely stateless and there is a wide range of choices for programming languages that can be used here, node.js being a salient example.
When it comes to how to secure inter-service communication it seems to be an area where lots remains to be improved. During our exploration phase we discovered that while most companies secure external requests via API keys and mechanisms such as OAuth2, internal communication security is very thin and only applied where strictly required for compliance. In our case, we envision having third-party services running in our infrastructure so we must think about security for internal requests from the get go. However, traditional mechanisms such as HTTPS with client and server side authentication via certificates have an awful overhead.
Service discovery is an area where we found various technologies such as Eureka or ZooKeeper competing for adoptance. However, while there are differences as to how each of these technologies is to be used and maintained, the general concept of service discovery is pretty similar in either case. Clients need to contact services and a naming mechanism is used, optionally with either versioning or other elements such as tenant as parameters. A service can be running in many instances so load balancing comes into play. The most flexible approach that we found so far is to have the service discovery mechanism return a ‘service contacting policy’ that indicates the different URLs where the service can be found, their health status, their relative load-balancing weights and other elements such as overrides for steering specific user requests to specific instances (particularly useful for rolling out features to a small fixed set of users). The client needs to make sense of this policy and honor it, usually with the help of a client library, so that the policy honoring logic does not have to be re-implemented by each service.
For asynchronous communication we’ve learned of various approaches that range from using polling mechanisms all the way down to full-fledged async solutions based on Kafka. We believe that a good philosophy for async communication is to use it almost exclusively for notifications with minimal data. For instance, services notify other services when the data or resources they own changes. The Account service will notify other services if ‘yolanda’ changes her role. These notifications should only contain the primary key of the elements that have changed, so they can be really short and generally won’t carry sensitive information. If a service wants to learn more after having received an async invalidation message it can go and fetch more info about it via synchronous communication as described above. This allows us to focus on securing synchronous messages only and avoids duplicate effort in that regard.
Monitoring and Debugging
A few important areas where microservice architectures need careful consideration are monitoring and debugging. In a traditional standalone system monitoring is attached to a few vitals and KPIs, debugging is done after-the-fact and via inspecting the logs. On top of this, when dealing with hundreds of services, it is imperative to have uniform mechanisms to ensure monitoring is done in a consistent and pervasive manner.Also, logs need to be aggregated, ideally with the possibility to rapidly trace individual requests and understand their ‘communication tree’.
Most of the challenges are around dealing the scale: multiplicity of services with tons of requests that bounce back and forth between them. Uniforming the instrumentation mechanisms as well as allowing the monitoring of the network itself (and not just what happens inside each service) are key ingredients in the solutions we’ve discussed.
A key insight is to think of the log and its messages as a stream of real-time events, and not so much as just a text file that’s to be stored in disk.
Finally, a common theme among companies adopting microservice architectures is that they acknowledged devoting a big chunk of their engineering efforts towards building and maintaining these core services such as service discovery or inter-service communication. For big companies this is manageable, but it remains to be seen how well it can be replicated in smaller, more product-driven companies.
In short, microservice architectures are still far from being a turn-key solution. While success stories abound, there is a big price tag attached in terms of engineering effort and innumerable failed attempts. At Medallia, we’re at the early days of something bigger than ourselves. We are in this to learn from the early adopters — and in doing so, become a smarter early adopter ourselves.