In this article we will tell you why and how we switched from REST to GraphQL in combination with Apollo Federation and how our lives have improved ever since.
MOIA Operations — what’s that?
As you can imagine, MOIA is not only about booking trips and pooling passengers, but a big part of our daily work is also done in the background, i.e., getting vehicles and their drivers ready to be out on the street. We call this part of MOIA “Operations” and our mission in this part is to build a digital product around this whole lifecycle. This involves the management of hubs, employees, and our beloved battery-electric vehicles.
Our drivers are provided with an app, via which they can communicate with their managers, look at their shifts, but also check in at the hub and open the vehicle they will drive throughout the day.
At the same time, we try to provide our hub agents with the best possible overview of what is going on not only at their hubs, but also in the field.
To develop these products, several teams are involved. One team would focus on the employee management, another one on service areas, hubs and break locations and a third team would focus on the vehicles, for example how to open and close them, their state of charge and their location updates.
A classic challenge — distributed services and how they communicate
As we at MOIA are firm believers in decoupling and team autonomy, all these teams can decide themselves how they want to handle their entities, but they need to be able to interact with each other in a defined way.
Now if you think about the fleet map our hub agents like to look at, you will realize that we need to query & join data from all these teams via REST API: We need a list of vehicles for the service area we are looking at and then for all these vehicles we want to know who is driving them, what hub they started and what hub they will return to. Obviously, you could do sequential calls to different endpoints, all of which may have a different API and different means of documenting said API. However, this approach puts a lot of effort and responsibility on the UI and is somewhat brittle.
“A trap is only a trap if you don’t know about it. If you know about it, it’s a challenge.”
― China Miéville
Enter GraphQL. GraphQL is an alternative to the REST API approach that we were using. The big advantage of GraphQL is its schema-first approach. It standardises the way APIs are designed and documented. Also, the schema makes the API much more approachable. In our REST API solution API documentation was diverse. It ranged from OpenAPI, copy text, Postman collections to self-documenting code. So, a developer building a piece of UI had to not only bother with the different API endpoints of different services but also with a diverse array of documentation. With GraphQL this developer experience can be significantly improved.
But the main problem remains with GraphQL. Different services expose different GraphQL endpoints with different schemas. These schemas are not linked or connected in any way.
As explained, while GraphQL by itself comes with a multitude of useful features, they were not attractive enough for most of our teams to warrant a migration of our entire microservice landscape from REST. After all, we had many services that communicated with browser and mobile applications, so adapting all of them to use GraphQL would have required a significant effort.
Meet Apollo Federation, a graph of graphs
Apollo Federation changed this perspective. With Apollo Federation, clients communicate with a central gateway instead of individual downstream services proving a graph (also known as subgraph). Additionally, the schema of subgraphs is extended by Federation specific types, queries, and directives to facilitate communication between the gateway and subgraphs. This comes with several advantages:
- Instead of having to query downstream services specifically, only one endpoint (the gateway) needs to be queried. Clients do not need to know the endpoints of every microservice.
- The gateway provides a unified interface defined by a supergraph composed of all connected subgraphs. This way, clients can interface with every service within the composed graph in a single request.
All these details are transparent to clients. To them the gateway is a regular GraphQL service following the standard protocol, thus no changes are necessary on the client side to support Federation. If you are interested, further technical details of Apollo Federation can be found in the specification document.
The following example attempts to illustrate this further. At MOIA, employees can open and close vehicles via an app or a web interface. For daily operations it is therefore of interest to query vehicle interactions. We need to know who opened or closed a vehicle at which hub and at which point in time.
In a non-federated setup, the first step would be to query the fleet service for vehicle interactions of a specific vehicle. The fleet service then responds with a list of vehicle interactions. Additionally included are employee and hub IDs. The client can use these ids to query further information about the employees and the hubs. As you can see, the client would have to query three services and join this information manually.
With Federation things get simpler: the schema of our Fleet service denotes that the employee ID in the vehicle interaction response can be resolved to an employee object by an external service. In contrast, the schema of our Employee service contains a directive that announces its capability to resolve employee IDs to employee objects. The gateway, aware of all subgraph schemas, is then able to plan and execute queries across the subgraphs to automatically resolve federated objects for clients. This reduces complexity on the client side immensely.
Another nifty feature that Apollo Federation offers is type extension. It allows service A to extend types defined by service B. Imagine the Fleet service maintaining a list of vehicles and their current position. With type extension it can extend the Hub type by a list of vehicles currently parked there. Clients are then not only able to query hubs but also the vehicles parked at each hub. Behind the scenes the gateway will request all hubs from the Hubs service and subsequently collect the parked vehicles by communicating with the Fleet service. The result is joined and returned to the client in a transparent manner.
Our journey towards GraphQL and Apollo Federation was not frictionless. We ran into a few problem areas that we would also like to share here.
Whaddayaknow — naming is important
When developers define a schema in the context of a single service, they always need to be aware of the global context their API will be used in. A query called “findByStatus” might make sense in the context of a service managing vehicle data. But in the global context the name is too generic.
We learned that it is essential to think about naming conventions for types and queries upfront. Especially the first schemas should be designed and reviewed carefully, so they can provide a good example for the following additions. Upfront thought is needed here. Naming flaws are hard to fix once they are in production
Leaving the happy path
Another topic we struggled with in the beginning is error handling. When a query hits multiple services it can easily run into an error. In this case Apollo Federation will still try to resolve the remaining part of the requested data graph. A user querying for vehicle interactions might not have the permissions to query employees and thus the employee service refuses to resolve the data. The Gateway will still resolve the rest of the data and return it to the client. This can only happen when the schema allows for it. The employee field on the VehicleInteraction type needs to be optional. Otherwise, the complete query will fail due to a schema violation. Also, the frontend code needs to be forgiving in such situations. It should just skip rendering missing data instead of failing completely.
One endpoint to fail them all
From an architectural point of view Apollo Federation is a single point of failure similar to a load balancer or proxy. Our previous approach with separate independent APIs was more resilient in that regard. We accepted this downside consciously because we think that the benefits outlined above outweigh it. The Apollo Federation Gateway also proved to be robust and reliable.
The move towards GraphQL and Apollo Federation helped us to make our API more approachable and concise. With the single federated schema, we now have the possibility to connect and enrich the separate API of different backend services. This is especially beneficial for UI development.
This effect is further amplified by the good tooling support on the client side (e.g., generating TypeScript from the schema).
Overall, the introduction of GraphQL and Apollo Federation was a significant improvement to our development process which was well received among our developer community. Both the possibility to easily query data from various services but also to have implicit contracts help us every day. We also started to use GraphQL subscriptions, but that will be a topic for a whole new blog post.