Webservices, Soap , rest api: my notes
Building webservices can increase reusability , and higher level of abstraction, but can cause complexity and increase cost
Designing web services:
The first aproach was to build the app and then add web services functionality on the top of it, this apps are developed using an MVC framework and there is no specific web services layers just a separate controller to handle connection between the system and third parties, can be used with early stage products as it is fast to build, also not all apps need apis , it takes time to define abstraction for the webservice layer and extra time to debug it
This monolith approach will make it harder for you to make people work on it when the team grow as they have to understand alot in the system to make changes, we need to either use time to build abstraction between different systems or to build features.
Api First Approach:
We start by building the API contract first, allowing each team to develop their logic, UI, and system separately. This approach helps avoid issues when integrating with third-party systems, mobile apps, and web applications. If we were to build the system first, we might encounter repetitive logic across different clients.
By creating an abstraction layer (our web service API) we provide a unified interface for others to use. This also makes scaling easier, as all scaling happens behind the scenes while clients continue to interact with the API without any changes.
The api first approach is not that easy to implement we might over engineer, and we might not provide the right thing for our teams and customer as things change in the future and we do not know the future.
Can be good for large stable system , but not really to a fast growing early stage one
Pragmatic Approach:
Use both approach webservice and service oriented only go for apis when really sure about that other systems will use them.
While building something for users try to getout as much as possible and let users try, instead of keeping overengineering , with this you can understand users better and might create a different or a better product,
I go with a service oriented approach, if im building a system to support other existing system or a product that have alot of funding and a high chance to get a user base who pays i will take the api first approach
Types of webservices
Function Centeric:
The ability of machine to call a function from another machine send the data, the other machine process data , send a response back , they had to deal with network latency, locking, different ways of serializing and deserilizing the data
There was a lot of protocol to communicate between services like
- Common Object Request Broker Architecture (CORBA)
- Extensible Markup Language — Remote Procedure Call (XML-RPC)
- Distributed Component Object Model (DCOM)
- Simple Object Access Protocol (SOAP),
SOAP dominated as it was backend by big techs like IBM, ORACLE, SUN, BEA, MICROSOFT, one of the advantaged it allow us to define the contract and generate the code from it, using XML for serilization and http to transform data.
To use this, we first prepare our Web Service Definition Language (WSDL), which describes the available methods and endpoints. We also define the data schemas using XML Schema Definition (XSD). A tool can then be used to convert this contract into a library that encapsulates everything. The client application will interact only with this generated library, which is derived from the WSDL and XML schemas.
Soap features got extended into a spec called ws-* , with features like transactions, support of multi phases commits ,and other autneitication and encryption methods which helped alot in distributed systems, examples of ws-* are (ws-security, ws-federation), this rich feature comes at cost and made the integration between systems harder as each provider or system has his own support and version of ws-*
Developing systems or services using python, perl, php with soap was not practical as they did not had enough support and tools to do that , also there usage in client side was not going well as they run through integrations issues due to the same reasons, and this was the raise of Represententional State Transfer REST apis with JavaScript Object Notation (JSON) which allowed an easier and cheaper implementation and integration.
SOAP comes with a disadvantage of is that with http a proxy that can cache http requests based on the url data and header is not going to be able to cache soap requests as the data and methods is going to be sent inside the XML, which make it not good when it comes to scalability in this case
Another scalability challenge arises when we transition our services from stateless to stateful. Features like transactions and security require the server to remember the state of previous requests. For example, in a payment system, multiple steps are involved in verification, and the system needs to track the last completed step. This introduces complexity in distributed systems, making it harder to add new servers, distribute the load freely, and implement effective load balancing.
Resource Centeric Services:
Webservices that are based on functions take a set of arguments and produce data, while webservices that are based on resources can be treated as objects with a limited operations that can be done on them in a standarized way
REST is a resource centric service that uses url as a unique identifier to the resources, making it easier to use compared to SOAP where you need to deal with different ws-* standards for each service you want to integrate with.
One thing REST does not handle by itself is exactly-once delivery: you need to design your system to have idempotent operations where repeating the same operation will not affect the final state, add support for transactions, and implement a stateful mechanism to track the processed messages and avoid reprocessing them again.
Now, SOAP can handle exactly-once delivery when it’s set up with the right extensions, but it’s worth noting this isn’t automatic — you need to specifically configure it using something called WS-ReliableMessaging.
When it comes to scalability REST is better as it allow HTTP caching for GET requests using only the url , those requests that go through the proxies will get cached there instead of going to the servers which reduce the load on the main servers
Scalling REST webservices:
To scale we need to keep our webservices machines stateless, which means that we keep the logic on the server and keep data stores and state in a shared stores databases, caches, messages queue and it also give us a set of advantages:
- Distribute traffic freely among webservices.
- Ability to remove machines from the load balancing pool incase of failure to prevent users from gettinga timedout requests.
- Gracefull remove which is the ability to take a machine out of the servers pool , so new connections will not be sent to it, and you need to wait until the old connections is closed to shut down the server for maintenance or other things.
- Zero downtime update : where we can update a server by server until everything is updated
- Adding more clones and machines for a better computing
- using cloud services , will be able to add more machines and scale easly
Cache can be stored locally on different machines and services. However, ensuring data consistency across multiple services can reduce availability and introduce latency, making it challenging to maintain a truly stateless architecture.
Sometimes we really have to share some state across the web services machine , like the case of authentication, the only state that we can keep between requests is the token , a shared object cache will store these tokens with permissions and authorization, and the object get invalidated with each change.
Another challenge when building stateless web services is managing access to shared resources. Typically, this involves implementing locks to control how services access a specific resource, which requires storing the state of these locks in a dedicated data store. An alternative solution is to use optimistic concurrency control. In this approach, a process reads the resource’s state and waits while monitoring for any changes. If the state changes, the process re-reads the resource until updates cease, or it may abort the action altogether. Although this method reduces overhead and resource consumption, it can potentially lead to processes waiting indefinitely for a resource.
When using locks, it is important to follow a consistent locking strategy. For example, consider a banking system where two processes, P1 and P2, want to lock two accounts, A1 and A2. If the accounts are locked randomly — say, P1 locks A1 and P2 locks A2 — then if P1 needs access to A2 and P2 needs access to A1, both processes will wait for each other indefinitely, resulting in a deadlock. To prevent this, you could enforce a rule where processes always lock resources in a consistent order, such as locking the account with the lower balance first. Other processes that need to lock the same account would wait until the first process releases its lock.
Additionally, it is important to consider that fine-grained locks, while allowing higher concurrency, can overwhelm the data store with requests. In contrast, coarse-grained locks reduce concurrency. Balancing these approaches is essential, and trade-offs must be carefully evaluated.
One of the principles of building scalable systems is to make each machine or service as independent as possible, but using locks can work against this principle.
When a resource is locked, other processes must wait, potentially slowing down the entire system. Moreover, any failure in the process holding the lock can negatively impact the server. Therefore, while using locks may be acceptable for background tasks — such as workers, scheduled jobs, or cron jobs — it is best to avoid them in request-response interactions between services.
Another challenge when scaling a stateless service is distributed transactions which is similar to db transactions , but there is multiple services included a one service or operation failed inside that service and the whole transaction fail and a rollback happen, the 2 phase commit algorithm is the way to implement this, this is very hard to scale when multiple machines and services are involved
We can completely ignore the implementation of these transactions to get more simple development , higher availability, and scalability, an example of this is in social media not receiving an event of a like from a person will cause search engine to not be able to index that person like , but it is not a critical failure in this case.
Or we can choose a service to be a leader so when an operation fail , the service where the failure happened will notify the leader service to notify others to rollback , this is a part of compensating transaction where the system introduce a way to rollabck or undo the successful operations when a failure occur in a one service while processing a request without the need of locks.
Functional perticioning:
Splitting our service into smaller one that can run as independent as possible and also use smaller machine for each service instead of a one powerful machine, the service must contains functionality that are related together and group them
Decoupling the logic , responsibilities ,data base , and team
With this each service can be scalled differently based on how much data he will receive and storage, how much requests, we will be able to distribute data independently for each one and same thing for caching and indexing strategy
System grow and things like reusability, encapsulation , single responsibility get applied on a service level
A problem that we might face is when a new requirement is raising needing data and logic from different services
Scalable services start caching aggressively and push states outside of services as much as possible