Andre Henriques
10ca099809
All checks were successful
continuous-integration/drone/push Build is passing
153 lines
11 KiB
TeX
153 lines
11 KiB
TeX
\section{Service Analysis and Requirements} \label{sec:sanr}
|
|
Understanding the project that is being built is a critical step in the software deployment process.
|
|
This section will discuss what are the requirements that a service needs to implement for the project to be considered a success.
|
|
|
|
As a software as a service project, there are some required parts that the project needs to have:
|
|
\begin{itemize}
|
|
\item A way for the user to interact with the system
|
|
\item A way for programs to interact with the system
|
|
\item Management of images
|
|
\item Management of models
|
|
\item Management of compute resources
|
|
\end{itemize}
|
|
|
|
\subsection{Service Structure}
|
|
The service has to be structured so that users can interact with in two ways.
|
|
|
|
The first way is for the user to directly interface with the system using a user interface.
|
|
This interface does not have any strict form requirements, it could either be a desktop application, web application or even a command line application.
|
|
The main objective of this interface is for the user to quickly understand what the system is doing with their data, and if they can use the model they created to evaluate images.
|
|
|
|
The second way for the user to interface with the system needs to be an API.
|
|
This is required as it would not make sense for users to be able to quickly generate image classification models if they still had to evaluate all the images manually.
|
|
Therefore, there needs to be away for the user product to connect with the system, the API provides exactly that.
|
|
|
|
The system should also be structured in a way that allows easy scalability.
|
|
So that it can handle many requests at the same time.
|
|
The system should be able to scale, this could be achieved in many ways.
|
|
One way is by allowing the service to act as a cluster, where the same application is running multiple times and a load balancer, balances the load between the systems.
|
|
Another way is for the service to behave as a distributed system, where the services are split into smaller modules and those modules can be replicated.
|
|
Independently of how the system scales, it requires the ability to handle the fact that the data that the system uses not be available everywhere.
|
|
|
|
As a machine learning solution, the service requires the necessary computational power to handle the training of the models.
|
|
This means that the system needs to be structured in an away that it can decouple the training process from the main process.
|
|
Which guarantees that the compute requirements for training the model do not affect the main server running.
|
|
Ideally, the service should be able to divide the tasks from tasks that would require the GPU, and tasks that would require the CPU.
|
|
|
|
\subsection{Resources}
|
|
As a machine learning image classification service, the service has to manage various types of resources.
|
|
|
|
\subsubsection{Compute Resources}
|
|
As mentioned before, the service needs to be able to manage its compute resources.
|
|
This is required because, for example, if the system starts training a new model and that training uses all the GPU resources, it would impact the ability of the service to be able to evaluate images for other users.
|
|
As this example demonstrated, the system needs to keep track of the amount of GPU power available, so it can manage the actions it has to take accordingly.
|
|
Therefore, for optimal functionality, the service requires the management of various compute resources.
|
|
|
|
There should be a separation of the different kinds of compute power.
|
|
The two types of compute power are: CPU and GPU.
|
|
The CPU is needed to handle the multiple requests that the API might answer at the same time.
|
|
And the GPU resources are required to train models and evaluate the images.
|
|
|
|
As a result, the service needs a system to distribute these compute tasks.
|
|
The tasks have to be distributed between the application that is running the API and the various other places where that compute can happen.
|
|
|
|
An ideal system would distribute the tasks intelligently, to allow the maximisation of resources.
|
|
An example of this would be running image classification, on the same model, on the same place twice, this would allow the model to stay in memory and not need to be reloaded again from disk.
|
|
These kinds of optimisations would help the system to be more efficient and less wasteful.
|
|
|
|
Another way to reduce the load that the system goes through is to allow users to add their own compute power to the system.
|
|
That compute power would only use images and models that are owned by the user.
|
|
While allowing the compute power to run any image or model in the system would allow for an even more scalable system, it would be an incredible violation of privacy and security.
|
|
As it allows outsiders access to possible sensitive information.
|
|
Which makes the idea of a complete distributed network of user provided compute power not viable.
|
|
|
|
\subsubsection{Storage}
|
|
Another resource that it has to handle is storage.
|
|
As the service accepts user uploaded images, the service has to monitor how much storage those images take.
|
|
The service will need systems to handle when the user uploaded images take too much space.
|
|
There are many ways of handling this, such as allowing the user to store their images, compacting the images, deleting images that the system might no longer need, or allowing dynamic storage services such as Object Buckets.
|
|
|
|
If there is not enough space to store all the images from all the models, and the service needs to delete images.
|
|
There should be a system that removes the images in a manner that causes the less harm.
|
|
An example of this would be deleting images in a way that keeps the dataset balanced.
|
|
|
|
If there is a discrepancy of where compute and storage happen, the system needs to be able to handle that.
|
|
This can be accomplished in various methods.
|
|
The most aggressive one is not allowing to compute resources to access data that is far away.
|
|
The less aggressive and smarter way is to allow the system to move data to the optimal place.
|
|
|
|
\subsection{User interface}
|
|
A service such as this requires a way for the users to quickly get an understating of how their data is being used and how they can perform the actions they want.
|
|
|
|
As previously mentions, this application can take multiple forms, from web apps, to command line applications.
|
|
As long as the application is easy to use, and allows the user to perform the required tasks:
|
|
\begin{itemize}
|
|
\item{Configure model}
|
|
\item{Upload images}
|
|
\item{Manage images}
|
|
\item{Request model training}
|
|
\item{Request image evaluation}
|
|
\item{Configure access}
|
|
% \item{See API usage}
|
|
%TODO write more
|
|
\end{itemize}
|
|
|
|
The way that the application communicates with the service should be done, via the API.
|
|
If there was a requirement to physical access the computer that the service is running on, it would defeat the purpose of this project.
|
|
Therefore, being able to control the service via the API makes the most reasonable sense.
|
|
A second system could be developed that allows the application to control the service, but that would be terribly inefficient.
|
|
Allowing the application to control the system via the API, also improves the API, as the API now gets more features.
|
|
|
|
The application should also allow administrators of the service to control the resources that are available to the system, to see if there is any requirement to add more resources.
|
|
|
|
\subsection{API} \label{sec:anal-api}
|
|
As a SaaS platform, most of the requests made to the service would be made via the API, not the user interface.
|
|
This is the case because the users that would need this service would set up the model using the web interface and then do the image classifications requests via the API.
|
|
|
|
While there are no hard requirements for the user interface, that is not the case for the API.
|
|
The API must be implemented as an HTTPS REST API, this is because the most of the APIs that currently exist online are HTTPS REST APIs \cite{json-api-usage-stats}.
|
|
If the service wants to be easy to use, it needs to be implemented in away such that it has the lowest barrier to entry.
|
|
Making the type of the API a requirement would guarantee that the application would be the most compatible with other systems that already exist.
|
|
The API would also need to be able to do all the tasks that the application can do.
|
|
As it would allow a user who wants to interact with the service via the API the ability to do so.
|
|
The API also requires authentication because without authentication it would allow users who might have malicious intent to:
|
|
\begin{itemize}
|
|
\item{Modifying systems settings}
|
|
\item{Accessing other users' data}
|
|
\end{itemize}
|
|
|
|
Allowing such actions would be incredibly damaging for the system.
|
|
Therefore, the API must implement authentication methods to prevent those kinds of actions from happening.
|
|
|
|
\subsection{Data Management}
|
|
The service will store a large amount of user data.
|
|
This includes: user information, user images, user models.
|
|
|
|
\subsubsection*{User Information}
|
|
There are no hard requirements on how the user information needs to be stored, as long as it is done securely.
|
|
User information includes personal identifiable information such as username and email, and secret information such as passwords, and access tokens.
|
|
|
|
Future versions of the service could possible also store more sensitive information about the user, such, as payment information and addresses.
|
|
Such information is required if the user needs to be charged, but payment for the services provided is outside the scope of this project.
|
|
|
|
\subsubsection*{User Images}
|
|
|
|
Images are another kind of information that has to be stored.
|
|
As it was mentioned before, the system has to keep track of the images and the space they use.
|
|
The system should also guarantee that there is some level of security in accessing the images that were uploaded to the service.
|
|
|
|
\subsubsection*{Models}
|
|
The last kind of data that the service has to keep track of is model data.
|
|
Once the model is trained, it has to be saved on disk.
|
|
The service should implement a system that manages where the models are stored.
|
|
This is similar to the image situation, where the model should be as close as possible to the compute resource that is going to utilise it, even if this requires copying the model.
|
|
|
|
\subsection{Summary}
|
|
This section shows that there are requirements that need to be met for the system to work as indented. These requirements range from usability requirements, implementation details, to system-level resource management requirements.
|
|
The most important requirement is for the system to be easy to use by the user.
|
|
As if it is difficult to use, then the service already fails in one of its objectives.
|
|
The other requirements are significant as well, as without them, the quality of the service would be very degraded.
|
|
And even if the service was effortless to use, it is as bad as being difficult to use if it could not process the images quickly in a reasonable amount of time.
|
|
The next chapter will describe a design that matches a subset of the requirements.
|
|
\pagebreak
|