chore: more work on the report
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
Andre Henriques 2024-03-11 23:56:43 +00:00
parent db08e53a8f
commit a02acd9b4f
2 changed files with 117 additions and 105 deletions

View File

@ -1,5 +1,7 @@
User.shape: Person
Server.shape: Cloud
Server: "User Server" {
shape: Cloud
}
Proxy.shape: Hexagon
@ -16,14 +18,16 @@ database: "Database" {
shape: cylinder
}
User->Proxy
Server->Proxy
User<->Proxy
Server<->Proxy
Proxy->Api
Proxy->Web
Proxy<->Api
Proxy<->Web
Api->Database
Api->Runner
Api->Train
Api<->Database
Api<->Runner
Api<->Train
Database<->Runner
Database<->Train

View File

@ -34,6 +34,8 @@
\renewcommand{\footrulewidth}{0pt} % Remove footer underlines
\setlength{\headheight}{13.6pt}
\newcommand*\NewPage{\newpage\null\thispagestyle{empty}\newpage}
% numeric
\usepackage[bibstyle=ieee, citestyle=numeric, sorting=none,backend=biber]{biblatex}
\addbibresource{../main.bib}
@ -49,6 +51,8 @@
%%% Begin document
\begin{document}
\pagenumbering{gobble}
\maketitle
\begin{center}
@ -59,8 +63,8 @@
\today
\end{center}
\newpage
\newpage
\NewPage
\pagenumbering{arabic}
\begin{center}
\vspace*{\fill}
@ -73,13 +77,12 @@
plagiarised will be penalised.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\NewPage
\begin{center}
\vspace*{\fill}
\section*{Acknowledgements}
I would like to take this opportunity to thank my supervisor Rizwan Asghar that helped me from the
I would like to take this opportunity to thank my supervisor, Rizwan Asghar that helped me from the
start of the project until the end.
I am honestly thankful to him for sharing his honest and educational views on several issues related
to this report.
@ -87,8 +90,7 @@
encouragement from the first day of the university.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\NewPage
\begin{center}
\vspace*{\fill}
@ -98,11 +100,10 @@
The project goes through all the requirements for creating a platform service, and all of its needs.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\NewPage
\tableofcontents
\newpage
\newpage
\section{Introduction}
% This section should contain an introduction to the problem aims and objectives (0.5 page)
@ -219,82 +220,130 @@
% There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models.
\pagebreak
\section{System Analysis}
\section{Service Analysis and Requirements}
Understanding the project that is being built is critical in the software deployment process, this section will look into the required parts for the project to work.
As a SaaS project, there are some required parts that the project needs to have:
\begin{itemize}
\item{Web App}
\item{JSON API}
\item{API}
\item{Server Management}
\item{Dataset Management}
\item{Model Management}
\end{itemize}
\subsection{Overall Structure of the Project}
The service should be able to respond to any load that is givien to it. This will require the ability to scale depending on the amount of requests that the service is recieving.
Therefore the service requires some level of distributivity.
\subsection{Service Structure}
The service should be able to respond to any load that is given to it. This will require the ability to scale depending on the number of requests that the service is receiving.
Therefore, the service requires some level of distributivity.
The service because of the machine learning tasks, also requires to able to have acess to machines that can use GPUs.
As the machines that have
The service, because of the machine learning tasks, also requires being able to have access to machines that can use GPUs.
As the machines that have.
The service needs to have some level of distributivity, this requirement exists because of the expensive nature of machine learning training.
It would be unwise to perform machine learning training on the same machine that the main web server is running, as it would starve that server of resources.
For a separation of concerns data should also be in a different server.
For a separation of concerns, data should also be on a different server.
\subsection{Resources}
As the service contains more than one resourse to manage, it should be able to track what are the resourses it has available and distribute the load acordingly.
As the service contains more than one resource to manage, it should be able to track what are the resources it has available and distribute the load accordingly.
One example of this would be the service has two servers with GPU available to them.
One of the servers contains a more capable GPU that server should be used to train models as that requires more computational power.
One of the servers contains a more capable GPU, that server should be used to train models as that requires more computational power.
Storage is another resourse that the service will have to handle.
Storage is another resource that the service will have to handle.
The service needs to keep track of the model files and uploaded files.
Alternatively it be able to mount other servers disks and get the images directerly from the other service.
Alternatively, the service should be able to mount other servers disks and get the images directly from the other service.
\subsection{Web App}
The user of the application should be able to interact with the platform using a graphical user interface(GUI).
There are multiple possible ways for the user to interact with services like web, mobile or desktop applications.
A web application is the most reasonable solution for this service.
The main way to interact with this service would be via an API, the API that the system will provide would be an HTTPS API \ref{sec:anal-api}, since the service already has a web oriented API, it makes the most sense for the GUI to be a web based as well.
The web app is where users can interact with the service.
Users should be able to manage models, see uploaded data.
The user will access the web app and configure the model, and manage that data set.
Users should be able to manage models, model data, API keys, API usage.
\subsection{JSON API}
The user should be able to access the web app and use it to:
\begin{itemize}
\item{Configure model}
\item{Manage datasets}
\item{Configure API tokens}
\item{See API usage}
%TODO write more
\end{itemize}
A big part of a SaaS is the ability to communicate with other services, nowadays, the way that systems communicate with each other is using mostly JSON and Rest APIs \cite{json-api-usage-stats}. Since the system will need to communicate with other services to work as intended.
For administrator purposes, the web application should also allow the management of available compute resources to the system.
\subsection{Server Management}
Since AI training is notoriously expensive, the system cannot run on one server alone, as this would put too much strain on that server.
The system needs to be able to distribute the load between the multiple servers.
For that reason, the service needs to both be able to send training and prediction jobs to servers that have the resources to train models or predict classes from images.
\subsection{API} \label{sec:anal-api}
The system has to be able to choose the servers to run the models in an optimized way.
For example, when training, send training jobs to the same server to prevent the server from having to reload the data again.
As a software as a service platform, the users of the platform will mainly interact via the API.
The user would set up the machine learning model using the web interface and then configure their application, to use a token, to securely interact with the API.
\subsection{Dataset Management}
Without data, the system cannot train models. And management of data is important as this data might contain some private data.
Such as biometrics, the system will need to be able to safely handle this data.
The system will also have to decide when to clear data, since storage space is also a resource that the system needs to manage.
There exists multiple architectural styles for APIs, using a REST API would be the proper architectural style as it is the most common \cite{json-api-usage-stats}, allowing for the most compatibility with other services.
\subsection{Model Management}
Once the model has been created, the system has to keep track of the model, as well as the actual accuracy of the model.
It has to record how much the model used so it can distribute the load from in different GPU servers.
The API should allow users to the most used features of the app, such as:
\begin{itemize}
\item{Uploading new images for the dataset}
\item{Request training of the model}
\item{Running an image in the model}
\item{Marking previous predictions as incorrect}
%TODO write more
\end{itemize}
\subsection{Resource Management}
For optimal functionality, the service requires the management of various compute resources.
This separation of compute resources is required because machine learning is computed and memory intensive.
Running this resource intensive operations on the same server that is running the main API could cause increase latency or downtime in the API, which would not be ideal.
The service should be able to decide where to distribute tasks.
The tasks should be distributed according to the resources that the task needs.
The tasks need to be submitted to servers in an organized manner.
Repeated tasks should be sent to the same server to optimize the usage of the resources, as this would improve the efficiency of the service by preventing, for example, reload of data.
For example, sending a training workload to a server that more GPU resources available to it while allowing slower GPU servers to run the models for prediction.
The service should also keep tract of the space available to it.
The service must decide which images, that it manages, to keep and which ones to delete.
It should also keep track of other services images, and control the access to them, and guarantee that the server that is closeted to the recourses is that has priority on tasks related to those recourses.
\subsection{Data Management}
The service needs to manage various kinds of data.
The first kind of data the service needs to manage is user data.
This is data that identifies a user and allows the user to authenticate with the service.
A future version of this service could possibly also store payment information.
This information would be used to charge for the usage of the service, although this is outside the scope of this project.
The second kind of data that has to be managed is the user images.
These images could be either uploaded to the service, or stored on the users' devices.
The service should manage access to remote images, and keep track of local images.
The last kind of data that the service has to keep track of are model definitions and model weights.
These can be sizable files, which makes it important for the system to distribute them precisely, allowing the files to be closer to the servers that need them the most.
\pagebreak
\section{System Design}
\section{Service Design}
This section will discuss the design of the system.
The section will discuss the inter application interface, control platform, and server, dataset, and model management.
\subsection{Structure of the Service}
\begin{figure}
\begin{center}
\includegraphics{system_diagram}
\end{center}
\caption{Simplified diagram of the service}\label{fig:simplified_service_diagram}
\begin{figure}[h!]
\centering
\includegraphics[height=0.4\textheight]{system_diagram}
\caption{Simplified diagram of the service}
\label{fig:simplified_service_diagram}
\end{figure}
The service is designed to be a 4 tier structure:
@ -311,12 +360,12 @@
The presentation layer consists of a webpage that interacts with the API layer, to manage both the resources allocated to users and administrators of the system.
More details of the implementation can be found in \ref{web-app-design}.
The API layer, controls the system, it's the interface that both the webpage and users' servers use to interact with the system.
The API layer, controls the system, it's the interface that both the webpage and users' servers used to interact with the system.
The Worker layer, consists of a set of servers available to perform GPU loads.
\subsection{Inter Application Interface}
\subsection{Application Programming Interface}
As a software as a service, one of the main requirements is to be able to communicate with other services.
The current main way that servers communicate over the internet is using https and a rest JSON API\cite{json-api-usage-stats}.
@ -337,56 +386,8 @@
\item{See the performance of the model.} %TODO add image for proff
\end{itemize}
\subsection{Aplaication Programming Interface}
\pagebreak
\section{Design Choices}
\subsection{Structure of the Service}
The system has to manage:
\begin{itemize}
\item{User data}
\item{Uploaded User Images / Remote User Images}
\item{User models}
\item{Generation of models}
\item{Training of models}
\item{Running of models}
\end{itemize}
The system is designed with a semi-monolithic approach \ref{fig:expandable_models_simple}. The management of the data, and generation of the models will be done in the monolith while the training/running of the models will be done in GPU dedicated nodes.
The overall workflow of a user who wants a model created would be:
\begin{itemize}
\item{The user requests the server to create a model with some base images and classes.}
\item{The system creates a model}
\item{The user requests the classification or confirmation of an image}
\end{itemize}
\subsection{Web app}
\subsection{Web App}
The goal of the project is to provide a software as a service platform for classification tasks. With that in mind, the service needs to have a way of controlling it. This will be achieved with a web interface.
@ -426,7 +427,14 @@
During the training process, the TP needs to cache the dataset being used, this is because to create one model, the system might have to generate and train more than one model, during this process, if the dataset is not cached then time is spent reloading the dataset into memory.
\pagebreak
\section{Service Implementation}
\pagebreak
\section{Legal and Ethical Issues}
\pagebreak
\section{Results} % TODO change this