fyp-report/report/report.tex

794 lines
46 KiB
TeX
Raw Normal View History

2023-10-16 14:48:04 +01:00
%%% Preamble
2024-03-10 00:13:00 +00:00
\documentclass[11pt, a4paper]{article}
2023-10-16 14:48:04 +01:00
\usepackage[english]{babel} % English language/hyphenation
\usepackage{url}
\usepackage{tabularx}
\usepackage{pdfpages}
\usepackage{float}
2024-04-24 11:38:42 +01:00
\usepackage{longtable}
2023-10-16 14:48:04 +01:00
\usepackage{graphicx}
2023-12-01 14:06:42 +00:00
\usepackage{svg}
2023-10-16 14:48:04 +01:00
\graphicspath{ {../images for report/} }
\usepackage[margin=2cm]{geometry}
2023-10-16 14:48:04 +01:00
2024-03-14 17:23:49 +00:00
\usepackage{datetime}
\newdateformat{monthyeardate}{\monthname[\THEMONTH], \THEYEAR}
2023-10-16 14:48:04 +01:00
\usepackage{hyperref}
\hypersetup{
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black
}
\usepackage{cleveref}
%%% Custom headers/footers (fancyhdr package)
\usepackage{fancyhdr}
\pagestyle{fancyplain}
\fancyhead{} % No page header
\fancyfoot[L]{} % Empty
\fancyfoot[C]{\thepage} % Pagenumbering
\fancyfoot[R]{} % Empty
\renewcommand{\headrulewidth}{0pt} % Remove header underlines
\renewcommand{\footrulewidth}{0pt} % Remove footer underlines
\setlength{\headheight}{13.6pt}
2024-03-11 23:56:43 +00:00
\newcommand*\NewPage{\newpage\null\thispagestyle{empty}\newpage}
% numeric
2023-12-20 17:15:11 +00:00
\usepackage[bibstyle=ieee, citestyle=numeric, sorting=none,backend=biber]{biblatex}
2023-10-16 14:48:04 +01:00
\addbibresource{../main.bib}
% Write the approved title of your dissertation
2024-02-27 16:25:49 +00:00
\title{Classify: Image Classification as a Software Platform}
% Write your full name, as in University records
2024-03-14 17:23:49 +00:00
\author{Andre Henriques\\Univerity of surrey}
2023-10-16 14:48:04 +01:00
\date{}
%%% Begin document
\begin{document}
2024-03-11 23:56:43 +00:00
\pagenumbering{gobble}
2023-10-16 14:48:04 +01:00
\maketitle
2024-02-27 16:25:49 +00:00
\begin{center}
\includegraphics[height=0.5\textheight]{uni_surrey}
\end{center}
2024-02-27 21:41:04 +00:00
\begin{center}
2024-03-14 17:23:49 +00:00
\monthyeardate\today
2024-02-27 21:41:04 +00:00
\end{center}
2024-03-11 23:56:43 +00:00
\NewPage
\pagenumbering{arabic}
2023-10-16 14:48:04 +01:00
2024-02-27 21:49:27 +00:00
\begin{center}
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
\section*{Declaration of Originality}
I confirm that the submitted work is my own work and that I have clearly identified and fully
acknowledged all material that is entitled to be attributed to others (whether published or
unpublished) using the referencing system set out in the programme handbook. I agree that the
University may submit my work to means of checking this, such as the plagiarism detection service
Turnitin® UK. I confirm that I understand that assessed work that has been shown to have been
2024-03-04 21:23:35 +00:00
plagiarised will be penalised.
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
2024-02-27 21:49:27 +00:00
\end{center}
2024-03-11 23:56:43 +00:00
\NewPage
2024-02-27 21:49:27 +00:00
\begin{center}
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
\section*{Acknowledgements}
2024-03-11 23:56:43 +00:00
I would like to take this opportunity to thank my supervisor, Rizwan Asghar that helped me from the
2024-02-28 22:51:31 +00:00
start of the project until the end.
I am honestly thankful to him for sharing his honest and educational views on several issues related
2024-02-27 21:54:14 +00:00
to this report.
Additionally, I would like to thank my parents and friends for their continued support and
2024-03-04 21:23:35 +00:00
encouragement from the first day of the university.
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
2024-02-27 21:49:27 +00:00
\end{center}
2024-03-11 23:56:43 +00:00
\NewPage
2024-02-27 21:49:27 +00:00
\begin{center}
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
\section*{Abstract}
2024-02-28 22:51:31 +00:00
Currently there are few automatic image classification platforms.
2024-02-27 21:54:14 +00:00
This project hopes to work as a guide for the creating a new image automatic classification platform.
2024-02-28 22:51:31 +00:00
The project goes through all the requirements for creating a platform service, and all of its needs.
2024-02-27 21:54:14 +00:00
\vspace*{\fill}
2024-02-27 21:49:27 +00:00
\end{center}
2024-03-11 23:56:43 +00:00
\NewPage
2024-02-27 10:27:47 +00:00
2023-10-16 14:48:04 +01:00
\tableofcontents
2024-03-11 23:56:43 +00:00
\newpage
2023-10-16 14:48:04 +01:00
2024-04-24 11:38:42 +01:00
\section{Introduction} \label{sec:introduction}
% This section should contain an introduction to the problem aims and obectives (0.5 page)
This project is to design and create a new software as a service platform, where users with no experience in machine learning, data analysis could create machine learning models to process their data.
2024-04-24 14:32:47 +01:00
In this project, the platform will be scoped to image classification, with the ability to be updated later with more model types.
2024-04-24 11:38:42 +01:00
As an easy-to-use platform needs to be able to handle: image uploads, processing, and verification; model creation, management, and expansion; and image classification.
This report will do a brief analysis of current image classification systems, followed by an overview of the design of the system, and implementation details. The report will finish with analysis of legal, ethical and societal issues, and evaluation of results, and objectives.
\subsection{Project Motivations}
Currently, there are many classification tasks that are being done manually.
Thousands of man-hours are used to classify images, this task can be automated.
There are a few easy-to-use image classification systems that require low to no knowledge of image classification.
This project aims to fill that role and provide an easy-to-use system that anyone without knowledge of image classification could use.
% These tasks could be done more effectively if there was tooling that would allow the easy creation of classification models, without the knowledge of data analysis and machine learning models creation.
% The aim of this project is to create a classification service that requires zero user knowledge about machine learning, image classification or data analysis.
% The system should allow the user to create a reasonable accurate model that can satisfy the users' need.
% The system should also allow the user to create expandable models; models where classes can be added after the model has been created. % hyperparameters, augmenting the data.
2024-02-27 21:41:04 +00:00
\subsection{Project Aim}
2024-04-24 11:38:42 +01:00
The project aims to create a platform an easy to use where users can create different types of classification models without the users having any knowledge of image classification.
2024-02-27 21:41:04 +00:00
\subsection{Project Objectives}
2024-04-24 11:38:42 +01:00
This project's primary objectives are to design and implement:
2024-02-27 21:41:04 +00:00
\begin{itemize}
\item a platform where the users can create and manage their models.
\item a system to automatically create and train models.
2024-04-24 11:38:42 +01:00
% \item a system to automatically expand and reduce models without fully retraining the models.
\item a system to automatically expand models without fully retraining the models.
2024-03-14 17:23:49 +00:00
\item an API that users can interact programmatically with the system.
2024-04-24 11:38:42 +01:00
\end{itemize}
2024-02-27 21:41:04 +00:00
This project extended objectives are to:
\begin{itemize}
2024-04-24 11:38:42 +01:00
% \item Create a system to automatically to merge modules to increase efficiency.
2024-02-27 21:41:04 +00:00
\item Create a system to distribute the load of training the model's among multiple services.
\end{itemize}
2024-04-24 11:38:42 +01:00
\subsection{Success Criteria}
This project can be considered successful when:
\begin{itemize}
\item A user can upload images, train a model on those images, and evaluate images using the web interface.
\item A user can perform the same tasks, via the API service.
\end{itemize}
\subsection{Project Structure}
The report on the project shows the development and designs stages of the project. With each section addressing a part of the design and development process.
2024-04-24 19:50:53 +01:00
\begin{longtable}{rp{0.35\textwidth} rp{0.45\textwidth}}
2024-04-24 14:32:47 +01:00
\hyperref[sec:introduction]{Introduction} & The introduction section will do a brief introduction of the project and its objectives. \\
\hyperref[sec:lit-tech-review]{Literature and Technical Review} & The Literature and Technical Review section will introduce some current existing projects that are similar to this one, and introduce some technologies that can be used to implement this project. \\
2024-04-24 19:50:53 +01:00
\hyperref[sec:sanr]{Service Analysis and Requirements} & This section will analyse the project requirements. The section will define design requirements that the service will need to implement to be able to achieve the goals that were set up. \\
2024-04-24 14:32:47 +01:00
\hyperref[sec:sdai]{Service Design and Implementation} & This section discusses transforming the requirements defined in the previous section and implementing them, to obtain a working application. \\
2024-04-24 14:38:47 +01:00
\hyperref[sec:lsec]{Legal, Societal, and Ethical Considerations} & This section will cover potential legal societal and ethical issues that might arise from the service and how they are mitigated.\\
2024-04-24 14:32:47 +01:00
2024-04-24 11:38:42 +01:00
\caption{Project structure}
\label{tab:project-structure}
\end{longtable}
2024-02-27 21:41:04 +00:00
\pagebreak
2024-04-24 11:38:42 +01:00
\section{Literature and Technical Review} \label{sec:lit-tech-review}
2024-01-04 11:05:25 +00:00
This section reviews existing technologies in the market that do image classification. It also reviews current image classification technologies, which meet the requirements for the project. This review also analyses methods that are used to distribute the learning between various physical machines, and how to spread the load so minimum reloading of the models is required when running the model.
2024-01-23 18:21:12 +00:00
\subsection{Existing Classification Platforms}
2024-01-03 14:55:38 +00:00
There are currently some existing software as a service (SaaS) platforms that do provide similar services to the ones this will project will be providing.
%Amazon provides bespoque machine learning services that if were contacted would be able to provide image classification services. Amazon provides general machine learning services \cite{amazon-machine-learning}.
2024-02-29 15:38:41 +00:00
Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model.
2023-12-18 21:33:53 +00:00
2024-02-27 10:05:17 +00:00
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours, \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one, or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
%https://aws.amazon.com/machine-learning/ml-use-cases/
%https://aws.amazon.com/rekognition/image-features/
2024-01-04 11:05:25 +00:00
Similarly, Google also has ``Cloud Vision API'' \cite{google-vision-api} which provides similar services to Amazon's Rekognition. But Google's Vision API appears to be more targeted at videos than images, as indicated by their price sheet \cite{google-vision-price-sheet}. They have tag and product identifiers, where every image only has one tag or product. The product identifier system seams to work differently than the Amazon's Rekognition and worked based on K neighbouring giving the user similar products on not classification labels \cite{google-vision-product-recognizer-guide}.
2023-12-20 16:07:27 +00:00
2024-01-23 18:21:12 +00:00
This method is more effective at allowing users to add new types of products, but as it does not give defined classes as the output, the system does not give the target functionality that this project is aiming to achieve.
2024-01-23 18:21:12 +00:00
\subsection{Requirements of Image Classification Models}
2023-12-22 17:15:46 +00:00
2024-02-27 10:05:17 +00:00
The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ``one solution fits all to the problem''. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the results.
2023-12-22 17:15:46 +00:00
2024-01-03 14:55:38 +00:00
This section will analyse possible models that would obtain the best results. The models for this project have to be the most efficient as possible while resulting in the best accuracy as possible.
2023-12-22 17:15:46 +00:00
2024-01-23 18:21:12 +00:00
A classical example is the MNIST Dataset \cite{mnist}. Models for the classification of the MNIST dataset can be both simple or extremely complex and achieve different levels of complexity.
2024-02-27 10:05:17 +00:00
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project, but \cite{mist-high-accuracy} are more computational intensive to run. When deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
2023-12-22 17:15:46 +00:00
% TODO fix the inglish in these sentance
2024-01-03 14:55:38 +00:00
The models for this system to work as indented should be as small as possible while obtaining the required accuracy required to achieve the task of classification of the classes.
2023-12-22 17:15:46 +00:00
2024-02-15 16:48:09 +00:00
As the service might need to handle many requests, it needs to be able to handle as many requests as possible. This would require that the models are easy to run, and smaller models are easier to run; therefore the system requires a balance between size and accuracy.
2024-01-24 19:51:41 +00:00
2024-02-01 10:05:34 +00:00
% TODO talk about storage
2024-01-23 18:21:12 +00:00
\subsection{Method of Image Classification Models}
2023-12-22 17:15:46 +00:00
2024-01-24 19:51:41 +00:00
There are all multiple ways of achieving image classification, the requirements of the system are that the system should return the class that an image that belongs to. Which means that we will be using supervised classification methods, as these are the ones that meet the requirements of the system.
2023-12-22 17:15:46 +00:00
% TODO find some papers to proff this
2024-01-03 14:55:38 +00:00
The system will use supervised models to classify images, using a combination of different types of models, using neural networks, convolution neural networks, deed neural networks and deep convolution neural networks.
2023-12-22 17:15:46 +00:00
2024-01-24 19:51:41 +00:00
These types were decided as they have had a large success in the past in other image classification challenges, for example in the ImageNet challenges \cite{imagenet}, which has ranked different models in classifying a 14 million images. The contest has been running since 2010 to 2017.
2023-12-22 17:15:46 +00:00
2024-02-29 15:38:41 +00:00
The models that participated in the contest tended to use more and more Deep convolution neural networks, out of the various models that were generated there are a few landmark models that were able to achieve high accuracies, including AlexNet \cite{krizhevsky2012imagenet}, ResNet-152 \cite{resnet-152}, EfficientNet \cite{efficientnet}.
2023-12-23 14:35:46 +00:00
% TODO find vgg to cite
2023-12-22 17:15:46 +00:00
2024-01-03 14:55:38 +00:00
These models can be used in two ways in the system, they can be used to generate the models via transfer learning and by using the model structure as a basis to generate a complete new model.
2023-12-23 14:35:46 +00:00
2024-02-28 22:51:31 +00:00
\subsection{Well-known models}
2023-12-23 14:35:46 +00:00
% TODO compare the models
2023-12-22 17:15:46 +00:00
2024-01-03 14:55:38 +00:00
This section will compare the different models that did well in the image net challenge.
2024-01-23 18:21:12 +00:00
AlexNet \cite{krizhevsky2012imagenet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2010 contest, it achieved a top-1 error rate of $37.5\%$, and a top-5 error rate of $37.5\%$. A variant of this model participated in the ImageNet LSVRC-2012 contest and achieved a top-5 error rate of $15.3\%$. The architecture of AlexNet consists of 5 convolution layers that are run separately followed by 3 dense layers, some layers are followed by Max pooling. The training the that was done using multiple GPUs, one GPU would run the part of each layer, and some layers are connected between GPUs. The model during training also contained data argumentation techniques such as label preserving data augmentation and dropout.
2024-01-04 11:05:25 +00:00
While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the number of resources available, and requiring to use 2 GPUs to train a model would limit the number of resources available to the system by 2-fold.
2024-01-03 14:55:38 +00:00
% TODO talk more about this
2024-02-27 10:05:17 +00:00
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. ResNet was created to solve a problem, the problem of degradation of training accuracy when using deeper models. Close to the release of the ResNet paper, there was evidence that deeper networks result in higher accuracy results, \cite{going-deeper-with-convolutions, very-deep-convolution-networks-for-large-scale-image-recognition}. but the increasing the depth of the network resulted in training accuracy degradation.
2024-01-24 19:51:41 +00:00
% This needs some work in terms of gramar
ResNet works by creating shortcuts between sets of layers, the shortcuts allow residual values from previous layers to be used on the upper layers. The hypothesis being that it is easier to optimize the residual mappings than the linear mappings.
The results proved that the using the residual values improved training of the model, as the results of the challenge prove.
It's important to note that using residual networks tends to give better results, the more layers the model has. While this could have a negative impact on performance, the number of parameters per layer does not grow that steeply in ResNet when comparing it with other architectures as it uses other optimizations such as $1x1$ kernel sizes, which are more space efficient. Even with these optimizations, it can still achieve incredible results. Which might make it a good contender to be used in the service as one of the predefined models to use to try to create the machine learning models.
2024-01-04 11:05:25 +00:00
2024-02-01 11:05:01 +00:00
% MobileNet
% EfficientNet
2024-02-29 15:38:41 +00:00
EfficientNet \cite{efficient-net} is a deep convolution neural network that was able to achieve $84.3\%$ top-1 accuracy while ``$8.4x$ smaller and $6.1x$ faster on inference than the best existing ConvNet''. EfficientNets \footnote{the family of models that use the thecniques that described in \cite{efficient-net}} are models that instead of the of just increasing the depth or the width of the model, we increase all the parameters at the same time by a constant value. By not scaling only depth, EfficientNets can acquire more information about the images, specially the image size is considered.
To test their results, the EfficientNet team created a baseline model which as a building block used the mobile inverted bottleneck MBConv \cite{inverted-bottleneck-mobilenet}. The baseline model was then scaled using the compound method, which resulted in better top-1 and top-5 accuracy.
2024-03-10 00:09:12 +00:00
While EfficientNets are smaller than their non-EfficientNet counterparts, they are more computational intensive, a ResNet-50 scaled using the EfficientNet compound scaling method is $3\%$ more computational intensive than a ResNet-50 scaled using only depth while improving the top-1 accuracy by $0.7\%$.
And as the model will be trained and run multiple times decreasing the computational cost might be a better overall target for sustainability then being able to offer higher accuracies.
2024-02-27 10:05:17 +00:00
Even though scaling using the EfficientNet compound method might not yield the best results using some EfficientNets what were optimized by the team to would be optimal, for example, EfficientNet-B1 is both small and efficient while still obtaining $79.1\%$ top-1 accuracy in ImageNet, and realistically the datasets that this system will process will be smaller and more scope specific than ImageNet.
2024-02-01 10:05:34 +00:00
2024-04-24 11:38:42 +01:00
2024-01-04 11:05:25 +00:00
% \subsection{Efficiency of transfer learning}
2023-12-25 11:20:58 +00:00
2024-01-04 11:05:25 +00:00
% \subsection{Creation Models}
% The models that I will be creating will be Convolutional Neural Network(CNN) \cite{lecun1989handwritten,fukushima1980neocognitron}.
% The system will be creating two types of models that cannot be expanded and models that can be expanded. For the models that can be expanded, see the section about expandable models.
% The models that cannot be expanded will use a simple convolution blocks, with a similar structure as the AlexNet \cite{krizhevsky2012imagenet} ones, as the basis for the model. The size of the model will be controlled by the size of the input image, where bigger images will generate more deep and complex models.
% The models will be created using TensorFlow \cite{tensorflow2015-whitepaper} and Keras \cite{chollet2015keras}. These theologies are chosen since they are both robust and used in industry.
2024-01-04 11:05:25 +00:00
% \subsection{Expandable Models}
% The current most used approach for expanding a CNN model is to retrain the model. This is done by, recreating an entire new model that does the new task, using the older model as a base for the new model \cite{amazon-rekognition}, or using a pretrained model as a base and training the last few layers.
2024-01-04 11:05:25 +00:00
% There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models.
2024-04-24 11:38:42 +01:00
\subsection{Conclusion}
2024-04-24 14:32:47 +01:00
The technical review of current systems reveal that there are current systems that exist that can perform image classification tasks, but they are not friendly in ways to easily expand currently existing models.
2024-04-24 11:38:42 +01:00
The current methods that exist for image classification seem to have reached a classification accuracy and efficiency that make a project like this feasible.
% TODO talk about web serving thechnlogies
\pagebreak
2024-04-24 11:38:42 +01:00
2024-04-24 14:32:47 +01:00
\section{Service Analysis and Requirements} \label{sec:sanr}
2024-03-11 12:53:30 +00:00
Understanding the project that is being built is critical in the software deployment process, this section will look into the required parts for the project to work.
2024-03-11 12:53:30 +00:00
As a SaaS project, there are some required parts that the project needs to have:
\begin{itemize}
\item{Web App}
2024-03-11 23:56:43 +00:00
\item{API}
2024-03-11 12:53:30 +00:00
\item{Server Management}
\item{Dataset Management}
\item{Model Management}
\end{itemize}
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
\subsection{Service Structure}
The service should be able to respond to any load that is given to it. This will require the ability to scale depending on the number of requests that the service is receiving.
Therefore, the service requires some level of distributivity.
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
The service, because of the machine learning tasks, also requires being able to have access to machines that can use GPUs.
As the machines that have.
2024-03-11 12:53:30 +00:00
The service needs to have some level of distributivity, this requirement exists because of the expensive nature of machine learning training.
2024-02-28 22:51:31 +00:00
It would be unwise to perform machine learning training on the same machine that the main web server is running, as it would starve that server of resources.
2024-03-04 21:23:35 +00:00
2024-03-11 23:56:43 +00:00
For a separation of concerns, data should also be on a different server.
2024-03-11 12:53:30 +00:00
2024-02-28 22:51:31 +00:00
\subsection{Resources}
2024-02-28 15:23:47 +00:00
2024-03-11 23:56:43 +00:00
As the service contains more than one resource to manage, it should be able to track what are the resources it has available and distribute the load accordingly.
2024-03-11 12:53:30 +00:00
One example of this would be the service has two servers with GPU available to them.
2024-03-11 23:56:43 +00:00
One of the servers contains a more capable GPU, that server should be used to train models as that requires more computational power.
2024-03-11 12:53:30 +00:00
2024-03-11 23:56:43 +00:00
Storage is another resource that the service will have to handle.
2024-03-11 12:53:30 +00:00
The service needs to keep track of the model files and uploaded files.
2024-03-11 23:56:43 +00:00
Alternatively, the service should be able to mount other servers disks and get the images directly from the other service.
2024-03-11 12:53:30 +00:00
\subsection{Web App}
2024-03-11 23:56:43 +00:00
The user of the application should be able to interact with the platform using a graphical user interface(GUI).
There are multiple possible ways for the user to interact with services like web, mobile or desktop applications.
A web application is the most reasonable solution for this service.
The main way to interact with this service would be via an API, the API that the system will provide would be an HTTPS API \ref{sec:anal-api}, since the service already has a web oriented API, it makes the most sense for the GUI to be a web based as well.
2024-03-11 12:53:30 +00:00
The web app is where users can interact with the service.
2024-03-11 23:56:43 +00:00
Users should be able to manage models, model data, API keys, API usage.
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
The user should be able to access the web app and use it to:
\begin{itemize}
\item{Configure model}
\item{Manage datasets}
\item{Configure API tokens}
\item{See API usage}
%TODO write more
\end{itemize}
For administrator purposes, the web application should also allow the management of available compute resources to the system.
\subsection{API} \label{sec:anal-api}
As a software as a service platform, the users of the platform will mainly interact via the API.
The user would set up the machine learning model using the web interface and then configure their application, to use a token, to securely interact with the API.
There exists multiple architectural styles for APIs, using a REST API would be the proper architectural style as it is the most common \cite{json-api-usage-stats}, allowing for the most compatibility with other services.
The API should allow users to the most used features of the app, such as:
\begin{itemize}
\item{Uploading new images for the dataset}
\item{Request training of the model}
\item{Running an image in the model}
\item{Marking previous predictions as incorrect}
%TODO write more
\end{itemize}
2024-03-11 23:56:43 +00:00
\subsection{Resource Management}
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
For optimal functionality, the service requires the management of various compute resources.
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
This separation of compute resources is required because machine learning is computed and memory intensive.
Running this resource intensive operations on the same server that is running the main API could cause increase latency or downtime in the API, which would not be ideal.
2024-02-27 10:05:17 +00:00
2024-03-11 23:56:43 +00:00
The service should be able to decide where to distribute tasks.
The tasks should be distributed according to the resources that the task needs.
The tasks need to be submitted to servers in an organized manner.
Repeated tasks should be sent to the same server to optimize the usage of the resources, as this would improve the efficiency of the service by preventing, for example, reload of data.
For example, sending a training workload to a server that more GPU resources available to it while allowing slower GPU servers to run the models for prediction.
2024-02-28 15:23:47 +00:00
2024-03-11 23:56:43 +00:00
The service should also keep tract of the space available to it.
The service must decide which images, that it manages, to keep and which ones to delete.
It should also keep track of other services images, and control the access to them, and guarantee that the server that is closeted to the recourses is that has priority on tasks related to those recourses.
\subsection{Data Management}
The service needs to manage various kinds of data.
The first kind of data the service needs to manage is user data.
This is data that identifies a user and allows the user to authenticate with the service.
A future version of this service could possibly also store payment information.
This information would be used to charge for the usage of the service, although this is outside the scope of this project.
The second kind of data that has to be managed is the user images.
These images could be either uploaded to the service, or stored on the users' devices.
The service should manage access to remote images, and keep track of local images.
The last kind of data that the service has to keep track of are model definitions and model weights.
These can be sizable files, which makes it important for the system to distribute them precisely, allowing the files to be closer to the servers that need them the most.
2024-02-28 15:23:47 +00:00
2024-04-24 14:32:47 +01:00
\subsection{Conclusion}
This section shows that there are requirements that need to be met for the system to work as indented. These requirements range from usability requirements, to system-level resource management requirements.
2024-02-28 15:23:47 +00:00
2024-04-24 14:32:47 +01:00
The service needs to be easy to use by the user, while being able to handle loads from both the website and API requests.
The service requires the ability to be able to scale up to the loads that is being provided with and keep track and manage resources that the user or the service created.
It also requires keeping track of computational resources that are available to it, so it does not cause deadlocks. For example, using all of its GPU recourses to train a model while there are classification tasks to be done.
The next section will go thought the process of the implementation of an application that implements a subset of this design requirements, with some limitations that will be explained.
2024-02-27 10:05:17 +00:00
\pagebreak
2024-03-04 21:23:35 +00:00
2024-04-24 14:32:47 +01:00
\section{Service Design and Implementation} \label{sec:sdai}
2024-03-04 21:23:35 +00:00
2024-03-12 11:15:05 +00:00
This section will discuss the design of the service.
2024-03-13 22:26:48 +00:00
This section will discuss the design decisions for the web application, API.
2024-03-04 21:23:35 +00:00
2024-03-10 00:09:12 +00:00
\subsection{Structure of the Service}
2024-03-11 23:56:43 +00:00
\begin{figure}[h!]
\centering
\includegraphics[height=0.4\textheight]{system_diagram}
\caption{Simplified diagram of the service}
\label{fig:simplified_service_diagram}
2024-03-10 00:09:12 +00:00
\end{figure}
The service is designed to be a 4 tier structure:
\begin{itemize}
2024-03-11 12:53:30 +00:00
\item{Presentation Layer}
\item{API Layer}
\item{Worker Layer}
2024-03-10 00:09:12 +00:00
\item{Database Layer}
\end{itemize}
2024-03-11 12:53:30 +00:00
This structure was selected because it allows separation of concerns to happen based on the resources required by that layer.
2024-03-10 00:09:12 +00:00
2024-03-11 12:53:30 +00:00
The presentation layer requires interactivity of the user, and therefore it needs to be accessible from the outside, and be simple to use.
The presentation layer consists of a webpage that interacts with the API layer, to manage both the resources allocated to users and administrators of the system.
More details of the implementation can be found in \ref{web-app-design}.
2024-03-10 00:09:12 +00:00
2024-03-11 23:56:43 +00:00
The API layer, controls the system, it's the interface that both the webpage and users' servers used to interact with the system.
2024-03-11 12:53:30 +00:00
The Worker layer, consists of a set of servers available to perform GPU loads.
2024-03-10 00:09:12 +00:00
2024-03-12 11:15:05 +00:00
\subsection{Web application} \label{web-app-design}
2024-03-04 21:23:35 +00:00
2024-03-12 11:15:05 +00:00
The web application (WEB App) is the chosen GUI to control the service.
2024-03-04 21:23:35 +00:00
2024-03-13 22:26:48 +00:00
This subsection discusses details of the workflows and implementation of the application.
2024-03-04 21:23:35 +00:00
2024-03-12 11:15:05 +00:00
\subsubsection*{Implementation Details}
2024-03-04 21:23:35 +00:00
2024-03-12 11:15:05 +00:00
The Web APP is a single-page application (SPA).
2024-03-13 22:26:48 +00:00
The SPA architecture is one of the most prevalent architectures that exists nowadays.
It allows for the fast transitions between pages without having a full reload of the browser happening.
2024-03-04 21:23:35 +00:00
2024-03-13 22:26:48 +00:00
Since this in this project the API and the Web APP are separated, it makes the use of server-side rendering more complex and less efficient.
2024-03-14 17:23:49 +00:00
As, the server would have to first request the API for information to build the web page and then send it to the users' device.
2024-03-13 22:26:48 +00:00
Therefore, the system will use client-side rendering only, allowing for the users' device to request the API directly for more information.
2024-03-13 22:26:48 +00:00
There exist currently many frameworks to create SPAs.
2024-03-12 11:15:05 +00:00
I selected Svelte \cite{svelte} for this project.
2024-03-13 22:26:48 +00:00
I selected Svelte because it's been one of the most liked frameworks to work with in the last years, accordingly to the State of JS survey \cite{state-of-js-2022}.
It's also one of the best performant frameworks that is currently available that has extremity good performance \cite{js-frontend-frameworks-performance}.
2024-03-12 11:15:05 +00:00
I also already have experience with Svelte.
2024-02-16 14:08:50 +00:00
2024-03-13 22:26:48 +00:00
I will be using Svelte with the SvelteKit framework \cite{svelte-kit} which greatly improves the developer experience.
2024-02-16 14:08:50 +00:00
2024-03-13 22:26:48 +00:00
SvelteKit allows for the early creating for SPA with a good default web router.
The static adapter will be used to generate a static HTML and JavaScript files, and they will be hosted by an NGINX proxy \cite{nginx}.
The web application uses the API to control the functionality of the service, this design is advantages.
It allows users of the application to do everything that the application does with the API, which is ideal in a SaaS project.
\subsubsection*{Service authentication}
2024-03-12 11:15:05 +00:00
\begin{figure}[h!]
\centering
\includegraphics[width=\textwidth]{service_authentication}
2024-03-13 22:26:48 +00:00
\caption{Simplified Diagram of User Authentication}
2024-03-12 11:15:05 +00:00
\label{fig:simplified_auth_diagram}
\end{figure}
The user uses an email and password to Sign In or Register with the application.
2024-03-13 22:26:48 +00:00
This is sent to the server and stored in a user account.
2024-03-12 11:15:05 +00:00
The Password is stored hashed using bcrypt \cite{bycrpt}.
In the future other methods of authentication might be provided; like using Googles' OAuth.
2024-03-13 22:26:48 +00:00
Once logged In, the user will be able to use the application and manage tokens that were emitted for this application.
2024-03-12 11:15:05 +00:00
This allows the user to manage what services have access to the account and the usage that those services have used.
2024-03-13 22:26:48 +00:00
The User can emit new tokens that can be used in the users services to request the classification of images.
2024-03-12 11:15:05 +00:00
\subsubsection*{Model Management}
\begin{figure}[h!]
\centering
2024-03-13 22:26:48 +00:00
\includegraphics[width=\textwidth]{models_flow}
\caption{Simplified Diagram of Model management}
2024-03-12 11:15:05 +00:00
\label{fig:simplified_model_diagram}
\end{figure}
2024-03-14 17:23:49 +00:00
The diagram \ref{fig:simplified_model_diagram} shows the steps that the user takes to use a model.
2024-03-13 22:26:48 +00:00
First, the user creates the model.
2024-03-14 17:23:49 +00:00
In this step, the user uploads a sample image of what the model will be handling.
2024-03-13 22:26:48 +00:00
This image is used to define what the kinds of images the model will be able to intake.
Currently, the system does not support resizing of images that are different from the one uploaded at this step during evaluation.
This was done to guarantee that the image that the user want to classify is unmodified.
Moving the responsibility of cropping and resizing to the user.
In the future, systems could be implemented that allow the user to select how an image can be cropped.
The second step is uploading the rest of the dataset.
The user uploads a zip file that contains a set of classes and images corresponding to that class.
That zip file is processed and images and classes are created.
Alternatively, the user can use the API to create new classes and upload.
After all the images that are required for training are uploaded, the user can go to the training step.
During this step, the system automatically trains the model.
After the system trains a model that meets the specifications set by the user, the system will make the model available for the user to use.
When the model is finished training, the user can use the model to run inference tasks on images.
2024-03-14 17:23:49 +00:00
\subsubsection*{Advanced Model Management}
\begin{figure}[h!]
\centering
\includegraphics[width=\textwidth]{models_advanced_flow}
\caption{Simplified Diagram of Advanced Model management}
\label{fig:simplified_model_advanced_diagram}
\end{figure}
The diagram \ref{fig:simplified_model_advanced_diagram} shows the steps that the user takes to use a model.
The steps are very similar to the normal model management.
There exists a new step where the user can upload new images and create new classes, then the user can request the retraining of the model.
During the expanding and training of new classes, the user can still use the inference step.
2024-03-13 22:26:48 +00:00
\subsection{API}
2024-03-12 11:15:05 +00:00
As a software as a service, one of the main requirements is to be able to communicate with other services.
2024-03-13 22:26:48 +00:00
The API provides the simplest way for other services to interact with this service.
The API provides a various HTTPS REST JSON endpoints that will allow any user of the service to fully control their model using only the API.
\subsubsection*{Implementation Details}
The API will run a go \cite{go} http server.
This server will take JSON and multipart form data requests, those requests will be processed, and they will respond with JSON.
The multipart requests are required due to JSON's inability to transmit binary data, which will make the uploading of images extremely inefficient.
Those images would have to be transformed into binary data and then uploaded as a byte array or encoded as base64 and uploaded.
Either of those options is extremely inefficient.
Therefore, there is a need to use multipart form requests are required to allow the early uploading of binary files.
2024-03-14 17:23:49 +00:00
Go was selected as the language to implement the backend due to various of its advantages.
Go has extremely fast compilations which allows for rapid development, and iteration.
It has a very minimal runtime which allows it to be faster, than heavy runtime languages such as JavaScript.
It is also a simple language, which helps maintain the codebase.
2024-03-13 22:26:48 +00:00
2024-03-14 17:23:49 +00:00
The Go language integrates well with C libraries, which allows it access to machine learning libraries like TensorFlow.
2024-03-13 22:26:48 +00:00
\subsubsection*{Authentication}
2024-03-14 17:23:49 +00:00
For a user to be authenticated with the server, it must first log in.
During the login process, the service checks to see if the user is registered and if the password provided during the login matches the stored hash.
2024-03-12 11:15:05 +00:00
2024-03-14 17:23:49 +00:00
Upon verifying the user, a token is emitted.
That token can be used as the header ``token'' as proof that the user is authenticated.
2024-03-13 22:26:48 +00:00
2024-02-16 14:08:50 +00:00
2024-03-13 22:26:48 +00:00
\subsection{Generation of Models}
2024-03-14 17:23:49 +00:00
The service requires the generation of models \ref{fig:expandable_models_generator}.
\subsubsection*{Implementation Details}
The model definitions are generated in the go API and then stored in the database.
The runner then loads the definition from the API and creates a model based on that.
\subsubsection*{Model Generation}
Generating all models based on one single model would decrease the complexity of the system, but it would not guarantee success.
2024-03-14 17:23:49 +00:00
The system needs to generate successful models, to achieve this, the system will be performing two approaches:
\begin{itemize}
\item{Database search}
\item{AutoML (secondary goal)}
\end{itemize}
2024-03-14 17:23:49 +00:00
The database search will consist of trying both previous models that are known to work to similar inputs, either by models that were previously generated by the system or known good models; base known architectures that are modified to match the size of the input images.
2024-03-14 17:23:49 +00:00
An example of the first approach would be to try the ResNet model, while the second approach would be using the architecture of ResNet and configuring the architecture so it is more optimized for the input images.
2024-03-14 17:23:49 +00:00
AutoML approach would consist of using an AutoML system to generate new models that match the task at hand.
2024-03-14 17:23:49 +00:00
Since the AutoML approach would be more computational intensive, it would be less desirable to run. Therefore, the approach would be for the database search to happen first, where known possibly good models would be first tested. If a good model is found, then the search stops and if no model is found, the system would resort to AutoML to find a suitable model.
2024-02-15 16:48:09 +00:00
\subsection{Models Training}
% The Training process follows % TODO have a flow diagram
2024-02-15 16:48:09 +00:00
The training of the models happens in a secondary Training Process(TP).
2024-03-14 17:23:49 +00:00
Once a model candidate is generated, the main process informs the TP of the new model.
The TP obtains the dataset and starts training.
Once the model finished training, it reports to the main process with the results.
The main process then decides if the model matches the requirements.
If that the case, then the main process goes to the next steps; otherwise, the service goes for the next model that requires training.
2024-02-15 16:48:09 +00:00
The TP when training the model decides when the training is finished, this could be when the training time has finished or if the model accuracy is not substantially increasing within the last training rounds.
2024-03-14 17:23:49 +00:00
During the training process, the TP needs to cache the dataset being use.
This is because to create one model, the service might have to generate and train more than one model, during this process, if the dataset is not cached then time is spent reloading the dataset into memory.
2024-03-11 23:56:43 +00:00
2024-04-24 14:32:47 +01:00
\subsection{Conclusion}
This section discussed the design and implementation specifications for the system.
While there were some areas where the requirements were not met completely, due to scope problems, the implementation allows for the missing designed sections to be implemented at a later time.
The implementation follows the requirements with the adjusted scope.
The results of the implementation will be tested in a future section.
2024-03-11 23:56:43 +00:00
\pagebreak
2024-04-24 14:32:47 +01:00
\section{Legal, Societal, and Ethical Considerations} \label{sec:lsec}
This section will address possible legal, societal, ethical issues that might arise from the deployment of the software being designed.
The Self-Assessment for Governance and Ethics (SAGE) form has addressed, and it is submitted along with the report.
\subsection{Legal Issues}
Legal issues can occur due to the data being stored by the service.
The service collect, the least amount of sensitive information, from the users who directly use the service.
That data that is collected while being sensitive is required to be able to authenticate the user, such as name, email, and password.
To safeguard that information, the system will be using industry standards to guarantee data security of that data.
Legal issues might occur due to image uploaded images. For example, those images could be copyrighted, or the images could be confidential. The service is designed to provide ways to allow users to host their images without having to host the images itself moving the legal requirement to the management of the data to the user of the system.
\subsubsection{GDPR}
The General Data Protection Regulation (GDPR) (GDPR, 2018) is a data protection and privacy law in the European Union and the European Economic Area, that has also been implemented into British law.
The main objective of the GDPR is to minimise the data collected by the application for purposes that are not the used in the application, as well as giving users the right to be forgotten.
The application collects only personal data need to authenticate the user, and data that is generated during the normal usage of the application.
All the data that is related to user can be deleted.
The system will prevent any new work that is related with the data, that was requested to be deleted.
Once the there is no more work that requires the data being done, the system will remove all relevant identifiable references to that data.
\subsection{Social Issues}
The web application was designed to be easy to use and there tries to consider all accessibility requirements.
% TODO talk about this
% The service itself could raise issues of taking jobs that are currently done by humans.
% This is less problematic as time has shown that the jobs just change, instead of manually classifying the images, the job transforms from the classifying all the images that are needed to maintain and verifying that the data being input to the model is correct.
\subsection{Ethical Issues}
While the service itself does not raise any ethical concerns. The data that the service will process could raise ethical complications.
For example, if the service gets acquired by a company that also wants to use the data provided to system for other reasons.
2024-04-24 19:50:53 +01:00
\pagebreak
2024-04-24 14:32:47 +01:00
\section{Evaluating the Service}
2024-04-24 19:50:53 +01:00
This section will discuss how the service can be evaluated from a technical standpoint and its results.
With the goals of the project there are two kinds of tests that need to be accounted for.
User testing tests that relate to the experience of the user while using the project and tests that quantitive test the project.
Such as accuracy of the generated models, response time to queries.
\subsection{Testing the model}
To test the system a few datasets were selected.
The datasets were selected to represent different possible sizes of models, and sizes of output labels.
The ImageNet\cite{imagenet} was not selected as one of the datasets that will be tested, as it does not represent the target problem that this project is trying to tackle.
The tests will measure:
\begin{itemize}
\item Time to process and validate the entire dataset upon upload
\item Time to train the dataset
\item Time to classify the image once the dataset has been trained
\item Time to extend the model
\item Accuracy of the newly created model
\end{itemize}
The results will be placed in the results table.
\subsubsection{MNIST}
The MNIST\cite{mnist} dataset was selected due to its size. It's a small dataset that can be trained quickly and can be used to verify other internal systems of the service.
\subsection{Results}
2024-04-24 14:32:47 +01:00
2024-04-24 19:50:53 +01:00
\begin{longtable}{ | c | c | c | c | c | c |}
\hline
MNIST & 0ms & 0ms & 0ms & 0ms & $98\%$ \\ \hline
\caption{Evaluation Results}
\label{tab:eval-results}
\end{longtable}
2024-03-11 23:56:43 +00:00
2024-02-01 12:39:05 +00:00
\pagebreak
2024-04-24 14:32:47 +01:00
2024-03-04 21:23:35 +00:00
\section{Results} % TODO change this
2024-03-14 17:23:49 +00:00
As it was stated during the introduction, this project has multiple objectives.
2024-03-14 17:23:49 +00:00
\subsection{Platform where users can manage their models}
2024-03-14 17:23:49 +00:00
This goal was achieved there a web-based platform was created to manage and control the models.
2024-03-14 17:23:49 +00:00
\subsection{A system to automatically train and create models}
2024-03-14 17:23:49 +00:00
This goal was achieved, there is currently a system to automatically create and train models.
2024-03-15 14:08:41 +00:00
The system that trains models needs some improvement, as it still is partially inefficient when managing the system loads while training.
2024-03-14 17:23:49 +00:00
\subsection{An API that users can interact programmatically}
2024-03-14 17:23:49 +00:00
This goal was achieved and there is currently a working API that users can use to control the models and do inference tasks.
2024-02-01 12:39:05 +00:00
\pagebreak
\section{Appendix}
2024-03-14 17:23:49 +00:00
\begin{figure}[h!]
2024-02-15 16:48:09 +00:00
\begin{center}
2024-03-12 11:15:05 +00:00
\includegraphics[height=0.8\textheight]{expandable_models_simple}
2024-02-15 16:48:09 +00:00
\end{center}
\caption{Contains an overall view of the entire system}\label{fig:expandable_models_simple}
\end{figure}
\begin{figure}
\begin{center}
2024-03-12 11:15:05 +00:00
\includegraphics[height=0.8\textheight]{expandable_models_generator}
2024-02-15 16:48:09 +00:00
\end{center}
2024-03-13 22:26:48 +00:00
\caption{Contains an overall view of the model generation system}\label{fig:expandable_models_generator}
2024-02-15 16:48:09 +00:00
\end{figure}
2024-02-01 12:39:05 +00:00
\pagebreak
2023-10-16 14:48:04 +01:00
\section{References}
\printbibliography[heading=none]
% TODO add my job title
\end{document}