fyp-report/report/report.tex
Andre Henriques 73dc08a782
All checks were successful
continuous-integration/drone/push Build is passing
chore: fix two sidepaper
2024-03-10 00:13:00 +00:00

468 lines
30 KiB
TeX

%%% Preamble
\documentclass[11pt, a4paper]{article}
\usepackage[english]{babel} % English language/hyphenation
\usepackage{url}
\usepackage{tabularx}
\usepackage{pdfpages}
\usepackage{float}
\usepackage{graphicx}
\usepackage{svg}
\graphicspath{ {../images for report/} }
\usepackage[margin=2cm]{geometry}
\usepackage{hyperref}
\hypersetup{
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black
}
\usepackage{cleveref}
%%% Custom headers/footers (fancyhdr package)
\usepackage{fancyhdr}
\pagestyle{fancyplain}
\fancyhead{} % No page header
\fancyfoot[L]{} % Empty
\fancyfoot[C]{\thepage} % Pagenumbering
\fancyfoot[R]{} % Empty
\renewcommand{\headrulewidth}{0pt} % Remove header underlines
\renewcommand{\footrulewidth}{0pt} % Remove footer underlines
\setlength{\headheight}{13.6pt}
% numeric
\usepackage[bibstyle=ieee, citestyle=numeric, sorting=none,backend=biber]{biblatex}
\addbibresource{../main.bib}
% Write the approved title of your dissertation
\title{Classify: Image Classification as a Software Platform}
% Write your full name, as in University records
\author{Andre Henriques, 6644818}
\date{}
%%% Begin document
\begin{document}
\maketitle
\begin{center}
\includegraphics[height=0.5\textheight]{uni_surrey}
\end{center}
\begin{center}
\today
\end{center}
\newpage
\newpage
\begin{center}
\vspace*{\fill}
\section*{Declaration of Originality}
I confirm that the submitted work is my own work and that I have clearly identified and fully
acknowledged all material that is entitled to be attributed to others (whether published or
unpublished) using the referencing system set out in the programme handbook. I agree that the
University may submit my work to means of checking this, such as the plagiarism detection service
Turnitin® UK. I confirm that I understand that assessed work that has been shown to have been
plagiarised will be penalised.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\begin{center}
\vspace*{\fill}
\section*{Acknowledgements}
I would like to take this opportunity to thank my supervisor Rizwan Asghar that helped me from the
start of the project until the end.
I am honestly thankful to him for sharing his honest and educational views on several issues related
to this report.
Additionally, I would like to thank my parents and friends for their continued support and
encouragement from the first day of the university.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\begin{center}
\vspace*{\fill}
\section*{Abstract}
Currently there are few automatic image classification platforms.
This project hopes to work as a guide for the creating a new image automatic classification platform.
The project goes through all the requirements for creating a platform service, and all of its needs.
\vspace*{\fill}
\end{center}
\newpage
\newpage
\tableofcontents
\newpage
\section{Introduction}
% This section should contain an introduction to the problem aims and objectives (0.5 page)
Currently, there are many classification tasks that are being done manually. These tasks could be done more effectively if there was tooling that would allow the easy creation of classification models, without the knowledge of data analysis and machine learning models creation.
The aim of this project is to create a classification service that requires zero user knowledge about machine learning, image classification or data analysis.
The system should allow the user to create a reasonable accurate model that can satisfy the users' need.
The system should also allow the user to create expandable models; models where classes can be added after the model has been created. % hyperparameters, augmenting the data.
\subsection{Project Aim}
The project aims to create a platform where users can create different types of classification models without the users having any knowledge of image classification.
\subsection{Project Objectives}
This project's primary objectives are to create:
\begin{itemize}
\item a platform where the users can create and manage their models.
\item a system to automatically create and train models.
\item a system to automatically expand and reduce models without fully retraining the models.
\item an API so that users can interact programmatically with the system.
\end{itemize}
This project extended objectives are to:
\begin{itemize}
\item Create a system to automatically to merge modules to increase efficiency.
\item Create a system to distribute the load of training the model's among multiple services.
\end{itemize}
\pagebreak
\section{Literature and Technical Review}
This section reviews existing technologies in the market that do image classification. It also reviews current image classification technologies, which meet the requirements for the project. This review also analyses methods that are used to distribute the learning between various physical machines, and how to spread the load so minimum reloading of the models is required when running the model.
\subsection{Existing Classification Platforms}
There are currently some existing software as a service (SaaS) platforms that do provide similar services to the ones this will project will be providing.
%Amazon provides bespoque machine learning services that if were contacted would be able to provide image classification services. Amazon provides general machine learning services \cite{amazon-machine-learning}.
Amazon provides an image classification service called ``Rekognition'' \cite{amazon-rekognition}. This service provides multiple services from face recognition, celebrity recognition, object recognition and others. One of these services is called custom labels \cite{amazon-rekognition-custom-labels} that provides the most similar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the Rekognition service would generate a model that allows the users to classify images according to the generated model.
The models generated using Amazon's Rekognition do not provide ways to update the number of labels that were created, without generating a new project. This will involve retraining a large part of the model, which would involve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours, \cite{amazon-rekognition-custom-labels-training}, which could result in up to 24 hours of lag between the need of creating a new label and being able to classify that label. A problem also arises when the uses need to add more than one label at the same time. For example, the user sees the need to create a new label and starts a new model training, but while the model is training a new label is also needed. The user now either stops the training of the new model and retrains a new one, or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency, this might not be the best platform to choose.
%https://aws.amazon.com/machine-learning/ml-use-cases/
%https://aws.amazon.com/rekognition/image-features/
Similarly, Google also has ``Cloud Vision API'' \cite{google-vision-api} which provides similar services to Amazon's Rekognition. But Google's Vision API appears to be more targeted at videos than images, as indicated by their price sheet \cite{google-vision-price-sheet}. They have tag and product identifiers, where every image only has one tag or product. The product identifier system seams to work differently than the Amazon's Rekognition and worked based on K neighbouring giving the user similar products on not classification labels \cite{google-vision-product-recognizer-guide}.
This method is more effective at allowing users to add new types of products, but as it does not give defined classes as the output, the system does not give the target functionality that this project is aiming to achieve.
\subsection{Requirements of Image Classification Models}
The of the main objectives of this project are to be able to create models that can give a class given an image for any dataset. Which means that there will be no ``one solution fits all to the problem''. While the most complex way to solve a problem would most likely result in success, it might not be the most efficient way to achieve the results.
This section will analyse possible models that would obtain the best results. The models for this project have to be the most efficient as possible while resulting in the best accuracy as possible.
A classical example is the MNIST Dataset \cite{mnist}. Models for the classification of the MNIST dataset can be both simple or extremely complex and achieve different levels of complexity.
For example, in \cite{mist-high-accuracy} an accuracy $99.91\%$, by combining 3 Convolutional Neural Networks (CNNs), with different kernel sizes and by changing hyperparameters, augmenting the data, and in \cite{lecun-98} an accuracy of $95\%$ was achieved using a 2 layer neural network with 300 hidden nodes. Both these models achieve the accuracy that is required for this project, but \cite{mist-high-accuracy} are more computational intensive to run. When deciding when to choose what models they create, the system should choose to create the model that can achieve the required accuracy while taking the leas amount of effort to train.
% TODO fix the inglish in these sentance
The models for this system to work as indented should be as small as possible while obtaining the required accuracy required to achieve the task of classification of the classes.
As the service might need to handle many requests, it needs to be able to handle as many requests as possible. This would require that the models are easy to run, and smaller models are easier to run; therefore the system requires a balance between size and accuracy.
% TODO talk about storage
\subsection{Method of Image Classification Models}
There are all multiple ways of achieving image classification, the requirements of the system are that the system should return the class that an image that belongs to. Which means that we will be using supervised classification methods, as these are the ones that meet the requirements of the system.
% TODO find some papers to proff this
The system will use supervised models to classify images, using a combination of different types of models, using neural networks, convolution neural networks, deed neural networks and deep convolution neural networks.
These types were decided as they have had a large success in the past in other image classification challenges, for example in the ImageNet challenges \cite{imagenet}, which has ranked different models in classifying a 14 million images. The contest has been running since 2010 to 2017.
The models that participated in the contest tended to use more and more Deep convolution neural networks, out of the various models that were generated there are a few landmark models that were able to achieve high accuracies, including AlexNet \cite{krizhevsky2012imagenet}, ResNet-152 \cite{resnet-152}, EfficientNet \cite{efficientnet}.
% TODO find vgg to cite
These models can be used in two ways in the system, they can be used to generate the models via transfer learning and by using the model structure as a basis to generate a complete new model.
\subsection{Well-known models}
% TODO compare the models
This section will compare the different models that did well in the image net challenge.
AlexNet \cite{krizhevsky2012imagenet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2010 contest, it achieved a top-1 error rate of $37.5\%$, and a top-5 error rate of $37.5\%$. A variant of this model participated in the ImageNet LSVRC-2012 contest and achieved a top-5 error rate of $15.3\%$. The architecture of AlexNet consists of 5 convolution layers that are run separately followed by 3 dense layers, some layers are followed by Max pooling. The training the that was done using multiple GPUs, one GPU would run the part of each layer, and some layers are connected between GPUs. The model during training also contained data argumentation techniques such as label preserving data augmentation and dropout.
While using AlexNet would probably yield desired results, it would complicate the other parts of the service. As a platform as a service, the system needs to manage the number of resources available, and requiring to use 2 GPUs to train a model would limit the number of resources available to the system by 2-fold.
% TODO talk more about this
ResNet \cite{resnet} is a deep convolution neural network that participated in the ImageNet ILSVRC-2015 contest, it achieved a top-1 error rate of $21.43\%$ and a top-5 error rate of $5.71\%$. ResNet was created to solve a problem, the problem of degradation of training accuracy when using deeper models. Close to the release of the ResNet paper, there was evidence that deeper networks result in higher accuracy results, \cite{going-deeper-with-convolutions, very-deep-convolution-networks-for-large-scale-image-recognition}. but the increasing the depth of the network resulted in training accuracy degradation.
% This needs some work in terms of gramar
ResNet works by creating shortcuts between sets of layers, the shortcuts allow residual values from previous layers to be used on the upper layers. The hypothesis being that it is easier to optimize the residual mappings than the linear mappings.
The results proved that the using the residual values improved training of the model, as the results of the challenge prove.
It's important to note that using residual networks tends to give better results, the more layers the model has. While this could have a negative impact on performance, the number of parameters per layer does not grow that steeply in ResNet when comparing it with other architectures as it uses other optimizations such as $1x1$ kernel sizes, which are more space efficient. Even with these optimizations, it can still achieve incredible results. Which might make it a good contender to be used in the service as one of the predefined models to use to try to create the machine learning models.
% MobileNet
% EfficientNet
EfficientNet \cite{efficient-net} is a deep convolution neural network that was able to achieve $84.3\%$ top-1 accuracy while ``$8.4x$ smaller and $6.1x$ faster on inference than the best existing ConvNet''. EfficientNets \footnote{the family of models that use the thecniques that described in \cite{efficient-net}} are models that instead of the of just increasing the depth or the width of the model, we increase all the parameters at the same time by a constant value. By not scaling only depth, EfficientNets can acquire more information about the images, specially the image size is considered.
To test their results, the EfficientNet team created a baseline model which as a building block used the mobile inverted bottleneck MBConv \cite{inverted-bottleneck-mobilenet}. The baseline model was then scaled using the compound method, which resulted in better top-1 and top-5 accuracy.
While EfficientNets are smaller than their non-EfficientNet counterparts, they are more computational intensive, a ResNet-50 scaled using the EfficientNet compound scaling method is $3\%$ more computational intensive than a ResNet-50 scaled using only depth while improving the top-1 accuracy by $0.7\%$.
And as the model will be trained and run multiple times decreasing the computational cost might be a better overall target for sustainability then being able to offer higher accuracies.
Even though scaling using the EfficientNet compound method might not yield the best results using some EfficientNets what were optimized by the team to would be optimal, for example, EfficientNet-B1 is both small and efficient while still obtaining $79.1\%$ top-1 accuracy in ImageNet, and realistically the datasets that this system will process will be smaller and more scope specific than ImageNet.
% \subsection{Efficiency of transfer learning}
% \subsection{Creation Models}
% The models that I will be creating will be Convolutional Neural Network(CNN) \cite{lecun1989handwritten,fukushima1980neocognitron}.
% The system will be creating two types of models that cannot be expanded and models that can be expanded. For the models that can be expanded, see the section about expandable models.
% The models that cannot be expanded will use a simple convolution blocks, with a similar structure as the AlexNet \cite{krizhevsky2012imagenet} ones, as the basis for the model. The size of the model will be controlled by the size of the input image, where bigger images will generate more deep and complex models.
% The models will be created using TensorFlow \cite{tensorflow2015-whitepaper} and Keras \cite{chollet2015keras}. These theologies are chosen since they are both robust and used in industry.
% \subsection{Expandable Models}
% The current most used approach for expanding a CNN model is to retrain the model. This is done by, recreating an entire new model that does the new task, using the older model as a base for the new model \cite{amazon-rekognition}, or using a pretrained model as a base and training the last few layers.
% There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models.
\pagebreak
\section{System Analysis}
\subsection{Introduction}
Understanding the project that is being built is critical in the software deployment process, this section will look into the required parts for the project to work.
As a SaaS project, there are some required parts that the project needs to have:
\begin{itemize}
\item{Web platform}
\item{JSON API}
\item{Server Management}
\item{Dataset Management}
\item{Model Management}
\end{itemize}
\subsection{Overall structure}
The system needs to have some level of distributivity, this requirement exists because of the expensive nature of machine learning training.
It would be unwise to perform machine learning training on the same machine that the main web server is running, as it would starve that server of resources.
\subsection{Resources}
The system has to manage what servers are available to do machine learning tasks.
The system has to be aware and manage all GPU servers, servers that have GPUs available, and run the possible models.
\subsection{Web platform}
The web app is where users manage models, and data. The user will access the web app and configure the model, and manage that data set.
\subsection{JSON API}
A big part of a SaaS is the ability to communicate with other services, nowadays, the way that systems communicate with each other is using mostly JSON and Rest APIs \cite{json-api-usage-stats}. Since the system will need to communicate with other services to work as intended.
\subsection{Server Management}
Since AI training is notoriously expensive, the system cannot run on one server alone, as this would put too much strain on that server.
The system needs to be able to distribute the load between the multiple servers.
For that reason, the service needs to both be able to send training and prediction jobs to servers that have the resources to train models or predict classes from images.
The system has to be able to choose the servers to run the models in an optimized way.
For example, when training, send training jobs to the same server to prevent the server from having to reload the data again.
\subsection{Dataset Management}
Without data, the system cannot train models. And management of data is important as this data might contain some private data.
Such as biometrics, the system will need to be able to safely handle this data.
The system will also have to decide when to clear data, since storage space is also a resource that the system needs to manage.
\subsection{Model Management}
Once the model has been created, the system has to keep track of the model, as well as the actual accuracy of the model.
It has to record how much the model used so it can distribute the load from in different GPU servers.
\pagebreak
\section{System Design}
This section will discuss the design of the system.
The section will discuss the inter application interface, control platform, and server, dataset, and model management.
\subsection{Structure of the Service}
\begin{figure}
\begin{center}
\includegraphics{system_diagram}
\end{center}
\caption{Simplified diagram of the service}\label{fig:simplified_service_diagram}
\end{figure}
The service is designed to be a 4 tier structure:
\begin{itemize}
\item{Presentaion Layer}
\item{Api Layer}
\item{Work Layer}
\item{Database Layer}
\end{itemize}
This structure was selected because it allows separation of concerns to happen based on the resourses required by that layer.
The presentaion layer requires interactivity of the user, therefore it needs to be accessible from the outside, and be simple to use.
The presentaion layer consisnts of a webpage that interacts with the Api layer, to manage both the resourses allocated to users and administrators of the system.
More specific details of the implementaion can be found in \ref{web-app-design}.
The Api layer, controls the system, it's the interface that both the webpage and customer servers use to interact with the system.
\subsection{Inter Application Interface}
As a software as a service, one of the main requirements is to be able to communicate with other services.
The current main way that servers communicate over the internet is using https and a rest JSON API\cite{json-api-usage-stats}.
\subsection{Web application} \label{web-app-design}
Why use a web application to control the system?
The main purpose of the web application is to visually manage your account and your models.
The web interface allows the user to:
\begin{itemize}
\item{Manage data before training.} %TODO add image for proof
\item{Start the training process.} %TODO add image for proff
\item{Visualize the model training.} %TODO add image for proff
\item{Run Images thought the model.} %TODO add image for proff
\item{Expand the model with new classes.} %TODO add image for proff
\item{See the performance of the model.} %TODO add image for proff
\end{itemize}
\pagebreak
\section{Design Choices}
\subsection{Structure of the Service}
The system has to manage:
\begin{itemize}
\item{User data}
\item{Uploaded User Images / Remote User Images}
\item{User models}
\item{Generation of models}
\item{Training of models}
\item{Running of models}
\end{itemize}
The system is designed with a semi-monolithic approach \ref{fig:expandable_models_simple}. The management of the data, and generation of the models will be done in the monolith while the training/running of the models will be done in GPU dedicated nodes.
The overall workflow of a user who wants a model created would be:
\begin{itemize}
\item{The user requests the server to create a model with some base images and classes.}
\item{The system creates a model}
\item{The user requests the classification or confirmation of an image}
\end{itemize}
\subsection{Web app}
The goal of the project is to provide a software as a service platform for classification tasks. With that in mind, the service needs to have a way of controlling it. This will be achieved with a web interface.
The web-interface will have to manage:
\begin{itemize}
\item{User Data}
\item{Model Data}
\item{Dataset Data}
% TODO maybe resourse data e.i. the resourses the system is using to manage everthing
\end{itemize}
\subsection{Generation Models}
The system requires the generation of models \ref{fig:expandable_models_generator}. Generating all models based on one single model would decrease the complexity of the system, but it would not guarantee success.
The system needs to generate successful models, to achieve this, the system will be performing two approaches:
\begin{itemize}
\item{Database search}
\item{AutoML (secondary goal)}
\end{itemize}
The database search will consist of trying both previous models that are known to work to similar inputs, either by models that were previously generated by the system or known good models; base known architectures that are modified to match the size of the input images.
An example of the first approach would be to try the ResNet model, while the second approach would be using the architecture of ResNet and configuring the architecture so it is more optimized for the input images.
AutoML approach would consist of using an AutoML system to generate new models that match the task at hand.
Since the AutoML approach would be more computational intensive, it would be less desirable to run. Therefore, the approach would be for the database search to happen first, where known possibly good models would be first tested. If a good model is found, then the search stops and if no model is found, the system would resort to AutoML to find a suitable model.
\subsection{Models Training}
% The Training process follows % TODO have a flow diagram
The training of the models happens in a secondary Training Process(TP).
Once a model candidate is generated, the main process informs the TP of the new model. The TP obtains the dataset and starts training. Once the model finished training, it reports to the main process with the results. The main process then decides if the model matches the requirements. If that the case, then the main process goes to the next steps; otherwise, the system goes for the next model that requires training.
The TP when training the model decides when the training is finished, this could be when the training time has finished or if the model accuracy is not substantially increasing within the last training rounds.
During the training process, the TP needs to cache the dataset being used, this is because to create one model, the system might have to generate and train more than one model, during this process, if the dataset is not cached then time is spent reloading the dataset into memory.
\pagebreak
\section{Results} % TODO change this
% technological free overview
% \subsection{Web Interface}
% The user will interact with the platform form via a web portal. % why the web portal
% The web platform will be designed using HTML and a JavaScript library called HTMX \cite{htmx} for the reactivity that the pagers requires.
% The web server that will act as controller will be implemented using go \cite{go}, due to its ease of use.
% Go was chosen has the programming language used in the server due to its performance, i.e. \cite{node-to-go}, and ease of implementation. As compiled language go, outperforms other server technologies such as Node.js.
% Go also has easy support for C ABI, which might be needed if there is a need to interact with other tools that are implemented using C.
% The web server will also interact with python to create models. Then to run the models, it will use the libraries that are available to run TensorFlow \cite{tensorflow2015-whitepaper} models for that in go.
% \subsection{Creating Models}
% The models will be created using TensorFlow \cite{tensorflow2015-whitepaper}.
% TensorFlow was chosen because, when using frameworks like Keras \cite{chollet2015keras}, it allows the easy development of machine learning models with little code. While tools like PyTorch might provide more advanced control options for the model, like dynamic graphs, it comes at the cost of more complex python code. Since that code is generated by the go code, the more python that needs to be written, the more complex the overall program gets, which is not desirable.
% The original plan was to use go and TensorFlow, but the go library was lacking that ability. Therefore, I chose to use python to create the models.
% The go server starts a new process, running python, that creates and trains the TensorFlow model. Once the training is done, the model is saved to disk which then can be loaded by the go TensorFlow library.
% \subsection{Expandable Models}
% The approach would be based on multiple models. The first model is a large model that will work as a feature traction model, the results of this model are then given to other smaller models. These model's purpose is to classify the results of the feature extraction model into classes.
% The first model would either be an already existent pretrained model or a model that is automatically created by the platform.
% The smaller models would all be all generated by the platform, this model's purpose would be actually classification.
% This approach would offer a lot of expandability, as it makes the addition of a new class as easy as creating a new small model.
\pagebreak
\section{Appendix}
\begin{figure}
\begin{center}
\includegraphics[height=0.9\textheight]{expandable_models_simple}
\end{center}
\caption{Contains an overall view of the entire system}\label{fig:expandable_models_simple}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[height=0.9\textheight]{expandable_models_generator}
\end{center}
\caption{Contains an overall view of the model genration system}\label{fig:expandable_models_generator}
\end{figure}
\pagebreak
\section{References}
\printbibliography[heading=none]
% TODO add my job title
\end{document}