fyp-report/report/report.tex
Andre Henriques 11b956894c
All checks were successful
continuous-integration/drone/push Build is passing
Talking about the google vision
2023-12-20 16:07:27 +00:00

209 lines
14 KiB
TeX

%%% Preamble
\documentclass[11pt, a4paper]{article}
\usepackage[english]{babel} % English language/hyphenation
\usepackage{url}
\usepackage{tabularx}
\usepackage{pdfpages}
\usepackage{float}
\usepackage{graphicx}
\usepackage{svg}
\graphicspath{ {../images for report/} }
\usepackage[margin=2cm]{geometry}
\usepackage{hyperref}
\hypersetup{
colorlinks,
citecolor=black,
filecolor=black,
linkcolor=black,
urlcolor=black
}
\usepackage{cleveref}
%%% Custom headers/footers (fancyhdr package)
\usepackage{fancyhdr}
\pagestyle{fancyplain}
\fancyhead{} % No page header
\fancyfoot[L]{} % Empty
\fancyfoot[C]{\thepage} % Pagenumbering
\fancyfoot[R]{} % Empty
\renewcommand{\headrulewidth}{0pt} % Remove header underlines
\renewcommand{\footrulewidth}{0pt} % Remove footer underlines
\setlength{\headheight}{13.6pt}
% numeric
\usepackage[style=ieee,sorting=none,backend=biber]{biblatex}
\addbibresource{../main.bib}
% Write the approved title of your dissertation
\title{Automated image classification service}
% Write your full name, as in University records
\author{Andre Henriques, 6644818}
\date{}
%%% Begin document
\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage
\section{Introduction}
% This section should contain an introduction to the problem aims and objectives (0.5 page)
Currently, there are many classification tasks that are being done manually. These tasks could be done more effectively if there was tooling that would allow the easy creation of classification models, without the knowledge of data analysis and machine learning models creation.
The aim of this project is to create a classification service that requires zero user knowledge about machine learning, image classification or data analysis.
The system should allow the user to create a reasonable accurate model that can satisfy the users' need.
The system should also allow the user to create expandable models; models where classes can be added after the model has been created.
\subsection{Project Aim}
The project aims to create a platform where users can create different types of classification models without the users having any knowledge of image classification.
\subsection{Project Objectives}
This project's primary objectives are to:
\begin{itemize}
\item Create platform where the users can create and manage their models.
\item Create a system to automatically create and train models.
\item Create a system to automatically expand and reduce models without fully retraining the models.
\item Create an API so that users can interact programatically with the system.
\end{itemize}
This project extended objectives are to:
\begin{itemize}
\item Create a system to automatically to merge modules to increase efficiency.
\item Create a system to distribute the load of training the model's among multiple services.
\end{itemize}
\section{Literature and Techincal Review}
% 1 page of background and literature review. Here you will need to references things. Gamal et al.~\cite{gamal} introduce the concept of \ldots
\subsection{Intruduction}
This section reviews current existing thechnologies in the market that do image classification. It also reviews current image classification technologies, and which meats the requirements fot the project. This review also analysis methods that are use to distrubute the learning between various machines, and how to spread the load so miminum reloading of the models is required when running the model.
\subsection{Current existing classification platforms}
There are currently some existing software as a service(SaaS) platfomrs that do provide similar services to the ones this will project will be providing.
%Amazon provides bespoque machine learning services that if were contacted would be able to provide image classification services. Amazon provides general machine learning services \cite{amazon-machine-learning}.
Amazon provides an image classification service called ''Rekognition`` \cite{amazon-rekognition}. This services provides multiple services from face regonition, celebrity regonition, object regonition and others. One of this services is called custom labels \cite{amazon-rekognition-custom-labels} which provides the most similiar service, to the one this project is about. The custom labels service allows the users to provide custom datasets and labels and using AutoML the rekognition service would generate a model that allows the users to classify images acording to the generated model.
The models generated using Amazon's rekognition dont provide ways to update the number of labels that were originaly created without generating a new project which will envolve retraining a large part of the model which would envolve large downtime between being able to add new classes. Training models also could take 30 minutes to 24 hours \cite{amazon-rekognition-custom-labels-training} which cloud result in up to 24 hours of lag between the need of creating a new label and beeing able to classify that label. A problem also arrises when the uses needs to add more than one label at the same time, for example the user sees the need to create a new label and starts a new model training, but while the model is traning a new label is also needed the user now either stops the training of the new model and retrains a new one or waits until the one currently running stops and trains a new one. If new classification classes are required with frequency this might not be the best platform to choose.
%https://aws.amazon.com/machine-learning/ml-use-cases/
%https://aws.amazon.com/rekognition/image-features/
Similarly Google also has ''Cloud Vision Api`` \cite{google-vision-api} which provides similiar services to Amazon's Rekognition. But Google's Vision Api apears to be more targetd at videos than images, as indicated by their proce sheet \cite{google-vision-price-sheet}. They have tag and product idetifiers, where every image only has one tag or product. The product identififer system seams to work diferently than the Amazon's regonition and worked based on K neighorings giving the user similar products on not classification labels \cite{google-vision-product-recognizer-guide}.
This method is more effective at allowing users to add new types of products but as it does not give defined classes as the output the system does not give the target functionality that this project is hoping to achive.
\subsection{Alternatives to my Project}
There currently exist systems that do image classification, like Google Vision AI \cite{google-vision-api}, and Amazon's Rekoginition \cite{amazon-rekognition}.
Their tools, while providing similar services to what my project is supposed to do, it mostly focusses on general image classification rather than specific image classification, i.e. Car vs Boat, vs, Car model X vs Car model Y.
\subsection{Creation Models}
The models that I will be creating will be Convolutional Neural Network(CNN) \cite{lecun1989handwritten,fukushima1980neocognitron}.
The system will be creating two types of models that cannot be expanded and models that can be expanded. For the models that can be expanded, see the section about expandable models.
The models that cannot be expanded will use a simple convolution blocks, with a similar structure as the AlexNet \cite{krizhevsky2012imagenet} ones, as the basis for the model. The size of the model will be controlled by the size of the input image, where bigger images will generate more deep and complex models.
The models will be created using TensorFlow \cite{tensorflow2015-whitepaper} and Keras \cite{chollet2015keras}. These theologies are chosen since they are both robust and used in industry.
\subsection{Expandable Models}
The current most used approach for expanding a CNN model is to retrain the model. This is done by, recreating an entire new model that does the new task, using the older model as a base for the new model\cite{amazon-rekognition}, or using a pretrained model as a base and training the last few layers.
There are also unsupervised learning methods that do not have a fixed number of classes. While this method would work as an expandable model method, it would not work for the purpose of this project. This project requires that the model has a specific set of labels which does not work with unsupervised learning which has unlabelled data. Some technics that are used for unsupervised learning might be useful in the process of creating expandable models.
\section{Problem analysis \& design choices}
\subsection{Structure of the service}
The system needs to manage:
\begin{itemize}
\item{User data}
\item{Uploaded User Images / Remote User Images}
\item{User models}
\item{Generation of models}
\item{Training of models}
\item{Running of models}
\end{itemize}
The system is designed with a semi-monolithic approach. The management of the data, and generation of the models will be done in the monolith while the training/running of the models will be done in GPU dedicated nodes.
The overall workflow of a user who wants a model created would be:
\begin{itemize}
\item{The user requests the server to create a model with some base images and classes.}
\item{The system creates a model}
\item{The user requests the classification or confirmation of an image}
\end{itemize}
% TODO add diagram!
\includegraphics[height=\textheight]{expandable_models_simple}
\subsection{Generation models}
The system requires the generation of models. Generating all models based on one single model would decrease the complexity of the system, but it would not guarantee success.
The system needs to generate successful models, to achieve this, the system will be performing two approaches:
\begin{itemize}
\item{Database search}
\item{AutoML (secondary goal)}
\end{itemize}
The database search will consist of trying both previous models that are known to work to similar inputs, either by models that were previously generated by the system or known good models; base known architectures that are modified to match the size of the input images.
An example of the first approach would be to try the ResNet model, while the second approach would be using the architecture of ResNet and configuring the architecture so it is more optimized for the input images.
AutoML approach would consist of using an AutoML system to generate new models that match the task at hand.
Since the AutoML approach would be more computational intensive, it would be less desirable to run. Therefore, the approach would be for the database search to happen first, where known possibly good models would be first tested. If a good model is found, then the search stops and if no model is found, the system would resort to AutoML to find a suitable model.
% TODO add diagram
\includegraphics[height=\textheight]{expandable_models_generator}
% technological free overview
% \subsection{Web Interface}
% The user will interact with the platform form via a web portal. % why the web portal
% The web platform will be designed using HTML and a JavaScript library called HTMX \cite{htmx} for the reactivity that the pagers requires.
% The web server that will act as controller will be implemented using go \cite{go}, due to its ease of use.
% Go was chosen has the programming language used in the server due to its performance, i.e. \cite{node-to-go}, and ease of implementation. As compiled language go, outperforms other server technologies such as Node.js.
% Go also has easy support for C ABI, which might be needed if there is a need to interact with other tools that are implemented using C.
% The web server will also interact with python to create models. Then to run the models, it will use the libraries that are available to run TensorFlow \cite{tensorflow2015-whitepaper} models for that in go.
% \subsection{Creating Models}
% The models will be created using TensorFlow \cite{tensorflow2015-whitepaper}.
% TensorFlow was chosen because, when using frameworks like Keras \cite{chollet2015keras}, it allows the easy development of machine learning models with little code. While tools like PyTorch might provide more advanced control options for the model, like dynamic graphs, it comes at the cost of more complex python code. Since that code is generated by the go code, the more python that needs to be written, the more complex the overall program gets, which is not desirable.
% The original plan was to use go and TensorFlow, but the go library was lacking that ability. Therefore, I chose to use python to create the models.
% The go server starts a new process, running python, that creates and trains the TensorFlow model. Once the training is done, the model is saved to disk which then can be loaded by the go TensorFlow library.
% \subsection{Expandable Models}
% The approach would be based on multiple models. The first model is a large model that will work as a feature traction model, the results of this model are then given to other smaller models. These model's purpose is to classify the results of the feature extraction model into classes.
% The first model would either be an already existent pretrained model or a model that is automatically created by the platform.
% The smaller models would all be all generated by the platform, this model's purpose would be actually classification.
% This approach would offer a lot of expandability, as it makes the addition of a new class as easy as creating a new small model.
\appendix
\newpage
\section{References}
\printbibliography[heading=none]
% TODO add my job title
\end{document}