I selected Svelte because it is been one of the most liked frameworks to work with in the last years, accordingly to the State of JS survey \cite{state-of-js-2022}.
It is also one of the best performant frameworks that is currently available that has extremity good performance \cite{js-frontend-frameworks-performance}.
This is done in the page shown in Figure \ref{fig:create_model}, the user provides a name for the model and an image and then presses the button create.
The ``Model'' tab, contains only relevant actions that the user can take.
In Figure \ref{fig:model_page}, the user has created a model but has not added training data, so the page shows a section where the user can input training data.
The user is given instruction on how to create the zip file so that the system can easily process the data, the upload set can be seen in Figure \ref{fig:upload_data_section}.
This process was original slow as the system did not have the capabilities to parallelize the process of importing the images, but this was implemented, and the import process was improved.
This step will appear both in the main tab of the model page and in the dataset tab. Once the user instructs the system to start training, the model page will become the training page, and it will show the progress of the training of the model.
During the entire process of creating new classes in the model and retraining the model, the user can still perform all the classifications tasks they desire.
On the administrator, users should be able to change the status of tasks as well as see a more comprehensive view on how the tasks are being performed.
Administrator users can see the current status of runners, as well as which task the runners are doing, the Figure \ref{fig:runner_page} shows the runner visualisation page.
After connecting to the database, the application performs pre-startup checks to make sure no tasks that were interrupted via a server restart and were not left in an unrecoverable state.
Once the checks are done, the application creates workers, which will be explained in section \ref{impl:runner}, which when completed the API server is finally started up.
Information about the API is shown around the web page so that the user can see information about the API right next to where the user would normally do the action, providing a good user interface.
Model generation happens on the API server, the API server analyses what the image that was provided and generates several model candidates accordingly.
The model generation subsystem decides the structure of the model candidates based on the image size, it prioritises the smaller models for smaller images and convolution networks with bigger images.
The depth is controlled both by the image size and number of outputs, models candidates that need to be expanded are generated with bigger values to account for possible new values.
It tries to generate the optimal size if only one model is requested.
If more than one is requested then the generator tries to generate models of various types and sizes, so if there is possible smaller model it will also be tested.
The runner, when it needs to perform training it generates a python script tailored to the model candidate that needs to be trained, then runs the that python script, and monitors the result of the python script.
If there is too much difference in accuracy, from the best model to the worst model, then the system might decide not to continue training a certain candidate and focus the training resources on candidates that are performing better.
Expandable models follow mostly the same process as the normal models.
First, bigger model candidates are generated.
Then the models are training using the same technic.
At the end, after the model candidate has been promoted to the full model, the system starts another python process that loads the just generated model and splits into a base model and a head model.
During the expanding process, the generation system creates a new head candidate that matches the newly added classes.
The base model, that was created in the original training process, is used with all available data to train the head candidate to perform the classification tasks.
The training process is similar to the normal training system, but this uses a different training script.
Once the model has finished training and the system meets the accuracy requirements, then makes the new head available for classification.
The user then can finally use the API to obtain the results of the model.
\subsubsection*{Expandable Models}
For expandable models, the inference step is very similar to the normal models.
The runner first loads the base model and runs the image through the model, the resultant features are then stored.
Once the features are obtained, the system then runs those features to the various possible heads, and the results are then matched with the stored classes, and the one with the height probability is then selected.
Runners are the name used to reference to the software that can perform CPU or GPU intensive tasks without halting the main API server.
Architecturally, they were implemented as a controller and worker pattern.
When the application that runs the main application starts, the system creates an orchestrator, this orchestrator is a piece of software that decides what work each runner is doing.
The orchestrator runs on go routine created at startup.
During the startup, the orchestrator by obtaining values from the configuration file.
Those values define the number of local runners.
These runners, which are started up by the orchestrator, act as local runners, runners that are running on the same machine as the main server.
Local runners are useful when running the main server on a machine that also has GPU power available to it, and in testing.
Local runners, run inside a go routine, this allows the runners and the orchestrator to communicate using go channels, which are the easiest way to communicate between two go routines.
The orchestrator is constantly waiting to receive either for a timer-based event or a runner-based event.
Timer-based events happen when the orchestrator internal clock informs it of needing to check if tasks are available to run.
The time at which this clock ticks in configured in the settings of the app.
Upon receiving a timer-based event, the orchestrator then checks if there is a new task available for it to run and if there are any runners available for the task to run on.
If there are tasks available, then the orchestrator instructors the runner to pick up the task and run it.
Runner-based events happen when a runner finishes running a task or crashes while trying to do it.
Upon receiving a runner event, the orchestrator checks if it is a success or a failure message.
If it is a failure message and the runner is a local runner, then the orchestrator just restarts the runner.
Upon restart, it adds the runner to the list of available runners.
If the runner is a remote runner, the orchestrator marks the runner as failed and stops sending messages to the runner until the runner informs the service again that is available.
If the message is of success, then the orchestrator just adds the runner to the list of viable runners, independently if the runner is remote or not.
The design was envisioned to be the best possible version of this service, but scope was restrained to the necessities of the system while it was being developed.
And possible features that would make the implemented application closer to the ideal design could have been implemented if there was higher need during the development timeline.
\section{Legal, Societal, Ethical and Professional Considerations}\label{sec:lsec}
This section will address possible legal, societal, ethical and professional issues that might arise from the deployment of the software being designed.
Legal issues might occur due to image uploaded images. For example, those images could be copyrighted, or the images could be confidential. The service is designed to provide ways to allow users to host their images without having to host the images itself, moving the legal requirement to the management of the data to the user of the system.
The General Data Protection Regulation (GDPR) (GDPR, 2018) is a data protection and privacy law in the European Union and the European Economic Area, that has also been implemented into British law.
The main objective of the GDPR is to minimise the data collected by the application for purposes that are not the used in the application, as well as giving users the right to be forgotten.
Once the there is no more work that requires the data being done, the system will remove all relevant identifiable references to that data.
\subsection{Social Issues}
The web application was designed to be easy to use and there tries to consider all accessibility requirements.
% TODO talk about this
% The service itself could raise issues of taking jobs that are currently done by humans.
% This is less problematic as time has shown that the jobs just change, instead of manually classifying the images, the job transforms from the classifying all the images that are needed to maintain and verifying that the data being input to the model is correct.
\subsection{Ethical Issues}
While the service itself does not raise any ethical concerns. The data that the service will process could raise ethical complications.
As a member of the British Computer Society (BCS), it is important to follow the Code of Conduct practices. The code of conduit contains 4 key principles.
\subsubsection*{Public interest}
This project tries to consider the public health, privacy, and security of third parties and therefore follows the principle of public interest.
\subsubsection*{Professional Competence and Integrity}
This project has been an enormous undertaking that pushed the limits of my capabilities.
I am glad that I was able to use this opportunity to learn about distributed systems, image classification, go, and Svelte.
During this project, I also followed the best practices of software development, such as using source control software and having an audit to tasks and issues.
For the duration of the project, all the guidelines provided by the University of Surrey were followed.
\subsubsection*{Duty to the Profession}
During the research, design, and implementation, and report state all interactions with the supervisor of the project have been professional, respectful, and honest.
To the best of my ability, I tried to design a system that would contribute to the field.