Curating data can be a tedious job, but it needs to be done. And yet, it contains some of the fundamental problems of dealing with esports data; accuracy and uniformity.
Ideally, we want to create an environment that offers easy access and use, whilst also bringing the data into a unified format that can be assessed and monitored for accuracy. An easy user environment would suit the idea that we’ll have users enter their information for us directly.
Why do we need it?
The need for this sort of service pre-exists throughout esports, but also within Bayes itself, where the curation of data is currently run solely on human operation.
In order to unify a collection of data, it is necessary to clean said data and achieve a unified format for cleaning data in a central way. This is a tedious, repetitive task, which necessitates attention to detail and a firm, logical grasp on the somewhat chaotic input of esports data in general.
The solution to use machine learning is on hand, but leaves open a fundamental question: how does correct data look?
In order to tackle this problem, we needed a way that was more than a uniform look for data, we needed something that would be able to learn from our operators. A smart, learning service that would be able to present accurate data, the way we like it. And that’s exactly where the Recommendation Service comes in.
It was envisioned as something that could work like a Google auto-complete search, with the ability of machine learning in order to prioritize the information most likely sought out.
How does it work?
The Recommendation Service is based on a two-pronged approach. Firstly, it is supposed to ease manual curation, by creating a process that will eventually be automatized piece by piece in order to relieve tedious work.
A part of easing the manual work is the simple visualization of the process; we can see the data in a more visual way instead of just excel tables. Color association has been proven to lead to quicker human interaction, so it is an important part of the Recommendation Service in order to maximize human potential.
The second part of this two-way approach is the goal of having an effective and accurate automated curation. This would not eliminate human operation, but it would aid the process of moving away from manual operation. Supervised classifiers would improve through machine learning and allow the Recommendation Service to become something that can be taught, correctly, to make the same logical decisions when it comes to cleaning and providing accurate data, in the same, reliable manner as manual curation.
Of course, the overarching goal is still to eventually move away from manual curation and have a reliable, smart machine in place to take over this tedious but important task.
What are some of the hurdles we face?
Of course, with an unprecedented attempt like this, errors are bound to come in all shapes and sizes. Without anything to follow, a lot of improvisation is part of the learning process. By running a lot of trial and error, there’s no way to predict which parts will break, or what will cause potential errors.
As an example, if a source provides very inaccurate or inconsistent data, there is no precedent for how the Recommendation Service will interpret those inaccuracies, or how it will process them. It means that establishing a good, serviceable recommendation is a slow process that requires a lot of attention to detail.
Finishing the recommendation service, or rather, an early version for human tryouts, depends heavily on feedback, in order to fine tune specific problems or even unexpected behaviour. There really is no other way to avoid it, or even skip through problems. Each one needs to be addressed and adjusted, in order to build a truly dependable service that can be trusted as much as a human operator in the decisions it makes.
While the process cannot, in the foreseeable future, be entirely automated, we also intend to make the handling of the curation process as easy and pleasant for the human curator as possible. By providing tools and a user interface, as well as working heavily with feedback from the curators, we strive to improve the process every day.
What do we have in mind?
Gradually, we’re removing tasks from the human operator, whilst also using the curator’s feedback to make sure that the machine taking over said tasks performs them to the best of the curator’s knowledge and ability. This vital feedback allows the machine to be trained to make the same decisions - based on the correct patterns made by dedicated curators.
Not only does this maximize the machine’s ability to learn, but it also helps minimize human error; something that is essential to the process of providing unified, correct data.
Eventually the machine will give feedback as well, enabling curators to see new patterns for input and perhaps, new forms of improvement, creating a symbiosis between the learning machine and the human curators.
Ideally, we want to create an environment that offers easy access and use, whilst also bringing the data into a unified format that can be assessed and monitored for accuracy. An easy user environment would suit the idea that we’ll have users enter their information for us directly.