Skip to main content

One post tagged with "basics"

View All Tags

· 6 min read
Juraj Bezdek

Welcome to Labelator.io, no-code data-centric NLP machine learning platform that helps you from the very beginning, when you barely have enough data to train a model on, all the way to managing and constantly improving models in production.

Is Labelator.io for me?

Labelator.io will help you build a production-ready AI solution FAST. It can help collect and label the data and then train and deploy an AI model to automate this.

It's the perfect platform for those of you, who want to leverage the power of AI to automate any task that requires text understanding and make a decision based upon that.

The platform is designed to make it extremely easy to train, deploy and manage AI models, without having to worry about the details of data preparation, model selection, or infrastructure.

The best part comes however when you realize that you can never collect data for all the possible situations, that can happen. Here a typical AI model that you could build with traditional tools will fail, because the model has never seen such data and therefore was not trained to handle these situations. Labelator.io offers unique and comprehensive tools to handle these cases.

How can Labelator.io help you get started ?

Do you have labeled data? Great, just upload them and train the model. Labelator.io will help you find potential problems in your data, such as mislabels, and ensure that your model is trained on high-quality data.

But what if you have some samples, but they are not labeled yet? Even better. Labelator.io includes a powerful similarity-based bulk labeling tool that helps you annotate your data faster than anything else. You can use this tool to quickly label a large number of samples at once, even if you don't have an existing dataset to train on.

And what if you don't have any data yet? Never mind. Labelator.io will help you collect your first data and train an AI model to automate the manual work as you grow.

How will Labelator.io helps you run your models

Deploying models has never been this easy... with just a few clicks, you can deploy your model or its new version and have it available as an API.

Managing and monitoring models is hard. Handling edge cases, targeting weak spots, even harder. Labelator.io is there to help you solve everything that you will encounter when running the AI model in production. Our platform includes tools for monitoring model performance, identifying issues, and troubleshooting problems. We also provide a variety of tools for managing and deploying models, so you can easily deploy your model to production, test it in different environments, and monitor its performance over time.

Overall, Labelator.io is a powerful and easy-to-use platform that helps you every step of the way, from data collection to model deployment. Whether you're a beginner or an experienced machine learning practitioner, Labelator.io has everything you need to train

Unique Labelator.io features

Similarity based bulk labeling

Similarity based bulk labeling is a powerful feature that makes the data labeling process much more efficient. With this feature, Labelator.io automatically shows you how many similar documents have not been labeled yet. This can greatly improve the labeling speed since unlike other annotation software, Labelator.io allows you to label tens or even hundreds of similar documents with a single click. This saves a significant amount of time and effort, allowing you to focus on the most important task of providing accurate labels for your data.

Data observability

Another important problem Labelator.io help to solve is identification of potential problems in your data. The platform's built-in tools help you identify mislabels, inconsistencies and other issues that could negatively impact your model's performance. We do this by using the same backbone model used for prediction to generate semantical vectors that represents the training example in the same way as the model it self understands it. This helps you explore the data and seeing them through the lens of the AI.

Microtopic exploration and prediction analysis

Labelator.io automatically generates clusters of similar documents, called microtopics, across the entire semantic space. This feature is extremely helpful in understanding the content and labeling accuracy of your dataset. By using microtopics, you can explore what kind of data is in your dataset, and what the accuracy and labels are within each cluster. This allows you to quickly and easily identify any errors or inconsistencies in your data labels, and make the necessary adjustments.

Smart routing

One of the most powerful features of Labelator.io is its smart routing feature, which enables you to target potential problems when running a model in production.

This feature allows you to decide how the request will be handled by setting a threshold to semantic similarity to a certain topic, or to the training data that the model was trained on.

This allows you to automate the cases where the model is good at, and handle the weak spots manually. This helps you focus your effort where it is needed the most.

Backlog for reviewing edge cases from production

Based on your smart routing settings and thresholds, the data alongside the predictions, that have been selected for review are shown in backlog, where an operator can confirm, or correct them. The traditional approach here is to select a random sample of production data from the last period, review the predictions within the dataset and retrain the model. After evaluation of this model on some benchmark dataset, you'd deploy this version into production.

There are two main problems with this approach:

  • The review process is extremely ineffective. Let say that you have a real accuracy of your model 80%. If that's true, your random sample would contain 80% of correctly predicted data and the model wont learn anything new on these data. But thanks to smart routing, Labelator.io offers a solutions to target just these 20% (or rather 30%, due to some overlap) that are very likely to be mishandled by the AI and by focusing purely on those, you are improving you model much faster!
  • Once you've corrected the 20% of your wrongly predicted data, you probably corrected tha data in the dataset. But these data are a real data from production system. These data are still incorrect in you master system and you should correct them as well. Unfortunately, it is very common that this will never happen due to technical and/or privacy constraints. Labelator.io integration options includes webhooks that you can connect with your master system so once the labels in you dataset are corrected, you can hook on these changes and correct the labels or decisions made in your primary system.