Data Labeling: A Potential and Problematic Industry Behind AI
Back to Newsroom
Mentioned in this Article

Data Labeling: A Potential and Problematic Industry Behind AI

Wednesday, September 23, 2020 2:45 AM
Share this article now
Company Update

NEW YORK, NY / ACCESSWIRE / September 23, 2020 / Data labeling is not as mysterious as AI. To put it in a simple way, it applies multiple labeling tools to process data, the basic element of AI, so as to make data understandable for computer version and "teach" AI to identify, judge and act like human beings. If data serves like oil for AI, data labeling is to refine crude oil into gasoline.

At present, data labeling has been powering various industries such as autonomous driving, agriculture, healthcare, retail to turn them more efficient through the AI revolution.

For example, Baidu's AI data annotation center finished a labeling project for facial recognition with masks during the covid-19 period. Data labelers need to mark key points on human's eyebrows, eyes and cheekbones so that AI scanners can identify human faces and measure their temperature even when they wear masks.

According to Fractovia, data annotation tools market was valued at $650 million in 2019 and is projected to surpass $5 billion by 2026. Another report released by McKinsey in April 2017 estimates that the total market for AI applications may reach $127 billion by 2025. The expected market growth refers to the increasing demand of high-quality data labeling service for the AI industry development.

However, compared to the fancy high-tech AI, data labeling is labor-intensive in essence. Considering their great contributions to fueling AI industry, data labelers deserve more attention to improve their treatment and social status. The number of full-time data labelers in China has reached up to 100,000 and part-time labelers almost totaled 1 million. An ordinary data labeler in Baidu AI center labels 1,300 images and earns less than 25 dollars every day, which is much better than the cheaper labor force of small labeling teams in less developed counties and villages in China.

Data labeling industry has a low threshold for newcomers and it is more likely to be subcontracted by middlemen at all levels due to its huge amount, tight cost and schedule. Middlemen tend to lower the cost to seek higher profit. For a typical small label team with 20 staff, the labor cost is about $15-$ 25 per person a day. Unfortunately, such small label teams cannot guarantee the data quality and project delivery time due to various reasons such as incompetency, miscommunication, poor regulation and dysfunctional competition, which in turns wastes money and time for a couple AI companies.

"We are eager to find reliable and cost-effective data labeling teams. The accuracy and quality of the processed data determines the outcome of our machine learning training and final product," says Mr. Wang, a project manager in an AI company., a blockchain-driven data company, has also realized such urgent problems in the data labeling industry and committed itself to powering AI development through its automated data labeling platform.

Developers can create their data collection and labeling projects on Bytebridge's dashboard. The automated platform enables developers to customize various labeling projects and write down their specific requirements, upload raw dataset and control the labeling process for project management in a transparent and dynamic way. Developers can check the processed data, speed, estimated price and time without any limit of time and place.

In order to reduce data training time and cost when dealing with complicated tasks, has built up the consensus algorithm rules to optimize the labelling system: before task distribution, set a consensus index, such as 90%, for a task. If 90% of the labelling's results are basically the same, the system will consider they have reached a consensus. In this way, the platform can get a large amount of accurate data in a short time. If the machine learning model demands a higher accuracy of data annotation, for example, 99%, developers can use "multi-round consensus" to repeat tasks over again to improve the accuracy of final data delivery. Consensus algorithm mechanism can not only guarantee the data quality in an efficient way but also save budget through cutting out the middlemen and optimizing the work process with AI technology.

Bytebridge's easy-to-integrate API enables continuous feeding of high-quality data into machine learning system. Data can be processed 24/7 by the global partners, in-house experts through the distribution mechanism based on their education level, language capability and other parameters.

No middlemen, complete automation, access to a global, 24x7 workforce, more control over the project status, build by developers for developers, has cut off the intermediary costs and enabled AI companies to get projects done cost-effectively with its high-quality data services.

In the fierce market competition, only companies that focus on quality and service with their own complete and independent set of resources and technology could eventually survive. is one of such great companies in the data labeling industry and determined to accelerate the movement of AI revolution.


contact: [email protected]
company: bytebridge
phone: 010 - 53673971

SOURCE: TTC Foundation

TTC Foundation
Back to Newsroom
Copyright 2020 © ACCESSWIRE. All rights reserved. Privacy Policy  |  Terms and Conditions
Drop us a line: