Datasets are the lifeblood of machine learning algorithms — they “teach” artificial intelligence (AI) facts about the world, in a manner of speaking. And in domains such as autonomous driving, it’s vitally important they’re of the highest quality.

That’s why Scale, a San Francisco-based data labeling startup, today announced the release of a self-driving dataset called nuScenes that it claims surpasses in size and accuracy public datasets like KITTI, Baidu’s ApolloScape, and the Udacity Self-Driving Car library.

“We’re proud to provide the annotations … as the most robust open source multi-sensor self-driving dataset ever released,” said Scale CEO Alexandr Wang. “We believe this will be an invaluable resource for researchers developing autonomous vehicle systems, and one that will help to shape and accelerate their production for years to come.”

Scale partnered with autonomous car startup nuTonomy to compile more than 1,000 scenes containing 1.4 million images, 400,000 sweeps of lidars (laser-based systems that judge the distance the distance between objects), and 1.1 million three-dimensional bounding boxes (objects detected with a combination of RGB cameras, radar, and lidar). They’ve been meticulously labeled through Scale’s Sensor Fusion Annotation API, which taps AI and teams of humans for data annotation, and they are open-sourced starting this week.

Self-driving car datasets aren’t exactly a rare commodity — just this summer, Oregon-based Flir Systems released 10,000 labeled photos captured by its thermal camera system, Mapillary published 25,000 street-level images, and the University of California Berkeley uploaded 100,000 video sequences captured by RGB cameras. But Scale and nuTonomy claim that nuScenes is more comprehensive than any similar dataset that’s come before it.

As Scale explains on its website, it used a combination of six cameras, one lidar, five radars, GPS, and an inertial measurement sensor to capture the nuScenes data. And driving routes in Singapore and Boston were specifically chosen to showcase “challenging” locations, times, and weather conditions.

“Scale has been the ideal partner for us in the production of the annotations for the nuScenes open source lidar, radar, and camera image dataset,” said Oscar Beijbom, machine learning lead at nuTonomy. “Scale’s outstanding agility, tooling, scalability, and quality made them our preferred partner and the natural choice for annotation-partner.”

Scale, which competes against the likes of Mighty AI, Appen, Cloud Factory, Samasource, and Amazon’s Mechanical Turk, has labeled more than 200,000 million miles for clients that include Lyft, Voyage, General Motors, Zoox, and Embark since its founding in 2016. It recently expanded its work into robotics, drones, virtual assistants, and “other solutions” that depend heavily on AI, and in August Scale announced an $18 million funding round led by Index Ventures, with participation from Accel and Y Combinator.

The startup has raised $22.7 million to date and reports that revenue grew 15 times over the past year.



Please enter your comment!
Please enter your name here