Nexar, an Israeli AI-dashcam company, has just announced a major tie-up with the University of California, Berkeley to release the world’s largest dataset of annotated visual driving scenes that could help researchers and companies train self-driving cars. Nexar says the BDD100K dataset is the largest and most diverse open driving dataset for computer vision research, consisting of 100,000 videos.
The videos are from tens of thousands of rides of normal drivers. They contain not only high resolution images, but also location and movement measurements to record the cars’ trajectories. In total, the dataset has 100,000 driving videos collected from more than 50,000 rides, covering New York, the San Francisco Bay Area, and other regions. Each video is 40 seconds long. The dataset contains diverse scene scenarios such as city streets, residential areas, and highways, and were recorded in diverse weather conditions (sunny/rainy/snowy) and at different times of day (daytime/nighttime/dusk/dawn). Overall, there are almost one million cars observable in the images.
Driving datasets have received increasing attention in the recent years, due to the rise in autonomous vehicle technology. What makes this dataset so unique and valuable for researchers, Nexar and Berkeley say, is that it’s large-scale, diverse (in terms of location, weather and time of day), and captured on real world roads from crowdsourced dash cams, so the driving scenarios are realistic. This is important for creating robust perception algorithms – a key element in training semi-autonomous and fully autonomous cars not to crash into humans or each other. Israel is a significant player in the emerging auto-tech market, with Israeli auto tech startups raising almost as much financing as similar U.S. companies last year, according to data by Start-Up Nation Central. They raised $814 million last year, triple the 2015 level, and $182 million in the first quarter of 2018, in line with last year’s pace, according to Start-Up Nation Central.
Nexar, one of Israel’s promising auto-tech and mobility startups, says the release of the dataset is a key milestone in autonomous and assisted driving research. We go under the hood of this partnership with Nexar Co-Founder and CTO Bruno Fernandez-Ruiz.
Q: What’s the context of the Nexar-Berkeley relationship
Fernandez-Ruiz: Nexar joined the Berkeley Deep Drive Industrial Consortium in August 2016. Nexar has been working closely with the research team at BDD and other BDD partners in the development of deep automotive applications. Since January 2018, Prof. Trevor Darrell of the EECS Dept has been Nexar’s Chief Scientist. Prof Darrell is the Director of BDD, and Co-Director of Berkeley Artificial Intelligence Research (BAIR) as a well as Faculty Director of California PATH (Partners for Advanced Transportation Technology). In addition, other members of the UC Berkeley EECS faculty, such as Dr. Fisher Yu have also become consultants to Nexar. This current dataset isn’t the first we released with BDD; last year we released a 55K dataset of street-level images. The interest in the first dataset was significant, and we are thrilled to see such excitement now also.
Q: How is the BDD work different from what Tesla/Waymo/Uber/Mobileye/HERE and others have out there?
Fernandez-Ruiz: BDD100K is an anonymized dataset collected from phone cameras at very high volume. To date, drivers using Nexar have driven more than 150 million miles. The accessibility of the phone and the low cost of the unit makes it really easy to collect incredibly large volumes of real-world data and to observe and learn from as many corner cases and edge conditions as possible. By corner cases and edge conditions, I mean highly unusual events that happen on the road or conditions that aren’t standard. This could range from unusual and extreme weather to collapsed power lines and more. In contrast, the solutions by others in the industry focus on high fidelity data collection, and as a result are not as easily accessible, not widely deployed, and observe and learn from a limited dataset, 10 times smaller than Nexar, that exhibits reduced variation. This is the reason why these companies end up using simulation to develop and test their algorithms, while Nexar and BDD100k can work with actual real field data.