New cameras on Rover!

Hey fam I am really excited to share the latest with Rover. I’m a little behind on my youtube releases so the next video won’t mention this yet, but I got new cameras for Rover!

This is the camera system:

It’s four 4k cameras with hardware synchronized shutters and the whole system can stream simultaneously at 4k 30fps for each camera! I’m actually going to be able to incorporate video from Rover’s point of view in my videos which I’m stoked about!

The point of these cameras is to finally give Rover a suitable sensor system for autonomous driving. I’ve worked with lidar before and they are very expensive sensors. I’ve long been interested in doing more with only cameras, and in fact I built Rover specifically so I could explore camera-based outdoor navigation. I’m finally getting there!

Below is a composite image from Rover’s two front cameras. To really appreciate the resolution, download the image or open it in a new window and zoom in! Here is a direct link to the image.


The cameras are positioned so they have only a little bit of stereo overlap front and rear. That stereo can be used to assist the algorithms which will learn structure from motion using algorithms like the types you’d see here:

And here’s Rover looking might fine!


Images licensed CC0 so do what you want with them.

For now I need to do some basic camera calibration and start to get a pipeline going. In time I will share a video with more details but I wanted to share a couple of awesome images I’ve gotten so far.

Love yall.

Last night I finally got GPU transcoding working which makes it much faster and easier to work with these massive video files! I downsampled and transcoded rover’s four 4k streams to four 720p streams in h265. Now I can open the files and easily seek through and pull out images. I’m not going to use full resolution for much so the 720p is fine for now, and I can also efficiently merge four 4k streams if I’d like.

Next step is setting up image segmentation for categories like “terrain, trail, sidewalk, road, building, car, person, animal, traffic cone, ball”. I’ll implement trail following routines as well as games that can be played in a field (set up cones as a perimeter and play chase and be chased, like I do with a dog I love).

Here’s some downsampled images (4x720p) from Rover’s four 4k camera system:








(Yes on that last one Rover went ice skating with other robots!)

At some point I’ll compile these clips in to a video but for now I wanted to share these samples. :slight_smile:

This video shows the kind of results I am hoping to achieve with trail segmentation.

Okay so the last few days I’ve been playing with some different image segmentation libraries, and I’m starting to get some good results!

The trick is that Rover has a powerful machine learning computer (Jetson Xavier) which runs fastest using a library called TensorRT from NVIDIA. TensorRT converts a neural network from something like Tensorflow or PyTorch to binary code that runs really fast on the Jetson computer. But not all neural network architectures are supported. For example Facebook’s Detectron2 uses some operations that TensorRT does not support, so it won’t run as fast on the Xavier as a fully supported network. I’m trying to find something that will take full advantage of Rover’s acceleration, so I need to find machine learning code that is both easy to understand for a noob like me and will compile to TensorRT.

NVIDIA does have some examples, but again since I’m a newbie I had some confusion with some of their example code. I finally landed on the Bonnetal library, which is specifically designed to support Rover’s computer.

I labeled about 30 images using coco-annotator.

I will slowly build a bigger dataset but 30 is enough to start if you use a network which has been pre-trained on another segmentation dataset. Bonnetal pulls in pre-trained network weights automatically at training start.

Here’s an example of a labeled image.

The colors are chosen at random but in this image green is “trail”, blue is “person”, yellow is “terrain” and pink is “buildings”.

I used ERFnet in bonnet to train on my images. Hopefully I can upload my code to github in the future as you do need to modify the python somewhat for your own dataset. But last night I trained it and its already showing promising results. Here’s an example result and its associated image:



Finally here is the image with the mark overlaid somewhat.


It looks pretty good! In the top half of the segmentation mask you can see it doesn’t quite get the trail all the way, so it’s not perfect. Of course I will make a much larger dataset in time, and perhaps there are some training settings I can use to improve that. I like that it marks the sign as “building” even though I wasn’t labeling signs as such. I should start labeling signs as their own class, but this was all just a quick test to see if I can get it working.

What’s really important here is I’ve got a workflow I can use now to train a neural net I think will run on Rover’s accelerated computer. That will be the next step for me - getting it to run on Rover’s Jetson Xavier. Once I get to the point where I’m getting results in real time on Rover I can start to write a trail following algorithm! Some day that will be a neural net but to start I’ll write the algorithm by hand. I feel like I will get trail following working in the next couple of months and I’m super excited!


I’m working on some cool stuff with photogrammetry and I recently added a spherical camera to Rover’s chassis. And by “added a spherical camera” I mean duct taped a GoPro to it.

I recorded video from rover’s four 4k cameras with shutter synchronization on, and also captured 0.5s interval 18MP spherical photos. I also recorded some 5.2k spherical video, though I see the camera has a new firmware for 5.6k video.

This is all a crash course in camera lens calibration and lots of computationally intensive stuff. It’s great!

I have not gotten good photogrammetry results with the spherical camera yet but I’ve seen that it can be done. I’ve reached out to one researcher who has done exactly what I’d like. Check our this great photogrammetry capture that researcher Maxime Lhuillier created.

I hope to produce similar results. Then I’m going to build it in to a video game for training a Rover reinforcement learning agent. I am currently learning Blender and Godot game engine to hopefully make that simulation/game work. Fun stuff!

Here’s Rover on the trail today lookin mighty fine.



I love you.