AI as a Virtual Aerial Tour Guide

Chi Po-Lin’s Beyond Beauty: Taiwan from Above brought a new perspective on our familiar island. With this spirit, we architect a smart city service with artificial intelligence, targeting to explore the surroundings of our daily life from a unique angle. To develop a prototype, we fly drones over the campus of National Taiwan University, and Anping District, Tainan City to record the street view videos with 360 camera, and build an aerial tour application. 360 camera provides flexible viewing angles for processing the videos. In this way, we only need to fly over a road once, and we can present various flying experiences. Eco-House, NTU Library, or Drunken Moon Lake may be familiar to you, but did you see them from the sky? Please be invited to visit the campus of NTU in the drone’s eye view

text-to-speech attraction introudction

We integrate AILabs’s technology into our service. With text-to-speech, a virtual tour guide vividly introduces the attractions to users. Videos with AI-adjusted color, tone, brightness and contrast present a more beautiful scene to users. Drawing-style videos show a distinctive city view to visitors. Apart from aerial tour application, object detection algorithms find potential urban problems from drones. Meanwhile, we are building an AI-driven smart city system, which customizes our services for residents, government, visitors, and every single person. Below let us introduce the AI technology behind the scene.

style transferred Anping scene

Stability is a challenging problem of processing videos taken by drones. Due to the high speed rotation of propellers, raw videos have severe shake. However, common video stabilization algorithms only work for normal field of view videos, and 360 video stabilization is a topic less researched on. Thanks to the wide field of view, we build the sequence of accurate 360 camera positions along the video footage. Once we have the knowledge of camera positions, we can offset the shaking of video and stabilize the videos. Stable 360 videos ensure a pleasing viewing experience for users.

Color inconsistency between videos is another terrible issue for our platform. Because the videos were taken on different days, the light and weather conditions differ a lot. Abrupt color change in the scene makes users uncomfortable. Many trials have been made to overcome this problem. Prevalent style transfer models fail because of their non-photorealistic results. Finally, we take advantage of powerful deep learning image features to match semantically similar regions in images, and transfer the color of corresponding regions accordingly. For example, the color of tree is transferred to tree and the color of car is transferred to car. In this way, the videos taken on different days keep consistent color, and such videos immerse users into the virtual world.

                                                    left: original image, right: color transferred image

In addition to color transfer, sky replacement algorithm helps reduce the color inconsistency between videos. Some videos are shot on sunny days, but most on cloudy days. The sky conditions seriously affect the quality of videos. We develop a sky replacement algorithm to replace the cloud sky with a clear sky. First we use semantic segmentation models to detect the coarse skyline, and use matting algorithm to refine the details. Then, the sequence of 360 camera positions is built as 360 video stabilization. We rotate the sky image with the help of these camera positions, to simulate a new sunny sky. At the end, we composite the new sky with original videos to generate appealing street view videos. This sky replacement processing lets users imagine a virtual world never seen before.

                                              up: original image, down: sky replaced image


Object detection algorithm provides an accurate way to inspect the city in detail. It can answer the questions such as the traffic density, potential traffic incident area, and urban green coverage, etc. Object detection on normal field of view video is a mature technology in industry, but object detection on 360 videos is still at early stage. Current object detection methods seldom tackle with distorted objects in 360 videos. We are building an object detection model to work on 360 videos with special spherical convolution neural network. This network reshapes the convolution kernel while scanning through different viewing angle on 360 videos, and overcomes the problem of severe image distortion.

object detection on 360 videos

This is the stage we are now toward building an AI-powered aerial tour service. We keep exploring an interesting and innovative method to bring a unique perspective on our beautiful home.