Privacy Preserving
Visual SLAM

European Conference on Computer Vision 2020

Mikiya Shibuya* 1,2

Shinya Sumikura* 1

Ken Sakurada* 1

* The authors assert equal contribution and joint first authorship.

1 National Institute of Advanced Industrial Science and Technology (AIST)

2 Tokyo Institute of Technology

Abstract

This study proposes a privacy-preserving Visual SLAM framework for estimating camera poses and performing bundle adjustment with mixed line and point clouds in real time. Previous studies have proposed localization methods to estimate a camera pose using a line-cloud map for a single image or a reconstructed point cloud. These methods offer a scene privacy protection against the inversion attacks by converting a point cloud to a line cloud, which reconstruct the scene images from the point cloud. However, they are not directly applicable to a video sequence because they do not address computational efficiency. This is a critical issue to solve for estimating camera poses and performing bundle adjustment with mixed line and point clouds in real time. Moreover, there has been no study on a method to optimize a line-cloud map of a server with a point cloud reconstructed from a client video because any observation points on the image coordinates are not available to prevent the inversion attacks, namely the reversibility of the 3D lines. The experimental results with synthetic and real data show that our Visual SLAM framework achieves the intended privacy-preserving formation and real-time performance using a line-cloud map.

Preliminary

Inversion Attack

Pittaluga et al. proved that fine images at arbitrary viewpoints can be restored only from a sparse point cloud and its optional attributes [Pittaluga et al., CVPR'19]. They referred to this restoration as the inversion attack. Thus, in AR/MR applications, there is a risk of privacy leak, which is caused by restoring confidential information from a shared point cloud by using the inversion attack.

3D Line Cloud

To prevent the inversion attack, a map representation based on a 3D line cloud is proposed [Speciale et al., CVPR'19]. The line cloud is built by converting each 3D point to a 3D line that has a random orientation and passes through the original point. It is difficult to directly restore the original point cloud from the line cloud because the point coordinates can be reparameterized arbitrarily on the corresponding line.


Quoted from [Speciale et al., CVPR'19]

In addition, they also formulated a method for localizing an image in the prebuilt line-cloud. On the other hand, however, there has been no Visual SLAM algorithm that can estimate camera poses continuously and in real-time using a line cloud.

LC-VSLAM

We propose a Visual SLAM framework for real-time relocalization, tracking, and bundle adjustment (BA) with a map mixed with lines and points, which we call Line-Cloud Visual SLAM (LC-VSLAM). The main contributions of this study are three-fold:

Experimental Results

These are qualitative evaluation in which LC-VSLAM applied to fisheye and equirectangular videos. Each of the datasets contains two videos. One is used for building a line cloud. The other is used for tracking in the prebuilt line cloud and mapping unobserved areas.
Please see the teaser video for the perspective one.

NOTE: In the videos, lines are drawn as line "segments" for better visibility, but actually they do not have any ends.

Fisheye (CARLA)

Equirectangular (Campus)

Quantitative Evaluation

Localization

This is a comparison of one-shot localization performance from the viewpoints of tracking time of each frame, absolute pose errors (APEs). LC-VSLAM outperforms the previous method in terms of both consumed time and localization accuracy.

Tracking time [ms]APE trans [m]APE rot [deg]
p6L [Speciale et al.]140.30.78150.5896
LC-VSLAM (ours)31.090.19790.2841

Tracking Accuracy

This is a comparison of APEs between the two types of the prebuilt map: 3D point cloud and line cloud. Even when using a 3D line cloud as a prebuilt map, LC-VSLAM achieves comparable performance to the conventional one using a 3D point cloud.

APE trans [m] / rot [deg]CARLA
Perspective
CARLA
Fisheye
CARLA
Equirectangular
KITTI
point-cloud map3.290 / 0.62732.883 / 0.44023.079 / 0.23753.801 / 1.012
line-cloud map (ours)3.651 / 0.84163.177 / 0.59413.075 / 0.27664.488 / 1.309

Map Optimization Efficiency

The performance of the proposed global map optimization is also confirmed. APEs are compared under the three conditions: w/o global optimization, w/ the pose-graph optimization (PGO), and w/ the PGO and the global BA. The PGO and the global BA efficiently reduces errors of estimated trajectories.

APE trans [m] / rot [deg]CARLA
Perspective
CARLA
Fisheye
CARLA
Equirectangular
None24.06 / 1.29210.16 / 1.06414.28 / 3.682
w/ PGO3.301 / 1.1511.670 / 0.80399.640 / 2.790
w/ PGO & global BA3.018 / 1.1001.593 / 0.85258.320 / 2.404

Citation

@inproceedings{shibuya2020privacy,
  title = {Privacy Preserving Visual {SLAM}},
  author = {Mikiya Shibuya and Shinya Sumikura and Ken Sakurada},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020}
}

Contact

  • Mikiya Shibuya: shibuya.m.ab <at> m.titech.ac.jp
  • Ken Sakurada: k.sakurada <at> aist.go.jp