Learning an optimisable semantic segmentation map with image conditioned variational autoencoder

Pengcheng Zhuang, Yusuke Sekikawa, Kosuke Hara, Hideo Saito

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.

Original languageEnglish
Title of host publicationImage Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings
EditorsElisa Ricci, Nicu Sebe, Samuel Rota Bulò, Cees Snoek, Oswald Lanz, Stefano Messelodi
PublisherSpringer Verlag
Pages379-389
Number of pages11
ISBN (Print)9783030306441
DOIs
Publication statusPublished - 2019 Jan 1
Event20th International Conference on Image Analysis and Processing, ICIAP 2019 - Trento, Italy
Duration: 2019 Sep 92019 Sep 13

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11752 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Image Analysis and Processing, ICIAP 2019
CountryItaly
CityTrento
Period19/9/919/9/13

Fingerprint

Segmentation
Pixel
Pixels
Semantics
Workspace
Latent Variables
Ambiguous
Encoder
Color Image
Real-world Applications
Learning Process
Neural Network Model
Classification Problems
Regression
Likely
Color
Neural networks
Partial
Learning
Optimization

Keywords

  • Optimization
  • Semantic segmentation
  • Variational autoencoder

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhuang, P., Sekikawa, Y., Hara, K., & Saito, H. (2019). Learning an optimisable semantic segmentation map with image conditioned variational autoencoder. In E. Ricci, N. Sebe, S. Rota Bulò, C. Snoek, O. Lanz, & S. Messelodi (Eds.), Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings (pp. 379-389). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11752 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-30645-8_35

Learning an optimisable semantic segmentation map with image conditioned variational autoencoder. / Zhuang, Pengcheng; Sekikawa, Yusuke; Hara, Kosuke; Saito, Hideo.

Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings. ed. / Elisa Ricci; Nicu Sebe; Samuel Rota Bulò; Cees Snoek; Oswald Lanz; Stefano Messelodi. Springer Verlag, 2019. p. 379-389 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11752 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhuang, P, Sekikawa, Y, Hara, K & Saito, H 2019, Learning an optimisable semantic segmentation map with image conditioned variational autoencoder. in E Ricci, N Sebe, S Rota Bulò, C Snoek, O Lanz & S Messelodi (eds), Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11752 LNCS, Springer Verlag, pp. 379-389, 20th International Conference on Image Analysis and Processing, ICIAP 2019, Trento, Italy, 19/9/9. https://doi.org/10.1007/978-3-030-30645-8_35
Zhuang P, Sekikawa Y, Hara K, Saito H. Learning an optimisable semantic segmentation map with image conditioned variational autoencoder. In Ricci E, Sebe N, Rota Bulò S, Snoek C, Lanz O, Messelodi S, editors, Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings. Springer Verlag. 2019. p. 379-389. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-30645-8_35
Zhuang, Pengcheng ; Sekikawa, Yusuke ; Hara, Kosuke ; Saito, Hideo. / Learning an optimisable semantic segmentation map with image conditioned variational autoencoder. Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings. editor / Elisa Ricci ; Nicu Sebe ; Samuel Rota Bulò ; Cees Snoek ; Oswald Lanz ; Stefano Messelodi. Springer Verlag, 2019. pp. 379-389 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{37f78f2dcdd2496c8035f51ab8c537ab,
title = "Learning an optimisable semantic segmentation map with image conditioned variational autoencoder",
abstract = "Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.",
keywords = "Optimization, Semantic segmentation, Variational autoencoder",
author = "Pengcheng Zhuang and Yusuke Sekikawa and Kosuke Hara and Hideo Saito",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-30645-8_35",
language = "English",
isbn = "9783030306441",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "379--389",
editor = "Elisa Ricci and Nicu Sebe and {Rota Bul{\`o}}, Samuel and Cees Snoek and Oswald Lanz and Stefano Messelodi",
booktitle = "Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Learning an optimisable semantic segmentation map with image conditioned variational autoencoder

AU - Zhuang, Pengcheng

AU - Sekikawa, Yusuke

AU - Hara, Kosuke

AU - Saito, Hideo

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.

AB - Recent semantic segmentation systems have achieved significant improvement by performing pixel-wise training with hierarchical features using deep convolutional neural network models. While the learning process usually requires pixel-level annotated images, it is difficult to get desirable amounts of fine-labeled data and thus the training set size is more likely to be limited, often in thousands. This means that top methods for a dataset can be fine-tuned for a specific situation, making the generalization ability unclear. In real-world applications like self-driving systems, ambiguous region or lack of context information can cause errors in the predicted results. Resolving such ambiguities is crucial for subsequent operations to be performed safely. We are inspired by work from CodeSLAM where optimizable pixel-wise depth representation is learned. We modify the regression method to work on the pixel-wise classification problem. By training a variational auto-encoder network conditioned with a color image, the computed latent space works as a low-dimensional representation of semantic segmentation, which can be efficiently optimized. As a consequence, our model can correct the error or ambiguity of the prediction during the inference phase given useful scene information. We show how this approach works by giving partial scene truth and perform optimization on the latent variable.

KW - Optimization

KW - Semantic segmentation

KW - Variational autoencoder

UR - http://www.scopus.com/inward/record.url?scp=85072901121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072901121&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-30645-8_35

DO - 10.1007/978-3-030-30645-8_35

M3 - Conference contribution

AN - SCOPUS:85072901121

SN - 9783030306441

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 379

EP - 389

BT - Image Analysis and Processing – ICIAP 2019 - 20th International Conference, Proceedings

A2 - Ricci, Elisa

A2 - Sebe, Nicu

A2 - Rota Bulò, Samuel

A2 - Snoek, Cees

A2 - Lanz, Oswald

A2 - Messelodi, Stefano

PB - Springer Verlag

ER -