SigsofTmax: Reanalysis of the softmax bottleneck

Sekitoshi Kanai, Yuki Yamanaka, Yasuhiro Fujiwara, Shuichi Adachi

Research output: Contribution to journalConference article

3 Citations (Scopus)

Abstract

Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning. However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck). In this paper, we propose an output activation function for breaking the softmax bottleneck without additional parameters. We re-analyze the softmax bottleneck from the perspective of the output set of log-softmax and identify the cause of the softmax bottleneck. On the basis of this analysis, we propose sigsoftmax, which is composed of a multiplication of an exponential function and sigmoid function. Sigsoftmax can break the softmax bottleneck. The experiments on language modeling demonstrate that sigsoftmax and mixture of sigsoftmax outperform softmax and mixture of softmax, respectively.

Original languageEnglish
Pages (from-to)286-296
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
Publication statusPublished - 2018 Jan 1
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2018 Dec 22018 Dec 8

Fingerprint

Chemical activation
Exponential functions
Probability distributions
Neural networks
Experiments
Deep learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Kanai, S., Yamanaka, Y., Fujiwara, Y., & Adachi, S. (2018). SigsofTmax: Reanalysis of the softmax bottleneck. Advances in Neural Information Processing Systems, 2018-December, 286-296.

SigsofTmax : Reanalysis of the softmax bottleneck. / Kanai, Sekitoshi; Yamanaka, Yuki; Fujiwara, Yasuhiro; Adachi, Shuichi.

In: Advances in Neural Information Processing Systems, Vol. 2018-December, 01.01.2018, p. 286-296.

Research output: Contribution to journalConference article

Kanai, S, Yamanaka, Y, Fujiwara, Y & Adachi, S 2018, 'SigsofTmax: Reanalysis of the softmax bottleneck', Advances in Neural Information Processing Systems, vol. 2018-December, pp. 286-296.
Kanai, Sekitoshi ; Yamanaka, Yuki ; Fujiwara, Yasuhiro ; Adachi, Shuichi. / SigsofTmax : Reanalysis of the softmax bottleneck. In: Advances in Neural Information Processing Systems. 2018 ; Vol. 2018-December. pp. 286-296.
@article{ef1c5400bf12496bbfbcc97d77738992,
title = "SigsofTmax: Reanalysis of the softmax bottleneck",
abstract = "Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning. However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck). In this paper, we propose an output activation function for breaking the softmax bottleneck without additional parameters. We re-analyze the softmax bottleneck from the perspective of the output set of log-softmax and identify the cause of the softmax bottleneck. On the basis of this analysis, we propose sigsoftmax, which is composed of a multiplication of an exponential function and sigmoid function. Sigsoftmax can break the softmax bottleneck. The experiments on language modeling demonstrate that sigsoftmax and mixture of sigsoftmax outperform softmax and mixture of softmax, respectively.",
author = "Sekitoshi Kanai and Yuki Yamanaka and Yasuhiro Fujiwara and Shuichi Adachi",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "2018-December",
pages = "286--296",
journal = "Advances in Neural Information Processing Systems",
issn = "1049-5258",

}

TY - JOUR

T1 - SigsofTmax

T2 - Reanalysis of the softmax bottleneck

AU - Kanai, Sekitoshi

AU - Yamanaka, Yuki

AU - Fujiwara, Yasuhiro

AU - Adachi, Shuichi

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning. However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck). In this paper, we propose an output activation function for breaking the softmax bottleneck without additional parameters. We re-analyze the softmax bottleneck from the perspective of the output set of log-softmax and identify the cause of the softmax bottleneck. On the basis of this analysis, we propose sigsoftmax, which is composed of a multiplication of an exponential function and sigmoid function. Sigsoftmax can break the softmax bottleneck. The experiments on language modeling demonstrate that sigsoftmax and mixture of sigsoftmax outperform softmax and mixture of softmax, respectively.

AB - Softmax is an output activation function for modeling categorical probability distributions in many applications of deep learning. However, a recent study revealed that softmax can be a bottleneck of representational capacity of neural networks in language modeling (the softmax bottleneck). In this paper, we propose an output activation function for breaking the softmax bottleneck without additional parameters. We re-analyze the softmax bottleneck from the perspective of the output set of log-softmax and identify the cause of the softmax bottleneck. On the basis of this analysis, we propose sigsoftmax, which is composed of a multiplication of an exponential function and sigmoid function. Sigsoftmax can break the softmax bottleneck. The experiments on language modeling demonstrate that sigsoftmax and mixture of sigsoftmax outperform softmax and mixture of softmax, respectively.

UR - http://www.scopus.com/inward/record.url?scp=85064846404&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064846404&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85064846404

VL - 2018-December

SP - 286

EP - 296

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

SN - 1049-5258

ER -