Statistical analysis via the curvature of data space

Kei Kobayashi, Orita Mitsuru, Henry P. Wynn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

Original languageEnglish
Title of host publicationBayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014
EditorsAli Mohammad-Djafari, Frederic Barbaresco, Frederic Barbaresco
PublisherAmerican Institute of Physics Inc.
Pages97-104
Number of pages8
ISBN (Electronic)9780735412804
DOIs
Publication statusPublished - 2015 Jan 1
Externally publishedYes
Event34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014 - Amboise, France
Duration: 2014 Sep 212014 Sep 26

Publication series

NameAIP Conference Proceedings
Volume1641
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Conference

Conference34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014
CountryFrance
CityAmboise
Period14/9/2114/9/26

Fingerprint

statistical analysis
metric space
curvature
cones
theorems
methodology

Keywords

  • CAT(0)
  • cluster analysis
  • curvature
  • extrinsic mean
  • intrinsic mean
  • metric cone
  • nonparametric analysis

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Kobayashi, K., Mitsuru, O., & Wynn, H. P. (2015). Statistical analysis via the curvature of data space. In A. Mohammad-Djafari, F. Barbaresco, & F. Barbaresco (Eds.), Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014 (pp. 97-104). (AIP Conference Proceedings; Vol. 1641). American Institute of Physics Inc.. https://doi.org/10.1063/1.4905968

Statistical analysis via the curvature of data space. / Kobayashi, Kei; Mitsuru, Orita; Wynn, Henry P.

Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014. ed. / Ali Mohammad-Djafari; Frederic Barbaresco; Frederic Barbaresco. American Institute of Physics Inc., 2015. p. 97-104 (AIP Conference Proceedings; Vol. 1641).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kobayashi, K, Mitsuru, O & Wynn, HP 2015, Statistical analysis via the curvature of data space. in A Mohammad-Djafari, F Barbaresco & F Barbaresco (eds), Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014. AIP Conference Proceedings, vol. 1641, American Institute of Physics Inc., pp. 97-104, 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014, Amboise, France, 14/9/21. https://doi.org/10.1063/1.4905968
Kobayashi K, Mitsuru O, Wynn HP. Statistical analysis via the curvature of data space. In Mohammad-Djafari A, Barbaresco F, Barbaresco F, editors, Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014. American Institute of Physics Inc. 2015. p. 97-104. (AIP Conference Proceedings). https://doi.org/10.1063/1.4905968
Kobayashi, Kei ; Mitsuru, Orita ; Wynn, Henry P. / Statistical analysis via the curvature of data space. Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014. editor / Ali Mohammad-Djafari ; Frederic Barbaresco ; Frederic Barbaresco. American Institute of Physics Inc., 2015. pp. 97-104 (AIP Conference Proceedings).
@inproceedings{f08c8985f2fb4fb8a1a29c726f0b4c8c,
title = "Statistical analysis via the curvature of data space",
abstract = "It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.",
keywords = "CAT(0), cluster analysis, curvature, extrinsic mean, intrinsic mean, metric cone, nonparametric analysis",
author = "Kei Kobayashi and Orita Mitsuru and Wynn, {Henry P.}",
year = "2015",
month = "1",
day = "1",
doi = "10.1063/1.4905968",
language = "English",
series = "AIP Conference Proceedings",
publisher = "American Institute of Physics Inc.",
pages = "97--104",
editor = "Ali Mohammad-Djafari and Frederic Barbaresco and Frederic Barbaresco",
booktitle = "Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014",

}

TY - GEN

T1 - Statistical analysis via the curvature of data space

AU - Kobayashi, Kei

AU - Mitsuru, Orita

AU - Wynn, Henry P.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

AB - It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

KW - CAT(0)

KW - cluster analysis

KW - curvature

KW - extrinsic mean

KW - intrinsic mean

KW - metric cone

KW - nonparametric analysis

UR - http://www.scopus.com/inward/record.url?scp=85063830092&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063830092&partnerID=8YFLogxK

U2 - 10.1063/1.4905968

DO - 10.1063/1.4905968

M3 - Conference contribution

T3 - AIP Conference Proceedings

SP - 97

EP - 104

BT - Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014

A2 - Mohammad-Djafari, Ali

A2 - Barbaresco, Frederic

A2 - Barbaresco, Frederic

PB - American Institute of Physics Inc.

ER -