### Abstract

It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

Original language | English |
---|---|

Title of host publication | Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014 |

Editors | Ali Mohammad-Djafari, Frederic Barbaresco, Frederic Barbaresco |

Publisher | American Institute of Physics Inc. |

Pages | 97-104 |

Number of pages | 8 |

ISBN (Electronic) | 9780735412804 |

DOIs | |

Publication status | Published - 2015 Jan 1 |

Externally published | Yes |

Event | 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014 - Amboise, France Duration: 2014 Sep 21 → 2014 Sep 26 |

### Publication series

Name | AIP Conference Proceedings |
---|---|

Volume | 1641 |

ISSN (Print) | 0094-243X |

ISSN (Electronic) | 1551-7616 |

### Conference

Conference | 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014 |
---|---|

Country | France |

City | Amboise |

Period | 14/9/21 → 14/9/26 |

### Fingerprint

### Keywords

- CAT(0)
- cluster analysis
- curvature
- extrinsic mean
- intrinsic mean
- metric cone
- nonparametric analysis

### ASJC Scopus subject areas

- Physics and Astronomy(all)

### Cite this

*Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014*(pp. 97-104). (AIP Conference Proceedings; Vol. 1641). American Institute of Physics Inc.. https://doi.org/10.1063/1.4905968

**Statistical analysis via the curvature of data space.** / Kobayashi, Kei; Mitsuru, Orita; Wynn, Henry P.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014.*AIP Conference Proceedings, vol. 1641, American Institute of Physics Inc., pp. 97-104, 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014, Amboise, France, 14/9/21. https://doi.org/10.1063/1.4905968

}

TY - GEN

T1 - Statistical analysis via the curvature of data space

AU - Kobayashi, Kei

AU - Mitsuru, Orita

AU - Wynn, Henry P.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

AB - It has been known that the curvature of data spaces plays a role in data analysis. For example, the Frechet mean (intrinsic mean) always exists uniquely for a probability measure on a non-positively curved metric space. In this paper, we use the curvature of data spaces in a novel manner. A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. The population version defines distance by the amount of probability mass accumulated on travelling between two points and geodesic metric arises from the shortest path version. Such metrics are then transformed in a number of ways to produce families of geodesic metric spaces. Empirical versions of the geodesics allow computation of intrinsic means and associated measures of dispersion. A version of the empirical geodesic is introduced based on some metric graphs computed from the sample points. For certain parameter ranges the spaces become CAT(0) spaces and the intrinsic means are unique. In the graph case a minimal spanning tree obtained as a limiting case is CAT(0). In other cases the aggregate squared distance from a test point provides local minima which yield information about clusters. This is particularly relevant for metrics based on so-called metric cones which allow extensions to CAT(κ) spaces. We show how our methods work by using some actual data. This paper is a summary of a longer version [5]. See it for proof of theorems and details.

KW - CAT(0)

KW - cluster analysis

KW - curvature

KW - extrinsic mean

KW - intrinsic mean

KW - metric cone

KW - nonparametric analysis

UR - http://www.scopus.com/inward/record.url?scp=85063830092&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063830092&partnerID=8YFLogxK

U2 - 10.1063/1.4905968

DO - 10.1063/1.4905968

M3 - Conference contribution

T3 - AIP Conference Proceedings

SP - 97

EP - 104

BT - Bayesian Inference and Maximum Entropy Methods in Science and Engineering, MaxEnt 2014

A2 - Mohammad-Djafari, Ali

A2 - Barbaresco, Frederic

A2 - Barbaresco, Frederic

PB - American Institute of Physics Inc.

ER -