We propose a cross-media music retrieval system that provides its users with a toolkit for formulating their own query by describing their emotional requirements for changes in mood of music as a sequence of images. To interpret the query, the system uses a metric space to convert the color changes in images into continuous tonal changes in music, and vice versa. The system provides delta functions for music and images to compute the changes as distance values in the metric space. Our system calculates the sentiment-oriented relevance score of the query and music by comparing their computed distance values. The advantage of the system is that it provides a bridge between heterogeneous image and music criteria by converting the visual impressions of images into a time-oriented invisible impression of music. This bridging operation is the foundation of our visual query construction method. This method allows users to search for music by subtly manipulating the form of a query in a trial-and-error manner by detecting the moods of music as color changes in images. This method is suitable for searching for unknown music that satisfies the preferences of users in web-based music resources that store many types of music, but without any publishing information related to the music.