Sunday, March 29, 2015

Scale Tonics and Strange Circles

As described in this previous post, the  text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.



Figure I-2. The pattern of weights from the input units of the scale tonic perceptron to one of its output units are presented at the top.  The bottom illustrates the two circles of major seconds.

An earlier interlude noted that interpretations of musical networks often reveal strange circles.  However, in many respects the interpretation of the weights of the scale tonic perceptron (a pattern like the upper part of Figure I-2) seems very conventional.  It was argued (using Figures 3-5 and 3-6) the weights in the upper part of Figure I-2 contains the pattern of musical intervals that defines a major scale as well as the pattern of musical intervals that defines a harmonic minor scale.

In this interlude, we discover that from another perspective the weights of the scale tonic perceptron also lead to strange circles.  An analysis of these weights reveals the presence of the two circles of major seconds that are presented in the bottom half of Figure I-2.

The tonal hierarchy (Krumhansl, 1990) discussed in Chapter 1 was a theory about the relationships between pitch-classes.  Krumhansl explored the structure of the tonal hierarchy using a statistical technique called multidimensional scaling or MDS (Kruskal & Wish, 1978).  MDS takes a matrix of similarity relationships between objects and creates a map.  In this map, each object is represented by a point at particular coordinates of the space.  The distance between a pair of points in the space indicates the similarity between two objects.  The closer the two points are, the higher is the similarity.  MDS attempts to build a map involving all of the objects, placing them at locations that provide the best fit to the raw similarity measures between them.

The scale tonic perceptron has twelve different output units, each representing a pitch-class.  Each of these units has its own pattern of weights between it and the twelve input units used to present scales to the network.  It stands to reason that if two different output units represent scale tonics that are related in some way, then they should have a similar pattern of connection weights.

One can use connection weights to compute a measure of similarity between two output units in this network.  Each output unit is a point in a twelve-dimensional space; its coordinates are given by its twelve connection weights.  The similarity between two output units is determined by measuring the distance between the two points in this twelve-dimensional space.  This can be done by computing the Euclidean distance between the coordinates of the two points.

A twelve-dimensional space is too large to understand; it would be better if a smaller space could be used to render the similarity relationships between the output units more easily understandable.  This is the purpose of MDS.

We provided the matrix of connection weights from the network to the statistical programming language R.  R converted this matrix into a matrix of distances between pairs of output units by computing Euclidean distances between twelve-dimensional coordinates.  R was then used to perform MDS on this distance data.  The best fitting solution to this data requires a six- or seven-dimensional space.  However, the first three dimensions of the MDS solution – which capture more structure in the distances than do later dimensions – reveal some very interesting structure.
 
Figure I-3.  The plot of the two-dimensional MDS analysis of the scale tonic perceptron weights.

 Figure I-3 presents the plot of the two-dimensional MDS of the scale tonic perceptron’s weights.  One obvious property of this graph is that the output unit pitch-classes are clearly organized into two different groups, one to the left of the dashed line in the middle of the plot and the other to its right.  An examination of these two groups indicates a striking musical property: all of the pitch-classes in one side of the plot belong to one of the circles of major seconds in Figure I-2, while those on the other side of the plot belong to the other circle of major seconds.

A better fit to data requires performing a higher dimensional MDS analysis.  Figure I-4 presents the three-dimensional analysis.  Note that the first two dimensions of this solution (provided by the x and y axes) pull the pitch-classes apart in terms of membership in the two circles of major seconds.  The position of the pitch-classes in the third dimension arranges them into patterns that are very suggestive of the two circles in Figure I-2, though the correspondence between the two figures is not perfect.


Figure I-4.  The plot of the three-dimensional MDS analysis of the scale tonic perceptron weights.

In summary, the pattern of connection weights that emerges in the scale tonic perceptron contains some interesting musical relationships that cannot be understood simply by inspecting a table of connection weights.  There is systematic structure that suggests that output units that represent tonics that belong to the same circle of major seconds are more similar to one another than to an output unit that represents a tonic that belongs to the other circle.  Furthermore, the closer two tonics are to one another in the same circle of major seconds, the more similar they are to one another.  In short, multivariate analyses of the scale tonic perceptron reveals that it uses strange circles to organize musical inputs.
 
References:



 

Sunday, March 22, 2015

Shallow Networks for Deeper Understanding?

As described in this previous post, the  text below is a draft of one of several "interludes" to be included in a book that I am working on concerned with music and artificial neural networks.

In the first half of the 20th century, the notion of an artificial neural network composed of many different layers of processors was born (McCulloch & Pitts, 1943).  These networks were very powerful, but had to be hand wired because a learning rule capable of training them had not yet been invented.

The first learning rule for artificial neural networks was discovered around the time of the cognitive revolution (Rosenblatt, 1958, 1962).  However, this rule could not train networks that contained hidden units.  As a result this learning rule could only train perceptrons, which are networks of limited capability (Minsky & Papert, 1969).

The rise of modern connectionism began with the discovery of supervised learning rules for networks with hidden units (Ackley, Hinton, & Sejnowski, 1985; Amari, 1967; Anderson, 1995; Rumelhart, Hinton, & Williams, 1986; Werbos, 1994).  Researchers could now teach networks that had enormous computational power (in principle).  Networks like the multilayer perceptron became the staple of connectionist cognitive science.

In the early decades of the 21st century some researchers expressed concern with the limitations of the supervised training of multilayer perceptrons.  While such networks can learn to perform a variety of complicated tasks, researchers often encounter practical problems in their use.  Some have pointed out that the incredible power of the human brain arises from its use of many, many different layers of hidden neurons (Bengio, 2009).  However, when 20th century supervised learning rules are used, networks of many layers are enormously difficult to train.  The old approaches to network training face practical obstacles that prevent the in principle power of multilayer networks from being exploited.

Modern researchers have discovered new types of learning rules that permit networks with many layers of hidden units to be trained (Bengio, Courville, & Vincent, 2013; Hinton, 2007; Hinton, Osindero, & Teh, 2006; Hinton & Salakhutdinov, 2006; Larochelle, Mandel, Pascanu, & Bengio, 2012).  These new rules, often called deep learning, now permit researchers to train deep belief networks to accomplish tasks far beyond the capabilities of shallow, late 20th century networks.  Deep learning has produced networks for classification tasks involving natural language, image classification, and the processing of sound (Hinton, 2007; Hinton et al., 2006; Mohamed, Dahl, & Hinton, 2012; Sarikaya, Hinton, & Deoras, 2014).  Daily news reports reveal deep learning applications are being employed by various companies such as Google, Facebook and PayPal; deep learning rules are widely available (Fischer & Igel, 2014; Testolin, Stoianov, De Grazia, & Zorzi, 2013).

The networks studied in the current book are clearly antiquated in comparison to modern deep belief networks.  What is the point of using older, less powerful, networks to investigate music?

The primary motivation for exploring music with older architectures is the frequent disconnect between the technology of neural networks and the cognitive science of neural networks (Dawson & Shamanski, 1994).  The development of artificial neural networks occurs in many different disciplines, and these different disciplines often have different goals.  For instance, deep learning is emerging from computer science, and current research on it focuses on developing new procedures for accomplishing deep learning efficiently (Bengio, 2009).  In other words, deep learning is being developed from a technological perspective; its developers are concerned with successfully training networks to perform extremely complex pattern classification tasks.

The cognitive science of deep learning is lagging far behind its technology.  Some researchers have expressed concern that while deep learning produces networks that solve problems worthy of human neural processing, these networks are not themselves providing any insight about the workings of the human brain or the human mind.

One reason for this is that most deep learning advances are currently quantitative, not qualitative (Erhan, Courville, & Bengio, 2010).  Techniques for interpreting the internal structure of deep belief networks are in their infancy.  If a network cannot be interpreted, then it likely cannot contribute to cognitive science (McCloskey, 1991).  Without interpretation, deep belief networks are magnificent artifacts, but are neither cognitive nor biological theories.

Of course, this is not to say that researchers are not interested in interpreting the internal structure of deep belief networks (Erhan et al., 2010; Hinton et al., 2006).  For instance, in the very first publication describing a method for deep learning Hinton et al. (2006) look into a network’s “mind” by observing responses of network processors to various stimuli in hope of discovering the abstract features that are detected by hidden layers.  However, few sophisticated techniques for interpreting deep networks exist.  Erhan et al. (2010) observe that typically researchers only visually examine the receptive field (i.e. the connection weights) that feed into processors in the first hidden layer of a deep belief network.

One reason to explore older architectures in the current book is because there are many more procedures in existence for interpreting their internal structure.  This in turn permits them to be more likely contributors to a cognitive science of music.

A second reason to focus on older artificial neural network architectures is the goal of seeking the simplest network that is required to solve a particular task.  For example, in the next chapter we will see that no hidden units are required at all to identify the tonic of a scale.  If such a simple network can accomplish this task, then why would we examine it with a deep belief network?  Indeed, though very old architectures like the perceptron  are extraordinarily simple, they can easily be used to contribute to a variety of topics in modern cognitive science (Dawson, 2008; Dawson & Dupuis, 2012; Dawson, Dupuis, Spetch, & Kelly, 2009; Dawson, Kelly, Spetch, & Dupuis, 2010).

Of course, the proof of the pudding is in the eating.  Thus in order to defend the claim that older network architectures can contribute to musical cognition, we must actually demonstrate their utility.  The goal of the remaining chapters in this book is to do exactly that.  Can we show that training shallow networks can provide a deeper understanding of music?

 Cited Literature:
 
  • Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzman machines. Cognitive Science, 9, 147-169.
  • Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, Ec16(3), 299-307.
  • Anderson, J. A. (1995). An Introduction to Neural Networks. Cambridge, Mass.: MIT Press.
  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127.
  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798-1828.
  • Dawson, M. R. W. (2008). Connectionism and classical conditioning. Comparative Cognition and Behavior Reviews, 3 (Monograph), 1-115.
  • Dawson, M. R. W., & Dupuis, B. (2012). Equilibria of perceptrons for simple contingency problems. IEEE Transactions On Neural Networks And Learning Systems, 23(8), 1340-1344.
  • Dawson, M. R. W., Dupuis, B., Spetch, M. L., & Kelly, D. M. (2009). Simple artificial networks that match probability and exploit and explore when confronting a multiarmed bandit. IEEE Transactions on Neural Networks, 20(8), 1368-1371.
  • Dawson, M. R. W., Kelly, D. M., Spetch, M. L., & Dupuis, B. (2010). Using perceptrons to explore the reorientation task. Cognition, 114(2), 207-226.
  • Dawson, M. R. W., & Shamanski, K. S. (1994). Connectionism, confusion and cognitive science. Journal of Intelligent Systems, 4, 215-262.
  • Erhan, D., Courville, A., & Bengio, Y. (2010). Understanding Representations Learned in Deep Architectures. Technical Report 1355: Departement d’Informatique et Recherche Operationnelle, Universite de Montreal.
  • Fischer, A., & Igel, C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47(1), 25-39.
  • Hinton, G. E. (2007). Learning multiple a layers of representation. Trends in Cognitive Sciences, 11(10), 428-434.
  • Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.
  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504-507.
  • Larochelle, H., Mandel, M., Pascanu, R., & Bengio, Y. (2012). Learning algorithms for the classification restricted Boltzmann machine. Journal of Machine Learning Research, 13, 643-669.
  • McCloskey, M. (1991). Networks and theories:  The place of connectionism in cognitive science. Psychological science, 2, 387-395.
  • McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.
  • Minsky, M. L., & Papert, S. (1969). Perceptrons: An Introduction To Computational Geometry (1st ed.). Cambridge, Mass.,: MIT Press.
  • Mohamed, A., Dahl, G. E., & Hinton, G. E. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio Speech and Language Processing, 20(1), 14-22.
  • Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408.
  • Rosenblatt, F. (1962). Principles Of Neurodynamics. Washington: Spartan Books.
  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
  • Sarikaya, R., Hinton, G. E., & Deoras, A. (2014). Application of deep belief networks for natural language understanding. IEEE-Acm Transactions on Audio Speech and Language Processing, 22(4), 778-784.
  • Testolin, A., Stoianov, I., De Grazia, M., & Zorzi, M. (2013). Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Frontiers in Psychology, 4.
  • Werbos, P. J. (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York: Wiley.