Cross-cultural analysis of song
It is unclear whether there are universal patterns to music across cultures. Mehr et al. examined ethnographic data and observed music in every society sampled (see the Perspective by Fitch and Popescu). For songs specifically, three dimensions characterize more than 25% of the performances studied: formality of the performance, arousal level, and religiosity. There is more variation in musical behavior within societies than between societies, and societies show similar levels of within-society variation in musical behavior. At the same time, one-third of societies significantly differ from average for any given dimension, and half of all societies differ from average on at least one dimension, indicating variability across cultures.
Music is often assumed to be a human universal, emerging from an evolutionary adaptation specific to music and/or a by-product of adaptations for affect, language, motor control, and auditory perception. But universality has never actually been systematically demonstrated, and it is challenged by the vast diversity of music across cultures. Hypotheses of the evolutionary function of music are also untestable without comprehensive and representative data on its forms and behavioral contexts across societies.
We conducted a natural history of song: a systematic analysis of the features of vocal music found worldwide. It consists of a corpus of ethnographic text on musical behavior from a representative sample of mostly small-scale societies, and a discography of audio recordings of the music itself. We then applied tools of computational social science, which minimize the influence of sampling error and other biases, to answer six questions. Does music appear universally? What kinds of behavior are associated with song, and how do they vary among societies? Are the musical features of a song indicative of its behavioral context (e.g., infant care)? Do the melodic and rhythmic patterns of songs vary systematically, like those patterns found in language? And how prevalent is tonality across musical idioms?
Analysis of the ethnography corpus shows that music appears in every society observed; that variation in song events is well characterized by three dimensions (formality, arousal, religiosity); that musical behavior varies more within societies than across them on these dimensions; and that music is regularly associated with behavioral contexts such as infant care, healing, dance, and love. Analysis of the discography corpus shows that identifiable acoustic features of songs (accent, tempo, pitch range, etc.) predict their primary behavioral context (love, healing, etc.); that musical forms vary along two dimensions (melodic and rhythmic complexity); that melodic and rhythmic bigrams fall into power-law distributions; and that tonality is widespread, perhaps universal.
Music is in fact universal: It exists in every society (both with and without words), varies more within than between societies, regularly supports certain types of behavior, and has acoustic features that are systematically related to the goals and responses of singers and listeners. But music is not a fixed biological response with a single prototypical adaptive function: It is produced worldwide in diverse behavioral contexts that vary in formality, arousal, and religiosity. Music does appear to be tied to specific perceptual, cognitive, and affective faculties, including language (all societies put words to their songs), motor control (people in all societies dance), auditory analysis (all musical systems have signatures of tonality), and aesthetics (their melodies and rhythms are balanced between monotony and chaos). These analyses show how applying the tools of computational social science to rich bodies of humanistic data can reveal both universal features and patterns of variability in culture, addressing long-standing debates about each.
The illustration depicts the sequence from acts of singing to the ethnography corpus. (A) People produce songs in conjunction with other behavior, which scholars observe and describe in text. These ethnographies are published in books, reports, and journal articles and then compiled, translated, cataloged, and digitized by the Human Relations Area Files organization. (B) We conduct searches of the online eHRAF corpus for all descriptions of songs in the 60 societies of the Probability Sample File and annotate them with a variety of behavioral features. The raw text, annotations, and metadata together form the NHS Ethnography. Codebooks listing all available data are in tables S1 to S6; a listing of societies and locations from which texts were gathered is in table S12.
The NHS Ethnography, it turns out, includes examples of songs in all 60 societies. Moreover, each society has songs with words, as opposed to just humming or nonsense syllables (which are reported in 22 societies). Because the societies were sampled independently of whether their people were known to produce music, in contrast to prior cross-cultural studies (10, 53, 54), the presence of music in each one—as recognized by the anthropologists who embedded themselves in the society and wrote their authoritative ethnographies—constitutes the clearest evidence supporting the claim that song is a human universal. Readers interested in the nature of the ethnographers’ reports, which bear on what constitutes “music” in each society [see (27)], are encouraged to consult the interactive NHS Ethnography Explorer at http://themusiclab.org/nhsplots.
Musical behavior worldwide varies along three dimensions
How do we reconcile the discovery that song is universal with the research from ethnomusicology showing radical variability? We propose that the music of a society is not a fixed inventory of cultural behaviors, but rather the product of underlying psychological faculties that make certain kinds of sound feel appropriate to certain social and emotional circumstances. These include entraining the body to acoustic and motoric rhythms, analyzing harmonically complex sounds, segregating and grouping sounds into perceptual streams (6, 7), parsing the prosody of speech, responding to emotional calls, and detecting ecologically salient sounds (8, 9). These faculties may interact with others that specifically evolved for music (4, 5). Musical idioms differ with respect to which acoustic features they use and which emotions they engage, but they all draw from a common suite of psychological responses to sound.
If so, what should be universal about music is not specific melodies or rhythms but clusters of correlated behaviors, such as slow soothing lullabies sung by a mother to a child or lively rhythmic songs sung in public by a group of dancers. We thus asked how musical behavior varies worldwide and how the variation within societies compares to the variation between them.
Reducing the dimensionality of variation in musical behavior
To determine whether the wide variation in the annotations of the behavioral context of songs in the database (Text S1.1) falls along a smaller number of dimensions capturing the principal ways that musical behavior varies worldwide, we used an extension of Bayesian principal components analysis (84), which, in addition to reducing dimensionality, handles missing data in a principled way and provides a credible interval for each observation’s coordinates in the resulting space. Each observation is a “song event,” namely, a description in the NHS Ethnography of a song performance, a characterization of how a society uses songs, or both.
We found that three latent dimensions is the optimum number, explaining 26.6% of variability in NHS Ethnography annotations. Figure 2 depicts the space and highlights examples from excerpts in the corpus; an interactive version is available at http://themusiclab.org/nhsplots. (See Text S2.1 for details of the model, including the dimension selection procedure, model diagnostics, a test of robustness, and tests of the potential influence of ethnographer characteristics on model results.) To interpret the space, we examined annotations that load highly on each dimension; to validate this interpretation, we searched for examples at extreme locations and examined their content. Loadings are presented in tables S13 to S15; a selection of extreme examples is given in table S16.
Density plots for each society show the distributions of musical performances on each of the three principal components (Formality, Arousal, Religiosity). Distributions are based on posterior samples aggregated from corresponding ethnographic observations. Societies are ordered by the number of available documents in the NHS Ethnography (the number of documents per society is displayed in parentheses). Distributions are color-coded according to their mean distance from the global mean (in z-scores; redder distributions are farther from 0). Although some societies’ means differ significantly from the global mean, the mean of each society’s distribution is within 1.96 standard deviations of the global mean of 0. One society (Tzeltal) is not plotted because it has insufficient observations for a density plot. Asterisks denote society-level mean differences from the global mean. *P < 0.05, **P < 0.01, ***P < 0.001.
We also applied a comparison that is common in studies of genetic diversity (85) and that has been performed in a recent cultural-phylogenetic study of music (86). It revealed that typical within-society variation is approximately six times the between-society variation. Specifically, the ratios of within- to between-society variances were 5.58 for Formality [95% Bayesian credible interval, (4.11, 6.95)]; 6.39 (4.72, 8.34) for Arousal; and 6.21 (4.47, 7.94) for Religiosity. Moreover, none of the 180 mean values for the 60 societies over the three dimensions deviated from the global mean by more than 1.96 times the standard deviation of the principal components scores within that society (fig. S3 and Text S2.1.9).
These findings demonstrate global regularities in musical behavior, but they also reveal that behaviors vary quantitatively across societies, consistent with the long-standing conclusions of ethnomusicologists. For instance, the Kanuri’s musical behaviors are estimated to be less formal than those of any other society, whereas those of the Akan are estimated to be the most religious (in both cases, significantly different from the global mean on average). Some ethnomusicologists have attempted to explain such diversity, noting, for example, that more formal song performances tend to be found in more socially rigid societies (10).
Despite this variation, a song event of average formality would appear unremarkable in the Kanuri’s distribution of songs, as would a song event of average religiosity in the Akan. Overall, we find that for each dimension, approximately one-third of all societies’ means significantly differed from the global mean, and approximately half differed from the global mean on at least one dimension (Fig. 3). But despite variability in the societies’ means on each dimension, their distributions overlap substantially with one another and with the global mean. Moreover, even the outliers in Fig. 3 appear to represent not genuine idiosyncrasy in some cultures but sampling error: The societies that differ more from the global mean on some dimension are those with sparser documentation in the ethnographic record (fig. S6 and Text S2.1.10). To ensure that these results are not artifacts of the statistical techniques used, we applied them to a structurally analogous dataset whose latent dimensions are expected to vary across countries, namely climate features (for instance, temperature is related to elevation, which certainly is not universal); the results were entirely different from what we found when analyzing the NHS Ethnography (figs. S7 and S8 and Text S2.1.11).
The results suggest that societies’ musical behaviors are largely similar to one another, such that the variability within a society exceeds the variability between them (all societies have more soothing songs, such as lullabies; more rousing songs, such as dance tunes; more stirring songs, such as prayers; and other recognizable kinds of musical performance), and that the appearance of uniqueness in the ethnographic record may reflect underreporting.
Associations between song and behavior, corrected for bias
Ethnographic descriptions of behavior are subject to several forms of selective nonreporting: Ethnographers may omit certain kinds of information because of their academic interests (e.g., the author focuses on farming and not shamanism), implicit or explicit biases (e.g., the author reports less information about the elderly), lack of knowledge (e.g., the author is unaware of food taboos), or inaccessibility (e.g., the author wants to report on infant care but is not granted access to infants). We cannot distinguish among these causes, but we can discern patterns of omission in the NHS Ethnography. For example, we found that when the singer’s age is reported, the singer is likely to be young, but when the singer’s age is not reported, cues that the singer is old are statistically present (such as the fact that a song is ceremonial). Such correlations—between the absence of certain values of one variable and the reporting of particular values of others—were aggregated into a model of missingness (Text S2.1.12) that forms part of the Bayesian principal components analysis reported above. This allowed us to assess variation in musical behavior worldwide, while accounting for reporting biases.
Next, to test hypotheses about the contexts with which music is strongly associated worldwide, in a similarly robust fashion, we compared the frequency with which a particular behavior appears in text describing song with the estimated frequency with which it appears across the board, in all the text written by that ethnographer about that society, which can be treated as the null distribution for that behavior. If a behavior is systematically associated with song, then its frequency in ethnographic descriptions of songs should exceed its frequency in that null distribution, which we estimated by randomly drawing the same number of passages from the same documents [see Text S2.2 for full model details].
We generated a list of 20 hypotheses about universal or widespread contexts for music (Table 1) from published work in anthropology, ethnomusicology, and cognitive science (4, 5, 40, 54, 58–60), together with a survey of nearly 1000 scholars that solicited opinions about which behaviors might be universally linked to music (Text S1.4.1). We then designed two sets of criteria for determining whether a given passage of ethnography represented a given behavior in this list. The first used human-annotated identifiers, capitalizing on the fact that every paragraph in the Probability Sample File comes tagged with one of more than 750 identifiers from the Outline of Cultural Materials (OCM), such as MOURNING, INFANT CARE, or WARFARE.
The second set of criteria was needed because some hypotheses corresponded only loosely to the OCM identifiers (e.g., “love songs” is only a partial fit to ARRANGING A MARRIAGE and not an exact fit to any other identifier), and still others fit no identifier at all [e.g., “music perceived as art or as a creation” (59)]. So we designed a method that examined the text directly. Starting with a small set of seed words associated with each hypothesis (e.g., “religious,” “spiritual,” and “ritual” for the hypothesis that music is associated with religious activity), we used the WordNet lexical database (87) to automatically generate lists of conceptually related terms (e.g., “rite” and “sacred”). We manually filtered the lists to remove irrelevant words and homonyms and add relevant keywords that may have been missed, then conducted word stemming to fill out plurals and other grammatical variants (full lists are in table S19). Each method has limitations: Automated dictionary methods can erroneously flag a passage containing a word that is ambiguous, whereas the human-coded OCM identifiers may miss a relevant passage, misinterpret the original ethnography, or paint with too broad a brush, applying a tag to a whole paragraph or to several pages of text. Where the two methods converge, support for a hypothesis is particularly convincing.
After controlling for ethnographer bias via the method described above, and adjusting the P values for multiple hypotheses (88), we found support from both methods for 14 of the 20 hypothesized associations between music and a behavioral context, and support from one method for the remaining six (Table 1). To verify that these analyses specifically confirmed the hypotheses, as opposed to being an artifact of some other nonrandom patterning in this dataset, we reran them on a set of additional OCM identifiers matched in frequency to the ones used above [see Text S2.2.2 for a description of the selection procedure]. They covered a broad swath of topics, including DOMESTICATED ANIMALS, POLYGAMY, and LEGAL NORMS that were not hypothesized to be related to song (the full list is in table S20). We find that only one appeared more frequently in song-related paragraphs than in the simulated null distribution (CEREAL AGRICULTURE; see table S20 for full results). This contrasts sharply with the associations reported in Table 1, suggesting that they represent bona fide regularities in the behavioral contexts of music.
Universality of musical forms
We now turn to the NHS Discography to examine the musical content of songs in four behavioral contexts (dance, lullaby, healing, and love; Fig. 4A), selected because each appears in the NHS Ethnography, is widespread in traditional cultures (59), and exhibits shared features across societies (54). Using predetermined criteria based on liner notes and supporting ethnographic text (table S21), and seeking recordings of each type from each of the 30 geographic regions, we found 118 songs of the 120 possibilities (4 contexts × 30 regions) from 86 societies (Fig. 4B). This coverage underscores the universality of these four types; indeed, in the two possibilities we failed to find (healing songs from Scandinavia and from the British Isles), documentary evidence shows that both existed (89, 90) despite our failure to find audio recordings of the practice. The recordings may be unavailable because healing songs were rare by the early 1900s, roughly when portable field recording became feasible.
(A) In a massive online experiment (N = 29,357), listeners categorized dance songs, lullabies, healing songs, and love songs at rates higher than chance level of 25%, but their responses to love songs were by far the most ambiguous (the heat map shows average percent correct, color-coded from lowest magnitude, in blue, to highest magnitude, in red). Note that the marginals (below the heat map) are not evenly distributed across behavioral contexts: Listeners guessed “healing” most often and “love” least often despite the equal number of each in the materials. The d-prime scores estimate listeners’ sensitivity to the song-type signal independent of this response bias. (B) Categorical classification of the behavioral contexts of songs, using each of the four representations in the NHS Discography, is substantially above the chance performance level of 25% (dotted red line) and is indistinguishable from the performance of human listeners, 42.4% (dotted blue line). The classifier that combines expert annotations with transcription features (the two representations that best ignore background sounds and other context) performs at 50.8% correct, above the level of human listeners. (C) Binary classifiers that use the expert annotation + transcription feature representations to distinguish pairs of behavioral contexts [e.g., dance from love songs, as opposed to the four-way classification in (B)] perform above the chance level of 50% (dotted red line). Error bars represent 95% confidence intervals from corrected resampled t tests (94).
Are accurate identifications of the contexts of culturally unfamiliar songs restricted to listeners with musical training or exposure to world music? In a regression analysis, we found that participants’ categorization accuracy was statistically related to their self-reported musical skill [F(4,16245) = 2.57, P = 0.036] and their familiarity with world music [F(3,16167) = 36.9, P < 0.001; statistics from linear probability models], but with small effect sizes: The largest difference was a 4.7–percentage point advantage for participants who reported that they were “somewhat familiar with traditional music” relative to those who reported that they had never heard it, and a 1.3–percentage point advantage for participants who reported that they have “a lot of skill” relative to “no skill at all.” Moreover, when limiting the dataset to listeners with “no skill at all” or listeners who had “never heard traditional music,” mean accuracy was almost identical to the overall cohort. These findings suggest that although musical experience enhances the ability to detect the behavioral contexts of songs from unfamiliar cultures, it is not necessary.