Languages and genes don’t always match — part 2

Feb 5, 2011 by

In the previous posting, I noted that the commonly held assumption that if the languages of two groups are related, the peoples must be related as well doesn’t always hold. As you can see from the chart below, the genetic tree of populations (on the left) often does not match the linguistic tree (on the right). For example, while the inhabitants of northern and southern India are genetically close, they speak languages from two distinct language families: Indo-European and Dravidian, respectively. Conversely, Ethiopians and Berbers are not genetically close (in fact, they are separated by the highest-order split on the genetic side of the tree!), but they speak languages from the same family: Afro-Asiatic.

Two other particularly interesting cases of genetic/linguistic classification mismatches come from Africa as well. The first one involves the Pygmies (i.e., Mbuti Pygmy on the top of the chart above). Genetically, they are quite distinct from other African peoples; their distinctive physical appearance suggests as much (see picture below). In fact, the Pygmies represent the most ancient divergence right after that of Khoisan people. The overall genetic pool of the Pygmies includes a very high frequency of Y-DNA haplogroup B (which is localized to sub-Saharan Africa, especially to tropical forests of West-Central Africa, where the Pygmies live) and of the mtDNA haplogroup L1 (i.e., the oldest divergent lineage).

Yet, despite their physical and genetic distinctiveness, linguistically the Pygmies are part of the Bantu-speaking sub-Saharan Africa — just as the name Mbuti (with its prenasalized consonant) suggests. Most researchers agree that originally the Pygmies must have spoken a different language, by now completely lost; that’s why the chart above states “unknown” for their linguistic affiliation. Thus, the Pygmies are a good reminder to us all that a group’s linguistic affiliation says little about its genetic origin.

Another example of the non-matching DNA and language involves Hadza and Sandawe, two Khoisan-speaking groups, living in northern Tanzania, in an area where mostly Bantu languages are spoken (marked with orange on the map below). Their closest linguistic relatives — other Khoisan groups — live thousands of miles to the south.

Why did Hadza and Sandawe end up living so far away from their linguistic brethren? And what about their genetics? Two explanations are possible for the geographic discontinuity between Hadza and Sandawe, on the one hand, and other Khoisan speakers, on the other. The first theory takes Hadza and Sandawe speakers to be originally Bantu-speakers who switched to Khoisan languages. The other alternative states that the small pockets of Hadza and Sandawe are remainders of the earlier, pre-Bantu-migration Khoisan population in East Africa, now surrounded by Bantu speakers as a result of that Bantu expansion.

At the first glance, genetic studies seem to support the first theory since the Hadza have been shown to be genetically closer to the Pygmies of Central Africa and Sandawe are closer to the Bantu than to Khoisan speakers. Curiously, the Sandawe are not related to the Hadza, despite their geographical proximity. But the first theory leaves unexplained why these two groups would switch to a Khoisan language if the closest Khoisan speakers are so far from them.

So at present, it is the second theory that has the most proponents. According to this theory, Hadza and Sandawe are originally Khoisan and maintained their language but their genes have changed over the centuries. How could their genes have changed? Well, note that Hadza and Sandawe populations are relatively small: today, there are only 800 Hadza and 40,000 Sandawe. Thus, geneticists hypothesize that there has been a great deal of intermarriage of Hadza/Sandawe with Bantu groups, which resulted in washing out their peculiar Khoisan genetic pool. However, being hunter-gatherers, Hadza and Sandawe were separated from Bantu farmers by socioeconomic factors, so they managed to preserve their language but not to prevent genetic exchange.

The take-home message: while the language may tell us who does or doesn’t belong to “our tribe”, large scale linguistic groupings often conceal rather than reveal genetic groupings on various populations.


Like this post? Please pass it on:


Previous Post
«
| Next Post
»

Related Posts

Subscribe For Updates

We would love to have you back on Languages Of The World in the future. If you would like to receive updates of our newest posts, feel free to do so using any of your favorite methods below:

      

  • Martin W. Lewis

    But isn't there some doubt that Hadza and Sandawe really are Khoisan languages — or, alternatively, isn't there some doubt that that Khoisan is a valid language family? Some have suggested that there is little beyond click-sounds to tie Hadza and Sandawe to Khoisan, and clicks may have been a common feature of many early languages. Could Hadza be a survivor of what was once a large central African language family that included the original languages of the "Pygmies?"

  • Asya Pereltsvaig

    Martin, you are right: as with many other proposed language families whose past is not (well) known, there are doubts about the validity of the Khoisan family as a whole and about the attribution of Hadza and Sandawe to this family in particular. Since we don't know much about the past of these languages, we have to make intelligent guesses and there are concerns about how "intelligent" some of these guesses have been. The Khoisan proposal goes back to the work of Joseph Greenberg and it wouldn't be the first of his proposals to be discredited. Some people now doubt that Khoisan form one family related through descent while others still see it as a valid family. When it comes to Hadza and Sandawe, the Khoisan theory is even more questionable since it doesn't receive direct support from genetics. There are some similarities between the Hadza and Sandawe languages, but no genetic relationship between the two. Hadza may indeed be a fragment of a old Central African population with a distinct genetic profile and a distinct language which by now los, except for Hadza. Their genetic similarity to the Pygmies suggests as much. The Sandawe language, on the other hand, is more closely related to the Khoi grouping within Khoisan (if the latter is a family at all). Yet, their genes are very much Bantu-like.

    As for click sounds, they are not so much a typical feature of an "old language" but an areal feature: several Bantu languages have clicks too, including Zulu, Xhousa etc.

  • aron

    A new population genetic study found a genetic link between the Hadza/Sandawe and Khoisan speakers of Southern Africa.

    http://www.pnas.org/content/early/2011/03/01/1017511108.full.pdf+html

    The Sandawe aren't particularly Bantu-like at all as you suggested, they are rather distinct from them.

    See the PCA plots on page 11, 12 and 13 of the supplementary files:

    http://www.pnas.org/content/suppl/2011/03/01/1017511108.DCSupplemental/sapp.pdf

  • aron

    The reason why Ethiopians and Berbers are separated by the highest-order split on that particular phylogenetic tree is due to the Eurasian-African split.

    Ethiopians are actually genetically intermediate between Sub-Saharan Africans and Eurasians, while Berbers are majority Eurasian (of West Asian origin in particular) with only minor Sub-Saharan African.

    That phylotree does not show genetic relationships all too well.

    Ethiopians and Berbers both share many haplogroups, in particular haplogroup E1b1b:

    http://www.thegeneticatlas.com/E1b1b1.png

    And maternal haplogroup M1

    http://img852.imageshack.us/img852/1271/haplogroupm1.jpg

    Paternal haplogroup J1 and maternal haplogroups R0a, U6a are also common both in Berbers and Ethiopians.

    On global genetic autosomal (data from 22 chromosomes) PCA plots, Ethiopians and Berbers cluster relatively close:

    http://3.bp.blogspot.com/_Ish7688voT0/TPZ9Kigo5ZI/AAAAAAAAC90/y_TOcj02A4w/s1600/1_2.png

    There’s actually a pretty strong genetic link between Ethiopians and Berbers

  • Asya Pereltsvaig

    thank you for the helpful comments, Aron!

  • Asya Pereltsvaig
  • Kevin Borland

    While my mini-article is about the Maasai people, I think some of the paragraphs toward the bottom give a plausible explanation regarding the origins of the Sandawe and Hadza.

    http://www.flickr.com/photos/kevinborland/2283318590/in/photostream

  • Asya Pereltsvaig

    @Kevin Borland: Thank you for your comment and the link. You provide very interesting genetic data in your article. The problem is that Hadza and Sandawe languages have been shown to be related to (other) Khoisan languages not only because they have clicks. It is quite likely that pygmies spoke a different type of language altogether (none of them do anymore, though, as far as I know). So overall, the Hadza-Sandawe linguistic problem remains unsolved even if the genetic picture becomes clearer…

  • Pingback: Do Languages Spread Solely By Diffusion? « Cultural Geography « GeoCurrents