Enzymes and Protein diversity in Human Population

2.1 Introduction
2.2 Genes and Isozymes
2.3 Genetic Variation of Red Cell (Erythrocyte) Enzymes and Serum Proteins
2.4 Haemoglobins: Normal and Abnormal Haemoglobins
2.5 Structural Variation in Haemoglobin (Haemoglobin Variants)
2.6 Quantitative Variation in Haemoglobin(Thalassaemias)
2.7 Geographic Distribution of HB*S, HB*D and HB*E in Indian Populations

2.1 INTRODUCTION

Human beings are exceedingly diverse. They differ from one another in their normal physical, physiological and behavioural attributes. These variations are caused partially by differences in the environmental conditions in which they live but more importantly they also depend on inborn (genetic) differences. Human variation can be visible (e.g., differences in skin colour, hair colour and form and or head shape) or invisible (biochemical differences, e.g., blood groups, blood protein/red cell enzyme polymorphisms or DNA markers). At the beginning of the 20th century, Landsteiner discovered ABO blood groups and Hirschfeld and Hirschfeld (1919) suggested that these could be used to delineate biochemical races. Blood protein haemoglobin polymorphism, including the gene for sickle cell anaemia, was reported by the middle of the century, followed by serum protein haptoglobin (HP) polymorphism in mid-1950’s and by mid-1970’s most red cell enzyme and blood protein polymorphisms were discovered. Anthropologists had studied these genetic markers with the primary aim of documenting genetic differences among various populations inhabiting different parts of the world and also for human racial classifications. Thus the existence of Mendelian genetic traits, demonstrated from human blood in the 20th century, provided important powerful tools for investigation of biological variation in humans along with traditional anthropometric/morphological and dermatoglyphic traits, among others.

2.2 GENES AND ISOZYMES

A gene can be defined as a unit of information and corresponds to a discrete segment of DNA (exon) that encodes the amino acid sequence of a polypeptide. Human cells contain about 25,000 genes which are dispersed on chromosomes and are separated by noncoding inter-genic DNA (introns). Isozymes result from point mutations or from insertion-deletion (indel) events that affect the DNA coding sequence of the gene.

The term isozyme was coined by Hunter and Markert (1957) who defined isozymes as different variants of the same enzyme having identical functions or present in the different individuals. In fact, isozymes refer to the existence of different molecular forms of an enzyme that catalyze the same biochemical reaction but which differ in their electrophoretic properties. The existence of isozymes permits the fine-tuning of metabolism to meet the particular needs of a given tissue or developmental stage.

2.3 GENETIC VARIATION OF RED CELL (ERYTHROCYTE) ENZYMES AND SERUM PROTEINS

A variety of different enzymes and proteins are synthesized in the human body, and the primary amino acid sequence of each of their distinctive polypeptide chains is coded in the DNA of a separate gene locus. Blood proteins, including red cell (erythrocyte) enzymes are composed of amino acids joined by covalent peptide bonds to form polypeptides. The sequences or “primary structures” are genetically determined. Each of the 20 amino acids has a unique side chain, characterized by its shape, size and charge. The side chains on lysine, arginine and histidine are positively charged (NH3+) and thus basic; those on aspartic acid and glutamic acid are negatively charged (COO-) and thus acidic.

The charged side chains are responsible for the movement of the proteins or enzymes through a gel matrix during electrophoresis. In an electric field, the anions (i.e. negatively charged molecules) migrate towards the anode (i.e. positive electrode), and the cations (i.e. positively charged molecules) migrate towards the cathode (i.e. negative electrode). The speed of migration is primarily related to net charge on the protein or enzyme molecule and the electrical field strength applied through the electrodes. Thus proteins/enzymes, the main products of genes, move in an electric field with mobility that depends on their chemical structure. The amino acid sequences of proteins are changed by mutations in the encoding DNA locus. Such mutations may alter shape and net charge and electrophoresis aims to reveal as many of these changes as possible.

The electrophoretic enzyme pattern in homozygous individuals usually show one major zone of enzyme activity, which may be accompanied by minor zones of secondary isozymes (Shaw, 1969). Such pattern in heterozygous individuals may, however, show different degrees of complexity i.e. it shows one or morecomponents which are not present in either homozygote. These extra enzyme components found only in heterozygotes are referred to as “hybrid enzymes” and show an electrophoretic mobility intermediate to those of the parental homozygote enzymes. In a heterozygote, in the case of a dimer enzyme there will be a triple banded pattern, in the case of a trimer enzyme a four banded pattern and in the case of a tetramer enzyme a five banded pattern. In each case the outermost bands correspond to the two parental homozygotes, and the band(s) with intermediate mobilities represent the “hybrid enzyme(s)”.

Electrophoretic investigations on a variety of different enzymes and proteins have led to the discovery of a large number of structurally variant forms which are genetically controlled. The inherited variants of enzymes/proteins which we find in human populations today must be attributed to specific gene mutations which occurred in single individuals among our ancestors in earlier generations. Thus it appears that at such loci a large number of different mutant alleles which may be generated by separate mutations, actually exist among living members of our species. The incidence of the majority of such mutant alleles is rather low in human populations. Occasionally, some occur with a polymorphic frequency (=1%) giving rise to the well-known phenomenon of genetic polymorphism in which the individual members of a population are sharply classified into two or more relatively common genetically determined phenotypes due to occurrence together of (usually) two or more alleles at a particular locus. The first example of electrophoretically detectable biochemical polymorphism in humans was that of the well-known blood protein haemoglobin (HB) (Pauling et al., 1949).

In man, both proteins (present in serum portion of the blood) and enzymes (present within the red blood cells/erythrocytes) have been screened for polymorphisms. Generally, the discovery of polymorphisms of serum protein preceded that of red cell enzymes in man. The first to be discovered was haptoglobin (HP) reported by Smithies (1955), followed by Transferrin (TF) (Poulik and Smithies, 1958), group specific component (GC) (Hirschfeld, 1959), complement component 3 (C3) (Wieme and Demeulenaere, 1967) and properdin factor B (BF) (Alper et al., 1971), among others.

Using the technique of starch gel electrophoresis pioneered by Smithies (1955) and through the use of very sensitive biochemical staining reactions (Hunter and Markert, 1957), the polymorphisms of red cell enzymes began to be reported from the early 1960’s. The first example was that of acid phosphatase locus 1 (ACP1) discovered by Hopkinson et al. (1963), followed by phosphoglucomutase locus 1 (PGM1) (Spencer et al., 1964), adenylate kinase locus 1 (AK1) (Fildes and Harris, 1966), adenosine deaminase (ADA) (Spencer et al., 1968), phosphohexose isomerase (PHI) (Detter et al., 1968), esterase D (ESD) (Hopkinson et al., 1973) and glyoxalase locus 1 (GLO1) (Kompf et al., 1975), among others. Harris et al. (1977) observed that of the 104 different loci coding for enzymes that had been investigated, 33 were found to exhibit an electrophoretically detectable polymorphism. In other words, one out of every three human loci coding for enzymes is polymorphic. It is important to mention that although an electrophoretic difference indicates a difference in structure between the polymorphic forms of an enzyme/protein, it does not in general provide information about possible functional difference, if any.

Like different blood groups which are serological markers, various red cell enzyme and serum protein polymorphisms are biochemical markers, and along withthe former help in characterization of human populations genetically. A large body of data on serum protein and red cell enzyme polymorphisms has been generated in world populations and these have been compiled and authoritative accounts written on their distribution (Mourant et al., 1976). For the people of India, such an exercise was performed by Bhasin et al. (1992, 1994a) and a brief account of the distribution of different biochemical genetic markers in Indian populations is given in the following.

In serum protein haptoglobin polymorphism, the average frequency of HP*1 is 0.160 in Indian populations; in general, the frequency of this allele is higher in people of North India (0.208) than in South India (0.131). The frequency is also high in people of West India (0.215), followed by East India (0.192).

In transferrin polymorphism, the average frequency of TF*C is 0.991 in Indian populations with a range of 0.898 to 1. The frequencies of other alleles of this serum protein polymorphism such as TF*D and TF*B are rather meager. The frequency of TF*D is highest in peninsular (South) Indian populations and it starts decreasing in the Indus-Ganga-Brahmaputra plains of North India (0.005).

In serum protein group specific component polymorphism, the average frequency of GC*2 in people of India is 0.253 with a range of 0.089 to 0.409. The frequency of the allele is low in West India (0.224) and high in North India (0.277). In the Himalayas, the frequency is almost similar in Eastern and Western regions (0.244 and 0.248, respectively) while in Central Himalayan region it is comparatively higher (0.312).

In acid phosphatase locus 1 system, like other world populations, in Indian populations the ACP1*B allele is preponderant (0.756), followed by ACP1*A (0.242) while the third allele of the system ACP1*C has a trace frequency (0.002). The average frequency of ACP1*A allele in people of India is 0.242, ranging from 0.035 to 0.467. The distribution of this red cell enzyme polymorphism in five geographical regions of the country i.e. North, West, Central, East and South India was studied by Chahal et al. (1985) who observed that the populations of North India are characterized by relatively high frequencies of ACP1*A and ACP1*C, which differentiate them from populations of all other regions. It was also observed that the incidence of both these alleles is generally higher in the non-tribal populations compared to the tribals of India. Indeed, as stated by Roberts et al. (1980) “the distribution of ACP1*C is essentially non-tribal in India.”

In red cell enzyme phosphoglucomutase locus 1 polymorphism, the average frequency of PGM1*2 is 0.300 in Indian populations, varying from 0.05 in Chaudhuri of West India to 0.558 in Kurumba of South India. In populations with Mongoloid affinities inhabiting Eastern Himalayas the frequency of the allele is somewhat higher (0.309) compared to those of South India (0.298). In general, the frequency of PGM1*2 is low in North India and it gradually starts increasing in West and East India while it decreases in East, Central and South India.

In adenylate kinase locus 1 polymorphism, the average frequency of AK1*2 in Indian populations is is 0.076 with a range of nil to 0.205. The frequency of the allele is low in the people of Himalayas having varying degrees of Mongoloidadmixture (0.058) but in populations of the peninsular (South) India it is comparatively high (0.082). Chahal et al. (1986a) studied the distribution pattern of this red cell enzyme polymorphism in the five geographical regions of India and found that there is a clear distinction between non-tribal (range 0.086-0.099) and tribal (range 0.042-0.064) populations inhabiting these regions. Thus tribal populations of India are characterized by having comparatively much lower AK1*2 frequency.

In red cell enzyme adenosine deaminase system, the average frequency of ADA*2 is 0.118 in various populations of India, varying from 0.015 in Jalari of Andhra Pradesh to 0.5 in Muslims and Kacharis of Assam. However, most of the Indian populations fall in a range of 0.015 – 0.214 for this allele. The frequency of the allele is quite low in the populations with Mongoloid ethnicity inhabiting Himalayas (0.087), followed by the peninsular populations of South India with Australoid (Pre-Dravidian) and Caucasoid (Dravidian) racial affinities compared to people of Indus-Ganga-Brahmaputra plains of North India with Caucasoid (Aryan) affinities (0.151).

In esterase D system, the average frequency of ESD*2 in India is 0.271, ranging from as low as 0.022 in Gaddi of Kangra in Himachal Pradesh to 0.582 in people of Andhra Pradesh. The frequency is low in people of Himalayas (0.253) and Indus-Ganga-Brahmaputra plains (0.256) while it is comparatively higher in that of peninsular India (0.322). In fact, there exists a north-south cline of increasing ESD*2 frequency in this red cell enzyme polymorphism in India (Chahal et al., 1986b).

The data on red cell enzyme glyoxalase I polymorphism in Indian populations are rather limited, especially among tribals. Apparently there are no great differences in GLO1*1 frequency in them. The frequency of GLO1*1 ranges from about 0.2 to 0.3 in most of them. Chahal et al. (1986) found somewhat elevated frequenciy of the allele in people of West India and attributed it mainly to the inclusion of two immigrant populations of the Parsi and Irani.

In red cell enzyme glucose phosphate isomerase (phosphohexose isomerase) system the frequency of the GPI*1 allele is unity in most world populations, except Asiatic Indians and Japanese in which rare alleles, respectively, GPI*3 and GPI*4 are present. Various Indian populations have been screened for variants of GPI (Papiha and Chahal, 1984) and the rare allele GPI*3 attain polymorphic proportions in many of them. With a frequency as high as 0.110 of the allele, the Bhotia of Chamoli district in Garhwal region of Uttarakhand stand out from all populations of India (Chahal et al., 2008). In addition to GPI*3, other variant alleles reported from the country include GPI*2, GPI*4, GPI*5, GPI*7, GPI*8 and GPI*9, albeit some of them are limited to specific ethnic groups or geographical regions.

2.4 HAEMOGLOBINS: NORMAL AND ABNORMAL HAEMOGLOBINS

Haemoglobin (HB), the red respiratory protein found in mammalian erythrocytes, is one of the most informative molecules in primate blood. It is the best known blood protein that gives rise to the most thoroughly studied genetic polymorphismin man – the polymorphism that includes the gene for sickle cell anaemia (CavalliSforza and Bodmer, 1971). The study of haemoglobin has made several significant contributions towards the development of molecular biology.

Normal adult human haemoglobin is composed of three different types which can be separated by electrophoresis. They are all tetramers having the general formula X2Y2, where X2 refers to a pair of α polypeptide chains and Y2 to a pair of β, γ, δ or ε chains (Giblett, 1969). For example, HBA, the major haemoglobin in the blood of adults, has the molecular formula αA2βA2. The sequence of the 141 amino acid residues in the α chain and of the 146 residues in the β and other non-α chains has been determined. While most of the haemoglobin in normal human subjects past infancy is HBA (95-98%), HBA2 is also present, but in a concentration of only 1.5-3% of the total haemoglobin content. Its molecular formula is αA2δ2 , indicating that the two α chains are identical with those of HBA, but that the other two identical chains are sufficiently different from β chains to warrant the designation δ. In fact, the δ chain has the same number of amino acid residues as the β chain and there are only 10 differences in the sequence. Another example of normal haemoglobin is HBF. It has a concentration of less than 0.5% after the first few years of life in normal subjects, but it is the major haemoglobin component during fetal development. Like HBA and HBA2, HBF contains two αA chains, but a different pair of chains, called γ, completes the tetramer, so the molecular formula is αA2γF2. The γ chain differs from the β chain in 39 amino acid residues, although the total number of residues is the same.

The abnormal haemoglobins differ from normal haemoglobins in molecular structure. They have structure similar to that of normal haemoglobin except slight alteration in the sequence of amino acids (usually in the β chain) and hence may be designated as mutant or variant haemoglobins. Most of the abnormal haemoglobins appear to be products of point mutations with a single amino acid substitution in the α, β, γ or δ peptide chain. The resultant change in the whole haemoglobin molecule may be so benign that no physiological effect is detectable and even the rate of synthesis remains normal. A large number of the apparently ‘benign’ abnormal haemoglobins have so far been detected in humans. In some instances, these variants have been observed in combination with thalassaemia, which depresses the synthesis of either α or β chain product of the homologous locus. Detection of structurally abnormal haemoglobin is usually demonstrated by electrophoresis.

The term haemoglobinopathies covers a group of hereditary abnormalities in which either (a) the haemoglobin structure is altered (like in HBS, HBC, HBD, HBE etc.) or (b) there is a defect in globin chain synthesis of one of the normal haemoglobin chains (i.e. α or β) (thalassaemias). The term thallasaemia is applied to a group of inherited disorders in which there is a variable decrease in net synthesis of a particular globin chain without an associated change in the structure of that chain. There are two principal types of thalassaemia namely alpha (α) and beta (β). β thalassaemia is of two subtypes viz., β-thalassaemia major (homozygous) and β-thalassaemia minor (heterozygous). α thalassaemia is present in two forms – haemoglobin Barts (four γ chains, no α chain produced) and haemoglobin H (excess β chains form unstable tetramer). Haemoglobinopathies are the most frequent genetic disease, affecting approximately 7 per cent of the world population.

2.5 STRUCTURAL VARIATION IN HAEMOGLOBIN (HAEMOGLOBIN VARIANTS)

Variant haemoglobins such as HBS, HBC, HBD and HBE, among others, are the products of point mutation and their detection is dependent on a difference in their altered electrophoretic mobility. Worldwide more than 470 genetically controlled haemoglobin variants, mostly of the β chain, are known (Honig and Adams III, 1986). Haemoglobin variants may occur independently or in association with thalassaemia.

Haemoglobin S (HBS)

One of the most interesting and widespread abnormal human haemoglobin, first described by Pauling et al. (1949), is designated haemoglobin S (HBS)(αA2βS2), more specifically, αA2β26Glu?Val. The notation signifies that HBS differs from its normal counterpart HBA by a single amino acid substitution i.e. valine replaces glutamic acid at position six in the β polypeptide chain (Ingram, 1957). It may be noted that the mutation has only affected the β chain while the α chain is normal. Haemoglobin S may be separated from haemoglobin A by electrophoresis or by performing sickling test on fresh blood. Individuals who have one sickle mutant gene and one normal beta gene i.e. heterozygote (HBAS) are referred as sickle cell trait, which is benign. The sickling of red cells of homozygote (HBSS) is more severe than that of heterozygote (HBAS). Haemoglobin S is also known to occur in combination with other abnormal haemoglobins, for example, sickle cell HBC (HBS/HBC), sickle cell HBD (HBS/HBD), sickle cell HBE (HBS/ HBE) and sickle cell thalassaemia (HBS/HBTh), among others.

Sickle cell trait occurs with highest frequency in tropical Africa (0.1-0.4), with high frequency in India, Greece and Southern Turkey (0.05-0.1) and less than 0.1 frequency in populations inhabiting Palestine, Tunisia, Algeria and Sicily, situated around the Mediterranean Sea.

Haemoglobin D (HBD)

Like HBS, HBD (also known as HBD-Punjab) is a β chain variant of haemoglobin in which glutamine replaces glutamic acid at position 121 in the β polypeptide chain (αA2β2121Glu?Gln). Both homozygote (HBDD) and heterozygote (HBAD) phenotypes have been reported among individuals with reasonable health. Haemoglobin D occurs mainly in North-West India, Pakistan and Iran. The electrophoretic mobility of HBAD is identical to that of HBAS at alkaline pH in cellulose acetate, but to distinguish these variants separation may be carried out in agarose gel using a different buffer system

Haemoglobin E (HBE)

Like HBS and HBD, HBE is also an abnormal haemoglobin with a single point mutation in the β polypeptide chain i.e. at position 26 in the chain there is a change of one amino acid from glutamic acid to lysine (αA2β226Glu?Lys). The mutation is estimated to have arisen within the last 5,000 years resulting in the second most common variant of normal haemoglobin in the world. Persons with homozygote haemoglobin E (HBEE) have a mild haemolytic anaemia and mild splenomegaly; heterozygote haemoglobin E (HBAE) is benign. Haemoglobin E is extremely common in Southeast Asia (Thailand, Myanmar, Cambodia and Laos, among others) where its prevalence can reach 30-40% and in some areas equals haemoglobin A frequency. In Thailand the mutation can reach a frequency as high as 50-70%. In India, this abnormal haemoglobin is most prevalent in North-East region, where in certain areas carrier (HBAE) incidence reaches as high as 60% of the population.

2.6 QUANTITATIVE VARIATION IN HAEMOGLOBIN (THALASSAEMIAS

Quite a number of inherited abnormalities are known in man in whom the central defect is the deficiency of a particular protein, and many of these are probably due to mutations which result in a gross but specific reduction in the rate of synthesis of one or more polypeptide chains. The most extensively studied of such conditions are those which involve defects in the synthesis of haemoglobin. Among them are a series of chronic haemolytic anaemias collectively known as the thalassaemias. They appear to be determined by a series of distinct abnormal genes in different heterozygous and homozygous combinations. It is useful to classify the various thalassaemias according to the polypeptide chain primarily concerned in causing the haemoglobin deficit (Ingram and Stretton, 1959). Thus in the β-thalassaemia there is defective synthesis of β-chains whereas in αthalassaemia, there is defective synthesis of α-chains. β-thalassaemia is characterized in the heterozygote by variable depression in β chain synthesis, a moderate increase in HBA2 (6-7%) and in about half the cases, a slight increase in HBF (1-5%). Defective synthesis of α chains i.e. complete or nearly complete absence of α chain characterizes α-thalassaemia which is incompatible with life because α-chains are required for all of the normal haemoglobins and their structural variants.

Thalassaemia (thalassa – the Greek word for the sea), appears in two clinical states – thalassaemia major and thalassaemia minor. The former, also known as Cooley’s anaemia or Mediterranean anaemia, is a severe haemolytic anaemia and afflicted individuals cannot survive unless they receive blood transfusions regularly or undergo bone marrow transplantation. Thalassaemia major is a clinical entity found in individuals homozygous for α or β chain defect i.e. HBαTh/HBαTh or HBβTh/HB βTh, where Th denotes thalassemia. Thalassaemia minor occurs in individuals heterozygous for either α or β chain defect i.e. HBαN/HBαTh or HBβN/ HB βTh, where N denotes normal production of α or β chain.

In India, the occurrence of both α and β thalassemia’s has been reported. βthalassaemia major has been reported predominantly from East India while cases of it, both major and minor, have been reported from other parts of India, including Uttar Pradesh, Punjab, Gujarat and Kerala. Various haemoglobin variants, more frequently HBE, occur in combination with β-thalassaemia. In α thalassaemia, haemoglobin Barts was detected in 2.4% Bengali population while haemoglobin H (HBH) cases have been reported from Calcutta in East India and West India. Sukumaran (1974) observed that β-thalassemia is probably the commonest inherited haemoglobin disorder in the Indian subcontinent.

2.7 GEOGRAPHIC DISTRIBUTION OF HB*S, HB*D AND HB*E IN INDIAN POPULATIONS

HB*S

It is the most common variant haemoglobin allele in India and is present mostly in South, West and Central India which inhabit sizeable autochthonous tribal population of the country. The allele has not been reported from North India, except in Chandigarh with a frequency of 0.011. HB*S shows polymorphic proportions in the states of Karnataka (0.082), Andhra Pradesh (0.038), Tamil Nadu (0.047), Kerala (0.045), Gujarat (0.065), Maharashtra (0.023), Dadra and Nagar Haveli (0.08) and Madhya Pradesh (0.061). The incidence of the allele for entire India has been estimated to be 0.031 (Bhasin et al., 1994).

HB*D

Though a distinct characteristic of populations originating from North-West border Indian state of Punjab, this variant allele of haemoglobin occurs sporadically almost all over North India. Bird et al. (1956) found 5 out of 279 Sikh subjects and one out of 13 Punjabi Hindu subjects to carry haemoglobin D. Five of these had haemoglobins A and D in heterozygous form (HBAD) and one had homozygous haemoglobin D (HBDD), giving a value 2.05% for haemoglobin D variants and a frequency of 0.012 for HB*D in Punjab. The authors concluded “It is clear, however, that haemoglobin D, though not common, is something more than a sporadic mutant as had been suspected when it had been found only in one Sikh” (Bird et al., 1955). In addition, this variant haemoglobin has also been reported in Audich Brahmin of Gujarat in 2 out of 200 subjects, giving a value of 1% for HBD variant and a frequency of 0.05 for HB*D (Parikh et al., 1969).

HB*E

It is the second most common variant haemoglobin allele present in people of India. But the distribution of the allele in the country is essentially limited to East India, where it has been reported with a frequency of 0.303 in Assam, 0.066 in Manipur, 0.363 in Meghalaya, 0.006 in Sikkim and 0.007 in West Bengal; the allele being polymorphic in the former three states. Thus, in general, HB*E is confined to people of East India, particularly the Eastern Himalayan populations with Mongoloid affinities. The allele is totally absent from West, Central and South India and from North India there is a solitary report from Uttar Pradesh.

Like any other tropical country of the world, India is a “great reservoir for abnormal haemoglobins” (Saha and Banerjee, 1971) and therefore further studies are desirable to find out the true incidence of haemoglobin variants in this mega diversity country. Needless to mention electrophoresis is essentially the technique of choice for such investigations.

Sample Questions

1) List different serum protein and red cell enzyme polymorphisms discovered in man. What are the uses of these genetic markers in anthropology?
2) What are normal and abnormal haemoglobins and what techniques are used to study them?
3) Discuss the two major components of haemoglobinopathies in man.
4) Give an account of the distribution of variant haemoglobin alleles in Indian populations.