Proteomics – Anthropology by K.V RAMESH

The term ‘Proteome’ was coined by Marc Wilkins in 1994. This term is a linguistic
equivalent to genome and deals with large scale analysis of the complete, or at
least the major, set of proteins. The Proteome can be defined in terms of the
sequence, structure, abundance, localisation, modification, interaction and
biochemical function of its components.
Importance of proteomics
1) Genes are instruction carriers, while the proteins are the functional molecules
of the cells and a true understanding of them can come from the direct study
of proteins.
2) Unlike the genome whose content with few exceptions remain the same
irrespective of cell type or environmental conditions, the proteome is dynamic
whose content varies under different conditions due to the regulation of
transcription, RNA processing, protein synthesis and protein modification.
Study of the proteome can provide the glimpse of the cell in action.

3) A good understanding on the structure and function of a protein may provide
clues to introduce mutations in order to better understand their function.
4) Transcriptome may not represent the true insight on proteome, because not
all mRNA in the cell are translated and rates of protein synthesis and protein
turnover may vary among transcripts.
5) Difference in the stability of mRNA and efficiencies in translation can affect
the generation of new proteins. Some transcripts may give rise to multiple
proteins. For instance, 22 different forms of alpha-1-antitrypsin were
observed in plasma. The individual functions of these proteins can be studied
at the protein level only.
6) In certain body fluids like serum, cerebrospinal fluid and urine where nucleic
acids are not represented, proteins only provide information about
determinants of disease progression. In case of degradation and cross linking
of nucleic acids in fixed biological specimens, protein may only act as a
source material for further study. In many diseases proteins are the drug
targets.
7) Proteomics attempts to bridge the gap between our understanding of genome
sequence and cellular behaviour.
A brief account on proteins is as follows:
Proteins: The term ‘protein’ is derived from the Greek word ’Proteios’, meaning
of the first order. J.J.Berzelius in 1938 coined the term ‘protein’ to describe a
class of macromolecules abundant in living organisms. Proteins constitute about
50% of dry weight of the cell. They are made up of amino acids which were
earlier 20 in number, but recently two more such as selenocysteine and pyrrolysine
were added to the list. Except in marine microorganisms, in which both D and L
amino acids were found, proteins contain L-amino acids only. The D and L refers
to the property of amino in response plane polarized light. The amino acids form
peptide bonds involving carboxylic group of one amino acid with amino group
of other amino acids. The involved amino acids are called amino residues.
Depending on the number of amino acid involvement in peptide bonds they are
called bi, tri, tetra etc. A protein may have one or more polypeptides (a string of
amino acids) folded mostly into either globular or fibrous form.
Proteins are synthesized by the translation of mRNA into polypeptides on
ribosomes. After translation, they undergo 400 types of reversible and irreversible
chemical reactions like glycosylation, phosphorylation, which are collectively
called as post translational modifications. At any given time in a cell, the level of
protein depends on the rate of transcription of the gene, the efficiency of translation
of mRNA into protein and the rate of degradation of the protein. Various agents
like oxidants, radiation, chemicals cause modification of proteins which lead to
their degradation. Phosporylation accompanied by conjugation with ubiquitinin
and lysosomal enzymes also effect degradation of proteins.
Structure: The structure of protein can be explained at four levels.
a) Primary:This contains the linear sequence of amino acids which are bonded
by covalent peptide bonds or linkages. This structure determines its function
and the composition of amino acids is responsible for physical and chemical
properties.

b) Secondary:This is formed by twisting of polypeptide chain leading to spatial
arrangement of protein. The basis of secondary structure depends on the
pattern of hydrogen bonds between amide and carboxylic groups. á helix
and â sheets are the known two main types of secondary structures. á helix
has a rigid arrangement of a polypeptide chain in which amino acid side
chains extend outward from the central axis. It is stabilized by extensive
hydrogen bonding. In case of â sheets, hydrogen bonds are observed between
the neighbour segments of polypeptide chains. The arrangement of
polypeptide in â sheets is either parallel (same direction) or anti-parallel
(opposite direction).
c) Tertiary: This structure provides the stability of the protein. It is the three
dimensional structure. In this, hydrophobic side chains are held interior and
hydrophilic groups are seen on the surface of the protein.
d) Quaternary: It is the spatial arrangement of subunits (polypeptide
chains).These subunits are held together by noncovalent bonds like hydrogen
bonds, hydrophobic interactions and ionic bonds. Depending on the number
of polypeptides these subunits are known as mono, di,tri or tetramers. If
they are identical they are called homo or if unrelated they are known hetero.

Subfields of Proteomics
1) Sequence and Structural proteomics: Protein sequences allow the designing
of probes or primers which can be used to isolate the cDNA or genomic
sequence. It is the protein sequence which acts as a bridge between the activity
of a protein and the genetic basis of a particular phenotype. Increasing
deposition of protein sequences and consequent development of statistical
techniques are facilitating the comparison of proteins. Three primary sequence
databases are Genbank, EMBL (European Molecular Biology Laboratory),
DDBJ (DNA Data Bank of Japan) that provide translated protein sequences
from DNA sequences, whereas, SWISSPROT is a dedicated protein sequence
data bank.
Similar sequences may gives rise to similar structures and this idea has given
birth to new branch of proteomics known as structural proteomics which
paved the way for storage, presentation, comparison, inferring evolutionary
relationships and prediction of theoretical protein models, a boon in the
absence of crystallographic protein structures, for drug discovery research.

2) Expression proteomics: It is concerned with the analysis of protein
abundance, separation of protein mixtures, the identification of individual
components and their systematic quantitative analysis. This sub branch lays
emphasis on differences representing alternative states like health and disease
and characterisation of post-translational modifications. The key tools used
in investigations involve 2D gel electrophoresis, mass spectrometry, multidimensional chromatography and protein chips.
3) Interaction proteomics: It deals with the genetic and physical interactions
among proteins as well as interactions between proteins and nucleic acids or
small molecules. Study of protein interactions not only provides insight on
the function of individual proteins but also how proteins function in pathways,
networks, and complexes. It seeks to achieve creation of proteome linkage
map based on binary interactions between individual proteins and higher
order interactions determined by the systematic analysis of protein complexes.
Interactions between proteins and nucleic acids emphasize on processes such
as gene regulation, while interaction of proteins with small molecules may
enlighten on the interaction of enzymes with substrates and receptors with
their ligands and also may play an important role in drug development process.
The key approaches used in studies of this kind of interaction are yeast, two
hybrid system, mass spectrometry, biochemical assays and X-ray
crystallography.
4) Functional Proteomics: This lays emphasis on testing protein functions on
a large scale such as testing expressed proteins for different enzymatic
activities.
Tools used in Proteomics
1) Databases: Protein Expressed sequence tag and complete genome sequence
provide information on all the expressed proteins in organisms.
2) Mass Spectrometry: It provides information on molecular measurement
(>100KDa) and sequence analysis of proteins.
3) Soft ware tools: These tools determine the sequence of a protein with the
aid of specialised algorithms and provide automated survey of large amounts
of mass spectrometry data for protein sequence matches.
4) Protein Separation technologies: They resolve complex protein mixtures
into individual proteins and permit comparison of differences in protein levels
between two samples. The key technologies include 2Dimensional gel
electrophoresis, SDS-Poly acrylamide gel electrophoresis, high performance
liquid chromatography, capillary electrophoresis, affinity and ion exchange
chromatography.
Applications
1) Identification and cataloguing of proteins.
2) Identification of proteins in a sample of differentiation, developmental state,
disease state and exposed to a drug, chemical or physical stimulus.
3) Determinate how proteins interact with each other in living systems and
characterisation of proteins in more complex networks.
4) Mapping of proteins in post-translational modifications.

Challenges: No single technological approach is suitable for every application.
Integration and automation of these approaches, using of better materials,
advancement in instrument design and methodology for improving sensitivity,
resolution and repeatability are the challenges before the proteomics community
in order to provide a comprehensive analysis of complex biological system.