This lesson will take place in a computer lab, guided
by your teacher.
information for teachers.
A brief definition of Bioinformatics that I like
is given by
Nilges and Linge (Institute Pasteur, France):
"Bioinformatics derives knowledge from computer analysis
of biological data".
It concerns large-volumes of biological information, recently
genomic sequences, gene expression data from microarrays,
protein-interactions, and three-dimensional ("3D") macromolecular
structure, but in a broader sense
includes various other sources such as clinical
trial data, neural networks, or the scientific literature.
Bioinformatics encompasses research with, and applications of
such information, as well as the development of the supporting
computational methods and tools.
As bioinformatics is a vast realm, we will deal with only a few topics
in this class, selected because your teacher knows something about them,
and for relevance to immunology.
- To learn how to find macromolecular structures at the
Protein Data Bank (www.pdb.org).
- To learn how to use Protein Explorer to investigate any
published protein structure
- To understand the structures of antibodies (1IGT, 1FDL).
- To locate the conserved vs. highly mutated residues
in antibody (using ConSurf on 1FDL).
- Optional: To understand the structures of MHC class I (2VAA, 2VAB)
and II (1F3J).
- Searching for Macromolecular 3D Structures at the Protein Data Bank
- Open Netscape or Internet Explorer, and go to the
Protein Data Bank (www.pdb.org).
- Search for "immunoglobulin".
- How many hits did you get?
- Return to the main page, and search for "intact antibody".
- How many hits did you get?
- Click on the EXPLORE link for 1IGT.
- What species did 1IGT come from?
- What is the isotype of 1IGT?
- Using Protein Explorer to answer questions about 3D structure
- Run Protein Explorer (PE)
- Intact immunoglobulin.
Enter 1IGT in the slot on PE's Frontdoor, and press Enter.
In class, you will be taught how to answer the following questions.
If you are working alone outside of class, follow the QuickTour linked to
PE's FrontDoor, and afterwards use the methods you learn there to answer
the following questions.
- How many chains does intact IgG have?
- Does the published 1IGT structure include water?
- What is reported when you click on the blue chain?
- What is the "ligand"?
Covalently-attached carbohydrates are usually "N linked"
(via the sidechain Nitrogen in asparagine -- single letter amino acid
Use the following steps to answer this question:
Which atom of which amino acid is covalently linked to the carbohydrate?
- SELECT Ligand
- DISPLAY Ball & Stick
- DISPLAY Contacts
- Step-by-step, complete all steps, using the
[Zoom+] button as needed.
- In the left middle frame, scroll down to
"Controls for Contact Surfaces", and click
Hide Non-contact surface regions.
- Hide Atoms inside surface.
- Examine the inside of the surface, looking for an atom that penetrates
through the surface. This means it is close enough to ligand to be
covalently bound. Click on it to identify it.
- What carbohydrate residue is closest to (covalently bound to)
Click to show Bonded atoms inside surface.
(This software does not show the covalent bond between the protein and
carbohydrate as a stick -- but it should.)
- What does NAG stand for? Hint: click the "Ligand" button
and read the left middle frame to learn how to get full names
for ligand ("hetero" atom) abbreviations.
- What secondary structures make up antibody?
- Click Reset View.
- Return to QuickViews.
- Click the [2o] button.
- Which chains are the light chains?
- Click Reset View.
- Return to QuickViews.
- Click the light chains and notice their one-letter names reported
in the message box.
- How many Ig domains are in a light chain?
- Show only chain A: SELECT Chain A, DISPLAY Only.
- Press the "Center" button, and cancel to center all selected atoms.
- Now you can see that chain A has two immunoglobulin (Ig) domains.
Each Ig domain consists of two beta-sheets "stapled" together with
a disulfide bond.
The 2 domains are connected by a linker chain.
- How many Ig domains are in a heavy chain?
- How many Ig domains are in an intact IgG?
- Approximately how many amino acids are in one Ig domain?
It may help to SELECT All, COLOR N->C Rainbow.
- Where is the Fc?
- How many Ig domains are in the Fc?
- What is the function of the carbohydrate in the Fc?
- How many epitope-binding sites are on an IgG, and where are they?
- Fab : lysozyme complex.
Bring the FrontDoor window to the foreground, enter 1FDL in the slot,
and press Enter.
1FDL contains a single Fab bound to its cognate antigen, lysozyme.
- What are the names of the chains that make up the Fab?
- What is the secondary structure of lysozyme?
- What kinds of bonds hold the antigen to the antibody?
- What kinds of bonds hold the two chains of the Fab together?
Single Immunoglobulin Domain
- What is the secondary structure of the constant domain of the
- Reset View
- Click on the light chain to find the range of amino acid sequence
numbers for the constant domain.
- In QuickViews, DISPLAY Sequences, and select Seq3D.
- Select the Range option in Seq3D.
- Select your range in chain L (approx. 107-214, then close Seq3D).
- DISPLAY Only.
- Center, cancel.
- DISPLAY Cartoon.
- COLOR Structure.
- How are hydrophobic and polar residues arranged in an Ig domain?
- DISPLAY Spacefill.
- COLOR Polarity3.
- Slab. (Align the Ig domain with its long axis perpendicular
to the screen, and slide the slab plane in and out by holding
down the Ctrl key while dragging up and down.)
- What is the explanation of the largest hydrophobic patch
on the surface of this CL domain?
- Slab off.
- SELECT Chain H.
- DISPLAY Backbone.
Epitope : Paratope Contacts
- Are the bonds between the antibody and the antigen
hydrophobic, polar, or both?
- Reset View.
- SELECT Chain Y
- DISPLAY Contacts, step-by-step: After step 5, notice the
elements that make up the noncovalently bonded atoms.
- Complete step 8.
- Do both heavy and light chains bind to the antigen?
- Scroll down in the left middle frame
to "Controls for Contact Surfaces", and click "Atoms outside surface,
- Center, cancel (to center the bonded atoms).
- SELECT Protein.
- Use QuickViews Plus to change the DISPLAY mode so that new displays
are added to existing ones (a red plus should appear above the DISPLAY
- DISPLAY Cartoon.
- COLOR Chain.
- Do the beta strands of the variable Ig domains bind directly to antigen?
- Where are the CDR3's of the H and L chains bound to the antigen epitope?
- COLOR Element (CPK).
- (QuickViews Plus: new display is hidden) -DISPLAY Cartoon.
- (QuickViews Plus: new display is added) +DISPLAY Backbone.
- (QuickViews Plus: Backbone thickness 0.01.
- Use Seq3D to color selected ranges green with
no change to the display. Color chain L 90-97
and chain H 98-105.
- Contact controls: Atoms outside surface Bonded
- DISPLAY Spacefill.
- Contact controls: Surface Transparent
(If you are interested, CDR1's are L 24-34 and H 31-35;
CDR2's are L 50-56 and H 50-66. CDR's have approximately the same
locations in different antibodies, but the exact residues where
they begin and end must be looked up in
a reference source for each antibody amino acid sequence.)
Sizes: macromolecules vs. cells
- What is the approximate diameter of intact IgG?
Hint: DISPLAY Distances.
- If a lymphocyte has a diameter of 10 micrometers,
what is the diameter ratio for lymphocyte/antibody?
Finding Patterns of Conservation and Mutation with ConSurf
- Meaning of conservation in 3D protein structure
Patches of amino acids on the surface of a protein that are highly
conserved in evolution often represent patches responsible for critical
functions of the protein. For example, the catalytic site of an enzyme is
typically more conserved in evolution than are other regions of the protein.
In contrast, there are cases where
regions of proteins undergo unusually high mutation rates
in order to support the function of that protein.
- Can you think of a protein,
related to infection or defense,
where a high mutation rate
in some residues is beneficial?
The portions of amino acid sequence that perform a function, such as
enzymatic catalysis, are often scattered in multiple sites in the linear
sequence. When the protein folds, these sites come together to form a
functional 3D structure. Since protein folding cannot be predicted reliably
from sequence alone, locating the most conserved residues in a sequence
does not tell you whether they fold together to form a functional site. On
the other hand, when contiguous patches of conserved residues are
identified in a known 3D protein structure, most likely they
are functionally important.
The molecule at left is enolase, an enzyme in energy metabolism. The
dark residues are highly conserved in evolution from bacteria
through yeast, insects and humans. They form a depression ringing the
catalytic site. (The PDB file is 4ENL. For more information, see
In order to find conserved patches, one must compare
sequences for the same molecule from different species, or different
individuals of the same species. The sequences must be found, selected, and
aligned intelligently producing a multiple protein sequence alignment.
An example of a portion of a sequence alignment for enolase is below, using
single-letter amino acid code (listed in Protein Explorer's
Help/Index/Glossary under Amino Acids). Bold residues are
totally conserved; CAPITALIZED residues have mutations to residues
with chemically similar sidechains; lower-case residues are mutated to
residues with chemically different sidechains.
- Can you think of immune system proteins
where it would be useful to compare the different protein molecules
within the same individual?
- Using ConSurf to Identify Conserved Patches
Until 2001, identifying conserved patches of residues
in 3D protein structures was a laborius process. You had to
search for and select amino acid sequences, align them, and use the
alignment to color a 3D structure.
(Instructions for making custom
alignments using Biology Workbench are available within
Protein Explorer: go to QuickViews, then Advanced Explorer, then
The ConSurf Server, which became available in 2001, makes this process
much easier, and uses the most sophisticated methods.
It is the first server to automate the entire process.
ConSurf does the following steps completely automatically:
- Obtains the amino acid sequence of the specified
3D chain in specified PDB file.
- Searches the
SWISS-PROT protein sequence database
(using PSI-BLAST) to find the most similar sequences.
SWISS-PROT is probably the highest quality protein sequence database.
In 2002, the US NSF announced funding to make it the centerpiece of
a central international facility named Uniprot.
By default uses up to 50 sequences with
E < 0.001.
The Expectation value E is the number of hits Expected by chance with
the sequence matching level observed, taking into account
the size of the sequence database and length of the
query sequence. Low values of E mean increasing significance of the match.
- Aligns sequences (with the CLUSTAL-W algorithm).
Optionally, the user may supply the sequence alignment.
- Constructs a phylogenetic tree from sequence alignment.
- Calculates conservation grades for each residue in the 3D chain
using a new maximum likelihood algorithm.
- Displays the 3D molecule colored by conservation grades in Protein Explorer.
The maximum liklihood method employed by ConSurf is superior to
previous methods, such as maximum parsimony.
It is less sensitive to high redundancy or a few
outliers in the alignment, giving a more robust result.
Mutations in distantly related sequences are devalued
since more mutations are expected between long branches of the
Thus, conservation will be detected despite inclusion of distantly
Yet conservation in closely related
sequences is devalued, since it is expected.
(At ConSurf, see
Maximum Liklihood Method
Advantages of Using Phylogenetic Trees.)
- Conservation of MHC (ConSurf Gallery)
- What domain of MHC class I has the most conserved patches?
Domains are named starting with the amino terminus: alpha-1,
alpha-2 (these two form the groove), alpha-3 (pairing with
- If you are running Protein Explorer, in QuickViews, DISPLAY Evolution.
- Otherwise go to consurf.tau.ac.il.
- Look in ConSurf's GALLERY.
- Click on Example 1. MHC Class I Heavy Chain
to see the result in Protein Explorer.
- What function must be served by at least one of the conserved patches
in the alpha-3 domain?
- What accounts for the high mutation rate in the groove?
- What function do you suppose is served by the higly conserved pit in
the floor of the groove?
- Conservation of Antibody
- What is the pattern of conservation for the variable domain of
ConSurf has a tendency to find alignments for only one domain of a
multidomain chain. For many antibody heavy and light chains (e.g. 1FDL)
it happens to align primarily the constant domains. One way to force
ConSurf to process an antibody variable domain is
to run a PDB file that contains only a single variable domain.
1IVL contains only the variable domain of an antibody light chain,
- Display 1IVL in Protein Explorer. The two domains are copies
of the same VL (which you could verify with DISPLAY Sequences).
- DISPLAY Evolution.
- At ConSurf, in the Chain Identifier slot,
specify either chain (A or B).
- Increase the Max. Number of Homologues to 150.
- To avoid overwhelming the ConSurf server with dozens of requests
to process the same job, here is a
link to the result from a previous
If you were submitting a different job, at this point you would press
the Submit button. You should then bookmark the job page
so you can come back to it later without recalculating the job.
- After the job is completed (or at the result of the previous
submission), examine the Multiple Sequence Alignment.
Look for regions where 1IVL has residues (not dashes), but
only a few sequence are aligned out of the total
of 150. The conservation scores for such regions are unreliable.
In this case, there are none. If you run 1FDL, the entire variable domain
It is planned that a
future version of ConSurf will identify these unreliable
regions in the 3D display.
- At the results for 1IVL, click View ConSurf Results with
- What are the three ranges of residues with high mutation
- Click ConSurf Seq3D.
- In the Seq3D window, touch the first and last turquoise (least conserved)
residues in each of the 3 ranges of turquoise residues,
noting their sequence number ranges.
- How do these 3 high-mutation sequence ranges compare with
the ranges given above for the 3 CDR's of the VL of 1FDL?
Optional: If you have time, use Protein Explorer to answer the following questions.
- Major Histocompatibility Class I.
At PE's FrontDoor, enter 2VAA in the slot, and press Enter.
- How many chains are present?
- What is each chain called?
- Where is the transmembrane region?
- COLOR N->C terminus
- Mol Info, PDB Header, click Other Sources, then the small
All under the Sequence Data block.
Notice that under SwissProt, the description of ha1b_mouse
agrees with our structure, namely H-2K(b). Click it,
going to NiceProt.
- Scroll way down, looking for a graphic at the right edge named
Feature aligner. Click it.
This should take you to
- Note that residues 1-22 are the signal sequence. Subtract 21
from the Feature aligner's sequence numbers to get the
mature protein numbers, e.g. the final Trp (W)
in the extracellular alpha-3 domain is 295 - 21 = 274.
Compare this to the C-terminal residue in 2VAA.
- Which domain faces away from the membrane of the antigen-presenting
cell, facing the T lymphocyte?
- Does MHC class I contain any Ig domains?
- What is the secondary structure of the peptide-binding domain?
- Which chains are coded by the host genome, and which by a foreign
- Do both host chains have allelic variability?
- In the chain with allelic variability, where do most of the allelic
differences cluster in the 3D structure?
- If your teacher has recently run 2VAA in ConSurf, use the
URL for the completed job to reduce strain on the ConSurf server.
- Click on Mol Info, and select ConSurf.
- At the ConSurf site, enter 2VAA, put A in the Chain slot,
and Submit the job.
- Residues critical for binding of CD8 are chain A, 223-229.
Where are these located?
Peptide and Presentation to T Cells
- How many residues are in the peptide?
- Which chains does the peptide contact?
- What would the T cell recognize as the 2VAA peptide's contribution to
- At PE's FrontDoor, under the slot, click on Comparator.
Enter into the two slots 2VAA and 2VAB and start
the Comparator session.
Do the following for each molecule:
- SELECT Chain B.
- DISPLAY Hide Sel.
- SELECT Chain A.
- DISPLAY Surface.
- SELECT Chain P.
- DISPLAY Spacefill.
- COLOR Polarity5.
- Answer the previous question for the 2VAB peptide's epitope.
- How does the groove accomodate the ends of the peptide?
- What are the origins of the two peptides?
Hint: Mol Info, Header.
Major Histocompatibility Class II.
At PE's FrontDoor, enter 1F3J in the slot, and press Enter.
Hint: Mol Info, PQS, from now on work with 1f3j_1!
How does the secondary structure of class II compare with that
of class I?
What are the ligands?
What chains does the peptide bind to?
Which chains have allelic variability?
How long is the peptide?
How does the groove accomodate the ends of the peptide?
What is the origin of the peptide?
- How many MHC II molecules are present?
Other molecules of immunological interest:
- See the Immune System section of the Atlas of
Macromolecules available from the FrontDoor page of Protein Explorer
CD4 (human) complexed to HIV gp120 and Fab (human)
CD8 complexed to MHC class I,
(1BQH). Only the terminal domains of the CD8 alpha-alpha homodimer are present.
H-2Kb with VSV8 peptide (same as 2VAA).
T Cell Receptor,
complexed to HTLV1 viral peptide presented by HLA-A2 (1AO7).
Interferon Gamma, bovine, expressed in E. coli
Interleukin-4 complexed to alpha-chain of receptor, human, expressed in E. coli
Interleukin-1 receptor complexed to IL-1 receptor antagonist, human, expressed in E. coli
Chemokine IL-8, human (mutated), expressed in E. coli
Miscellaneous molecules including hemoglobin, lipid bilayers, the
rhinovirus capsid, ATP, olestra, NaCl, etc.
Information for Teachers of this lesson:
This lesson is designed for two 3-hour class meetings in a computer
lab. The topic is immunology, and since the QuickTour uses Gal4 (of
no immunological interest) as the main example,
this lesson bypasses the QuickTour.
Instead, the teacher should guide the students in how to use
PE, while using the antibody structures as examples. In my experience,
the best way to do this is to show one feature or tool, and then immediately give the
students a few mintutes to try it out. This cycle should be repeated
for each new feature. This alternating cycle works much better than showing everything in
one long continuous introductory demonstration, and then expecting the students
to do it on their own afterwards.
If sufficient time remains (or for students with previous PE experience),
the students may be invited to tackle the questions on MHC using PE on
By design, this lesson plan does not list "click by click" instructions
for every question.
This design is intended to convey to the students that
PE is by nature a largely self-explanatory exploration tool, and
that you do not need detailed instructions in order to use it.
The goal is to give the students a sense of empowerment for future
use of PE on molecules that come up in their studies.
If you use this lesson plan in your class, please collect
student's assessment of their learning gains.
Teachers having no experience with PE should first do the QuickTour (allow
2-3 hours for a thorough trip),
and then use the principles learned there to work out how to demonstrate
the key points of antibody structure in a QuickTour-like sequence for
the students. (Allow another couple of hours for this.)
Not every point in the QuickTour needs to be
Answers to the questions are available by email request from teachers
who agree not to post them on the web. Contact
and include evidence of your status as a teacher, such as the URL for