Bioinformatics:
Visualization of 3D Antibody Structure and Evolution

by Eric Martz for Immunology Lab (Microbiology 542)
To use the links in this document, go to www.bio.umass.edu/immunology/molvis.htm

This lesson will take place in a computer lab, guided by your teacher. Here is information for teachers.

Objectives:

A brief definition of Bioinformatics that I like is given by Nilges and Linge (Institute Pasteur, France): "Bioinformatics derives knowledge from computer analysis of biological data". It concerns large-volumes of biological information, recently genomic sequences, gene expression data from microarrays, protein-interactions, and three-dimensional ("3D") macromolecular structure, but in a broader sense includes various other sources such as clinical trial data, neural networks, or the scientific literature. Bioinformatics encompasses research with, and applications of such information, as well as the development of the supporting computational methods and tools. Other definitions: NIH; bioinformatics.org. As bioinformatics is a vast realm, we will deal with only a few topics in this class, selected because your teacher knows something about them, and for relevance to immunology.
  1. Searching for Macromolecular 3D Structures at the Protein Data Bank
    1. Open Netscape or Internet Explorer, and go to the Protein Data Bank (www.pdb.org).
    2. Search for "immunoglobulin".
      1. How many hits did you get?
    3. Return to the main page, and search for "intact antibody".
      1. How many hits did you get?
    4. Click on the EXPLORE link for 1IGT.
      1. What species did 1IGT come from?
      2. What is the isotype of 1IGT?

  2. Using Protein Explorer to answer questions about 3D structure
    1. Run Protein Explorer (PE) (proteinexplorer.org).
    2. Intact immunoglobulin. Enter 1IGT in the slot on PE's Frontdoor, and press Enter.

      In class, you will be taught how to answer the following questions. If you are working alone outside of class, follow the QuickTour linked to PE's FrontDoor, and afterwards use the methods you learn there to answer the following questions.
        In FirstView:
      1. How many chains does intact IgG have?
      2. Does the published 1IGT structure include water?
      3. What is reported when you click on the blue chain?

        In QuickViews:

        Ligand

      4. What is the "ligand"?
      5. Covalently-attached carbohydrates are usually "N linked" (via the sidechain Nitrogen in asparagine -- single letter amino acid code "N"). Use the following steps to answer this question:
        Which atom of which amino acid is covalently linked to the carbohydrate?
        1. SELECT Ligand
        2. DISPLAY Ball & Stick
        3. DISPLAY Contacts
        4. Step-by-step, complete all steps, using the [Zoom+] button as needed.
        5. In the left middle frame, scroll down to "Controls for Contact Surfaces", and click Hide Non-contact surface regions.
        6. Hide Atoms inside surface.
        7. Examine the inside of the surface, looking for an atom that penetrates through the surface. This means it is close enough to ligand to be covalently bound. Click on it to identify it.
      6. What carbohydrate residue is closest to (covalently bound to) the Asn?
          Click to show Bonded atoms inside surface.
          (This software does not show the covalent bond between the protein and carbohydrate as a stick -- but it should.)
      7. What does NAG stand for? Hint: click the "Ligand" button and read the left middle frame to learn how to get full names for ligand ("hetero" atom) abbreviations.

        Secondary Structure

      8. What secondary structures make up antibody?
        1. Click Reset View.
        2. Return to QuickViews.
        3. Click the [2o] button.

        Immunoglobulin domains

      9. Which chains are the light chains?
        1. Click Reset View.
        2. Return to QuickViews.
        3. Click the light chains and notice their one-letter names reported in the message box.
      10. How many Ig domains are in a light chain?
        1. Show only chain A: SELECT Chain A, DISPLAY Only.
        2. Press the "Center" button, and cancel to center all selected atoms.
        3. Zoom!
        4. Now you can see that chain A has two immunoglobulin (Ig) domains. Each Ig domain consists of two beta-sheets "stapled" together with a disulfide bond. The 2 domains are connected by a linker chain.
      11. How many Ig domains are in a heavy chain?
      12. How many Ig domains are in an intact IgG?
      13. Approximately how many amino acids are in one Ig domain?
        It may help to SELECT All, COLOR N->C Rainbow.

        Fc

      14. Where is the Fc?
      15. How many Ig domains are in the Fc?
      16. What is the function of the carbohydrate in the Fc?

        Function

      17. How many epitope-binding sites are on an IgG, and where are they?

    3. Fab : lysozyme complex. Bring the FrontDoor window to the foreground, enter 1FDL in the slot, and press Enter.

        1FDL contains a single Fab bound to its cognate antigen, lysozyme.
      1. What are the names of the chains that make up the Fab?
      2. What is the secondary structure of lysozyme?
      3. What kinds of bonds hold the antigen to the antibody?
      4. What kinds of bonds hold the two chains of the Fab together?

        Single Immunoglobulin Domain

      5. What is the secondary structure of the constant domain of the light chain?
        1. Reset View
        2. Click on the light chain to find the range of amino acid sequence numbers for the constant domain.
        3. In QuickViews, DISPLAY Sequences, and select Seq3D.
        4. Select the Range option in Seq3D.
        5. Select your range in chain L (approx. 107-214, then close Seq3D).
        6. DISPLAY Only.
        7. Center, cancel.
        8. Zoom.
        9. DISPLAY Cartoon.
        10. COLOR Structure.
      6. How are hydrophobic and polar residues arranged in an Ig domain?
        1. DISPLAY Spacefill.
        2. COLOR Polarity3.
        3. Slab. (Align the Ig domain with its long axis perpendicular to the screen, and slide the slab plane in and out by holding down the Ctrl key while dragging up and down.)
      7. What is the explanation of the largest hydrophobic patch on the surface of this CL domain?
        1. Slab off.
        2. SELECT Chain H.
        3. DISPLAY Backbone.

        Epitope : Paratope Contacts

      8. Are the bonds between the antibody and the antigen hydrophobic, polar, or both?
        1. Reset View.
        2. SELECT Chain Y
        3. DISPLAY Contacts, step-by-step: After step 5, notice the elements that make up the noncovalently bonded atoms.
        4. Complete step 8.
      9. Do both heavy and light chains bind to the antigen?
        1. Scroll down in the left middle frame to "Controls for Contact Surfaces", and click "Atoms outside surface, 7 Å".
        2. Center, cancel (to center the bonded atoms).
        3. SELECT Protein.
        4. Use QuickViews Plus to change the DISPLAY mode so that new displays are added to existing ones (a red plus should appear above the DISPLAY menu).
        5. DISPLAY Cartoon.
        6. COLOR Chain.
      10. Do the beta strands of the variable Ig domains bind directly to antigen?
        COLOR Structure.
      11. Where are the CDR3's of the H and L chains bound to the antigen epitope?
        1. COLOR Element (CPK).
        2. (QuickViews Plus: new display is hidden) -DISPLAY Cartoon.
        3. (QuickViews Plus: new display is added) +DISPLAY Backbone.
        4. (QuickViews Plus: Backbone thickness 0.01.
        5. Use Seq3D to color selected ranges green with no change to the display. Color chain L 90-97 and chain H 98-105.
        6. Contact controls: Atoms outside surface Bonded
        7. DISPLAY Spacefill.
        8. Contact controls: Surface Transparent
          (If you are interested, CDR1's are L 24-34 and H 31-35; CDR2's are L 50-56 and H 50-66. CDR's have approximately the same locations in different antibodies, but the exact residues where they begin and end must be looked up in a reference source for each antibody amino acid sequence.)

        Sizes: macromolecules vs. cells

      12. What is the approximate diameter of intact IgG?
          Hint: DISPLAY Distances.
      13. If a lymphocyte has a diameter of 10 micrometers, what is the diameter ratio for lymphocyte/antibody?

  3. Finding Patterns of Conservation and Mutation with ConSurf
    1. Meaning of conservation in 3D protein structure

        Patches of amino acids on the surface of a protein that are highly conserved in evolution often represent patches responsible for critical functions of the protein. For example, the catalytic site of an enzyme is typically more conserved in evolution than are other regions of the protein. In contrast, there are cases where regions of proteins undergo unusually high mutation rates in order to support the function of that protein.

      1. Can you think of a protein, related to infection or defense, where a high mutation rate in some residues is beneficial?

        The portions of amino acid sequence that perform a function, such as enzymatic catalysis, are often scattered in multiple sites in the linear sequence. When the protein folds, these sites come together to form a functional 3D structure. Since protein folding cannot be predicted reliably from sequence alone, locating the most conserved residues in a sequence does not tell you whether they fold together to form a functional site. On the other hand, when contiguous patches of conserved residues are identified in a known 3D protein structure, most likely they are functionally important.

        The molecule at left is enolase, an enzyme in energy metabolism. The dark residues are highly conserved in evolution from bacteria through yeast, insects and humans. They form a depression ringing the catalytic site. (The PDB file is 4ENL. For more information, see http://molvis.sdsc.edu/protexpl/pv_msa3d.htm)

        In order to find conserved patches, one must compare sequences for the same molecule from different species, or different individuals of the same species. The sequences must be found, selected, and aligned intelligently producing a multiple protein sequence alignment. An example of a portion of a sequence alignment for enolase is below, using single-letter amino acid code (listed in Protein Explorer's Help/Index/Glossary under Amino Acids). Bold residues are totally conserved; CAPITALIZED residues have mutations to residues with chemically similar sidechains; lower-case residues are mutated to residues with chemically different sidechains.

      2. Can you think of immune system proteins where it would be useful to compare the different protein molecules within the same individual?

    2. Using ConSurf to Identify Conserved Patches

        Until 2001, identifying conserved patches of residues in 3D protein structures was a laborius process. You had to search for and select amino acid sequences, align them, and use the alignment to color a 3D structure. (Instructions for making custom alignments using Biology Workbench are available within Protein Explorer: go to QuickViews, then Advanced Explorer, then MSA3D.)

        The ConSurf Server, which became available in 2001, makes this process much easier, and uses the most sophisticated methods. It is the first server to automate the entire process. ConSurf does the following steps completely automatically:

        1. Obtains the amino acid sequence of the specified 3D chain in specified PDB file.
        2. Searches the SWISS-PROT protein sequence database (using PSI-BLAST) to find the most similar sequences.
            SWISS-PROT is probably the highest quality protein sequence database. In 2002, the US NSF announced funding to make it the centerpiece of a central international facility named Uniprot.
          By default uses up to 50 sequences with E < 0.001.
            The Expectation value E is the number of hits Expected by chance with the sequence matching level observed, taking into account the size of the sequence database and length of the query sequence. Low values of E mean increasing significance of the match. (NCBI BLAST FAQ)
        3. Aligns sequences (with the CLUSTAL-W algorithm).
          Optionally, the user may supply the sequence alignment.
        4. Constructs a phylogenetic tree from sequence alignment.
        5. Calculates conservation grades for each residue in the 3D chain
          using a new maximum likelihood algorithm.
        6. Displays the 3D molecule colored by conservation grades in Protein Explorer.

        The maximum liklihood method employed by ConSurf is superior to previous methods, such as maximum parsimony. It is less sensitive to high redundancy or a few outliers in the alignment, giving a more robust result. Mutations in distantly related sequences are devalued since more mutations are expected between long branches of the phylogenetic tree. Thus, conservation will be detected despite inclusion of distantly related sequences. Yet conservation in closely related sequences is devalued, since it is expected. (At ConSurf, see Maximum Liklihood Method and Advantages of Using Phylogenetic Trees.)

    3. Conservation of MHC (ConSurf Gallery)
      1. What domain of MHC class I has the most conserved patches?
        Domains are named starting with the amino terminus: alpha-1, alpha-2 (these two form the groove), alpha-3 (pairing with b2-microglobulin).
        1. If you are running Protein Explorer, in QuickViews, DISPLAY Evolution.
        2. Otherwise go to consurf.tau.ac.il.
        3. Look in ConSurf's GALLERY.
        4. Click on Example 1. MHC Class I Heavy Chain to see the result in Protein Explorer.
      2. What function must be served by at least one of the conserved patches in the alpha-3 domain?
      3. What accounts for the high mutation rate in the groove?
      4. What function do you suppose is served by the higly conserved pit in the floor of the groove?

    4. Conservation of Antibody
      1. What is the pattern of conservation for the variable domain of immunoglobulin?

        ConSurf has a tendency to find alignments for only one domain of a multidomain chain. For many antibody heavy and light chains (e.g. 1FDL) it happens to align primarily the constant domains. One way to force ConSurf to process an antibody variable domain is to run a PDB file that contains only a single variable domain. 1IVL contains only the variable domain of an antibody light chain, VL.

        1. Display 1IVL in Protein Explorer. The two domains are copies of the same VL (which you could verify with DISPLAY Sequences).
        2. DISPLAY Evolution.
        3. At ConSurf, in the Chain Identifier slot, specify either chain (A or B).
        4. Increase the Max. Number of Homologues to 150.
        5. To avoid overwhelming the ConSurf server with dozens of requests to process the same job, here is a link to the result from a previous submission.
            If you were submitting a different job, at this point you would press the Submit button. You should then bookmark the job page so you can come back to it later without recalculating the job.
        6. After the job is completed (or at the result of the previous submission), examine the Multiple Sequence Alignment. Look for regions where 1IVL has residues (not dashes), but only a few sequence are aligned out of the total of 150. The conservation scores for such regions are unreliable. In this case, there are none. If you run 1FDL, the entire variable domain is unreliable! (Example.) It is planned that a future version of ConSurf will identify these unreliable regions in the 3D display.
        7. At the results for 1IVL, click View ConSurf Results with Protein Explorer.
      2. What are the three ranges of residues with high mutation rates?
        1. Click ConSurf Seq3D.
        2. In the Seq3D window, touch the first and last turquoise (least conserved) residues in each of the 3 ranges of turquoise residues, noting their sequence number ranges.
      3. How do these 3 high-mutation sequence ranges compare with the ranges given above for the 3 CDR's of the VL of 1FDL?


    Optional: If you have time, use Protein Explorer to answer the following questions.

    1. Major Histocompatibility Class I. At PE's FrontDoor, enter 2VAA in the slot, and press Enter.
      1. How many chains are present?
      2. What is each chain called?
      3. Where is the transmembrane region?
        1. COLOR N->C terminus
        2. Mol Info, PDB Header, click Other Sources, then the small All under the Sequence Data block. Notice that under SwissProt, the description of ha1b_mouse agrees with our structure, namely H-2K(b). Click it, going to NiceProt.
        3. Scroll way down, looking for a graphic at the right edge named Feature aligner. Click it. This should take you to http://us.expasy.org/cgi-bin/ft_aligner?P01901.
        4. Note that residues 1-22 are the signal sequence. Subtract 21 from the Feature aligner's sequence numbers to get the mature protein numbers, e.g. the final Trp (W) in the extracellular alpha-3 domain is 295 - 21 = 274. Compare this to the C-terminal residue in 2VAA.
      4. Which domain faces away from the membrane of the antigen-presenting cell, facing the T lymphocyte?
      5. Does MHC class I contain any Ig domains?
      6. What is the secondary structure of the peptide-binding domain?
      7. Which chains are coded by the host genome, and which by a foreign microbe?

        Alleles

      8. Do both host chains have allelic variability?
      9. In the chain with allelic variability, where do most of the allelic differences cluster in the 3D structure?
        1. If your teacher has recently run 2VAA in ConSurf, use the URL for the completed job to reduce strain on the ConSurf server.
          Otherwise:
        2. Click on Mol Info, and select ConSurf.
        3. At the ConSurf site, enter 2VAA, put A in the Chain slot, and Submit the job.

        CD8

      10. Residues critical for binding of CD8 are chain A, 223-229. Where are these located?

        Peptide and Presentation to T Cells

      11. How many residues are in the peptide?
      12. Which chains does the peptide contact?
      13. What would the T cell recognize as the 2VAA peptide's contribution to the epitope?
        1. At PE's FrontDoor, under the slot, click on Comparator. Enter into the two slots 2VAA and 2VAB and start the Comparator session.
          Do the following for each molecule:
        2. SELECT Chain B.
        3. DISPLAY Hide Sel.
        4. SELECT Chain A.
        5. DISPLAY Surface.
        6. SELECT Chain P.
        7. DISPLAY Spacefill.
        8. COLOR Polarity5.
      14. Answer the previous question for the 2VAB peptide's epitope.
      15. How does the groove accomodate the ends of the peptide?
      16. What are the origins of the two peptides?
          Hint: Mol Info, Header.

    2. Major Histocompatibility Class II. At PE's FrontDoor, enter 1F3J in the slot, and press Enter.
      1. How many MHC II molecules are present?
          Hint: Mol Info, PQS, from now on work with 1f3j_1!
      2. How does the secondary structure of class II compare with that of class I?
      3. What are the ligands?
      4. What chains does the peptide bind to?
      5. Which chains have allelic variability?
      6. How long is the peptide?
      7. How does the groove accomodate the ends of the peptide?
      8. What is the origin of the peptide?

Other molecules of immunological interest:

 
Information for Teachers of this lesson:

This lesson is designed for two 3-hour class meetings in a computer lab. The topic is immunology, and since the QuickTour uses Gal4 (of no immunological interest) as the main example, this lesson bypasses the QuickTour. Instead, the teacher should guide the students in how to use PE, while using the antibody structures as examples. In my experience, the best way to do this is to show one feature or tool, and then immediately give the students a few mintutes to try it out. This cycle should be repeated for each new feature. This alternating cycle works much better than showing everything in one long continuous introductory demonstration, and then expecting the students to do it on their own afterwards.

If sufficient time remains (or for students with previous PE experience), the students may be invited to tackle the questions on MHC using PE on their own.

By design, this lesson plan does not list "click by click" instructions for every question. This design is intended to convey to the students that PE is by nature a largely self-explanatory exploration tool, and that you do not need detailed instructions in order to use it. The goal is to give the students a sense of empowerment for future use of PE on molecules that come up in their studies.

If you use this lesson plan in your class, please collect the student's assessment of their learning gains.

Teachers having no experience with PE should first do the QuickTour (allow 2-3 hours for a thorough trip), and then use the principles learned there to work out how to demonstrate the key points of antibody structure in a QuickTour-like sequence for the students. (Allow another couple of hours for this.) Not every point in the QuickTour needs to be covered.

Answers to the questions are available by email request from teachers who agree not to post them on the web. Contact emartz@microbio.umass.edu and include evidence of your status as a teacher, such as the URL for your faculty website.