NucleaRDB: Correlated mutation analysis (CMA)

NucleaRDB logo Copyright (C) 2002, NucleaRDB.

Remark:
This document was written for the GPCRDB. Although CMA is not molecule-speficic, some part of the text below can be GPCR-specific (e.g. the snakes are 2D Diagram to visualize GPCRs). In the NucleaRDB, positions detected by CMA can be visualized via multiple sequence alignments in Mview-like format.


Introduction to Correlated Mutational Analysis

Correlated mutation analysis (or for short CMA) is a powerful technique to determine 'important' residues if you have a multiple sequence alignment available. The first two questions will of course be "What do you mean by important?" and "How and why does it work?".

Before explaining CMA, we will explain how you can pragmatically use this information.

Suppose you have some ideas about mutating residues that influence the dimerisation of a receptor. Unfortunately, it looks as if 22 mutations are all equally likely candidates, but the student has time to make at least six more mutations before graduation time. In that case you should mutate the six residues with the highest CMA score.

An example

Lets take a look at the CMA snake below. Don't try to click on this plot. That does not work, but all the real CMA snakes in the GPCRDB are clickable.

A CMA snake

You see several coloured residues. In a real CMA snake, you can now click on those and that will get you directly at the corresponding location in the mutiple sequence alignment. So, clicking one after the other on the three red residues you would see:

    64     0  Q   Q      3.05 0.10        27   90  QQQQQQQQQQQQQQEEEEEQEEQQQEE
    65     0  H   H      3.11 0.20        27  100  HHHHHHHHHHHHHHHHHHHHHHHHHHH
    66     0  K   K      3.12 0.20        27  100  KKKKKKKKKKKKKKKKKKKKKKKKKKK
    67     0  K   K      3.15 0.21        27  100  KKKKKKKKKKKKKKKKKKKKKKKKKKK
    68     0  L   L      3.12 0.20        27  100  LLLLLLLLLLLLLLLLLLLLLLLLLLL
    69     0  R   R      3.11 0.17        27  100  RRRRRRRRRRRRRRRRRRRRRRRRRRR
    70     0  T   T      3.03 0.10        27   93  TTTTTTTTTSTTTTTTTTTTTTQTTTT

   136   341  Y   Y      8.00 2.00        27   80  YYYYYYYYYYYYYYWWWWWYWWYYYWW
   137     0  V   V      2.94 0.05        25   70  --VVVVVVIIVVIIMVMVVVMLMIILV
   138     0  V   V      3.11 0.14        27  100  VVVVVVVVVVVVVVVVVVVVVVVVVVV
   139     0  V   V      3.04 0.10        27   90  VVIVVVVVVVVVVVVVVVVIVVVIIVV
   140     0  C   C      3.15 0.21        27  100  CCCCCCCCCCCCCCCCCCCCCCCCCCC
   141     0  K   K      3.12 0.14        27  100  KKKKKKKKKKKKKKKKKKKKKKKKKKK
   142     0  P   P      3.05 0.11        27  100  PPPPPPPPPPPPPPPPPPPPPPPPPPP

   273   626  F   F      8.00 2.00        27   80  FFFFFFFMFFFFFFWWWWWFWWFFFWW
   274   627  Y   Y      3.06 0.15        27   96  YYYYYYYYYYYYYYYYYYWYYYYYYYY
   275   628  I   I      3.11 0.11        27  100  IIIIIIIIIIIIIIIIIIIIIIIIIII
   276     0  F   F      3.07 0.11        27  100  FFFFFFFFFFFFFFFFFFFFFFFFFFF
   277     0  T   T      2.91 0.07        27   87  TTTTTTTTTSTSTTTTTTTTTTITTST
   278     0  H   H      2.86 0.05        27   74  HHHHHHHHHNNNHHHHHNNHHHNHHNH
   279     0  Q   Q      3.02 0.15        27  100  QQQQQQQQQQQQQQQQQQQQQQQQQQQ
The columns mean:
    |     |   |   |        |    |          |   |      |
    |     |   |   |        |    |          |   |       \--> These are the
    |     |   |   |        |    |          |   |            residues at position
    |     |   |   |        |    |          |   |            279 in the 27 aligned
    |     |   |   |        |    |          |   |            sequences.
    |     |   |   |        |    |          |   |
    |     |   |   |        |    |          |    \--> The variability at this 
    |     |   |   |        |    |          |         position in the alignment.
    |     |   |   |        |    |          |         100 means totally conserved.
    |     |   |   |        |    |          |         Below 40, things are doubtful.
    |     |   |   |        |    |          |    
    |     |   |   |        |    |           \--> The number of sequences in the   
    |     |   |   |        |    |                alignment that has a residue at
    |     |   |   |        |    |                this position.
    |     |   |   |        |    |
    |     |   |   |        |     \--> The gap elongation penalty used at this
    |     |   |   |        |          position in the alignment
    |     |   |   |        |     
    |     |   |   |         \--> The gap open penalty used at this position 
    |     |   |   |              in the alignment
    |     |   |   |             
    |     |   |    \--> This column is of no value (yet)
    |     |   |   
    |     |    \--> The consensus sequence at this position in the alignment   
    |     |      
    |      \--> The so called arbitrary sequence number (the same position in      
    |           any GPCR alignment always gets this same number, or in other words
    |           every Arg in every DRY motif, for example, always get number 340).
    |  
     \--> This is simply the sequential number in the alignment. This number has no
          value whatsoever and is only needed if you want to compare the profile
          with the alignment.

Things get clearer when we put the three red residues directly underneath each other

                                                   123456789012345678901234567890
    64     0  Q   Q      3.05 0.10        27   90  QQQQQQQQQQQQQQEEEEEQEEQQQEE
   136   341  Y   Y      8.00 2.00        27   80  YYYYYYYYYYYYYYWWWWWYWWYYYWW
   273   626  F   F      8.00 2.00        27   80  FFFFFFFMFFFFFFWWWWWFWWFFFWW
Now you see why we call this correlated. Lets forget the M at position 626 in sequence 8 and the entire contribution of sequence 20. You than see that these three sequence positions are not conserved, but if one position is different between any two sequences, all three are different, and on top of that, the changes going from one sequence to the other are always between the same residue types.

Before you decide on trusting these CMA results, a few critical notes about them.

CMA explained

The text listed below is an early version of:

Sequence-function correlation in G protein-coupled receptors. W. Kuipers, L. Oliveira, A.C.M. Paiva, F. Rippmann, C. Sander, G. Vriend, C.G. Kruese, I. van Wijngaarden, A.P. IJzerman. In: Membrane protein models. Chapter 2. Eds J. Findlay (1995) BIOS Sci. Pub. Ltd.




ABSTRACT

G protein-coupled receptors (GPCRs) perform several functions such as ligand binding, signal transduction and G protein activation. A computational method is presented that determines which residues are important for these functions. The method uses correlation analysis to recognise residue patterns that correspond with functions. The basic idea is very simple: if, for example, a residue is conserved in all proteins that bind to the same agonist, then it could be involved in this binding.

This sequence pattern correlation technique was used to find the residues that determine the optimal wavelength of the photon absorbed by the retinal in opsins. The technique was also useful in the analysis of residues which play a role in ligand binding in several classes of biogenic amine receptors. An important aim of this work is to automatically detect residues that may be interesting targets for future mutagenesis studies.

INTRODUCTION

G protein-coupled receptors (GPCRs) form a large superfamily of proteins that transduce signals across the cell membrane. At the external side they receive a ligand, or, in case of opsins, a photon, and at the cytosolic side they activate a G protein. GPCRs can be divided in three main families: rhodopsin like, glucagon/secretin like and metabotropic like receptors (Oliveira et al. 1993, Kolakowsky 1993). All GPCRs consist of one single protein chain that crosses the membrane seven times, similar to bacteriorhodopsin (Henderson et al. 1990). Most ligands bind between the membrane helices, presumably at a similar position as the retinal in bacteriorhodopsin. Sometimes the periplasmic loops are also involved in ligand recognition. The second and third cytosolic loop and part of the (cytosolic) C-terminal end of the receptors are involved in G protein recognition and binding. Hundreds of GPCRs exist, and they can be activated by a multitude of agonists. Five steps can be observed in the process of GPCR activation:

Most experimental data available to date relates to either ligand binding or G protein interaction. It was shown that these two functions involve distinctively different sets of residues (Oliveira et al., 1994).

One of the central paradigms in protein research is that the sequence determines the structure, and the structure the function. Many G protein-coupled receptor sequences have been determined, and much functional data is available. Structural data, however, is not available, and modelling studies have not yet yielded adequately accurate models to allow for the inference of function. It would therefore be useful if in the sequence -> structure -> function pathway the structure could be skipped, or in other words if functional data could be abstracted directly from the sequence without the need for structure determination or modelling studies. Multiple sequence alignments provide a powerful tool for such analyses (Casari et al. 1994).

Ever since Eck and Dayhoff (Eck and Dayhof 1969) published how evolutionary changes in proteins could be described by a residue exchange matrix the question which matrix is the best remained under debate. Eck and Dayhoff determined a matrix that describes the likelihood of mutations during evolution. This matrix can be modified for usage as a scoring matrix in sequence alignment procedures. In this case the matrix is used to determine the likelihood that two residues occur at equivalenced positions in a sequence alignment.

However, at different positions in a protein the same mutations are not equally likely to occur, and one single scoring matrix for all positions in the sequences to be aligned is often not adequate. Overington et al. (1992) extended Dayhof's idea by using multiple matrices; one for residues in a helix, one for residues in a beta strand, etc. Scharf (1988) and Bowie et al. (1991) went one step further and created one exchange matrix for each position in the sequence. These so called structure based profiles can of course only be made if a three-dimensional structure for at least one of the family members is available. Sander and Schneider (1991) described structure based multiple sequence alignments, which are sequence profiles for protein structures that are produced from multiple sequence alignments.

GPCRs partly reside in an aqueous environment, and are partly embedded in a lipid membrane. Thus, very different physico chemical constraints apply to the residues, depending on their spatial location. Consequently, evolutionary pressure is different for different positions, and one single Dayhof-type matrix is inadequate for GPCR sequence alignment. We therefore use an iterative sequence alignment similar to the one described by Sander and Schneider (Sander & Schneider, 1991). We do not use a scoring matrix, but make the profile relate directly to the frequencies of the residues at each position in the multiple sequence alignment.

The main aim of most GPCR modelling studies in medicinal research is the design of (specific) agonists or antagonists. For these studies, knowledge of the residues involved in ligand binding is very important. Much indirect evidence points to the idea that the same drug often binds in structurally related receptors in a similar manner. In these cases, a correlation between the presence or absence of certain key receptor residues and the affinity for a specific drug can be expected.

Biogenic amine receptors are the best studied class among the GPCRs. The abundance of experimental receptor binding data that emerged from these studies forms the basis for our analyses in which we perform a correlated mutation analysis between ligand affinities and receptor sequences. Many available mutation data enable verification of the results of the analyses.

For many GPCRs residues have been experimentally identified that are involved in agonist binding. In contrast to all other GPCRs, opsins are not activated by an agonist, but by a photon that triggers a cis-trans isomerization in the covalently bound retinal. This retinal is located between the transmembrane helices, roughly at the same position as where the ligands are believed to bind in many other GPCRs. It seems likely that characteristics of the cis-trans isomerization, such as the optimal wavelength of the photon that causes this isomerization, are in part determined by residues in the vicinity of the retinal.

We present a simple method to analyse sequence - function relationships of GPCRs. The method searches for correlated mutations in multiple sequence alignments. We used the method to search for residues that determine the optimal wavelength for retinal activation in opsins, and to analyse which residues are important for agonist or antagonist binding in serotonin receptors and adrenoceptors.

DATA AND METHODS

Sequences

In this article many different GPCR sequences are discussed. We use one numbering scheme for all sequences. Figure 1 shows a multiple sequence alignment for a whole lot of the GPCRs used in this study. In this figure the unified numbering scheme is indicated. This scheme is similar to the one used by Hibert (Hibert et al., 1991), and was endorsed by participants of the second EMBL GPCR workshop.

To enhance the signal to noise ratio in our analyses we used as many sequences as possible. We thus included sequences from many species. In a few cases the sequence differences between very unrelated species (e.g. mammals and insects) were too large, and the study was restricted to mammals.

Sequences were extracted from the Swissprot database (Bairoch), the PIR database (PIR), the translated GenBank (NCI) and from the GCRDb database (Kolakowsky, 1993). The program WHAT IF (Vriend, 1990) was used for database searches, multiple sequence alignments and correlated mutation analyses. This program is available from one of us (GV) for a minimal fee.

Iterative unbiased profile alignment of sequences

We use a profile based alignment method similar as described by Sander and Schneider (1991). All biases that could be introduced by the use of a scoring matrix are removed by making profiles relate directly to the frequency of occurrence of residue types at each position in the protein. Sequences are aligned against this profile, rather than against all other sequences.

This now creates a problem: in order to do the alignment, we need a profile, and in order to get a profile, we need an alignment. To solve this problem a multiple sequence alignment is made on a subset of the sequences that show high pairwise homologies. From this initial (very easy to perform) alignment a profile is created. This profile is now used to align all sequences of interest. The aligned sequences are sorted according to their similarity to the profile, and a new profile is made from the highest scoring sequences. This process is repeated until all sequences are incorporated. Of course in every iteration more sequences are incorporated than in the previous one. Sequences that show too little similarity with the consensus sequence of the profile were not used because there is no guarantee that they belong in the same structural family. After incorporating all sequences the alignment procedure is iterated a few more steps. In practice, this iterative alignment method has shown to normally converge to a satisfactory solution for multiple sequence alignments in less than 5 cycles, provided enough sequences are available (typically 20-30 sequences are enough).

Analysis of correlated mutations

Correlated mutational behaviour can be defined as the tendency of residues to stay conserved or to mutate in tandem between (sets of) sequences. Several techniques have been described to analyse correlated mutations (i.e. Goebel et al., 1993). They found that the correlation coefficient is related to the chance that the residue pair is in contact in the structure. In this method completely conserved residues give rise to a high correlation, indicating their importance for structural integrity. This procedure works well if structure analysis or structure prediction is the primary aim. To determine functional correlations, however, a slightly different approach is required (Casari et al., 1994; Oliveira et al. 1993). Oliveira et al. (1993) used a similar approach as Goebel et al. (1993) to determine the correlation between residue positions in GPCRs, but always require a certain degree of variability at the positions that are analysed for correlating behaviour. Figure 2 shows an example of correlated mutational behaviour.

Figure 2. Example of correlated mutations.

Seq.     5   10            Sequence position
 #       |    |   
  1  AAAASSSSTTTT          Positions 2 and 3 are compared with this one
  2  RRRRPPPPHHHH  66 1.00 All residues correlate perfectly with position 1
  3  TTTTGGGGEEED  63 0.95 One residue is not correlated with position 1
In this hypothetical multiple sequence alignment, 2 residue positions are compared with the first position. All residue pairs are compared between all pairs of sequence positions, leading to 12*(12-1)/2=66 comparisons. If the residues are either conserved or mutated in both sequences, a score of 1 is given. The correlation between a residue pair is defined as the score divided by the maximal score (66). The sequences are given vertically, aligned residues in these sequences horizontally. So, three sequence positions are given in twelve sequences.

Using this method many functional residues in GPCRs were detected (Oliveira et al., 1993), and a hypothesis for the signal transduction pathway could be formulated (Oliveira et al., 1994).

However, rather than analysing if residue positions show pairwise correlated mutational behaviour, residues can also be correlated with sequence-related parameters. Parameters that can be used are, for example, the binding constant for a certain ligand or the sub-class of receptors the sequence belongs to. These parameters can be encoded in a single character called pseudo residue. This pseudo residue can be used in the correlation analysis as if it is a normal residue.

We tried several simple scoring schemes to determine correlations between pseudo residues and real residues. Figure 3 shows the scheme that appeared to be most useful when analysing functional characteristics. This scheme is very similar to the one described in figure 2. The main difference is that residue positions are not compared pairwise, but only with the pseudo residue.

A residue is called uncorrelated if it is different from the majority of residues that correspond with the same pseudo residue. If it is the same as the majority of residues corresponding to another pseudo residue, it is called anti correlated.

Figure 3. Example of scoring correlated mutations.

         5   10 
         |    |   
 -1  111122223333          Pseudo residue
  1  RRRRPPPPHHHH  12 1.00 All residues correlate with the pseudo residue
  2  TTTTGGGGEEED  11 0.92 The D does not correlate
  3  LLLLAAAAYLYY  10 0.84 The L is anti-correlated
In this hypothetical multiple sequence alignment, 3 residue positions are compared with a pseudo residue. If two residues are the same and the corresponding pseudo residues are the same, a score of 1 is given. If a residue is not correlated with the pseudo residue the score is 0 (e.g. residue 2 in sequence 12). Anti correlated residues score -1 (e.g. residue 3 in sequence 10). The correlation between a residue pair is defined as the score divided by the maximal score (12). Residue -1 is the pseudo residue. The sequences are given vertically, aligned residues in these sequences horizontally. So, three sequence positions are given in twelve sequences.

Except for the endogenous agonists, the affinity for chemical compounds is not a natural property of the receptor, and hence is not the result of an evolutionary process. These ligands can interact with residues that are conserved in a number of receptors, but may just as well bind to residues that show high variability. We thus want to find residues that are conserved in the receptors that bind the exogenous ligand, but are absent in the receptors to which the exogenous ligand does not bind. In the receptors with low affinity, any residue is allowed, provided it is different from that in receptors with high affinity. A very simple scoring scheme can be used to find such residues. Receptors that bind the ligand well get pseudo residue "+"; receptors with low affinity get "-". For the receptors with pseudo residue "+", a residue has to be identical to the majority of the "+" coded residues in the same position, in which case a score of 1 is given. If a residue in the "-" coded receptors belongs to this same majority the score is -1. The final score is divided by the maximal achievable score.

Within a given data set the selectivity of a compound may depend on the absence or presence of more than just one residue. For example, binding can be abolished by sterical reasons if an alanine is mutated into a leucine, or for energetic reasons if a hydrogen bond is lost because of a threonine to valine mutation. Therefore, a scoring scheme was designed that reflects a dependence on a combination of residues. In this scheme individual residues that are involved in binding may also be present in "-" coded receptors, but of every pair of positions at least one should not be the same as the majority of the "+" coded residues. Figure 4 shows an hypothetical example explaining the different scoring schemes.

Figure 4. A hypothetical set of ten sequences of ten residues each.

         sequence position      Compound
     1 2 3 4 5 6 7 8 9 10     1 2 3

 1   F I A V H F A A A G     + + -
 2   F I V V H G S A A G     + + +
 3   F V I D H S S G A G     + + +
 4   G I L R H T S G A G     - - -
 5   F I Y I H V T S A F     + + -
 6   F I G L W V A S A F     + - -
 7   F A T C W W A T A F     + - -
 8   G S I C W W S V A L     - - -
 9   F I A S W I S I A L     + - -
10  F I C T W L S L L L     + - -
The affinities of three hypothetical compounds for each receptor are coded with pseudo-residue "+" for high affinity, and "-" for low affinity. Binding of compound 1 is only correlated with the presence of a phenylalanine at position 1. Our simplest scoring scheme detects such cases. Compound 2 correlates with the presence of a phenylalanine at position 1 AND a histidine at position 5. The scoring scheme that is based on a comparison of pairs of residue positions with the pseudo residue will detect such cases. Compound 3 depends on the combined presence of phenylalanine 1, histidine 5 and serine 7. At present we can not yet detect such cases.

RESULTS AND DISCUSSION

Opsin residues determining the wavelength of photons absorbed by retinal

The optimal wavelength of the incoming photon is determined by the precise chemical environment of the retinal, and it can therefore safely be assumed that there are some residues that are important for determining the optimal wavelength of the photon. Recently, it has been shown experimentally that His440 and Lys443 (in the periplasmic loop between helices IV and V) play a role in the light absorption of green and red visual pigments (Wang et al., 1993). These two charged residues form a chloride binding site. From the presence of a disulphide bond between the cysteine in loop IV-V and Cys315 at the external side of helix III (Karnik and Khorana, 1990) it can be inferred that His440 and Lys443 are positioned in the vicinity of Glu318, the residue assumed to be a counterion for the retinylidene Schiff's base (Zhukovsky and Oprian, 1989; Sakmar et al., 1989; 1991). Under physiological conditions, chloride ions that are present at the periplasmic medium saturate this binding site leading to a red shift in the retinal absorption maximum. In the absence of those two residues (especially His440), red or green pigments behave like other short wavelength absorbing rhodopsins (Wang et al., 1993).

To determine which residues influence the optimal wavelength in vertebrate opsins, we looked for all residue positions that display perfect correlated mutational behaviour. The results of this analysis are tabulated in figure 5. Nine residue positions form a network of perfect pairwise correlations. At these positions all blue/violet opsins and the green pigments of goldfish and chicken have the same residue type whereas the other opsins (red/green) systematically have another conserved residue at this position. One of the two residues from the chloride site (Lys443) is part of this network. His440 is not perfectly correlated because sheep rhodopsins have a glutamine instead of a glutamic acid at this position. The sequences used in this study can be obtained from the TM7 file server (Vriend 1994).

Figure 5. Wavelength determining residues in opsins.

             103 126 337 443 444 452 517 531 738
Blue/Violet   Q   G   A   Q   C   T   F   L   K
Red/Green     N   S   S   K   T   S   C   V   R
All residues that form a network of perfectly correlating mutations are shown.

Residues important for ligand specificity in biogenic amine receptors.

Many receptors, such as serotonin-, adrenergic-, muscarinic-, dopamine-, and histamine- receptors, interact with ligands that contain a positively charged nitrogen atom. In figure 6 in the original article several of the endogenous agonists (neurotransmitters) for these receptors are shown. The structure of acetylcholine, which activates muscarinic receptors, is rather different from other aminergic neurotransmitters. The aromatic ring system which is present in most endogenous amine agonists, is replaced by a polar non-aromatic acetyl group in acetylcholine. Furthermore, it contains a quaternary ammonium group with three methyl substituents instead of a primary or secondary amine group as in other aminergic neurotransmitters. This raises the question whether any residues exist that correlate with these structural differences. We compared 32 muscarinic receptor sequences with 144 other biogenic amine receptor sequences. The seven residues that discriminate best between these two classes are shown in figure 7. They are all located in or close to the putative ligand binding pocket for biogenic amine receptors. The residue positions 327 and 330 are just in middle of helix III close to the conserved aspartate residue (Asp 322) that interacts with the ligand's ammonium group. Residue position 231 is near Asp322 in most GPCR models. The conserved mutations in positions 621 and 622, in the middle of helix VI, are of particular interest. At position 621, a highly conserved Phe in serotonin, dopamine and adrenergic receptors, is replaced by an equally conserved Tyr in muscarinic and some histamine receptors. The more polar character of Tyr, and its capability of forming hydrogen bonds, is in good agreement with the more polar character of their endogenous agonists. The importance of residue 621 for muscarinic agonist affinity was confirmed by the mutation Tyr621Phe in M3 muscarinic receptors, which decreases agonist binding (Wess et al., 1992). The highly conserved Phe622 in all other amine receptors is an Asn in all muscarinic receptors. This mutation agrees with the structural differences between the endogenous agonists of these receptors. Probably, Asn622 is capable of forming hydrogen bonds with the polar acetylcholine, whereas Phe622 has an aromatic-aromatic interaction with the other amine neurotransmitters. The importance of Phe622 for agonist affinity was confirmed by mutation studies in the a2-adrenoceptor and the 5-HT2A receptor (Strader et al., 1989; Choudhary et al., 1993). Residue 722 is adjacent to positions claimed to bind acetylcholine in muscarinic receptors (Wess, 1992).

Figure 7. Residues that discriminate between muscarinic and other biogenic amine receptors.

              231 233 327 330 621 622 722
Muscarinic     S   N   N   V   Y   N   C
Other amine    V   P   T   I   F   F   G
              (5I)(1S)    (1H)(1A)
A listing of the 176 sequences that were used in this study can be obtained from the TM7 file server. 32 muscarinic receptors were compared with 144 other amine receptors. Four of these residue positions are not 100% conserved in the non-muscarinic aminergic receptors. The alternative residues and their frequency of occurrence are indicated in brackets.

Figure 9. Affinities for propranolol and pindolol, represented by pseudo residues, and receptors sequences which were used for correlation analysis.

accession      pseudo residues*     receptor type         species
number    pindolol   propranolol
 
 P19327         +      +        SEROTONIN 1A               rat 
 P08908         +      ND       SEROTONIN 1A               human    
 P28564         +      +        SEROTONIN 1B               rat
 P28334         +      ND       SEROTONIN 1B               mouse
 P07700         +      +        BETA-1 ADRENOCEPTOR        turkey
 P18090         +      +        BETA-1 ADRENOCEPTOR        rat  
 P10608         +      +        BETA-2 ADRENOCEPTOR        rat  
 P28222         -      -        SEROTONIN 1B               human
 P28565         -      -        SEROTONIN 1D               rat
 P28221         -      -        SEROTONIN 1D               human
 P28566         -      ND       SEROTONIN 1E               human
 P30939         -      ND       SEROTONIN 1F               human
 P30940         -      ND       SEROTONIN 1F               rat
 Q02284         -      ND       SEROTONIN 1F               mouse
 P14842         -      -        SEROTONIN 2A               rat
 P08909         -      ND       SEROTONIN 2C               rat
 P30966         -      -        SEROTONIN 5A               mouse
 P35364         -      -        SEROTONIN 5A               rat
 P31387         -      -        SEROTONIN 5B               mouse
 P35365         -      -        SEROTONIN 5B               rat
 P31388         -      ND       SEROTONIN 6                mouse
 P32304         ND     -        SEROTONIN 7                mouse
 P32305         ND     -        SEROTONIN 7                rat
 P34969         ND     -        SEROTONIN 7                human
 P20288         ND     -        DOPAMINE 2                 bovine  
 P13953         -      -        DOPAMINE 2                 rat    
 P23944         ND     -        ALPHA-1A   ADRENOCEPTOR    rat
 P15823         ND     -        ALPHA-1B   ADRENOCEPTOR    rat  
 P18130         ND     -        ALPHA-1C   ADRENOCEPTOR    bovine  
 P08913         ND     -        ALPHA-2A   ADRENOCEPTOR    human
 P22909         ND     -        ALPHA-2A   ADRENOCEPTOR    rat  
 P18089         ND     -        ALPHA-2B   ADRENOCEPTOR    human
 P19328         ND     -        ALPHA-2B   ADRENOCEPTOR    rat  
 P18825         ND     -        ALPHA-2C-1 ADRENOCEPTOR    human
 P22086         ND     -        ALPHA-2C   ADRENOCEPTOR    rat
 P35369         ND     -        ALPHA-2C-2 ADRENOCEPTOR    human
 P08482         -      -        ACETYLCHOLINE 1            rat
 P10980         -      -        ACETYLCHOLINE 2            rat
 P08483         -      -        ACETYLCHOLINE 3            rat
 P08485         -      -        ACETYLCHOLINE 4            rat
 P08911         -      -        ACETYLCHOLINE 5            rat
 P31389         ND     -        HISTAMINE 1                guinea pig   
 P35367         ND     -        HISTAMINE 1                human 
 P31390         ND     -        HISTAMINE 1                rat
* Receptor affinity values were taken from Boess and Martin (1993) and Seeman (1993). All receptors with high affinity have pKi>=7.0 and were coded "+". The receptors with low affinity all have pKi<6.0 and were coded "-". ND indicates that no data is available. 27 sequences were used in the pindolol analysis; 36 sequences were used in the propranolol analysis

Figure 10. Results of correlation analyses of propranolol and pindolol affinities for the sequences shown in figure 9 for which binding data is available.

Pindolol
             10        20       
     123456789012345678901234567
     +++++++--------------------
     NNNNNNNTTTTAAAVVLLLLTTYYYYY

Propranolol
             10        20        30     
     123456789012345678901234567890123456
     +++++-------------------------------
     NNNNNTTTVLLLLLLLTTFFFFFFFFFFYYYYYIII
The top lines indicate the sequence numbers in the same order as in figure 9. + and - indicate high and low affinity respectively. The bottom line represents the amino acid at position 719 in the corresponding sequence.

The aryloxypropanolamine binding site

Pindolol and propranolol are b-adrenoceptor antagonists, belonging to the class of the aryloxypropanolamines These antagonists are not selective for this class as they display considerable affinity for rodent 5-HT1B and all 5-HT1A receptors (Boess and Martin, 1993). To identify the residue(s) important for binding compounds of this class, we analysed affinities for pindolol and propranolol. Because these compounds were not always tested on the same receptors, two different sets of receptor-affinity data were used (see figure 9). All receptors with high affinity (all b1 and b2 adrenoceptors, 5-HT1A and rodent 5-HT1B) were given pseudo residue "+", and those having low affinity were given pseudo residue "-". Figure 10 shows that Asn719 displays a 100% correlation with affinity for both compounds despite the fact that two different sets of receptor-affinity were used.

The importance of Asn719 for binding pindolol and propranolol has been confirmed in a number of cases. The mutation Asn719Val in the 5-HT1A receptor, decreases the receptor's affinity for pindolol, but has no effect on the affinity for the natural agonist (serotonin) (Guan et al., 1992). The Phe719Asn mutation in the a2-adrenoceptor increases affinity for these antagonists considerably (Suryanararayana et al., 1991). The Thr719Asn mutation in human 5-HT1B and 5-HT1Db receptors increases their affinities for pindolol and propranolol. In these last two studies, it was shown that the Thr versus Asn difference in position 719 was completely responsible for the receptor-binding differences between rodent 5-HT1B receptors and other 5-HT1B and 5HT1D receptors(Parker et al., 1993; Oksenberg et al., 1992).

Binding site of 5-carboxamidotryptamine in serotonin receptors

Serotonin displays a high affinity for almost all serotonin receptors (for a review see Boess and Martin, 1993). Replacement of the 5-hydroxy group of serotonin with a carboxamido group, yielding 5-carboxamidotryptamine increases the affinity for 5-HT1A,B,D,5-HT5 and 5-HT7 receptors by a factor 5-10 (Boess and Martin, 1993). In contrast, the affinity for 5-HT1E,F, 5-HT2 and 5-HT6 receptors decreases by a factor 10-100. Like serotonin, 5-CT was shown to act as an agonist on 5-HT1A receptors, suggesting a similar binding mode for both compounds on these two receptors.

We performed a sequence analysis, using pseudo residue "+" for 5-HT1A,B,D, 5-HT5 and 5-HT7 receptors, and "-" for 5-HT1E,F, 5-HT2 and 5-HT6 receptors (see figure 11). Figure 12 shows that one single residue, proline 629, is correlated with high affinity for 5-CT. This proline is located two turns above (i.e. closer to the extracellular side) the 'PFF' motif, of which the second Phe (at position 622) has been shown to interact with agonists in the 5-HT2A receptor and the a2-adrenoceptor (Strader et al., 1989; Choudhary et al., 1993). This prolines enables the formation of a hydrogen bond with the unsatisfied backbone C=O of residue 625. Thus, 5-CT may form a hydrogen bond with this backbone C=O, which is located only one helical turn above the putative binding site for agonists. In some models of the complexes of serotonin and 5-HT1A and 5-HT2A receptors, the 5-hydroxy group in serotonin is directed towards the backbone of helix VI (Trumpp-Kallmeyer et al. 1991; Kuipers et al. 1994). These models suggest that a hydrogen bond between the NH2 in 5-CT and the C=O of residue 625 may be formed. Thus, the proline in position 629 may account for the high affinity of 5-CT for Pro629-containing serotonin receptors, which makes this residue an interesting target for future mutagenesis studies.

Figure 11. Affinity values for 5-carboxamidotryptamine (5-CT), represented by a pseudo residue, and the corresponding receptors that were used in the analysis.

 access. pseudo         receptor type  species
 number  residue*
 P08908;   +            SEROTONIN 1A   human
 P28222;   +            SEROTONIN 1B   human
 P28564;   +            SEROTONIN 1B   rat
 P28334;   +            SEROTONIN 1B   mouse
 P11614;   +            SEROTONIN 1D   dog
 P28221;   +            SEROTONIN 1D   human
 P28565;   +            SEROTONIN 1D   rat
 P28566;   -            SEROTONIN 1E   human
 P30939;   -            SEROTONIN 1F   human
 Q02284;   -            SEROTONIN 1F   mouse
 P30940;   -            SEROTONIN 1F   rat
 P28223;   -            SEROTONIN 2A   human
 P14842;   -            SEROTONIN 2A   rat
 P30994;   -            SEROTONIN 2B   rat
 P08909;   -            SEROTONIN 2C   rat
 P30966;   +            SEROTONIN 5A   mouse
 P35364;   +            SEROTONIN 5A   rat
 P31387;   +            SEROTONIN 5B   mouse
 P35365;   +            SEROTONIN 5B   rat
 P31388;   -            SEROTONIN 6    rat
 P34969;   +            SEROTONIN 7    human
 P32304;   +            SEROTONIN 7    mouse
 P32305;   +            SEROTONIN 7    rat
* Affinity values were taken from Boess and Martin (1993). Receptors with values pKi<7 are coded "-", receptors with pKi>7 are coded "+". In addition, "+"-coded receptors displayed affinity 5-CT>5-HT, while receptors coded "-"displayed affinity 5-CT<5-HT.

Figure 12. Correlation analysis of affinity of 5-CT for serotonin receptors shown in figure 11.

             10        20   
     12345678901234567890123
     +++++++--------++++-+++
     PPPPPPPGNNNVVLVPPPPAPPP
The top lines indicate the sequences in the same order as in figure 11. + and - indicate affinity differences as described in figure 11. The bottom line represents the amino acid at position 629 in the corresponding sequence.

Mesulergine binding site

Mesulergine has been classified as a 5-HT2C antagonist, but it also displays affinity for rat 5-HT2A and all reported 5-HT7 receptors (Boess and Martin, 1992). Its affinity for human 5-HT2A, 5-HT1 and 5-HT6 receptors is much lower (Boess and Martin, 1992; Pazos et al., 1985). We compared the sequences of the serotonin receptors of which the affinity for mesulergine was determined. We used pseudo-residue "+" for rat 5-HT2A and all 5-HT7 receptors, and "-" for human 5-HT2A and all 5-HT1 and 5-HT6 receptors (see figure 13). Unlike for pindolol and propranolol, no single amino acid residue showed 100% correlation with affinity for mesulergine. Apparently the affinity of mesulergine in the given data set depends on the presence/absence of more than just one residue. Analysing the correlation between the pseudo residue and two real residues at the same time, we were able to identify combinations of residues that might be involved in mesulergine affinity.

Figure 14 shows that the low affinity of mesulergine for 5-HT1 and 5-HT6 receptors may be explained by the presence of different residues in positions 127, 129, 223, 337 and 524, when compared to 5-HT2 and 5-HT7 receptors. The human 5-HT2A receptor, which has low affinity for mesulergine, differs from other 5-HT2 and 5-HT7 receptors only in that residue 516 is a serine. Receptors with high affinity all contain an alanine at this position. The importance of this residue was confirmed by an Ala516Ser mutation in the rat 5-HT2A receptor, which decreases the affinity for mesulergine as well as a number of other N1-alkyl substituted ergolines and tryptamines (figure 8; Johnson et al., 1993). In contrast, the affinity of the unsubstituted N1-H compounds increases as a result of the Ala516Ser mutation. Apparently, ergolines and tryptamines with a N1-alkyl substituent prefer an alanine at position 516, whereas the free N1-H compounds prefer a serine. These findings suggest a direct interaction between the N1-substituent and residue 516. Similarly, the mutation Ala516Thr in the rat 5HT2A receptor diminishes affinity for N1-alkyl substituted compounds (Johnson et al., 1993). This explains the low affinity of mesulergine for the 5-HT6 receptor which has a threonine at position 516. Figure 14 shows that residues in other positions, like 127, 129 and 223, may also contribute to the low affinity. These residues, as well as residue 524, are interesting candidates for future mutagenesis studies in 5-HT1, 5-HT5 and 5-HT6 receptors.

Figure 13. Affinity values for mesulergine, represented by a pseudo residue, and the corresponding receptor sequences which were used in the correlation analysis.

 access.  pseudo       receptor type      species
 number  residue*
P14842 + SEROTONIN 2A rat P28335 + SEROTONIN 2C human P08909 + SEROTONIN 2C rat P30994 + SEROTONIN 2B rat P32304 + SEROTONIN 7 mouse P32305 + SEROTONIN 7 rat P34969 + SEROTONIN 7 human P28223 - SEROTONIN 2A human P31388 - SEROTONIN 6 rat P28564 - SEROTONIN 1B dog P28222 - SEROTONIN 1B human P28334 - SEROTONIN 1B mouse P28566 - SEROTONIN 1E human Q02284 - SEROTONIN 1F mouse P30939 - SEROTONIN 1F human P30940 - SEROTONIN 1F rat P30966 - SEROTONIN 5A mouse P35364 - SEROTONIN 5A rat P31387 - SEROTONIN 5B mouse P35365 - SEROTONIN 5B rat
* Affinity values were taken from Seeman (1993), and Boess and Martin (1993). Values pKi>7.0 are coded "+", values pKi<6 are coded "-".

Figure 14. Residue positions that show combined correlation with affinity for mesulergine in serotonin receptors.

 residue     sequence number
 pos.                10        20
             12345678901234567890
             +++++++-------------
  127        IIIIIIIIATTTTTTTFFFF
  129        GGGGGGGGASSSLIIIWWWW
  223        AAAAAAAASTTTTTTTSSSS
  337        SSSSSSSSSAAAAAAAAAAA
  516        AAAAAAASTAAAAAAAAAAA
  524        MMMMMMMMILLLIIIIVVVV

8-OH-DPAT, DOI, flesinoxan and chloropromazin.

The compounds 8-OH-DPAT and flesinoxan are highly selective full agonists for the 5-HT1A receptor. Correlation analysis yields not only the residues important for binding, but all residues that distinguish the 5-HT1A receptor from other amine receptors. Of the 27 residues that correlate well with high affinity for 8-OH-DPAT, 15 are in loop regions, not likely to be involved in agonist-binding. The same problem occurred with the 5-HT2-selective agonist DOI, yielding 17 residues which correlate with high affinity, of which 5 were in loop regions. Thus, for these highly selective compounds that only bind to one specific subclass of receptors, the signal from ligand binding is severely obscured by the signal created by other residue positions that distinguish this subclass from other classes and subclasses.

Another problem occurred with the antagonist chloropromazin, which is reported to display affinity for 5-HT2A,2C, Dopamine2,3,4 and a1-adrenoceptors (Boess and Martin, 1993; Seeman, 1993). A perfect correlation could not be observed when all the binding constants were included in one analysis. Apparently, the complete set is too complicated to be analysed with our method. Division of the set into subsets yielded too many hits for each analysis. Possibly, introduction of a scoring scheme that can correlate the pseudo-residue with three or four residues at the same time, could solve this problem. On the other hand, it is doubtful whether the amount of sequences (20-30 per compound) would, for such a scoring scheme, be sufficient for a clear signal. It seems likely that in a number of cases the receptor diversity is too large given the number of available receptor binding data. This problem, however, is restricted to the use of receptor binding data and does not show up upon function classification for which all available sequences can be used.

It should also be kept in mind that the method contains a number of simplifications, that may reduce its capability to extract correlations from the available data. For instance, affinities are coded "+" or "-" without any values in between. Also, only residue identities are used and physico chemical similarities between different amino acid types were not taken into account. Thus a serine to threonine mutation is penalised equally high as a serine to tryptophan mutation. The quality of the data may also cause problems. For instance, antagonists label high and low affinity sites, while agonists only label the high affinity sites. When an antagonist is used as radioligand, a biphasic curve for the displacement with an agonist is obtained, reflecting the affinities for both states. If the number of receptors in the high-affinity state is low, it may be difficult to obtain a clear signal from high affinity agonist binding. As a result, reported agonist affinity values may be too low when determined with an antagonist as radiolabel. Another problem is the fact that many receptor binding tests include mixtures of receptors or receptor subtypes, which makes it difficult to attribute the results of a test to the individual sequences. Of course, one of the major assumptions is the idea that a certain ligand with affinity for a number of receptors, addresses an equivalent binding site on these receptors. Although our study shows that this is often a reasonable assumption, problems may be expected for compounds that do not obey this rule.

Despite these problems the method in its present form provides a tool for unbiased combination of sequence data with experimentally determined ligand affinities. In a number of cases it was a useful tool in the identification of residues involved in ligand binding. Many residues for which the role in ligand binding was experimentally determined are detected and several new suggestions for future mutagenesis experiments were derived.

CONCLUSIONS

We have presented a method to analyse sequence patterns in a multiple sequence family. Pairwise comparisons of sequence positions can be used to search for functionally important residues without any prior knowledge or expectation. Residues can also be compared with properties of the sequences that are coded in so-called pseudo residues. These comparisons can be used to prove or falsify hypotheses, or to search for residues responsible for specific characteristics of the sequences. For several ligand binding studies our analyses led to better understanding of the receptor models. In other cases this correlation analysis has helped us circumventing the structure in the central protein research paradigm: sequence -> structure -> function.

ACKNOWLEDGEMENTS

The authors thank Alfonso Valencia, Ulrike Goebel, Rob Hooft, Mike Singer, Georg Casari and Reinhard Schneider for many helpful discussions. FAPESP and IQCP provided financial support.

REFERENCES

Oliveira, L., Paiva, A.C.M. and Vriend, G. (1993) J.Comp.-Aid.Mol.Des. 7, 649-658. A common motif in G-protein coupled seven transmembrane helix receptors.

Oliveira, L., Paiva, A.C.M., Sander, C. and Vriend, G. (1994) TIPS 15, 170-172. A common step for signal transduction in G protein-couple receptors.

Savarese, T.M., Fraser, C.M. (1992) Biochem. J., 283, 1-19.In vitro mutagenesis and the search for structure-function relationships among G protein-coupled receptors.

Boess, F.G., Martin, I.L. (1993) Neuropharmacology, 33, 275-317. Molecular Biology of 5-HT Receptors.

Guan, X.. Peroutka, S.J. and Kobilka, B.K. (1992) Mol. Pharmacol., 41, 695-698. Identification of a single amino acid residue responsible for the binding of a class of b-adrenergic receptor antagonists to 5-hydroxytryptamine1A receptors.

Suryanararayana, S., Daunt, D.A., Zastrow, M. von and Kobilka, B.K. (1991) J. Biol. Chem., 266, 15488-15492. A point mutation in the seventh hydrophobic domain of the a2 adrenergic receptor increases its affinity for a family of b-receptor antagonists.

Parker, E.M., Grisel, D.A., Iben, L.G., Shapiro, R.A. (1993) J. Neurochem., 60, 380-383. A Single Amino Acid Difference Accounts for the Pharmacological Distinctions Between the Rat and Human 5-Hydroxytryptamine1B Receptors.

Oksenberg, D., Marsters, S.A., O'Dowd, B.F., Jin, H., Havlik, S., Peroutka, S.J., Ashkenazi, A. (1992) Nature, 360, 161-163. A single amino-acid difference confers major pharmacological variation between human and rodent 5-HT1B receptors.

Pazos, A., Hoyer, D., Palacios, J.M. (1985) Eur. J. Pharmacol., 106, 531-538. Mesulergine, a selective serotonin-2 ligand in the rat cortex, does not label these receptors in porcine and human cortex: evidence for species differences in brain serotonin-2-receptors.

Johnson, M.P., Longcharich, R.J., Baez, M., Nelson, D.L. (1993) Mol. Pharmacol., 45, 277-286. Species Variations in Transmembrane Region V of the 5-Hydroxytryptamine Type 2A Receptor Alter the Structure-Activity Relationship of Certain Ergolines and Tryptamines.

Kuipers, W., Wijngaarden, I. van.,IJzerman, A.P. (1994) Drug Design Disc., 11, 231-249. A Model of the Serotonin 5-HT1A Receptor. Agonist and Antagonist Binding Sites.

Hibert, M.F., Trumpp-Kallmeyer, S. Bruinvels, A., Hoflack, J. (1991) Mol. Pharmacol., 40, 8-15. Three-dimensional models of neurotransmitter G-binding protein-coupled receptors.

Trumpp-Kallmeyer, S., Hoflack, J., Bruinvels, A, Hibert, M. (1992) J. Med. Chem., 35, 3448-3462. Modeling of G-protein coupled receptors: Application to dopamine, adrenaline, serotonin, acetylcholin and mammalian opsin receptors

Seeman, P.M.D. (1993) RECEPTOR TABLES, vol.2: Drug Dissociation Constants For Neuroreceptors and Transporters, Printed by SZ Research, Toronto, Canada.

Bowie, J.B., Lüthy, R., Eisenberg, D., Science, 253 (1991) 164-170. A method to identify protein sequences that fold into a known three-dimensional structure.

Casari, G., Sander, C., Valencia, A. (1994) Submitted. A method to predict functional residues in proteins.

Goebel, U., Sander, C., Schneider, R., Valencia, A. (1994) PROTEINS, 18, 309-317. Correlated mutations and residue contacts in proteins.

Scharf, M., Thesis, Univ. Heidelberg (1989).

Henderson, R., Baldwin, J.M., Ceska, T.A., Zemlin, F., Beckmann, E., Downing, K.H., (1990) J.Mol.Biol., 212 899-929. Model of the structure of bacteriorhodopsin based on high-resolution cryo-microscopy.

Overington, J., Donnelly, D., Johnson, M.S., Sali, A., Blundell, T.L., Prot.Sci. (1992) 1, 216-226. Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds.

Eck, R.V., Dayhoff, M.O., In "Atlas of protein sequence and structure" (ed. M.O.Dayhoff), pp 33-41, Washington DC. (Natl. Biomed. Res. Found., Washington, DC) 1969.

A. Bairoch, Dept Biochimie Medicinale, Centre Medicinal Universitaire, 1211 Geneva 4, Switzerland, and SWISS-PROT Protein sequence database, EMBL Data Library D-69117 Heidelberg Germany.

PIR, NBRF, Georgetown University Medical Center, 3900 Reservoir Road N.W. Washinton, DC.

NCI-FCRDC Frederick Biomedical Supercomputing Center Po Box B Frederick, MD 21702 1201.

Kolakowsky, L.F., (1994) Receptors and Channels 2 1-7. GCRDb: A G-protein coupled receptor database.

Wang Z, Asenjo, AB, Oprian, DD. (1993) Biochemistry, 32 2125-2130. Identification of the Cl--Binding Site in the Human Red and Green Color Vision Pigments.

Strader, CO, Sigal IS, Dixon, RAF. (1989) FASEB J., 3: 1825-1832. Structural Basis of Beta-adrenergic Function.

Choudhary, MS, Craigo, S, Roth, BL. (1993) Mol. Pharmacol., 43 755-761. A Single Point Mutation (Phe340-Leu340) of a Conserved Phenylalanine Abolishes 4-(125)Iodo-(2,5)-Dimethoxy)-Phenylisopropylamine and (3H)-Mesulergine but Not

(3H)Ketanserin Binding to 5-Hydroxytryptamine 2 Receptors.

Wess J, Gdula D, Brann MR. (1992) EMBO J., 10 3728-3734. Site-Directed Mutagenesis of the muscarinic receptors: Identification of a Series of Threonine

and Tyrosine Residues Involved in Agonist but Not Antagonist Binding.

Zhukovsky E, Oprian DD. (1989) Science, 246 928-930. Effect of Carboxylic Acid Chains on the Absorptium Maximum of Visual Pigments.

Sakmar TP, Franke RR, Khorana HG. (1989) Proc. Natl. Acad. Sci. USA,

86 8309-8313. Glutamic acid-113 serves as the retinylidene Schiff base counterion.

Sakmar TP, Franke RR, Khorana HG. (1989) Proc. Natl. Acad. Sci. USA,

88 3079-3083. The role of the retinylidene Schiff base counterion in rhodopsin in determining wavelenght absorbance and Schiff base pKa.

Karnik SS, Khorana HG. (1990) J. Biol. Chem., 265 17520-17524. Assembly of Functional Rhodopsin Requires a Disulfide Bond between Cysteine Residues 110 and 187.

Vriend, G. (1990) J.Mol.Graph. 8, 52-56. WHAT IF: A molecular modeling and drug design program.


Button bar
GV 15-jan-1998, FH 23-Jan-2002