2004, PrionDB.
Before explaining CMA, we will explain how you can pragmatically use this information.
Suppose you have some ideas about mutating residues that influence the dimerisation of a GPCR. Unfortunately, it looks as if 22 mutations are all equally likely candidates, but the student has time to make at least six more mutations before graduation time. In that case you should mutate the six residues with the highest CMA score.
You see several coloured residues. In a real CMA snake, you can now click on those and that will get you directly at the corresponding location in the mutiple sequence alignment. So, clicking one after the other on the three red residues you would see:
64 0 Q Q 3.05 0.10 27 90 QQQQQQQQQQQQQQEEEEEQEEQQQEE
65 0 H H 3.11 0.20 27 100 HHHHHHHHHHHHHHHHHHHHHHHHHHH
66 0 K K 3.12 0.20 27 100 KKKKKKKKKKKKKKKKKKKKKKKKKKK
67 0 K K 3.15 0.21 27 100 KKKKKKKKKKKKKKKKKKKKKKKKKKK
68 0 L L 3.12 0.20 27 100 LLLLLLLLLLLLLLLLLLLLLLLLLLL
69 0 R R 3.11 0.17 27 100 RRRRRRRRRRRRRRRRRRRRRRRRRRR
70 0 T T 3.03 0.10 27 93 TTTTTTTTTSTTTTTTTTTTTTQTTTT
136 341 Y Y 8.00 2.00 27 80 YYYYYYYYYYYYYYWWWWWYWWYYYWW 137 0 V V 2.94 0.05 25 70 --VVVVVVIIVVIIMVMVVVMLMIILV 138 0 V V 3.11 0.14 27 100 VVVVVVVVVVVVVVVVVVVVVVVVVVV 139 0 V V 3.04 0.10 27 90 VVIVVVVVVVVVVVVVVVVIVVVIIVV 140 0 C C 3.15 0.21 27 100 CCCCCCCCCCCCCCCCCCCCCCCCCCC 141 0 K K 3.12 0.14 27 100 KKKKKKKKKKKKKKKKKKKKKKKKKKK 142 0 P P 3.05 0.11 27 100 PPPPPPPPPPPPPPPPPPPPPPPPPPP
273 626 F F 8.00 2.00 27 80 FFFFFFFMFFFFFFWWWWWFWWFFFWW 274 627 Y Y 3.06 0.15 27 96 YYYYYYYYYYYYYYYYYYWYYYYYYYY 275 628 I I 3.11 0.11 27 100 IIIIIIIIIIIIIIIIIIIIIIIIIII 276 0 F F 3.07 0.11 27 100 FFFFFFFFFFFFFFFFFFFFFFFFFFF 277 0 T T 2.91 0.07 27 87 TTTTTTTTTSTSTTTTTTTTTTITTST 278 0 H H 2.86 0.05 27 74 HHHHHHHHHNNNHHHHHNNHHHNHHNH 279 0 Q Q 3.02 0.15 27 100 QQQQQQQQQQQQQQQQQQQQQQQQQQQ
The columns mean:
| | | | | | | | |
| | | | | | | | \--> These are the
| | | | | | | | residues at position
| | | | | | | | 279 in the 27 aligned
| | | | | | | | sequences.
| | | | | | | |
| | | | | | | \--> The variability at this
| | | | | | | position in the alignment.
| | | | | | | 100 means totally conserved.
| | | | | | | Below 40, things are doubtful.
| | | | | | |
| | | | | | \--> The number of sequences in the
| | | | | | alignment that has a residue at
| | | | | | this position.
| | | | | |
| | | | | \--> The gap elongation penalty used at this
| | | | | position in the alignment
| | | | |
| | | | \--> The gap open penalty used at this position
| | | | in the alignment
| | | |
| | | \--> This column is of no value (yet)
| | |
| | \--> The consensus sequence at this position in the alignment
| |
| \--> The so called arbitrary sequence number (the same position in
| any GPCR alignment always gets this same number, or in other words
| every Arg in every DRY motif, for example, always get number 340).
|
\--> This is simply the sequential number in the alignment. This number has no
value whatsoever and is only needed if you want to compare the profile
with the alignment.
Things get clearer when we put the three red residues directly underneath each other
123456789012345678901234567890
64 0 Q Q 3.05 0.10 27 90 QQQQQQQQQQQQQQEEEEEQEEQQQEE
136 341 Y Y 8.00 2.00 27 80 YYYYYYYYYYYYYYWWWWWYWWYYYWW
273 626 F F 8.00 2.00 27 80 FFFFFFFMFFFFFFWWWWWFWWFFFWW
Now you see why we call this correlated. Lets forget the M at position 626
in sequence 8 and the entire contribution of sequence 20. You than see that these
three sequence positions are not conserved, but if one position is different
between any two sequences, all three are different, and on top of that, the
changes going from one sequence to the other are always between the same residue
types.
Before you decide on trusting these CMA results, a few critical notes about them.
Sequence-function correlation in G protein-coupled receptors. W. Kuipers, L. Oliveira, A.C.M. Paiva, F. Rippmann, C. Sander, G. Vriend, C.G. Kruese, I. van Wijngaarden, A.P. IJzerman. In: Membrane protein models. Chapter 2. Eds J. Findlay (1995) BIOS Sci. Pub. Ltd.
G protein-coupled receptors (GPCRs) form a large superfamily of proteins that transduce signals across the cell membrane. At the external side they receive a ligand, or, in case of opsins, a photon, and at the cytosolic side they activate a G protein. GPCRs can be divided in three main families: rhodopsin like, glucagon/secretin like and metabotropic like receptors (Oliveira et al. 1993, Kolakowsky 1993). All GPCRs consist of one single protein chain that crosses the membrane seven times, similar to bacteriorhodopsin (Henderson et al. 1990). Most ligands bind between the membrane helices, presumably at a similar position as the retinal in bacteriorhodopsin. Sometimes the periplasmic loops are also involved in ligand recognition. The second and third cytosolic loop and part of the (cytosolic) C-terminal end of the receptors are involved in G protein recognition and binding. Hundreds of GPCRs exist, and they can be activated by a multitude of agonists. Five steps can be observed in the process of GPCR activation:
One of the central paradigms in protein research is that the sequence determines the structure, and the structure the function. Many G protein-coupled receptor sequences have been determined, and much functional data is available. Structural data, however, is not available, and modelling studies have not yet yielded adequately accurate models to allow for the inference of function. It would therefore be useful if in the sequence -> structure -> function pathway the structure could be skipped, or in other words if functional data could be abstracted directly from the sequence without the need for structure determination or modelling studies. Multiple sequence alignments provide a powerful tool for such analyses (Casari et al. 1994).
Ever since Eck and Dayhoff (Eck and Dayhof 1969) published how evolutionary changes in proteins could be described by a residue exchange matrix the question which matrix is the best remained under debate. Eck and Dayhoff determined a matrix that describes the likelihood of mutations during evolution. This matrix can be modified for usage as a scoring matrix in sequence alignment procedures. In this case the matrix is used to determine the likelihood that two residues occur at equivalenced positions in a sequence alignment.
However, at different positions in a protein the same mutations are not equally likely to occur, and one single scoring matrix for all positions in the sequences to be aligned is often not adequate. Overington et al. (1992) extended Dayhof's idea by using multiple matrices; one for residues in a helix, one for residues in a beta strand, etc. Scharf (1988) and Bowie et al. (1991) went one step further and created one exchange matrix for each position in the sequence. These so called structure based profiles can of course only be made if a three-dimensional structure for at least one of the family members is available. Sander and Schneider (1991) described structure based multiple sequence alignments, which are sequence profiles for protein structures that are produced from multiple sequence alignments.
GPCRs
partly reside in an aqueous environment, and are partly embedded
in a lipid membrane. Thus, very different physico chemical constraints
apply to the residues, depending on their spatial location. Consequently,
evolutionary pressure is different for different positions, and
one single Dayhof-type matrix is inadequate for GPCR
sequence alignment. We therefore use an iterative sequence alignment
similar to the one described by Sander and Schneider (Sander
& Schneider, 1991).
We do not use a scoring matrix, but make the profile relate directly
to the frequencies of the residues at each position in the multiple
sequence alignment.
The main aim of most GPCR modelling studies in medicinal research is the design of (specific) agonists or antagonists. For these studies, knowledge of the residues involved in ligand binding is very important. Much indirect evidence points to the idea that the same drug often binds in structurally related receptors in a similar manner. In these cases, a correlation between the presence or absence of certain key receptor residues and the affinity for a specific drug can be expected.
Biogenic amine receptors are the best studied class among the GPCRs. The abundance of experimental receptor binding data that emerged from these studies forms the basis for our analyses in which we perform a correlated mutation analysis between ligand affinities and receptor sequences. Many available mutation data enable verification of the results of the analyses.
For many GPCRs
residues have been experimentally identified that are involved
in agonist binding. In contrast to all other GPCRs,
opsins are not activated by an agonist, but by a photon that triggers
a cis-trans isomerization in the covalently bound retinal. This
retinal is located between the transmembrane helices, roughly
at the same position as where the ligands are believed to bind
in many other GPCRs.
It seems likely that characteristics of the cis-trans isomerization,
such as the optimal wavelength of the photon that causes this
isomerization, are in part determined by residues in the vicinity
of the retinal.
We present a simple method to analyse sequence - function relationships of GPCRs. The method searches for correlated mutations in multiple sequence alignments. We used the method to search for residues that determine the optimal wavelength for retinal activation in opsins, and to analyse which residues are important for agonist or antagonist binding in serotonin receptors and adrenoceptors.
To enhance the signal to noise ratio in our analyses we used as many sequences as possible. We thus included sequences from many species. In a few cases the sequence differences between very unrelated species (e.g. mammals and insects) were too large, and the study was restricted to mammals.
Sequences were extracted from the Swissprot database (Bairoch), the PIR database (PIR), the translated GenBank (NCI) and from the GCRDb database (Kolakowsky, 1993). The program WHAT IF (Vriend, 1990) was used for database searches, multiple sequence alignments and correlated mutation analyses. This program is available from one of us (GV) for a minimal fee.
This now creates a problem: in order to do the alignment, we need a profile, and in order to get a profile, we need an alignment. To solve this problem a multiple sequence alignment is made on a subset of the sequences that show high pairwise homologies. From this initial (very easy to perform) alignment a profile is created. This profile is now used to align all sequences of interest. The aligned sequences are sorted according to their similarity to the profile, and a new profile is made from the highest scoring sequences. This process is repeated until all sequences are incorporated. Of course in every iteration more sequences are incorporated than in the previous one. Sequences that show too little similarity with the consensus sequence of the profile were not used because there is no guarantee that they belong in the same structural family. After incorporating all sequences the alignment procedure is iterated a few more steps. In practice, this iterative alignment method has shown to normally converge to a satisfactory solution for multiple sequence alignments in less than 5 cycles, provided enough sequences are available (typically 20-30 sequences are enough).
Figure 2. Example of correlated mutations.
Seq. 5 10 Sequence position # | | 1 AAAASSSSTTTT Positions 2 and 3 are compared with this one 2 RRRRPPPPHHHH 66 1.00 All residues correlate perfectly with position 1 3 TTTTGGGGEEED 63 0.95 One residue is not correlated with position 1In this hypothetical multiple sequence alignment, 2 residue positions are compared with the first position. All residue pairs are compared between all pairs of sequence positions, leading to 12*(12-1)/2=66 comparisons. If the residues are either conserved or mutated in both sequences, a score of 1 is given. The correlation between a residue pair is defined as the score divided by the maximal score (66). The sequences are given vertically, aligned residues in these sequences horizontally. So, three sequence positions are given in twelve sequences.
Using this method many functional residues in GPCRs
were detected (Oliveira et
al., 1993), and a hypothesis
for the signal transduction pathway could be formulated (Oliveira
et al., 1994).
However, rather than analysing if residue positions show pairwise correlated mutational behaviour, residues can also be correlated with sequence-related parameters. Parameters that can be used are, for example, the binding constant for a certain ligand or the sub-class of receptors the sequence belongs to. These parameters can be encoded in a single character called pseudo residue. This pseudo residue can be used in the correlation analysis as if it is a normal residue.
We tried several simple scoring schemes to determine correlations between pseudo residues and real residues. Figure 3 shows the scheme that appeared to be most useful when analysing functional characteristics. This scheme is very similar to the one described in figure 2. The main difference is that residue positions are not compared pairwise, but only with the pseudo residue.
A residue is called uncorrelated
if it is different from the majority of residues that correspond
with the same pseudo residue. If it is the same as the majority
of residues corresponding to another pseudo residue, it is called
anti correlated.
Figure 3. Example of scoring correlated mutations.
5 10
| |
-1 111122223333 Pseudo residue
1 RRRRPPPPHHHH 12 1.00 All residues correlate with the pseudo residue
2 TTTTGGGGEEED 11 0.92 The D does not correlate
3 LLLLAAAAYLYY 10 0.84 The L is anti-correlated
In this hypothetical multiple sequence alignment, 3
residue positions are compared with a pseudo residue. If two residues
are the same and the corresponding pseudo residues are the same,
a score of 1 is given. If a residue is not correlated with the pseudo residue
the score is 0 (e.g. residue 2 in sequence 12).
Anti correlated residues score -1 (e.g. residue 3 in sequence 10).
The correlation between a residue pair is defined as the score
divided by the maximal score (12).
Residue -1 is the pseudo residue. The sequences are given vertically, aligned
residues in these sequences horizontally. So, three sequence positions
are given in twelve sequences.
Except for the endogenous agonists, the affinity for chemical compounds is not a natural property of the receptor, and hence is not the result of an evolutionary process. These ligands can interact with residues that are conserved in a number of receptors, but may just as well bind to residues that show high variability. We thus want to find residues that are conserved in the receptors that bind the exogenous ligand, but are absent in the receptors to which the exogenous ligand does not bind. In the receptors with low affinity, any residue is allowed, provided it is different from that in receptors with high affinity. A very simple scoring scheme can be used to find such residues. Receptors that bind the ligand well get pseudo residue "+"; receptors with low affinity get "-". For the receptors with pseudo residue "+", a residue has to be identical to the majority of the "+" coded residues in the same position, in which case a score of 1 is given. If a residue in the "-" coded receptors belongs to this same majority the score is -1. The final score is divided by the maximal achievable score.
Within a given data set the selectivity
of a compound may depend on the absence or presence of more than
just one residue. For example, binding can be abolished by sterical
reasons if an alanine is mutated into a leucine, or for energetic
reasons if a hydrogen bond is lost because of a threonine to valine
mutation. Therefore, a scoring scheme was designed that reflects
a dependence on a combination of residues. In this scheme individual
residues that are involved in binding may also be present in "-"
coded receptors, but of every pair of positions at least one should
not be the same as the majority of the "+" coded residues.
Figure 4
shows an hypothetical example explaining the different scoring
schemes.
Figure 4. A hypothetical set of ten sequences of ten residues each.
sequence position Compound
1 2 3 4 5 6 7 8 9 10 1 2 3
1 F I A V H F A A A G + + -
2 F I V V H G S A A G + + +
3 F V I D H S S G A G + + +
4 G I L R H T S G A G - - -
5 F I Y I H V T S A F + + -
6 F I G L W V A S A F + - -
7 F A T C W W A T A F + - -
8 G S I C W W S V A L - - -
9 F I A S W I S I A L + - -
10 F I C T W L S L L L + - -
The affinities of three hypothetical
compounds for each receptor are coded with pseudo-residue "+"
for high affinity, and "-" for low affinity. Binding
of compound 1
is only correlated with the presence of a phenylalanine at position
1.
Our simplest scoring scheme detects such cases. Compound 2
correlates with the presence of a phenylalanine at position 1
AND a histidine at position
5.
The scoring scheme that is based on a comparison of pairs of residue
positions with the pseudo residue will detect such cases. Compound
3
depends on the combined presence of phenylalanine 1,
histidine 5
and serine 7.
At present we can not yet detect such cases.
To determine which residues influence
the optimal wavelength in vertebrate opsins, we looked for all
residue positions that display perfect correlated mutational behaviour.
The results of this analysis are tabulated in figure 5.
Nine residue positions form a network of perfect pairwise correlations.
At these positions all blue/violet opsins and the green pigments
of goldfish and chicken have the same residue type whereas the
other opsins (red/green) systematically have another conserved
residue at this position. One of the two residues from the chloride
site (Lys443)
is part of this network. His440
is not perfectly correlated because sheep rhodopsins have a glutamine
instead of a glutamic acid at this position. The sequences used
in this study can be obtained from the TM7
file server (Vriend 1994).
Figure 5. Wavelength determining residues in opsins.
103 126 337 443 444 452 517 531 738 Blue/Violet Q G A Q C T F L K Red/Green N S S K T S C V RAll residues that form a network of perfectly correlating mutations are shown.
Figure 7. Residues that discriminate between muscarinic and other biogenic amine receptors.
231 233 327 330 621 622 722
Muscarinic S N N V Y N C
Other amine V P T I F F G
(5I)(1S) (1H)(1A)
A listing of the 176
sequences that were used in this study can be obtained from the
TM7 file server. 32 muscarinic receptors were compared with 144
other amine receptors. Four of these residue positions are not
100% conserved in the non-muscarinic aminergic receptors. The alternative
residues and their frequency of occurrence are indicated in brackets.
Figure 9. Affinities for propranolol and pindolol, represented by pseudo residues, and receptors sequences which were used for correlation analysis.
accession pseudo residues* receptor type species number pindolol propranolol P19327 + + SEROTONIN 1A rat P08908 + ND SEROTONIN 1A human P28564 + + SEROTONIN 1B rat P28334 + ND SEROTONIN 1B mouse P07700 + + BETA-1 ADRENOCEPTOR turkey P18090 + + BETA-1 ADRENOCEPTOR rat P10608 + + BETA-2 ADRENOCEPTOR rat P28222 - - SEROTONIN 1B human P28565 - - SEROTONIN 1D rat P28221 - - SEROTONIN 1D human P28566 - ND SEROTONIN 1E human P30939 - ND SEROTONIN 1F human P30940 - ND SEROTONIN 1F rat Q02284 - ND SEROTONIN 1F mouse P14842 - - SEROTONIN 2A rat P08909 - ND SEROTONIN 2C rat P30966 - - SEROTONIN 5A mouse P35364 - - SEROTONIN 5A rat P31387 - - SEROTONIN 5B mouse P35365 - - SEROTONIN 5B rat P31388 - ND SEROTONIN 6 mouse P32304 ND - SEROTONIN 7 mouse P32305 ND - SEROTONIN 7 rat P34969 ND - SEROTONIN 7 human P20288 ND - DOPAMINE 2 bovine P13953 - - DOPAMINE 2 rat P23944 ND - ALPHA-1A ADRENOCEPTOR rat P15823 ND - ALPHA-1B ADRENOCEPTOR rat P18130 ND - ALPHA-1C ADRENOCEPTOR bovine P08913 ND - ALPHA-2A ADRENOCEPTOR human P22909 ND - ALPHA-2A ADRENOCEPTOR rat P18089 ND - ALPHA-2B ADRENOCEPTOR human P19328 ND - ALPHA-2B ADRENOCEPTOR rat P18825 ND - ALPHA-2C-1 ADRENOCEPTOR human P22086 ND - ALPHA-2C ADRENOCEPTOR rat P35369 ND - ALPHA-2C-2 ADRENOCEPTOR human P08482 - - ACETYLCHOLINE 1 rat P10980 - - ACETYLCHOLINE 2 rat P08483 - - ACETYLCHOLINE 3 rat P08485 - - ACETYLCHOLINE 4 rat P08911 - - ACETYLCHOLINE 5 rat P31389 ND - HISTAMINE 1 guinea pig P35367 ND - HISTAMINE 1 human P31390 ND - HISTAMINE 1 rat* Receptor affinity values were taken from Boess and Martin (1993) and Seeman (1993). All receptors with high affinity have pKi>=7.0 and were coded "+". The receptors with low affinity all have pKi<6.0 and were coded "-". ND indicates that no data is available. 27 sequences were used in the pindolol analysis; 36 sequences were used in the propranolol analysis
Figure 10. Results of correlation analyses of propranolol and pindolol affinities for the sequences shown in figure 9 for which binding data is available.
Pindolol
10 20
123456789012345678901234567
+++++++--------------------
NNNNNNNTTTTAAAVVLLLLTTYYYYY
Propranolol
10 20 30
123456789012345678901234567890123456
+++++-------------------------------
NNNNNTTTVLLLLLLLTTFFFFFFFFFFYYYYYIII
The top lines indicate the sequence numbers in the
same order as in figure 9. + and - indicate
high and low affinity respectively. The bottom line represents
the amino acid at position 719 in the corresponding
sequence.
The importance of Asn719 for binding pindolol and propranolol has been confirmed in a number of cases. The mutation Asn719Val in the 5-HT1A receptor, decreases the receptor's affinity for pindolol, but has no effect on the affinity for the natural agonist (serotonin) (Guan et al., 1992). The Phe719Asn mutation in the a2-adrenoceptor increases affinity for these antagonists considerably (Suryanararayana et al., 1991). The Thr719Asn mutation in human 5-HT1B and 5-HT1Db receptors increases their affinities for pindolol and propranolol. In these last two studies, it was shown that the Thr versus Asn difference in position 719 was completely responsible for the receptor-binding differences between rodent 5-HT1B receptors and other 5-HT1B and 5HT1D receptors(Parker et al., 1993; Oksenberg et al., 1992).
We performed a sequence analysis,
using pseudo residue "+" for 5-HT1A,B,D,
5-HT5
and 5-HT7
receptors, and "-" for 5-HT1E,F,
5-HT2
and 5-HT6
receptors (see figure 11).
Figure 12
shows that one single residue, proline 629,
is correlated with high affinity for 5-CT.
This proline is located two turns above (i.e. closer to the extracellular
side) the 'PFF'
motif, of which the second Phe (at position 622)
has been shown to interact with agonists in the 5-HT2A
receptor and the a2-adrenoceptor
(Strader et al., 1989; Choudhary et al., 1993).
This prolines enables the formation of a hydrogen bond with the
unsatisfied backbone C=O of residue 625. Thus, 5-CT
may form a hydrogen bond with this backbone C=O,
which is located only one helical turn above the putative binding
site for agonists. In some models of the complexes of serotonin
and 5-HT1A and 5-HT2A receptors, the 5-hydroxy
group in serotonin is directed towards the backbone of helix VI
(Trumpp-Kallmeyer et al.
1991; Kuipers et al. 1994).
These models suggest that a hydrogen bond between the NH2
in 5-CT and the C=O of residue 625 may be formed. Thus, the proline in position 629
may account for the high affinity of 5-CT for Pro629-containing
serotonin receptors, which makes this residue an interesting target
for future mutagenesis studies.
Figure 11. Affinity values for 5-carboxamidotryptamine (5-CT), represented by a pseudo residue, and the corresponding receptors that were used in the analysis.
access. pseudo receptor type species number residue* P08908; + SEROTONIN 1A human P28222; + SEROTONIN 1B human P28564; + SEROTONIN 1B rat P28334; + SEROTONIN 1B mouse P11614; + SEROTONIN 1D dog P28221; + SEROTONIN 1D human P28565; + SEROTONIN 1D rat P28566; - SEROTONIN 1E human P30939; - SEROTONIN 1F human Q02284; - SEROTONIN 1F mouse P30940; - SEROTONIN 1F rat P28223; - SEROTONIN 2A human P14842; - SEROTONIN 2A rat P30994; - SEROTONIN 2B rat P08909; - SEROTONIN 2C rat P30966; + SEROTONIN 5A mouse P35364; + SEROTONIN 5A rat P31387; + SEROTONIN 5B mouse P35365; + SEROTONIN 5B rat P31388; - SEROTONIN 6 rat P34969; + SEROTONIN 7 human P32304; + SEROTONIN 7 mouse P32305; + SEROTONIN 7 rat* Affinity values were taken from Boess and Martin (1993). Receptors with values pKi<7 are coded "-", receptors with pKi>7 are coded "+". In addition, "+"-coded receptors displayed affinity 5-CT>5-HT, while receptors coded "-"displayed affinity 5-CT<5-HT.
Figure 12. Correlation analysis of affinity of 5-CT for serotonin receptors shown in figure 11.
10 20
12345678901234567890123
+++++++--------++++-+++
PPPPPPPGNNNVVLVPPPPAPPP
The top lines indicate the sequences in the same order as in figure
11. + and - indicate affinity differences as described in figure 11. The bottom line
represents the amino acid at position 629 in the corresponding sequence.
Figure 14 shows that the low affinity of mesulergine for 5-HT1
and 5-HT6 receptors may be explained by the presence of different residues
in positions 127, 129, 223, 337 and 524,
when compared to 5-HT2 and 5-HT7 receptors. The human 5-HT2A
receptor, which has low affinity for mesulergine, differs from
other 5-HT2 and 5-HT7 receptors only in that residue 516
is a serine. Receptors with high affinity all contain an alanine
at this position. The importance of this residue was confirmed
by an Ala516Ser mutation in the rat 5-HT2A
receptor, which decreases the affinity for mesulergine as well
as a number of other N1-alkyl
substituted ergolines and tryptamines (figure
8; Johnson et al., 1993).
In contrast, the affinity of the unsubstituted N1-H
compounds increases as a result of the Ala516Ser
mutation. Apparently, ergolines and tryptamines with a N1-alkyl
substituent prefer an alanine at position 516,
whereas the free N1-H
compounds prefer a serine. These findings suggest a direct interaction
between the N1-substituent
and residue 516.
Similarly, the mutation Ala516Thr
in the rat 5HT2A
receptor diminishes affinity for N1-alkyl
substituted compounds (Johnson
et al., 1993). This
explains the low affinity of mesulergine for the 5-HT6
receptor which has a threonine at position 516.
Figure 14 shows that residues in other positions, like 127,
129 and 223, may also contribute to the low affinity. These residues, as well
as residue 524, are interesting candidates for future mutagenesis studies in 5-HT1,
5-HT5 and 5-HT6 receptors.
Figure 13. Affinity values for mesulergine, represented by a pseudo residue, and the corresponding receptor sequences which were used in the correlation analysis.
access. pseudo receptor type species number residue** Affinity values were taken from Seeman (1993), and Boess and Martin (1993). Values pKi>7.0 are coded "+", values pKi<6 are coded "-".
P14842 + SEROTONIN 2A rat P28335 + SEROTONIN 2C human P08909 + SEROTONIN 2C rat P30994 + SEROTONIN 2B rat P32304 + SEROTONIN 7 mouse P32305 + SEROTONIN 7 rat P34969 + SEROTONIN 7 human P28223 - SEROTONIN 2A human P31388 - SEROTONIN 6 rat P28564 - SEROTONIN 1B dog P28222 - SEROTONIN 1B human P28334 - SEROTONIN 1B mouse P28566 - SEROTONIN 1E human Q02284 - SEROTONIN 1F mouse P30939 - SEROTONIN 1F human P30940 - SEROTONIN 1F rat P30966 - SEROTONIN 5A mouse P35364 - SEROTONIN 5A rat P31387 - SEROTONIN 5B mouse P35365 - SEROTONIN 5B rat
Figure 14. Residue positions that show combined correlation with affinity for mesulergine in serotonin receptors.
residue sequence number
pos. 10 20
12345678901234567890
+++++++-------------
127 IIIIIIIIATTTTTTTFFFF
129 GGGGGGGGASSSLIIIWWWW
223 AAAAAAAASTTTTTTTSSSS
337 SSSSSSSSSAAAAAAAAAAA
516 AAAAAAASTAAAAAAAAAAA
524 MMMMMMMMILLLIIIIVVVV
Another problem occurred with the antagonist chloropromazin, which is reported to display affinity for 5-HT2A,2C, Dopamine2,3,4 and a1-adrenoceptors (Boess and Martin, 1993; Seeman, 1993). A perfect correlation could not be observed when all the binding constants were included in one analysis. Apparently, the complete set is too complicated to be analysed with our method. Division of the set into subsets yielded too many hits for each analysis. Possibly, introduction of a scoring scheme that can correlate the pseudo-residue with three or four residues at the same time, could solve this problem. On the other hand, it is doubtful whether the amount of sequences (20-30 per compound) would, for such a scoring scheme, be sufficient for a clear signal. It seems likely that in a number of cases the receptor diversity is too large given the number of available receptor binding data. This problem, however, is restricted to the use of receptor binding data and does not show up upon function classification for which all available sequences can be used.
It should also be kept in mind that the method contains a number of simplifications, that may reduce its capability to extract correlations from the available data. For instance, affinities are coded "+" or "-" without any values in between. Also, only residue identities are used and physico chemical similarities between different amino acid types were not taken into account. Thus a serine to threonine mutation is penalised equally high as a serine to tryptophan mutation. The quality of the data may also cause problems. For instance, antagonists label high and low affinity sites, while agonists only label the high affinity sites. When an antagonist is used as radioligand, a biphasic curve for the displacement with an agonist is obtained, reflecting the affinities for both states. If the number of receptors in the high-affinity state is low, it may be difficult to obtain a clear signal from high affinity agonist binding. As a result, reported agonist affinity values may be too low when determined with an antagonist as radiolabel. Another problem is the fact that many receptor binding tests include mixtures of receptors or receptor subtypes, which makes it difficult to attribute the results of a test to the individual sequences. Of course, one of the major assumptions is the idea that a certain ligand with affinity for a number of receptors, addresses an equivalent binding site on these receptors. Although our study shows that this is often a reasonable assumption, problems may be expected for compounds that do not obey this rule.
Despite these problems the method in its present form provides a tool for unbiased combination of sequence data with experimentally determined ligand affinities. In a number of cases it was a useful tool in the identification of residues involved in ligand binding. Many residues for which the role in ligand binding was experimentally determined are detected and several new suggestions for future mutagenesis experiments were derived.
Oliveira, L., Paiva, A.C.M. and Vriend, G.
(1993) J.Comp.-Aid.Mol.Des. 7, 649-658. A common
motif in G-protein coupled seven transmembrane helix receptors.
Oliveira, L., Paiva, A.C.M., Sander, C. and Vriend,
G. (1994) TIPS 15, 170-172.
A common step for signal transduction in G protein-couple receptors.
Savarese, T.M., Fraser, C.M.
(1992) Biochem. J., 283, 1-19.In vitro mutagenesis
and the search for structure-function relationships among G protein-coupled
receptors.
Boess, F.G., Martin, I.L.
(1993) Neuropharmacology, 33, 275-317. Molecular
Biology of 5-HT Receptors.
Guan, X.. Peroutka, S.J. and Kobilka, B.K.
(1992) Mol. Pharmacol., 41, 695-698. Identification
of a single amino acid residue responsible for the binding of
a class of b-adrenergic
receptor antagonists to 5-hydroxytryptamine1A receptors.
Suryanararayana, S., Daunt, D.A., Zastrow, M.
von and Kobilka, B.K. (1991) J. Biol.
Chem., 266, 15488-15492. A point mutation in the seventh
hydrophobic domain of the a2
adrenergic receptor increases its affinity for a family of b-receptor
antagonists.
Parker, E.M., Grisel, D.A., Iben, L.G., Shapiro,
R.A. (1993) J. Neurochem., 60,
380-383. A Single Amino Acid Difference Accounts for the Pharmacological
Distinctions Between the Rat and Human 5-Hydroxytryptamine1B Receptors.
Oksenberg, D., Marsters, S.A., O'Dowd, B.F., Jin,
H., Havlik, S., Peroutka, S.J., Ashkenazi, A.
(1992) Nature, 360, 161-163. A single amino-acid
difference confers major pharmacological variation between human
and rodent 5-HT1B receptors.
Pazos, A., Hoyer, D., Palacios, J.M. (1985)
Eur. J. Pharmacol., 106, 531-538. Mesulergine, a
selective serotonin-2 ligand in the rat cortex, does not label
these receptors in porcine and human cortex: evidence for species
differences in brain serotonin-2-receptors.
Johnson, M.P., Longcharich, R.J., Baez, M., Nelson,
D.L. (1993) Mol. Pharmacol., 45,
277-286. Species Variations in Transmembrane Region V of the 5-Hydroxytryptamine
Type 2A Receptor Alter the Structure-Activity Relationship of
Certain Ergolines and Tryptamines.
Kuipers, W., Wijngaarden, I. van.,IJzerman, A.P.
(1994) Drug Design Disc., 11,
231-249. A Model of the Serotonin 5-HT1A Receptor. Agonist and
Antagonist Binding Sites.
Hibert, M.F., Trumpp-Kallmeyer, S. Bruinvels,
A., Hoflack, J. (1991) Mol. Pharmacol.,
40, 8-15. Three-dimensional models of neurotransmitter
G-binding protein-coupled receptors.
Trumpp-Kallmeyer, S., Hoflack, J., Bruinvels,
A, Hibert, M. (1992) J. Med. Chem.,
35, 3448-3462. Modeling of G-protein coupled receptors:
Application to dopamine, adrenaline, serotonin, acetylcholin and
mammalian opsin receptors
Seeman, P.M.D. (1993)
RECEPTOR TABLES, vol.2: Drug Dissociation Constants For
Neuroreceptors and Transporters, Printed by SZ Research, Toronto,
Canada.
Bowie, J.B., Lüthy, R., Eisenberg, D.,
Science, 253 (1991) 164-170. A method to identify
protein sequences that fold into a known three-dimensional structure.
Casari, G., Sander, C., Valencia, A.
(1994) Submitted. A method to predict functional residues in proteins.
Goebel, U., Sander, C., Schneider, R., Valencia,
A. (1994) PROTEINS, 18,
309-317. Correlated mutations and residue contacts in proteins.
Scharf, M., Thesis, Univ.
Heidelberg (1989).
Henderson, R., Baldwin, J.M., Ceska, T.A., Zemlin,
F., Beckmann, E., Downing, K.H., (1990)
J.Mol.Biol., 212 899-929. Model of the structure
of bacteriorhodopsin based on high-resolution cryo-microscopy.
Overington, J., Donnelly, D., Johnson, M.S., Sali,
A., Blundell, T.L., Prot.Sci.
(1992) 1, 216-226. Environment-specific amino acid substitution
tables: Tertiary templates and prediction of protein folds.
Eck, R.V., Dayhoff, M.O., In "Atlas of protein sequence and structure" (ed. M.O.Dayhoff), pp 33-41, Washington DC. (Natl. Biomed. Res. Found., Washington, DC) 1969.
A. Bairoch,
Dept Biochimie Medicinale, Centre Medicinal Universitaire, 1211
Geneva 4, Switzerland, and SWISS-PROT Protein sequence database,
EMBL Data Library D-69117 Heidelberg Germany.
PIR,
NBRF, Georgetown University Medical Center, 3900 Reservoir Road
N.W. Washinton, DC.
NCI-FCRDC
Frederick Biomedical Supercomputing Center Po Box B Frederick,
MD 21702 1201.
Kolakowsky, L.F.,
(1994) Receptors and Channels 2 1-7. GCRDb: A G-protein
coupled receptor database.
Wang Z, Asenjo, AB, Oprian, DD.
(1993) Biochemistry, 32 2125-2130. Identification
of the Cl--Binding Site in the Human Red and Green Color Vision
Pigments.
Strader, CO, Sigal IS, Dixon, RAF.
(1989) FASEB J., 3: 1825-1832. Structural Basis
of Beta-adrenergic Function.
Choudhary, MS, Craigo, S, Roth, BL. (1993) Mol. Pharmacol., 43 755-761. A Single Point Mutation (Phe340-Leu340) of a Conserved Phenylalanine Abolishes 4-(125)Iodo-(2,5)-Dimethoxy)-Phenylisopropylamine and (3H)-Mesulergine but Not
(3H)Ketanserin Binding to 5-Hydroxytryptamine 2 Receptors.
Wess J, Gdula D, Brann MR. (1992) EMBO J., 10 3728-3734. Site-Directed Mutagenesis of the muscarinic receptors: Identification of a Series of Threonine
and Tyrosine Residues Involved in Agonist but Not
Antagonist Binding.
Zhukovsky E, Oprian DD.
(1989) Science, 246 928-930. Effect of Carboxylic
Acid Chains on the Absorptium Maximum of Visual Pigments.
Sakmar TP, Franke RR, Khorana HG. (1989) Proc. Natl. Acad. Sci. USA,
86 8309-8313. Glutamic
acid-113 serves as the retinylidene Schiff base counterion.
Sakmar TP, Franke RR, Khorana HG. (1989) Proc. Natl. Acad. Sci. USA,
88 3079-3083. The role
of the retinylidene Schiff base counterion in rhodopsin in determining
wavelenght absorbance and Schiff base pKa.
Karnik SS, Khorana HG.
(1990) J. Biol. Chem., 265 17520-17524. Assembly
of Functional Rhodopsin Requires a Disulfide Bond between Cysteine
Residues 110 and 187.
Vriend, G. (1990) J.Mol.Graph. 8, 52-56. WHAT IF: A molecular modeling and drug design program.