Upon activation, the epidermal growth factor receptor (EGFR) phosphorylates tyrosine residues in its cytoplasmic tail, which triggers the binding of Src Homology 2 (SH2) and Phosphotyrosine Binding (PTB) domains and initiates downstream signaling. The sequences flanking the tyrosine residues (referred to as "phosphosites") must be compatible with phosphorylation by the EGFR kinase domain and the recruitment of adapter proteins, while minimizing phosphorylation that would reduce the fidelity of signal transmission. To understand how phosphosite sequences encode these functions within a small set of residues, we carried out high-throughput mutational analysis of three phosphosite sequences in the EGFR tail. We used bacterial surface display of peptides, coupled with deep sequencing, to monitor phosphorylation efficiency and the binding of the SH2 and PTB domains of the adapter proteins Grb2 and Shc1, respectively. We found that the sequences of phosphosites in the EGFR tail are restricted to a subset of the range of sequences that can be phosphorylated efficiently by EGFR. Although efficient phosphorylation by EGFR can occur with either acidic or large hydrophobic residues at the −1 position with respect to the tyrosine, hydrophobic residues are generally excluded from this position in tail sequences. The mutational data suggest that this restriction results in weaker binding to adapter proteins, but also disfavors phosphorylation by the cytoplasmic tyrosine kinases c-Src and c-Abl. Our results show how EGFR-family phosphosites achieve a trade-off between minimizing off-pathway phosphorylation while maintaining the ability to recruit the diverse complement of effectors required for downstream pathway activation.
(Click on the small image to get a higher-resolution version.)
Figure 1 - Overview of EGFR signal transduction at the membrane and a bacterial surface display scheme to analyze specificity of tyrosine kinases and phosphotyrosine-binding proteins.
A. Illustration of membrane-proximal EGFR signaling components.
Autophosphorylation of the tyrosine phosphosites in the C-terminal cytoplasmic tail (red circles)
by the activated kinase domain produces binding sites for many downstream effectors, a subset of
which are depicted. These effectors go on to activate second-messenger pathways, also depicted. Grb2, growth factor receptor-bound protein 2; MAPK, mitogen activated protein kinases; PI3K, phosphoinositide 3-kinase regulatory subunit; PKC, protein kinase C; Plcγ1, phospholipase C-gamma-1; Shc1, Src homology 2 domain-containing-transforming protein C1.
Figure 1 - Continued.
B. Workflow for determining phosphosite specificity profiles of tyrosine kinases and phosphotyrosine-binding proteins by bacterial surface-display coupled with FACS and deep sequencing. Phosphotyrosine on the surface of the cells is detected either by immunostaining with an anti-phosphotyrosine antibody or, for binding profiles, with a tandem SH2– or PTB–GFP construct. The frequency of each peptide-coding sequence in the highly phosphorylated population, or enrichment, and thus the relative efficiency of phosphorylation or binding for each peptide, is determined by counting the number of sequencing reads for each peptide in the sorted and unsorted populations..
Figure 2 - Comparison of intrinsic EGFR and c-Src substrate specificity with EGFR-family phosphosite sequences.
A.Histogram of peptide read frequency ratios from EGFR phosphorylation of a library of human phosphosites obtained by bacterial surface-display and deep sequencing. The distribution of ratios of read frequencies for input and sorted samples are plotted from two replicate experiments.
B. Read-frequency ratios for two replicate Human-pTyr library phosphorylation experiments plotted against each other. Peptides with ratios above the 75th percentile in both replicates (gray box) were counted as highly phosphorylated in C.
C. Phosphorylation probability logo (phospho-pLogo) of highly phosphorylated peptides for EGFR in the bacterial surface display experiment. The height of each letter corresponds to the negative log-odds ratio of binomial probabilities of finding a given amino acid residue at a particular sequence position at higher versus lower frequencies than the expected positional frequency for all peptides in the library.
Figure 2 - Continued
Higher values indicate an enrichment of a residue versus the background distribution. Red lines indicate the log-odds ratio values for a significance level of 0.05, as defined in ref. 44.
D. Sequence probability logo (sequence-pLogo) of EGFR-family C-terminal tail tyrosines. Sequence segments surrounding tyrosine were extracted from the regions C-terminal to the kinase domain for metazoan EGFR-family protein sequences. The positional amino acid frequency in these segments was compared to the frequency in metazoan intracellular and transmembrane proteins and was plotted as a pLogo.
E. F. Phospho-pLogo of highly phosphorylated sequences from c-Src phosphorylation E and c-Abl phosphorylation F of the Human -pY library (raw data are from ref. 32). Sequences above the 75th percentile in three replicates are included in the highly-phosphorylated set.
Figure 3 - Effect of single amino acid substitutions on phosphorylation of three EGFR phosphosite peptides by EGFR.
A. Sequences of three human EGFR C-terminal tail phosphosites.
Figure 3 - Continued
B. Heat maps showing the effect of all single amino acid substitutions (except tyrosine and cysteine) on the phosphorylation level of three EGFR phosphosite peptides relative to the wild-type peptide upon phosphorylation by EGFR, measured by bacterial surface display and deep sequencing. Squares for each substitution x of each wild-type position i are colored as log-two fold-enrichment relative to wild-type (ΔExi), calculated from read frequency ratios of sorted and input samples. Wild-type residue squares (ΔEwti = 0 by definition) are indicated by gray squares. The ΔE scales for each peptide, displayed in the top right corner of each heat map, are not directly comparable because different optimized cell sorting parameters were used for each peptide. Red and blue colors indicate variants that were phosphorylated more or less, respectively, than the wild-type sequence. Row and column mean ΔE values are displayed separately. Data are the variantwise mean of at least two replicates.
C. Enrichment values for the −2 column (ΔEx-2) for each peptide. Error bars indicate the SEM.
Figure 4 - Comparison of c-Src and EGFR specificity with respect to EGFR substrates.
A. Relative phosphorylation of EGFR-family phosphosites and reported cytoplasmic EGFR substrates by c-Src versus EGFR. Log-two fold-enrichment values relative to a non-tyrosine containing control peptide were calculated from peptide read frequencies in sorted and input samples. These enrichment values (denoted ΔE*) were corrected by the relative expression level measured for each peptide by cell sorting and deep sequencing. ΔE* values are not comparable on the same scale between kinases. The mean of three replicates with 95% confidence intervals for each kinase is plotted.
B. Venn diagram showing membership of peptides in the top quartile of ΔE* values for each kinase.
Figure 4 - Continued.
C. Specific activities measured for c-Src and EGFR by NADHcoupledassay against selected peptides at 0.5 mM peptide. Three EGFR C-terminal tail phosphosites and one c-Src substrate, noted below each set of bars, were measured. Error bars, 95% confidence interval of the mean.
D. Heat map showing the effect single amino acid substitutions on the phosphorylation level of the EGFR Tyr 1086 phosphosite relative to wild-type upon phosphorylation by c-Src, measured by bacterial surface display and deep sequencing. ΔExi is displayed as a heat map as described in Figure 2.
Figure 5 - Effect of single amino acid substitutions on the binding of the Grb2 SH2 domain and Shc1 PTB domain to two EGFR phosphosites.
Log-twofold-changes in read frequency ratios relative to wild-type (ΔExi) were determined by cell sorting after labeling phosphorylated bacteria displaying peptides with tandem copies of the Shc1 PTB domain A or the Grb2 SH2 domain B fused to GFP. ΔExi values for single amino acid substitutions are displayed as heat maps, as described in Figure 2.
(Click on the small image to get a higher-resolution version.)
Supplementary Figure 1 - Kinase activity of a dimerized EGFR kinase measured with enzyme-coupled assays and bacterial surface-display coupled with deep sequencing.
A. The two measurements of enzymatic activity of EGFR for 21-residue peptides corresponding to the indicated EGFR family tail phosphosites were compared. The EGFR protein used in both methods consisted of an equimolar mixture of N-terminal FKBP and FRB fusions of human EGFR residues 663–1186, in the presence of excess rapamycin. The activity on the x-axis was measured with a continuous, homogeneous assay wherein the generation of ADP upon phosphorylation of a purified peptide is enzymatically coupled to the oxidation of reduced β-nicotinamide adenine dinucleotide (NADH), with a corresponding decrease in absorbance of NADH. Peptides were present at 0.5 mM, below expected KM and EGFR dimers were present at 0.2 μM. Activity reported on the y-axis was measured with the bacterial surface-display and deep sequencing assay. For this experiment, peptides were displayed on the surface of E. coli as part of a larger library were subjected to phosphorylation by dimerized EGFR at 0.1 μM dimer for 15 minutes at room temperature, to produce a phosphorylation level of ~1/3 the maximum
Supplementary Figure 1 - Continued
obtained by long incubation in the presence of high concentration of kinases. The highly phosphorylated cells were collected by fluorescence activated cell sorting, and the peptide coding portion of the surface-display gene of these cells was sequenced, along with that of the input population. The read frequencies were normalized and plotted as a log-fold-change relative to a negative control peptide containing no Tyr residue and corrected for the separately measured surface-display level. Error bars indicate standard error of the mean from three replicates in each experimental method.
B. Effect of forced dimerization on EGFR intracellular module kinase activity. The generic Tyr kinase substrate
poly(Glu4Tyr)n at 1 mg/ml was subjected to phosphorylation by 50 nM FKBP– and FRB–EGFR dimers in the presence and absence of 1 μM rapamycin. Phosphorylation at various time points was detected with by ADP production enzymatically coupled to production of resorufin. The slope of fluorescence change over time for the linear reaction progress curve is plotted with standard error of the mean, and the fold-increase in rate with the addition of rapamycin is noted.
Supplementary Figure 2 - Sequence content of high-efficiency peptide substrates of EGFR fromthe human proteome, including peptides with more than one Tyr residue.
A phospho-pLogo of peptide sequences in the top quartile of read frequency ratios for EGFR, according to the bacterial surface-display/deep sequencing experiment with the Human-pTyr library. This pLogo was generated from the same raw dataset as Figure 2C, but including peptides with greater than one Tyr residue in the analysis. Tyr residues appear at multiple positions, but it is not known whether this is a result of multiple Tyr residues becoming phosphorylated and detected by the antibody during the experiment, or due to an improvement of catalytic efficiency for the central Tyr residue when other Tyr residues are present in the peptide.
Supplementary Figure 3 - Comparison of relative enrichment differences between variants due to phosphorylation and surface-display level for the Tyr 1114 phosphosite peptide library.
The contribution of expression level differences between variants in the calculated enrichment due to phosphorylation by EGFR, ΔE, was estimated by measuring the relative surface-display level of each variant in the Tyr 1114 library. Cells displaying the Tyr 1114 library were labeled with a fluorescent anti-Strep tag antibody targeting the surface-display scaffold. These cells were sorted by fluorescence activated cell sorting into six bins spanning the distribution of fluorescence values. The abundance of each peptide in each bin relative to the wild-type peptide was inferred from read frequencies, as measured by Illumina sequencing in the same manner used for phosphorylation level determination. The log-fold differences in expression level relative to the wild-type peptide in the library, Cxi (panel B), were plotted on the same scale as the log-fold differences in phosphorylation level for the Tyr 1114 library (panel A, reproduced from Figure 3B). White squares indicate minimal differences in expression level for a variant relative to the wild-typepeptide, and thus indicate minimal contribution to the phosphorylation enrichment value for that variant, ΔExi.
Supplementary Figure 4 - Kinase activity of EGFR against EGFR Tyr 992 phosphosite peptide mutants.
Relative kinase activity was measured with an NADH-coupled enzyme assay with purified EGFR intracellular module and purified 21-mer peptides corresponding to the EGFR Tyr 992 phosphosite, with and without the noted substitutions to the wild-type sequence. Steady-state rates were measured in triplicate and plotted as the negative slope of the linear portion of enzyme progress curve of absorbance at 340 nm over time. Error bars, 95% confidence interval.
Supplementary Figure 5 - Backbone conformation of the −1 and −2 residues of the EGFR Tyr 1114 phosphosite peptide during molecular dynamics simulations.
A.Ramachandran diagrams showing the backbone dihedral angles of the −1 Glu (left panel) and −2 Pro (right panel) residues during a representative molecular dynamics trajectory. Each point represents a frame from the 200 ns trajectory, sampled every 10 picoseconds. The points are colored based on agglomerative clustering performed on four angles, the φ and ψ angles of the −1 and −2 residues.
B. ψ angles of the −1 and −2 residues over the time course of the simulation, sampled every 10 picoseconds and colored as in A. ψ angles between approximately 110º and 180º are considered to represent the β conformation, and angles between approximately −50º and 50º are considered to represent the α conformation.
C. Fractional occupancy of the five clusters of −1 and −2 dihedral angles generated for trajectory frames sampled every 10 picoseconds. “Lα” indicates the left-handed α-helical region of the Ramachandran diagram.
Supplementary Figure 6 - Structural explanation for alternative sequence preferences of EGFR at the −2 position.
A. Selected snapshots from molecular dynamics simulations of an EGFR Tyr 1114 peptide docked onto a peptide-bound crystal structure of the EGFR kinase domain (PDB 2GS6). Two snapshots are shown, with the −1 and −2 residues of the peptide in either the β conformation A or α conformation B. Interactions between the −1 and −2 peptide residues and selected residues on the kinase domain are highlighted. Diagrams illustrating the different interactions available between kinase domain residues and a substrate peptide depending on the orientation of the −1 and −2 residues are shown below each zoomed-in view of the active site. A peptide with a −2 Pro and a β conformation of the −1 residue, is diagramed in A, while a peptide with a −2 glutamic acid and an α conformation of the −1 residue is diagramed in B.
Supplementary Figure 7 - Comparison of relative enrichments for EGFR (panel A) and c-Src (panel B) against the Tyr 1086 phosphosite peptide library.
These data are reproduced from main text Figs. 3B and 6D, respectively..
Supplementary Figure 8 - Alignment of EGFR Tyr 1086 phosphosite sequences in the clade after the split between EGFR and Her2.
The sequences were aligned with the mafft global homology algorithm and are labeled with common name, NCBI taxid, ENSEMBL translation accession number, and residue boundaries (including signal sequences). The phylogenetic relationship for the corresponding full-length EGFR sequences, taken from the EggNOG database, is shown on the left. A His residue at the +1 position of the Tyr 1086 phosphosite is a conserved feature of mammalian EGFR sequences. No phosphosites contain a −1 acidic or +1 hydrophobic residue.
Supplementary Figure 9 - Flow cytometry histogram of fully-phosphorylated bacteria displaying mutagenesis libraries.
A sample of the bacteria that served as an input to the surface display binding experiments presented in Figure 5 were stained with an anti-phosphotyrosine antibody (4G10) and analyzed by flow cytometry. Site-saturation mutagenesis libraries corresponding to the EGFR Tyr 1086 and 1114 phosphosites were either treated with a mixture of EGFR, c-Src, and c-Abl kinases for 1 hour at room temperature (“kinase treated”) or incubated in the absence of kinases (“untreated”). The single, narrow main peaks in the histogram indicate the libraries wer uniformly phosphorylated.