Helen Hobbs
Helen Stamp

Saturation mutagenesis of a predicted ancestral Syk-family kinase

Helen T. Hobbs, Neel H. Shah, Sophie R. Shoemaker, Jeanine F. Amacher, Susan Marqusee, and John Kuriyan

Protein Science 2022 31(10):e4411. doi: 10.1002/pro.4411.     (local copy)
BioRχiv 2022.04.24.489292; doi: https://doi.org/10.1101/2022.04.24.489292


Many tyrosine kinases cannot be expressed readily in E. coli, limiting facile production of these proteins for biochemical experiments. We used ancestral sequence reconstruction to generate a spleen tyrosine kinase (Syk) variant that can be expressed in bacteria and purified in soluble form, unlike the human members of this family (Syk and ZAP-70). The catalytic activity, substrate specificity, and regulation by phosphorylation of this Syk variant are similar to the corresponding properties of human Syk and ZAP-70. Taking advantage of the ability to express this novel Syk-family kinase in bacteria, we developed a two-hybrid assay that couples the growth of E.coli in the presence of an antibiotic to successful phosphorylation of a bait peptide by the kinase. Using this assay, we screened a site-saturation mutagenesis library of the kinase domain of this reconstructed Syk-family kinase. Sites of loss-of-function mutations identified in the screen correlate well with residues established previously as critical to function and/or structure in protein kinases. We also identified activating mutations in the regulatory hydrophobic spine and activation loop, which are within key motifs involved in kinase regulation. Strikingly, one mutation in an ancestral Syk-family variant increases the soluble expression of the protein by 75-fold. Thus, through ancestral sequence reconstruction followed by deep mutational scanning, we have generated Syk-family kinase variants that can be expressed in bacteria with very high yield.

Figures from the paper

(Click on the small image to get a higher-resolution version.)
Figure 1 from paper

Figure 1 - The Syk-family kinases.

(A) The structure of full-length auto-inhibited human ZAP-70 kinase (PDB: 4K2R)
(B) Activation of the Syk-family kinases by a conformational change in the tSH2 module and phosphorylation of key tyrosine residues (indicated with red circles). The tSH2 module binds to a doubly-phosphorylated immunoreceptor tyrosine-based activation motif (ITAM) and phosphorylation of the tSH2-kinase linker stabilizes the open, active conformation. Maximal activation occurs when the activation loop is also phosphorylated.
(C0 His-tagged human kinase domains expressed in E. coli co-expressing the YopH phosphatase (MW=45kDa) and enriched over nickel columns. The expected molecular weight of each tyrosine kinase domain falls within the red box (between 25 and 37 kDa). A strong band in this box indicates soluble expression. Human ZAP-70 and Syk, in the last two lanes, show no soluble expression in E. coli with YopH.
Figure 2 from paper

Figure 2 - Reconstructed ancestral Syk-family kinases.

(A) The phylogenetric tree used in the ancestral sequence reconstruction. The nodes corrresponding to AncS, AncZ, and AncSZ are marked with green circles.
(B) The pairwise sequence identities for the full-length and kinase domains of human ZAP-70, AncZ, AncSZ, AncS, and human Syk.
(C) A sequence alignment of the kinase domains of the human and reconstructed Syk-family kinases. Residues are shaded according to percent identity. Numbered according to AncSZ.
Figure 3 from paper

Figure 3. Characterization of the bacterially-expressed AncSZ and other Syk-family kinases.

(A) Top, SDS-PAGE gel for purification of AncSZ expressed in E. coli co-expresssing the protein tyrosine phosphatase YopH. The band corresponding to YopH (45kDa) is observed in the lanes corresponding to the anion exchange (Q) column washes. A band at the expected molecular weight of AncSZ (~32kDa) is observed in fractions taken during the elution of AncSZ from the Q column with increasing concentration of NaCl. The yield from this purification was approximately 2 mg of protein from one liter of E. coli culture. Bottom, a similar gel from the purification of AncSZ*. The yield for AncSZ* was approximately 150 mg of protein from one liter of E. coli culture, a substantial increase over what was observed for AncSZ.
(B) The initial velocities for enzymatic reaction of ancestral and extant Syk-family kinases with LAT214-233 as the peptide substrate (n=3), as reported by ATP hydrolysis. For the pre-phosphorylated samples, kinases were incubated with purified Lck kinase and ATP for one hour prior to measurement. Lck had negligible activity towards the LAT214-233 peptide (far left bar), and it was present at 10-fold lower concentrations than the Syk-family kinase in the phosphorylation reactions.
Figure 3 from paper

Figure 3 continued. Characterization of the bacterially-expressed AncSZ and other Syk-family kinases.

(C) Reaction progress curves for full-length Syk, ZAP-70, and AncSZ phosphorylation of LAT214-233, measured in an enzymatic assay where ADP production is coupled to NADH oxidation and a loss of absorbance at 340 nm. The black lines in each graph track the initial velocities. Both Syk and AncSZ can auto-activate, as indicated by the increasing reaction rate for the unphosphorylated samples as a function of time. ZAP-70, however, cannot do so.
Figure 4 from paper

Figure 4 - The predicted ancestral Syk-family kinases retain the substrate specificity profile of the human Syk-family kinases.

(A) E. coli are transformed with a plasmid library of ~3000 diverse peptides fused to a bacterial surface-display scaffold. Displayed peptides are then phosphorylated by the addition of the kinase variant of interest and labeled with a fluorescent pan-phosphotyrosine antibody. Labeled cells are sorted according to fluorescence and peptide-encoding plasmids from cells in the selected and unselected populations are isolated for deep sequencing, allowing for the calculation of enrichment at each position in the peptide.
(B) The enrichment of amino acids at positions (-1, +1, +3) in the substrate that are known determinants of substrate specificity in Syk-family kinases.

Figure 5 from paper

Figure 5 - A bacterial two-hybrid assay for kinase activity.

(A)The protein tyrosine kinase phosphorylates a tyrosine on a peptide substrate, the “bait” protein, fused to the N-terminal domain of the transcription factor λ-cI, which binds to an operator sequence upstream of the phage λ-promoter (PRM). The “prey” protein, the Grb2 SH2 domain, binds the phosphorylated peptide, thereby recruiting the α-subunit of RNA polymerase to the promoter. Once at the promoter, RNA polymerase transcribes chloramphenicol acyl-transferase (CAT), which confers resistance to the antibiotic chloramphenicol.
(B) AncSZ begins growing well before the kinase dead mutant (AncSZ DN) in the presence of 50 ng/µl chloramphenicol.
(C) Expression of the kinase is necessary for growth in chloramphenicol. When no arabinose is added the cells begin growing approximately 2 hours after those with 0.2% arabinose.

Figure 6 from paper

Figure 6 - Saturation mutagenesis of AncSZ.

Heatmap depicting the average enrichment values (E) for each pool (n=3). Along the top of each heatmap is the unmutated sequence of the protein, with every 10th residue colored blue, and along the left y-axis is the substituted residue. Synonymous codons are averaged. Red boxes indicate variants that were more active than the wild-type kinase in the bacterial two-hybrid assay, and blue boxes indicate variants that were less active than the wild-type kinsae. Grey boxes represent variants that were absent or had insufficient counts (<25) in the input library.

Figure 7 from paper

Figure 7 - Loss-of-function mutations.

(A) Green residues (left) are invariant across all eukaryotic protein kinases, and are mapped onto the structure of Lck (PDB: 3LCK). These include some active site residues and residues in other essential motifs. The blue residues (right) are those to which almost any substitution results in a loss of activity in the deep-mutagensis analysis of AncSZ described in this paper. Both Arg 490 and Asp 546 (further discussed in B and C) are among these. Residues are mapped onto a homology model of AncSZ.
(B) Mutations of Arg 490 and Asp 546 lead to loss of function. These residues play important roles in the activity and/or structure of the kinase.
(C) Average enrichment scores for Arg 490 in the bacterial two-hybrid (n=3). For Arg 490, all mutations, except for synonymous R codons, are loss-of-function.
(D) Average enrichment scores for Asp 546. All mutations to D546, except for aspartate or glutamate, are loss-of-function.

Figure 8 from paper

Figure 8 - Activating mutations occur in regions important for activity but may also be the result of increased expression.

(A) Strong gain-of-function variants map to regions of the kinase domain that are known to be important for the regulation of kinase activity, such as the activation and αC-β4 loops.
(B) Left, residues making up the catalytic spine (C-spine, left) and the regulatory spine (R-spine, right). Substitutions to these residues are loss-of-function in the bacterial two-hybrid assay, and are colored blue. A residue located at the top of the regulatory spine, Met 426, is a gain-of-function if mutated to Glu, Asp, or Pro and neutral for most other substitutions. Right, solvent-exposed Leu 616 is on the C-terminal helix of AncSZ. Next to Leu 616 is Asp 620, suggesting that a salt bridge may form when the leucine residue is mutated to positively-charged arginine.
(C) AncSZ M426E has a higher LAT226 phosphorylation rate than the AncSZ or AncSZ* (AncSZ L616R). The red bars correspond to the rates determined following a one-hour treatment with purified Lck kinase domain, in order pre-phosphorylate the activation loop, while the grey bars are the rates measured without this pre-phosphorylation step. The yield observed during purification for each kinase domain (n=1). The yield for AncSZ* was 75-fold higher than the other kinases.
(D) The average enrichment value for Leu 616 variants, left y-axis, and the likelihood score predicted by ancestral sequence reconstruction, right y-axis. Many of the variants that are gain-of-function in our assay were alternative amino acids at this position.

Supplemental figures from the paper

(Click on the small image to get a higher-resolution version.)
Supplemental Figure 1 from paper

Supplemental Figure 1 - Pre-phosphorylation of Syk-family kinases with Lck

(A)Western blot of the five Syk-family kinases with and without incubation with the kinase Lck using a primary antibody that recognizes the phosphorylated activation loop of Syk-family kinases. The membranes were stained with Coomassie following imaging.
B. Western blot of full-length Syk, ZAP-70, and AncSZ with and without incubation with Lck using primary antibodies recognizing the phosphorylated activation loop (bottom blot) and phosphorylated inter-SH2 linker (top blot). The membranes were stained with Coomassie following imaging.

Supplemental Figure 2 from paper

Supplemental Figure 2 - Activation loop auto-phosphorylation by Syk-family kinases.

Western blots depicting the time course of auto-phosphorylation of the kinase domains of human Syk, AncS, AncSZ, AncZ, and human ZAP-70 using a primary antibody that recognizes the phosphorylated tyrosine on the activation loop(s). Auto-phosphorylation is slow for all the kinases, but especially so for AncZ and human ZAP-70.

Supplemental Figure 3 from paper

2_Supplemental Figure 3 - The substrate specificity of human and ancestral Syk family kinases.

Heatmaps depicting the enrichment of amino acids at all positions in the substrate peptide for each Syk-family kinase, expanding on the data in Figure 4.

Supplemental Figure 4 from paper

Supplemental Figure 4 - Constructs used in bacterial two-hybrid.

(A) In the “prey” construct, RNA polymerase is followed by a Gly-Ser linker and then the Grb2 SH2 domain. In the “bait” construct, the N-terminal domain of the λ-cI protein is fused to the LAT 226 peptide by a Gly-Ser linker. Specific residue numbers used are denoted, and the expression vector is listed beneath each construct.
B. The pBpZR vector that was constructed for the two-hybrid assay. The portion originating from the pBAD expression vector is shown in light purple, and the portion originating from the original CAT plasmids shown in yellow. The pBAD portion contains the araBAD promoter (grey), the corresponding araC gene, a ribosome binding site (orange), and inserted restriction enzyme sites (Xba1 and BamH1). The CAT portion contains the CAT gene, the phage promoter (PRM) and operons (OR1-3), and the beta-lactamase gene.

Supplemental Figure 5 from paper

Supplemental Figure 5 - The bacterial two-hybrid assay is reproducible.

The correlation of the enrichment scores for the selected population over the unselected populated for three independent replicates for each pool. All reported enrichment scores will be the average from three replicates.

Supplemental Figure 6 from paper

Supplemental Figure 6 - Many residues are robust to mutation.

(A) H104 highlighted on the homology model of AncSZ. This residue is on the surface and the side chain is not involved in maintaining the structure or directly linked to catalysis.
B. The average enrichment scores for substitutions to H104 in the bacterial two-hybrid (n=3). Most mutations have no effect on the enrichment score. However, a stop codon is a loss-of-function.

Supplemental Figure 7 from paper

Supplemental Figure 7 - Proline mutations are loss function in residues involved in the secondary structure.

Residues in which a proline substitution resulted in a loss of function in the bacterial two-hybrid assay are colored blue on the homology model of AncSZ. Many of these detrimental mutations occur in residues participating in secondary structure.