Crystal Structures of Human C4.4A Reveal the Unique Association of Ly6/uPAR/α-neurotoxin Domain

Ly6/uPAR/α-neurotoxin domain (LU-domain) is characterized by the presence of 4-5 disulfide bonds and three flexible loops that extend from a core stacked by several conversed disulfide bonds (thus also named three-fingered protein domain). This highly structurally stable protein domain is typically a protein-binder at extracellular space. Most LU proteins contain only single LU-domain as represented by Ly6 proteins in immunology and α-neurotoxins in snake venom. For Ly6 proteins, many are expressed in specific cell lineages and in differentiation stages, and are used as markers. In this study, we report the crystal structures of the two LU-domains of human C4.4A alone and its complex with a Fab fragment of a monoclonal anti-C4.4A antibody. Interestingly, both structures showed that C4.4A forms a very compact globule with two LU-domain packed face to face. This is in contrast to the flexible nature of most LU-domain-containing proteins in mammals. The Fab combining site of C4.4A involves both LU-domains, and appears to be the binding site for AGR2, a reported ligand of C4.4A. This work reports the first structure that contain two LU-domains and provides insights on how LU-domains fold into a compact protein and interacts with ligands.

Despite the clear functional importance of these multiple LU-domains proteins, their three-dimensional structures remain largely unexplored with a single exception. The urokinase-type plasminogen activator receptor (uPAR) is a GPI-anchored membrane protein containing three LU-domains (DI, DII and DIII) and several crystal structures have been solved for this founder of the LU-domain protein family [28][29][30][31][32]. The intermolecular assembly of all three LU-domains in uPAR via β-sheet interactions creates a large central hydrophobic ligand-binding cavity that mediates the high-affinity binding of its primary ligand, the serine protease urokinase-type plasminogen activator. Biophysical studies have shown that this interdomain assembly in uPAR is highly flexible and that this has biological relevance [33,34]. Restricting this internal flexibility by introducing an interdomain disulfide bond between the DI and DIII traps uPAR in a closed conformation, which increases its affinity for its second ligand, Vitronectin [33,35]. From a translational perspective, this domain flexibility also proved essential for the development of a small 9-mer peptide targeting an intermediate conformation in uPAR [28,36] and this assisted its further maturation into a PET-probe currently used for non-invasive imaging of uPAR expression in patients with malignant solid tumors [37][38][39]. Moreover, the dimer of uPAR isoform 2 was reported to induce kidney diseases in mice [40].
Prompted by the close relationship between LU-domain flexibility and function of uPAR, we decided to solve the crystal structure of another protein containing multiple LU-domains to gain further insight into the structural versatility of this fold. We chose to focus on C4.4A (encoded by LYPD3), which contains two LU-domains followed by a mucin-type region rich in serine, threonine and proline (STP-rich region) and a C-terminal GPI-anchor [41,42]. No well-defined function has yet been assigned to C4.4A, but circumstantial evidence suggests that it could play a role in cell adhesion, migration and invasion through established interaction with laminins [43], integrins and MMP14 [44,45], and/or Anterior Gradient 2 (AGR2) [46]. Nonetheless, expression of C4.4A is strictly regulated under normal homeostatic conditions as it represents a robust biomarker for the presence of stratum spinosum in stratified squamous epithelia of the skin and for squamous differentiation of epithelia in other organs such as esophagus, vagina, oral cavity, and rectum [27,42,47]. Along the same lines, squamous metaplasia of bronchial epithelia (not yet a malignant lesion) is strictly correlated with the emergence of C4.4A expression [48]. Consequently, high expression levels of C4.4A predicts poor prognosis for patients with pulmonary adenocarcinoma but not for those with squamous cell carcinoma [20,49,50]. Similar findings have been reported in other solid cancers in e.g., breast [51], bladder [52,53], colon [54,55], and esophagus [56,57]. Based on these findings, there is a strong interest in studying C4.4A in various pathological conditions and new experimental tools are being developed to accomplish this-such as C4.4A-deficient mouse models [53] and antibody drug conjugates targeting C4.4A [58]. With this study, we seek to gain structural insights into how the LU-domains in C4.4A are organized and how C4.4A recognizes ligand.

Challenges in structural determination of C4.4A
Recombinant human C4.4A was expressed in Drosophila S2 cells. This recombinant protein contains, at its C-termini, a purification tag (uPAR DIII) to facilitate the capture and purification of the protein [59]. The STP-rich region of C4.4A is heavily glycosylated containing 15 putative O-linked glycosylation sites [42], posing major difficulty for the crystallization of intact C4.4A. We thus removed the STP-rich region and the purification tag by limited proteolysis with chymotrypsin to obtain the N-terminal region containing the two LU-domains (DI and DII) of C4.4A (residues 1-201), which was then purified to high homogeneity, and grown into well diffracting crystals of C4.4A (2.4 Å) at pH 3.6 [60]. Structural determination from these C4.4A crystals using molecular replacement (MR) proved difficult due to the low sequence conservation amongst published structures of single LU-domain (e.g., the two LU-domains in human C4.4A share only 30% and 28% sequence identity with the DII of uPAR). Single-wavelength anomalous dispersion phasing using a biosynthetically selenomethionyl labeled C4.4A (yielding ~70% Se-incorporation in Met) was unsuccessful due to poorly diffracting SeMET crystals (>4 Å). Traditional multiple isomorphous replacements (MIR) or phasing with sodium bromide [61] were also tried, but in all cases, the crystals either lost their diffraction upon soaking or did not give clear solutions of heavy atom positions. The complex between C4.4A and the Fab fragment of 11H10 was purified by size exclusion chromatography and yielded well-diffracting crystals of C4.4A:Fab (11H10). The Fab fragment was positioned into the crystal by MR, and the Fo-Fc map now revealed electron density for C4.4A. Extensive manual building, together with iterative refinement, finally yielded crystal structures of both the C4.4A:Fab complex and C4.4A with good statistics. The structure of C4.4A was refined to 2.59 Å with R-factor and R-free of 0.2063 and 0.2503, respectively; 92.8% residues in favored Ramachandran region (Table 1). Most residues of the C4.4A structure are well supported by electron density maps, except for residues 95-99 and residues 92-101-a glycosylated linker region between the DI and DII-which were consequently not modelled in the structure. The structure of C4.4A:Fab complex was refined to 2.75 Å with R-factor and R-free of 0.1963 and 0.2555, respectively; 95.3% residues in favored Ramachandran region (Table 1). Residues 89-103 of C4.4A molecule between DI and DII are also not modelled.
A long inter-domain linker exists between the two LU-domains in C4.4A. However, the two LU-domains assemble via a large hydrophobic interface to form a compact protein structure with the dimensions 60 x 42 x 34 Å (Fig. 1A, B and C). This unique assembly of the LU-domains in C4.4A resembles two right hands tightly facing each other on finger area.
There are two C4.4A molecules in the asymmetric unit of the crystal. Superposition of the two molecules shows that their structures are highly similar with each other with RMSD of 0.72 Å for all Cα, further supporting the low flexibility of the structure under these conditions. Another notable key difference between the two LU-domains in C4.4A is the arrangement and the number of disulfide bonds. The DI has four disulfide bonds (Db1, Cys3-Cys31; Db1a, Cys6-Cys14; Db2, Cys24-Cys52; Db4, Cys78-Cys83), while the DII has five (Db1, Cys110-Cys142; Db1a, Cys113-Cys121; Db2, Cys131-Cys163; Db3, Cys169-Cys185; Db4, Cys186-Cys191) (Fig. 1D). This divergent arrangement of the disulfide bonding is nonetheless not unique to C4.4A, but is found in all proteins with multiple LU-domains. In these proteins, the N-terminal LU-domains invariably lack one of the otherwise consensus disulfide bonds (Db3) [2]. Paradoxically, missense mutations affecting one of the four consensus disulfides in the single LU-domain-containing proteins (e.g., GPIHBP1, CD59, κbungarotoxin) cause protein misfolding and loss-of-function [6,12,62,63]. One possible structural advantage of the absence of Db3 in the DI is that the affected βID becomes much less twisted compared to the corresponding βIID in the DII (where Db3 remains intact). A comparison to all structures solved for uPAR reveals similar lower twisting of βID compared to βIID [28][29][30][31][32].
The crystal structure revealed clearly four N-linked glycans (Fig. 1A, D), including one located at the linker regions between the LU-domains of C4.4A (Asn88) and three glycans in the DII of C4.4A (Asn133, Asn146 and Asn153).

Structural basis for a compact conformation of C4.4A
The two LU-domains of C4.4A associate tightly in the crystal structure forming a globular protein. This domain organization is predominantly stabilized via interdomain hydrophobic interactions involving relatively large surfaces of the central β-sheet in each LU-domain ( Fig. 2A, B). Of note, the central β-sheets of the two LU-domains in C4.4A are both asymmetric in the sense that they have one face which is particularly hydrophobic (hydrophobic contact area of 562 Å 2 for DI and 638.6 Å 2 for DII) (Table S1). These hydrophobic faces of the β-sheets assemble to form the interdomain binding interface and they share a high degree of shape complementarity. A number of polar interactions are also found at the rim of the interdomain interface: Ile41-Tyr132, Arg62-Thr175, His67-Gln165, His67-Asp172, Gly68-Tyr139 (Table  S2, Fig. 2C). These hydrogen bonds and ionic interactions likely provide a directional force to stabilize the relative orientation between two LU-domains. Interestingly, these polar interactions are predominantly located at the interface created by the shape complementarity between the finger tips in the DI (finger tips of F2 and F3) and the disulfide rich core of the DII (including Lk1 and Lk2). In this region, the DII forms a highly negatively charged pocket (Asp166 and Asp172) that accommodates His67 from the DI by electrostatic interaction (Fig. 2D). This pocket is stabilized by Db3.

Antibody 11H10 recognized both LU-domains in C4.4A
The structure of C4.4A in the complex with the Fab fragment of mAb 11H10 is highly similar to the C4.4A alone (Fig. 3A), with an RMSD of 0.55 Å for all atoms. This high similarity demonstrates the compactness and rigidity of the globular assembly of the two LU-domains in C4.4A is not affected by the crystal lattice formation and the presence of the antibody. Note that the complex was crystallized under neutral pH (7.0), compared to the low pH (3.6) crystallization of C4.4A, which further underlines the stability of the compact structure of C4.4A.
As shown by the C4.4A:Fab complex structures, the structural epitope on C4.4A recognized by the Fab fragment is mainly located in three β-strands (βC, βE and βF) in DI and the linkers between β-strands in DII (Lk1 and Lk2) (Fig. 3C, Table S3). The Fab Arg103 of heavy chain (labeled as H/Arg103 in Fig. 3C) inserts into the groove of C4.4A-DI and C4.4A-DII and forms hydrogen bonds with Asp65 and Gln165 of C4.4A (Table S2). On the other hand, C4.4A-DI residues Leu70 and Phe72 embed into the hydrophobic area surrounded by Fab heavy chain (Phe32, Trp54, Trp55, Tyr58, Tyr60 and Leu102) and light chain (Trp94 and Pro95). Usually, the conformation of loop is susceptible to ligand binding and/or environment due to its flexibility. However, despite containing a long loop in DII-Lk1, the binding of Fab doesn't induce the conformational change of this loop. Further structural analysis shows Tyr132 and Tyr139 located in DII-Lk1 form hydrogen bonds with DI to stable the conformation of DII-Lk1 (Fig. 3B). Moreover, the DII-Lk1 appears to have constrained conformation due to the presence of four internal hydrogen bonds (Tyr132-Ala134, Asn133-Asp136, Ala134-His137, His137-Tyr139). All hydrogen bonds are mediated by the main chain atoms and thus are conserved in different species.
The structure of the complex demonstrates that the mAb 11H10 recognizes a conformational epitope on intact C4.4A-DIDII containing both LU-domains. This observation is excellently aligned with biochemical results showing by Western Blot (Fig. 3D, E) that the binding of mAb 11H10 to C4.4A requires that both domains are present and that it is folded correctly (line 4 and 6).

Discussion
The functional site of C4.4A for ligand binding C4.4A was reported to interact with both α6β4 integrin and MMP14, promoting wound healing and metastasis [45]. In addition, the interaction between C4.4A with Anterior Gradient 2 (AGR2) stimulates pancreatic ductal adenocarcinoma cell aggressiveness and reduces sensitivity to chemotherapy drug gemcitabine. C4.4A also binds to integrin β1 and laminins 1 and 5 [46]. However, the structural details of how C4.4A interacts with its ligands is unknown.
Based on our crystal structure of C4.4A:Fab complex, we studied its molecular interaction of C4.4A with AGR2 by carrying out the protein-protein docking between C4.4A and AGR2 (PDB ID: 2lnt) [66] using ZDOCK (Version 3.0.2) [67]. The top docking solution clearly stands out from all the rest of the solutions, demonstrating the top solution is highly reliable. Interestingly, the AGR2 contacts to C4.4A at the site (Fig. 4A) quite close to the Fab fragment binding site (Fig. 3A). Again, the C4.4A DII-Lk1 moiety plays an important role mediating the interaction by docking into a pocket of AGR2 (Fig.  4B). These consistent results demonstrate that this area of C4.4A is important for ligand binding.

A novel assembly mode of LU-domains in C4.4A
LU-domains contain three to six highly conserved disulfide bonds with a unique signature motif: CCxxxxCN (x is arbitrary amino acid), which is tightly packed at the palm region [68].
In many cases, the palm surface of LU-domain is important in interacting with the ligand, as shown by the structure of the multiple-LU-domains protein (uPAR) and single LU-domain protein (CD59 and some three-fingered snake venom toxins) [29,[69][70][71]. The uPAR contains three LU-domains, which assemble in a circular manner by interdigitating with each other to generate a central cavity (Fig. 5B) which accommodate its ligand. However, in our C4.4A structures, the palm surfaces of two domains are composed wholly of hydrophobic residues and buried inside the protein by the unique face-to-face assembly mode of two LU-domains.
A novel mode of homodimerization of LU-domains was revealed in our C4.4A structure. All of known three-fingered snake venom toxins contains only one LU-domain. A few toxins exist nevertheless as non-covalent homodimers in solution e.g. κ-bungarotoxin and haditoxin [11,72]. In these dimers, two independent protein molecules are arranged in an antiparallel manner (Fig. 5C). The interaction between the protomers consist of the pairing of β-strands and van der Waals interactions provided by some hydrophobic residues in the F3. The key residue Phe49, which is found in all four κ-bungarotoxins to provide the hydrophobic core, interact with Ile20, Thr60 and the disulfide bond Cys46-Cys58 from another subunit [12,73]. Three-fingered snake venom toxins also form homodimers or heterodimers via intermolecular disulfide bonds [74]. In the α-cobratoxin homodimer, the first N-terminal β-strand of two protomers were swapped, and two intermolecular disulfide bonds were formed between Cys3 in one protomer and Cys20 in another (Fig. 5D) [13].
Prediction of structure of Haldisin, a C4.4A analogue, based on C4.4A structure Haldisin (encoded by LYPD5) is extracellular protein predominantly expressed in stratum granulosum of human skin under homeostatic condition, and was predicted to contain two LUdomains with disulfide bonding pattern similar to C4.4A [25]. However, the sequence identity between Haldisin and C4.4A is low, particularly for the DI (Fig. S1). Despite this low sequence conservation, we were able to generate a homology model of Haldisin based on our structure of C4.4A. The model was subjected to thorough molecular dynamics (MD) simulation. The stability, sampling and convergence of the MD simulation were established by calculation of the backbone RMSD (Fig. S2). Hess analysis and RMSD both confirmed the adequate sampling of Haldisin conformation around the equilibrium position in the last 500 ns of the MD simulation (Table S5). The most representative model of Haldisin, covering 92% of the sampled conformations was identified by clustering analysis on the last 500 ns MD trajectory. The resultant Haldisin model showed high structural similarity to our crystal structure of C4.4A with the RMSD of 1.58 Å (DI) and 2.16 Å (DII) for all Cα atoms (Fig. 6A). Importantly, the inter-domain interface of Haldisin is highly complementary to each other in term of charges ( Fig. 6B blank circles) and polarity (Fig. 6B orange  circles). Such high degree of structural similarity of Haldisin to C4.4A suggests parallel functions, which remains to be confirmed experimentally.

Generation of a monoclonal anti-C4.4A antibody and its Fab fragment
A mouse monoclonal anti-C4.4A antibody (11H10) was generated by conventional mouse hybridoma technology after immunizing FVB mice with purified recombinant human C4.4A produced in Drosophila S2-cells with a C-terminal uPAR DIII fusion tag that was removed by enterokinase treatment [56,75]. Purified 11H10 was treated with immobilized Ficin (Thermo Scientific, Rockford, Il, US) in the presence of 25 mM L-cysteine and 1 mM EDTA at 37°C for 1 h to produce Fab and Fc fragments. The reaction mixture was passed through a Protein A column to remove the undigested 11H10 and its Fc fragments. The Fab-containing flow-through fraction was further purified size exclusion chromatography using a Superdex 200.
The cDNAs were amplified with conventional PCR using the Platinum Pfx DNA Polymerase (Invitrogen) and products with the proper size (app. 700 bp) were purified from a 1% agarose gel with the QiaQuick Gel Extraction Kit (Qiagen). The cDNAs were cloned into pBlueScript KS+ using the introduced EcoRI and NotI restriction enzyme sites (underlined in the primer sequences) and the Rapid DNA ligation kit (Roche). Subsequently, DH5α Competent cells were transformed and DNA was isolated from individual clones and analyzed by restriction enzyme digestion before sequencing. Five-six clones for each chain were sequenced and revealed 100% identical sequences.

Recombinant
C4.4A-ent-uPAR-DIII fusion protein was produced in Drosophila S2 cells and purified by immunoaffinity chromatography as described [59,75]. The C4.4A protein containing the two LU-domains was subjected to limited proteolysis with chymotrypsin preferential hydrolyzing the peptide bonds after Tyr200 or Phe201 in the linker region between the DII and the mucin-type C-terminal domain [42] and further purified by size exclusion chromatography using a Superdex75 column [60].

Immunoblotting Analysis
The generation of various C4.4A domain constructs was produced in Drosophila S2 cells and purified by immunoaffinity chromatography as described [59]. After separation by SDS/PAGE an identical set of samples were immobilized on a PVDF-membrane (Millipore, Bedford, MA, U.S.A.). After blocking excess of binding sites, the PVDF membrane was incubated with 1 µg/ml of mAb 11H10 as primary antibody and peroxidase conjugated swine anti-mouse immunoglobulins (Dako, Glostrup, Denmark) diluted 1:5000 as secondary antibody. Positive reactivity was visualized by enhanced chemiluminescence (ECL plus; Amersham).

Formation of complex between C4.4A and 11H10 Fab
The Fab peak was pooled and mixed with C4.4A-DIDII and the resultant complexes purified by gel filtration on a Superdex 200 column (GE Life Sciences) with 20 mM Tris, 150 mM NaCl and pH 7.4 as the running buffer. The eluted fragments contained the target complex were mixed and concentrated to 10 mg/mL using an Amicon Ultra Centrifugal Filter Device (Millipore, USA) with a molecular mass cutoff of 10,000 Da, the aliquots were stored frozen at -80°C.

Crystallization
The C4.4A crystals were obtained at 293K by the sitting-drop vapor-diffusion method in a concentration of 10 mg/mL. The precipitant condition is 22.5% (w/v) polyethylene glycol 4000, 0.1 M citric acid in pH 3.6, as described previously [60]. For C4.4A:Fab complex, all crystallization trials were done at 295K using commercial screening kits (from Qiagen, XtalQuest and Hampton Research) with the Phoenix robot (Art Robbins Instruments). Optimized crystals grew in 20% PEG3350, 0.2M Potassium sodium tartrate tetrahydrate, 0.1M Tris-HCl pH 7. Crystals appeared within the third day and grew larger over the course of 2 weeks. Slender rod-shaped crystals were carefully looped and frozen in ice-free liquid nitrogen after a quick soak in original mother liquor with 25% (v/v) glycol.

Data collection and processing
Prior to X-ray data collection, the crystals were transferred to the precipitant solution containing 25% (v/v) glycerol and flash-frozen in liquid nitrogen. Diffraction data for both C4.4A and C4.4A:Fab complex were collected on beamline X29 at Brookhaven National Synchrotron Light Source (NSLS) and were processed using the automated data-processing pipeline xia2 [76] with options that run XDS [77]. used to perform iterative model building, refinement and density modification, leading to an improved electron density map. Iterative cycles of model building and refinement were performed until the model cannot be improved.
The model from the C4.4A crystal was then used as the searching model to successfully place into the C4.4A:Fab crystal by MR method with the positioned Fab as fixed model. The completed model of the complex was further improved by several cycles of refinement and manual adjustment until converged. All relevant data collection and refinement statistics were summarized in Table 1.

Molecular dynamics simulation of Haldisin
The homology model of Haldisin was built automatically by the SWISSmodel web server based on the crystal structure of C4.4A [78]. The model of Haldisin was inserted into a truncated octahedron water box with edge lengths of 71 Å, 71 Å, and 71 Å. The protonation states of residues were assigned according to the corresponding pKa values calculated by using the H++ webserver [79]. Two Na + ions were added to counterbalance the charge of the protein.
The system contained 8,630 water molecules and 28,786 atoms in total. It was underwent MD simulations with AMBER ff99SB-ILDN force field [80][81][82] using the GROMACS 4.6.5 code [83]. The Åqvist potential [84] and TIP3P model [85] were used for the ions and for the water molecules, respectively. All bond lengths were constrained by LINCS algorithm [86]. Periodic boundary conditions were applied. Electrostatic interactions were calculated using the Particle Mesh-Ewald (PME) method [87], and van der Waals and Coulomb interactions were truncated at 10 Å. The system underwent 1,000 steps of steepest-descent energy minimization with 1,000 kJ·mol −1 ·Å −2 harmonic position restraints on the protein, followed by 2,500 steps of steepest-descent and 2,500 steps of conjugate-gradient minimization without restraints. The system was then gradually heated from 0 K up to 298 K in 20 steps of 2 ns. After that, 2000 ns-long productive MD simulations were carried out in the NPT ensemble. The most representative structure was identified by the cluster analysis [83] with a cut-off of 1.5 Å over the equilibrated trajectories, ranging from 1500 ns to 2000 ns. To assess the convergence of the simulated trajectory in the last 500 ns, we considered the projection of each snapshot on the top essential dynamical spaces obtained from a standard covariance analysis. Following Hess's criterion [88], these projections were next compared with those expected for a random reference. The observed negligible overlap (i.e. cosine content close to 0, see Table S5) confirmed a posterior adequate sampling of Haldisin conformation around the equilibrium position in the last 500 ns.