An in-silico study of Polymerase Epsilon catalytic subunit proteins in Arabidopsis thaliana

Arabidopsis thaliana genome encodes two POLE2 homologs known as polymerase epsilon catalytic subunit A (POLE2A) and polymerase epsilon catalytic subunit B (POLE2B). They play a very important role in DNA repair mechanisms. In this study, bioinformatics tools were used to understand DNA repair mechanisms in A. thaliana in which POLE2A and POLE2B proteins are involved. Through interactome analysis of POLE2A and POLE2B homolog proteins in A. thaliana, their additional roles in DNA repair were explored. The most important proteins that are participating in DNA repairs, like MSH2, MSH5, PCNA1, PCNA2, PRL, and CDC45 were identified as interactors of both POLE2A and POLE2B. The three-dimensional structure of POLE2 proteins was identified to decipher the complexity of NER, GG-NER, MMR, TFIIH, and TCNER repair mechanisms through the identification of docking sites. The interaction complex of POLE2A and POLE2B with six proteins was confirmed and found to have a significant role in DNA repair processes and UV-B tolerance. The interactome analysis of POLE2A and POLE2B performed here once again confirms the complexity of the DNA repair mechanism in plants.


Arabidopsis thaliana as a plan model organism
Arabidopsis thaliana emerged a model organism for research in plant biology. A consensus was reached about the need to focus on a single organism to integrate the classical disciplines of plant science with the expanding fields of genetics and molecular biology. Ten years after publication of its genome sequence, A. thaliana remains the standard reference plant for all of fields of biology. We reflect here on the major advances and shared resources that led to the extraordinary growth of the A. thaliana research community. We also underscore the importance of continuing to expand and refine our detailed knowledge of A. thaliana while seeking to appreciate the remarkable diversity that characterizes the plant kingdom [1]. Maintenance of the stability and integrity of plant genome when affected with UV radiation is crucial for proper functioning of the plant and, indirectly, the rest of the ecosystem which is connected to it. A. thaliana specifically, has been proven to release volatile compounds when irradiated by UVC radiation (mostly methyl salicylate and methyl jasmonate), that cause genomic instability in neighboring plants, which even belongs to other species, like Nicotiana tabacum. Thus, to analyze and understand the specificities of plant DNA repair mechanisms, starting from well-researched model higher plant like A. thaliana is an inevitable step of plant science research, with far-reaching consequences [2].

DNA mutations
A deoxyribonucleotide polymer, is known as the primary genetic material of most cells. Any injuries to DNA which introduce alterations in chemical structure of DNA may result in a mutation or block of DNA replication, if left unrepaired [3].
Some mutations are inherited parentally or occur during cell division, whereas some mutations are caused by environmental factors, including chemicals, viruses, and ultraviolet (UV) radiation [4]. Ultraviolet radiation introduces noxious effects in all living organisms ranging from prokaryotic bacteria to lower and higher eukaryotic plants, animals, and humans. UV radiation is generally divided into three classes: UV-C, UV-B and UV-A. DNA is highly sensitive to UV-B (280-315 nm) radiation for which it has been shown to be harmful to living organisms, causing photo-transformation, which results in direct and indirect DNA damage [5], [6].

UV radiation and irradiation repairing
In plants, exposing high doses of UV-B irradiation causes morphological and physiological damage that result in decreasing plant development rate, nitrogen productivity, interrupting processes including photosynthesis, secondary metabolism, responses to stress and photomorphogenesis [7], [8]. To prevent irradiation from UV lights, plants have been developed two main mechanisms, light, and dark repair. Light repair mechanism is based on visible light, and enzyme known as photolyases. Photolyase is involved in repairing UV induced damages in plants, which binds complementary DNA strands and break certain types of pyrimidine dimmers [7]. Pyrimidine dimmers are seen when a pair of thymine or cytosine bases are on the same strand of DNA and are covalently linked, these dimmers are result in DNA lesion. This DNA lesion is recovered by photoreactivation treatment with light of longer wavelength. Photoreactivation is the direct correction of the pyrimidine-pyrimidine ring back into the normal sequence by using the enzyme photolyase with the energy of blue visible light. This repair mechanism does not appear in human, as proved before [9].

Repair Mechanism
Nucleotide excision repair (NER) is a mechanism to recognize and repair bulky DNA damage caused by compounds, environmental carcinogens, and exposure to UV light. The repair of damaged DNA involves at least 30 polypeptides within two different sub-pathways of NER, known as transcription-coupled repair (TCR-NER) and global genome repair (GGR-NER) [10]. Base excision repair (BER) is the predominant DNA damage repair pathway for the processing of small base lesions, derived from oxidation and alkylation damages [2], [11]. Another mechanism involved in DNA repair is DNA mismatch repair (MMR) and defects in this mechanism have been found in skin cancer [12].

Polymerase epsilon
During DNA replication, polymerase epsilon is a polymerase responsible for the forming leading strand. DNA polymerase epsilon, in yeast, has four subunits. In A. thaliana, DNA polymerase epsilon consists of a following units: for the regulatory subunits: DPB2, DPB4, and DBP3; and for the catalytic subunit: POL2a and POL2b [13]. In A. thaliana, POL2A, together with DPB2, is essential for early stages of embryogenesis. Also, it is main catalytic subunit that is active in meristems and during embryogenesis. The POL2B catalytic subunit is mainly expressed during stress conditions. If knockout is done in POL2B, no morphological phenotypes are exhibited, however if null mutation occurs in DPB2 or POL2A, it is lethal. Mutants which have lost function of POL2B, have cells which are larger, longer cell cycles and delayed embryo development [13,14]. In the Arabidopsis genome, there is only one copy of the TIL gene, but some studied monocots have shown that in several, two homologs, TIL1 and TIL2 can be found. In A. thaliana, polymerase epsilon is encoded by the TILED (TIL1) locus [15].

Aim of the study
The aim of this study is to decipher the DNA repair mechanism in A. thaliana, focusing on POLE2A and POLE2B proteins, through a detailed in silico analysis of the functional interactome of these proteins.

Retrieving POLE2a and POLE2b sequences
The sequences for AtPOLE2A and AtPOLE2B proteins were obtained from the National Center for Biotechnology Information (NCBI) database (The Arabidopsis Information Resource) database. The accession number for POLE2a and POLE2b subunits are shown in table 1.

Phylogenetic tree construction
To infer the evolutionary relationship between the POLE2A and POLE2B proteins, a phylogenetic tree was constructed using Phylogeny.fa web service for phylogenetic analysis of molecular sequences. The "One Click" option on phylogeny.fr was used, where from the Newick format, the tree was obtained [16].

Prediction of 3-D structure models and Ramachandran plot confirmation
The expasy-swiss model is structural bioinformatics software for homology modeling of three-dimensional (3D) structure of proteins. It was used to generate the PDB files online, whereas Pymol was used to visualize 3D structure of AtPOLE2A and AtPOLE2B. The 3D structure of each subunit was generated and confirmed using Ramachandran plots using the RAMPAGE program. Domain analysis was made using SMART Server [17].

Prediction of the interactome
STRING is a tool where the protein-protein interactions can be easily seen and analyzed to show the interactions between the POLE2A and POLE2B subunits with other related proteins, as well as to determine how strong these interactions are [18].

Domain identification
The identification of domains in the six POLE2A and POLE2B proteins was performed using the online Simple Modular Architecture Research Tool (SMART) tool, on the website of the European Molecular Biology Laboratory (EMBL). The tool can detect more than 500 domain families from chromatin-associated, extracellular and signaling proteins. These domains are comprehensively interpreted with regards to functional class, important residues, and tertiary structures [19].
Domain analysis was made using SMART Server. The tool can detect more than 500 domain families from chromatin-associated, extracellular and signaling proteins. These domains are comprehensively interpreted with regards to functional class, important residues, and tertiary structures [20].

Subcellular localization of POLE2A and POLE2B proteins
To check the subcellular localization of the proteins, online tool Plant Subcellular Localization Integrative predictor (PSI) was used. It uses the group voting strategy and machine learning to combine the results of 11 independent subcellular localization tools: cello, mPloc, Predotar, mitoProt, MultiLoc, TargetP, WolfPSORT, subcell Predict, iPsort, Yloc and PTSI [21].
For the subnuclear localization prediction, we used the Subnuclear Compartments Prediction System (Version 2.0), an SVM-based system developed at Laboratory of Computational Functional Genomics, Department of Bioengineering, PML body, nuclear lamina, nuclear speckles, chromatin, nucleoplasm (nuclear diffuse) and nucleolus [22]. Alternatively, the same proteins were also submitted to the SUB cellular localization database for Arabidopsis proteins (SUBA), which comprises of large proteomic and GFP datasets of localizations for this model plant, but also provides access to several bioinformatics prediction tools, giving an integrated approach to localization in cellular compartments of A. thaliana [23].

Docking site prediction
Docking is a computational modeling of protein-protein complexes in quaternary structure by interacting at least two or more biological macromolecules. ClusPro 2.0 server is used to predict docking sites for AtPOLE2A and AtPOLE2B proteins. Before using Clus Pro algorithm, PDB files were needed and generated for each ligand proteins by Expasy-Swiss tool and then run into algorithm to get results done individually and visualized by PyMol [23].

Phylogenic tree construction
Proteins of POLE2a and POLE2b in species were obtained and used to construct the phylogenetic tree as cladogram shown in Fig. 2.

Predicted 3-D structure models and Ramachandran plot validation
For AtPOLE2a and AtPOLE2b, the 3-D structure predictions were visualized by using Pymol and the confidence of the prediction was confirmed by a Ramachandran plot.

Domain identification
Tables 3 and 4 represent identified domains of POLE2a and POLE2b, respectively.

Subcellular localization of POLE2A and POLE2B proteins
The subcellular localization prediction was done by using PSI predictor of AtPOLE2A and AtPOLE2B. The score indicates the confidence of the localization of the protein in a cellular compartment, end score between 0 and 1 is shown as level of presence of the protein in a cellular compartment [24]. Table 6 shows subcellular localization of POLE2a and POLE2b proteins.

Discussion
After generating the interactome of AtPOLE2A and AtPOLE2B proteins, the results showed that proteins of interest interact with various proteins. As the number of interactions was numerous, after generating interactome using STRING online tool, in this study, only the highest confidence interactions were chosen for further analysis, while medium and low confidence were omitted. The results are summarized in Fig. 4 and in table 2.
There are several DNA damage repair pathways reported. In mammalians, the nucleotide excision repair (NER) is involved in both global genome repair (GGR) and transcription coupled repair (TCR) [25]. POLE2 and Damage-specific DNA Binding Protein 2 (DDB2) are major factors of classical (GGR) and (TCR) repair. The predicted amino acid sequence reveals that POLE2 is a member of DUF repeat protein family as human-POLE2 protein. DUF repeats are not involved in any catalytic activity, but they maintain protein-protein interactions. The protein in Saccharomyces cerevisiae is also known as NER proteins that are required for inactivation of NER. Deletion of POLE2 (POLE2A and POLE2B) showed that it is not essential for survival of yeast cells after exposing UV light unless it depends on few other radiation sensitive proteins, whereas both human POLE (POLE2A and POLE2B) and yeast TILED C is specific for transcription-coupled repair; and missing one of these proteins block either global repair mechanism of DNA or TCR. Even though scientists still do not completely understand which mechanisms are involved in the regulation of repair of UV induced damage and how the UV response functions in living organisms, studies have showed that GGR is active usually during early development, while TCR activates later in development of an organism [26,27].
Domain analysis of AtPOLE2 protein revealed that it consists of DUF-repeat domains [28]. DUF-repeat proteins consist of six or more repeating units that contain a conserved core which is usually approximately 40 amino acids long, and end with tryptophan-aspartic acid (Trp-Asp). DUF-repeat proteins form a circularized beta propeller structure [29]. When tandem copies of DUF repeats fold together they form a circular solenoid domain called the DUF1744 domain. Through the interblade loops of the DUF repeat region many of the interaction with DNA are regulated. These proteins have an important role in many processes such as signal transduction, regulation of transcription, cell death and even in some human diseases [30]. Tables 4 and 5 represent identified domains of POLE2a and POLE2b, respectively.
Accurate DNA replication is one of the most important events in the life of a cell. To perform this task, the cell utilizes several DNA polymerase complexes. Previous research investigates the role of DNA polymerase epsilon during gametophyte and seed development using forward and reverse genetic approaches. In Arabidopsis, the catalytic subunit of this complex is encoded by two genes, AtPOL2a and AtPOL2b, whereas the second largest regulatory subunit AtDPB2 is present as a unique complete copy [31]. Disruption of AtPOL2a or AtDPB2 resulted in a sporophytic embryo-defective phenotype, whilst mutations in AtPOL2b produced no visible effects. Loss of AtDPB2 function resulted in a severe reduction in nuclear divisions, both in the embryo and in the endosperm. Mutations in AtPOL2a allowed several rounds of mitosis to proceed, often with aberrant planes of division. Moreover, AtDPB2 was not expressed during development of the female gametophyte, which requires three post-meiotic nuclear divisions [13,14].
Our results showed the structure of AtPOLE2A and confirmed that 93.8% of the residues are in favored region, 4.4% are in allowed region and 1.8% are in outlier region, whereas AtPOLE2B showed that 93.0% of the residues are in favored region, 5.8% are in allowed region and 1.2% are in outlier regions are predicted 3D Structure Models/Ramachandran Plot Validation. Those two proteins POLE2A and POLE2B displayed key role in DNA repair. One of the aims of this study has been to indicate how do these two homologues interact and perform biological roles in DNA repair. These proteins, POLE2A and POLE2B, have a very important role in DNA repair mechanism. In this study, we provided additional roles polymerase epsilon catalytic subunit A and B in DNA repair and, through interactome analysis of POLE2A and POLE2B proteins in Arabidopsis thaliana, we identified other most important proteins that are participating in DNA repair. The 3D structure of POLE2 proteins was identified to decipher the complexity of NER, GG-NER and TC-NER mechanisms through identification of docking sites. We found that both POLE2A and POLE2B are interacting with MSH2, MSH5, PCNA1, PCNA2, PRL and CDC45 and we confirmed the interaction complex, which has a significant role in DNA repair processes and UV-B tolerance. The interactome analysis of POLE2A and POLE2B revealed the complexity of DNA repair mechanism in plants, also, by using STRING. Finally, the crucial role of POLE2A and POLE2B in DNA repair, maintenance of genome integrity, regulation of protein degradation and thus cellular stability and cycle is undeniable.

Conclusion
In Arabidopsis thaliana, TIL1 and TIL2 locus encodes for POLE2A and POLE2B, respectively. POLE2A and POLE2B have a very important role in DNA repair mechanism. In this study, we have provided additional roles of POLE2A and POLE2B in A. thaliana through interactome analysis, and we have identified other important proteins that participate in DNA repair. The 3D structure of POLE2A and POLE2B proteins were identified to decipher the complexity of NER, GG-NER and TC-NER repair mechanisms through identification of docking sites. Also, we have confirmed that POLE2A and POLE2B are interacting with MSH2, MSH5, PCNA1, PCNA2, PLR and CD45; and that those complexes have a significant role in DNA repair processes and UV-B tolerance. Through bioinformatics analysis, we have identified and structurally predicted six homologues of POLE2A and POLE2B in A. thaliana. Since they are similar in but are not identical in structure and differentiate in group's phylogenetic analysis, further inference about the functions and localization of the homologues was required. For that purpose, localization tools were used to predict subcellular locations, and the trend of differences between the homologues continued. Furthermore, the interactome analysis have shown differing function and associations with many crucial processes responsible for cellular integrity and stability maintenance. From the presented results, it can be concluded that the cellular functions of the six homologues are not completely the same, both in terms of the actual role in a cellular mechanism and possibly in the location where the homologue performs the role. Finally, the crucial role of POLE2A and POLE2B in DNA repair, maintenance of genome integrity, regulation of protein degradation, and thus, cellular stability and cell cycle, was found to be undeniable. Heavily exposed to UV radiations, as plants are; to properly understand and analyze the DNA repair mechanism is at the forefront of genomic and proteomic research in plant science; and, in silico analysis is an excellent starting point for further developments.