In silico interactome analysis of cross-link repair proteins (RPA32b and RPA2) in Arabidopsis thaliana

This work is licensed under a Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) that allows others to share and adapt the material for any purpose (even commercially), in any medium with an acknowledgement of the work's authorship and initial publication in this journal. 19 In silico interactome analysis of cross-link repair proteins (RPA32b and RPA2) in Arabidopsis thaliana


Introduction
Every cell in our body contains Deoxyribonucleic Acid (DNA). In order for cells to divide, they need to copy their DNA, so that every daughter cell has the same copy of DNA. During the copying process (replication), an error can occur, which can make the product of that part of the copied DNA, non-functional. Also, errors can occur from different agents from outside, such as UV light. However, cell has various mechanism by which it repairs it errors [1]. It differs in terms of the structure and complexity of organism, as well as the type of damage made, and thus the they are divided into several types: BERbase excision repair, MMRmismatch repair, DSBRdouble-strand break repair and NERnucleotide excision repair [2].

RPA32B and RPA2 proteins
Replication protein A (RPA) is single stranded DNA binding protein. It is important for many DNA processing pathways like DNA replication, meiotic recombination and for some DNA repair mechanisms. RPA is found in every eukaryote [3]- [6]. RPA protein in Arabidopsis thaliana is divided into five phylogenetically distinct RPA1 subunits (RPA1A-E), two RPA2 (RPA2A and B) subunits, and two RPA3 (RPA3A and B) subunits [7]. In this paper, the focus will be on two proteins that are part of DNA replication factor A complex (RPA): RPA2 and RPA32B.

Arabidopsis thaliana
Arabidopsis thaliana is a plant from the family of mustard. It is a small plant with a small genome, which is used as a model organism for research. Its genome is organised into 5 chromosomes and contains about 20,000 genes [8].

Multiple sequence alignment
Multiple sequence alignment is used to find similarities or relationships between sequences [9]. It gives insights into sequence-structure-function relationships of nucleotide or protein sequence families. It can also give information about evolutionary and functional relationships of protein families [10]. In this paper, Clustal Omega tool was used.

Phylogenetic tree construction
The second step, which takes place after multiple sequence alignment, is phylogenetic tree construction in Clustal Omega. Phylogenetic tree construction is a representation of the evolutionary relationship between organisms and diagrammatic form which shows different species, organisms, their ancestors. Through its branching, it represents a prediction about relationships between sequences and their possible common ancestors [11]. Phylogeny Fr. was used for the Phylogenetic tree for RPA32B and RPA2 proteins [33].

3D structure prediction, visualisation and validation
3D structure of RPA32B and RPA2 proteins will be presented using PyMol Molecular Graphic System [37].

Domain identification
Tolls SMART software and PFAM database was used to identify the domains of RPA32B and RPA2 proteins. SMART is an online resource used for annotation and identification of protein domains, as well as architecture of protein domains analysis [12]. Pfam database is a database for protein classification into domain and families [13].

Protein interactome prediction
Interactome is a complete set of molecular interaction of a certain molecule in a given cell or some other biological environment [14]. For protein-protein interaction prediction, STRING database; which includes both physical and functional interactions; is used [15].

Metabolic pathway mapping
KEGG pathway map is a graphical diagram drawn manually, used to show signalling, metabolic and other molecular reaction and interaction networks. It also shows information such as how genes or their products interacts in pathways [16]. In this paper, KEGG tool was used for determining the metabolic pathways of RPA32B and RPA2 proteins.

Docking site prediction
For docking site prediction, tools SWISS-MODEL and ClusPro are used [24]- [30]. Docking site prediction shows us 3D visualization of protein-protein interaction between RPA32B/RPA2 and 10 proteins found by STRING.

Multiple sequence alignment
Cluster Omega tool was used to get multiple sequence alignment results. The results are shown in   In figure 1and 2 we see the results represented in different symbols, their meaning is as following: ".: " Shows the level of conservation at a particular position. "*" Means that there is exactly the same amino acid at that position. One protein has different amino acid, all the other have the same amino acid. "." Represents lower degree of similarity ":" One protein has different amino acid, all the other have the same amino acid ".:" Represents higher degree of similarity

Phylogenetic tree
Phylogeny Fr. was used for the Phylogenetic tree for RPA32B and RPA2 proteins. The results are shown in

Domain identification
Domain identification and analysis was done using SMART software and PFAM database.

Interactome prediction
Using STRING software, interactome prediction for proteins RPA32B and RPA2 was analyzed.  Function as a DNA helicase and which is essential to undergo a single round of replication initiation and elongation per cell cycle in eukaryotic cells. Plays a crucial role in the control of de-differentiation and cell proliferation processes required for lateral root formation. Is essential for embryo development.

Metabolic pathway mapping
To find metabolic pathway mapping, through the KEGG tool, showed that of RPA32B and RPA2 proteins are involved in the DNA replication, nucleotide excision repair and mismatch repair, respectively.

Docking site prediction
In figure 7 we present the predicted docking models of all interactome partners, visualized in PyMol software.

Discussion
The RPA32B and RPA2 proteins were analyzed via multiple sequence alignment, phylogenetic tree, 3Dstructure and visualization, subcellular localization, domain identification, interactome prediction and metabolic pathway mapping. Using the TAIR database, with its integrated tools, we searched for homologs needed for multiple sequence alignment and comparison. Homolog organisms are derived from the common ancestor and they have some similarities [17]. Multiple sequence alignment is a process used to analyze related proteins, for example homolog from common ancestor, in order to find relationships between proteins or similar function or structure [18]. Using Cluster Omega tool, we compared the following homologous organisms: Capsela rubella, Camelina sativa, Brassica carinata and Euterma salsugineum. In Fig. 1 we see the multiple sequence alignment results of RPA32B and RPA2 proteins from the above, mentioned organisms, annotating the conserved region. We obtained in this analysis the level of conservation at a particular position, looking at the same or different amino acids withing the sequence, concluding that in all organisms they have the same amino acid sequence at some point, representing higher and lower degree of similarity.
The phylogenetic tree is seen in Fig.2, showing connection between organisms based on evolutionary connection. It can be observed that RPA23B in Arabidopsis thaliana is closely related with RPA32B in Camelina sativa and in Eutrema salsugineum. From Fig. 3, the 3D models of RPAA and RPA32B were shown, including the alpha helix and beta sheet locations. Ramachandran plot analysis shows us prediction of amino acids types. Every black dot on Ramachandran plot represents one amino acid. By the location of these dots in psi and phi coordinates, we can find out which secondary structure each of our amino acid has. Based on our Ramachandran plot analysis, we found out that RPA2 proteins' amino acids have right-handed alpha helix, anti-parallel beta sheets and parallel beta sheet secondary structure. Few amino acids have collagen triple helix and only two of them have left-handed alpha helix. RPA32B proteins' amino acids mostly have antiparallel b sheets and parallel beta sheet secondary structure. In addition, we observe few right-handed alpha helix amino acids.
Domain identification and analysis was done using SMART software and PFAM database. Protein domains are the basic functional units of protein [19]. From Fig. 4, we see that both protein's polypeptides have 2 domains. Using PFAM database amino acid residues lengths as well as start and endpoints were found. In the table 1 we can see that RPA32B domain RPA_C has start point at 161 residue and end point at 270 residue. tRNA anti-codon binding domain has start point at 70 residue and shows the end point at 144 residue. RPA32 protein domain RPA_C has start point at 163 reside and end point at 272 residues. TRNA anti-codon binding domain has start point at 71 residue and end point at 148 residue [13].
Interaction network is gene/protein graphical representation which contain nodes and edges. Each edge represents inetaction between gene/protein, where a node is gene/protein [20]. Using STRING software, interactome prediction for proteins RPA32B and RPA2 was analysed. From Fig. 6 it can be seen that 10 proteins interact with RPA32B and RPA2 proteins. In the obtained interactome, 11 nodes and 55 edges are found.Proteins MCM5, MCM4, MCM3, MCM2, PRL, comes from MCM2/3/5 Minichromosome maintenance, family protein. Proteins that function as DNA helicase and which are needed for replication initiation are: MCM3, MCM4 and MCM2. PRL and MCM5 proteins are also involved in replication initiation. RPA32B and RPA2 show strong correlation to Proliferating cell nuclear antigen protein (PCNA1), known to be involved in the control of eukaryotic DNA replication by increasing the polymerase's processibility during elongation of the leading strand [44]. In this regard, these results confirm the strong correlation of the PCNA protein family with MCM family in DNA replication and repair. Further, the interaction of RPA70D and RPA70B proteins with EMB2813 is expected, since EMB2813 has a role in synthesizing RNA primers for Okazaki fragments and known to be connected to DNA primase function, so directly connected to the DNA replication mechanism strand [45]. DNA primase POLA3 has function as DNA primase activity. For our prediction, high confidence was used which means that visualization is better and more detailed. For additional confirmation of RPA32B and RPA2 proteins function, the metabolic pathway mapping in KEGG program was used. Fig. 7, 8 and 9 show involvements of RPA32B and RPA2 proteins in DNA replication, nucleotide excision repair and mismatch repair, is shown, respectively.Nucleotide Excision Repair Mechanism (NER), is used to remove lesions which distort helix in more than one sequences of the DNA at the same time. Such lesions can be caused by the excessive exposure to the sun or the ultraviolet light, carcinogenic chemicals and similar and they are usually bulky in shape [46]. RPA interacts with Xeroderma Pigmentosum group A (XPA) proteins, a response factor included in NER. If there is no enough XPA it can result in high sensitivity of cells in response to UV light. XPA needs protein complex, in this case RPA, because it does not have enzyme activity so to function properly it needs to bind to RPA. RPA32B and RPA2 bind to the central region of XPA where XPA binding is the main step in NER, after which process can continue. Further, RPA32B and RPA2 are included in the process of stimulating EXO-catalyzed excision, protecting the ssDNA gap generated during excision, facilitating the termination of MMR excision and repairing/synthesizing DNA [21]. Docking site prediction shows us 3D visualization of protein-protein interaction between RPA32B/RPA2 and 10 proteins found by STRING. Those are: MCM3, PRL, MCM2, MCM5, MCM4, PCNA1, RPA70D, RPA70B, EMB2813 and POLA3, its the firsy docking study predicted for RPA32B/RPA2 in correlation to the interactome proteins, as seen in figure 6. In addition, this study represents the first 3D modeiling of RPA32B/RPA2 proteins in interaction with MCM3, PRL, MCM2, MCM5, MCM4, PCNA1, RPA70D, RPA70B, EMB2813 and POLA3, showing potention docking site and regions.

Conclusion
In this paper, in silico analysis of two proteins involved in DNA replication and repair mechanisms in Arabidopsis thaliana was done. Using different bioinformatics tools, RPA32B and RPA2 proteins were analyzed through the following steps: multiple sequence alignment, phylogenetic tree, 3D structure and visualization, subcellular localization, domain identification, interactome prediction and metabolic pathway mapping.
In regards to their structure, the RPA2 proteins' amino acids have right-handed alpha helix, antiparallel b sheets and parallel b sheet secondary structure. Few amino acids have collagen triple helix and only two of them have left-handed alpha helix. The RPA32B proteins' amino acids mostly have antiparallel b sheets and parallel b sheet secondary structure. There are also few right-handed alpha helix amino acids. Both of our protein's polypeptides have 2 domains. Domains are structural and functional units of proteins that are responsible for specific function. RPA2 and RPA32B proteins remove lesions which distort helix in more than one sequences of the DNA at the same time. This is the first study the predicted the interactome partners of RPA32B and RPA2 proteins, explained the docking options and assessed the 3D predicted models of all proteins. In general, we confirmed that RPA32B and RPA2 proteins proteins have important role in many processes and mechanism in Arabidopsis thaliana, especially in DNA replication, where they keep single stranded DNA unwinded and DNA mismatch/nucleotide excision repair processes.