Locating position for certain size of complexes by PDB/restart file ------------------------------------------------------------------- By PDB file ~~~~~~~~~~~ locate_pos_no_restart(FileNamePdb, NumDict, FileNameInp, BufferRatio=0.01, OpName="output_file") Description: This function allows users to locate specific complexes of a certain size from a PDB file after simulation. The result is output as a new file named `output_file.pdb` containing only the desired complex. Parameters: - **FileNamePdb** (*str*): The path to the PDB file, typically the last frame of the simulation. - **NumDict** (*dict*): A dictionary that holds the requested number of protein types in a complex. - **FileNameInp** (*str*): The path to the `.inp` file, which usually stores the reaction information. - **BufferRatio** (*float*, optional, default=0.01): The buffer ratio used to determine whether two reaction interfaces can be considered bonded. - **OpName** (*str*, optional, default="output_file"): The name of the output file. Returns: - **.pdb file**: A PDB file containing only the selected complexes. Example: .. code-block:: python import ionerdss as ion ion.locate_pos_no_restart(FileNamePdb = "nerdss_output.pdb", NumDict={"dod":9}, FileNameInp="parm.inp", OpName = “output”) >>> Output_file.pdb that includes only proteins in complexes of the selected size ... ATOM 19 COM dod 3 301.720 116.470 306.361 0 0CL ATOM 20 lg1 dod 3 315.636 126.000 315.231 0 0CL ATOM 21 lg2 dod 3 312.386 125.024 293.086 0 0CL .... By restart.dat file ~~~~~~~~~~~~~~~~~~~ locate_pos_restart(FileNamePdb, NumDict, FileNameRestart, OpName="output_file") Description: This function enables users to locate specific complexes of a certain size from a PDB file along with a `restart.dat` file after simulation. The result is output as a new file named `output_file.pdb` containing only the desired complex. Important Note: The advantage of reading the `restart.dat` file is that it directly stores the binding information of each complex in the system, allowing the function to run faster. However, this function is not universal; if the write logic of the `restart.dat` file changes, the function will no longer work. Parameters: - **FileNamePdb** (*str*): The path to the PDB file, typically the last frame of the simulation. - **NumDict** (*dict*): A dictionary that holds the requested number of protein types in a complex. - **FileNameRestart** (*str*): The path to the `restart.dat` file. - **OpName** (*str*, optional, default="output_file"): The name of the output file. Returns: - **.pdb file**: A PDB file containing only the selected complexes. Example: .. code-block:: python import ionerdss as ion ion.locate_pos_restart(FileNamePdb="nerdss_output.pdb", NumDict={"dod": 9}, FileNameRestart="restart.dat", OpName="output") >>> Output_file.pdb that includes only proteins in complexes of the selected size ... ATOM 19 COM dod 3 301.720 116.470 306.361 0 0CL ATOM 20 lg1 dod 3 315.636 126.000 315.231 0 0CL ATOM 21 lg2 dod 3 312.386 125.024 293.086 0 0CL ATOM 22 lg3 dod 3 294.395 112.226 289.287 0 0CL .... Analyzing .xyz files ~~~~~~~~~~~~~~~~~~~~ .xyz files hold the location of every protein at specific times. (Does not necessarily include every timestamp, more to compare a couple of timestamps). CSV – creates spreadsheet of protein locations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ xyz_to_csv(FileName, LitNum, OpName="output_file") Description: This function converts the output `.xyz` file from a NERDSS simulation into a `.csv` file for a specific or entire time frame. The generated `.csv` file will contain 5 columns: iteration number, species name, and x, y, z coordinates. Parameters: - **FileName** (*str*): The path to the `.xyz` file, typically named `trajectory.xyz`. - **LitNum** (*int*, optional, default=-1): The iteration number to examine. If set to -1, the function will extract data for all iterations. - **OpName** (*str*, optional, default="output_file"): The name of the output file. Returns: - **.csv file**: A CSV file containing the specified trajectory data. Example: .. code-block:: python import ionerdss as ion ion.xyz_to_csv(FileName="trajectory.xyz", LitNum=-1) >>> output_file.csv containing the trajectory data DATAFRAME – creates dataframe of protein locations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ xyz_to_df(FileName, LitNum=-1, SaveCsv=True): Description: This function converts the output `.xyz` file from a NERDSS simulation into a `pandas.DataFrame` for a specific or entire time frame. The generated DataFrame will contain 5 columns: iteration number, species name, and x, y, z coordinates. Optionally, the DataFrame can be saved as a `.csv` file. Parameters: - **FileName** (*str*): The path to the `.xyz` file, typically named `trajectory.xyz`. - **LitNum** (*int*, optional, default=-1): The iteration number to examine. If set to -1, the function will extract data for all iterations. - **SaveCsv** (*bool*, optional, default=True): Whether to save the DataFrame as a `.csv` file. Returns: - **pandas.DataFrame**: A DataFrame containing the specified trajectory data. Example: .. code-block:: python import ionerdss as ion traj_df = ion.xyz_to_df(FileName="trajectory.xyz", LitNum=-1, SaveCsv=False) >>> traj_df iteration name x y z 0 0 ap 87.420620 -270.109172 -203.661987 1 0 ap 88.081526 -271.052470 -205.297038 2 0 ap 86.759715 -269.165874 -202.026936 3 0 ap -58.647113 277.528515 -353.236112 ... MATRIX - tracks the trajectory of specific protein(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ traj_track(FileName, SiteNum, MolIndex, SaveVars=False) Description: This function enables users to track the center of mass (COM) coordinates of one or more molecules over time. The return will be a 2D matrix with the size of the number of iterations times the number of desired molecules. Parameters: - **FileName** (*str*): The path to the `.xyz` file, typically named `trajectory.xyz`. - **SiteNum** (*int*): The total number of COM and interfaces of a single molecule. For example, if a molecule possesses 1 COM and 5 interfaces, the `SiteNum` value should be 6. - **MolIndex** (*list of int*): The indices of the molecules to track. The numbers in the list should be no smaller than 1. - **SaveVars** (*bool*, optional, default=False): Whether to save the outputs in a file. Returns: - **Trajectory** (*list*): A list containing the COM coordinates of the specified molecules at different time stamps. Example: .. code-block:: python import ionerdss as ion trajectory = ion.traj_track(FileName="trajectory.xyz", SiteNum=3, MolIndex=[1, 4, 10]) >>> trajectory [ [[87.42062, -270.109172, -203.661987], [40.873538, 168.96348, -497.993163]], [[74.407358, 51.461467, -242.958456], [187.824563, 325.913499, -497.993163]], [[20.608487, 330.919045, -182.061499], [-27.367719, 330.945162, -497.993163]] ] Analyzing .pdb files ~~~~~~~~~~~~~~~~~~~~ LINE PLOT- Auto correlation function (acf) for complexes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ acf_coord(PDBDirectory, mol_list, sim_num=1, time_step=1, show_fig=True, save_fig=False) Description: Calculates the mean auto-correlation function (ACF) of protein complexes stored in a series of NERDSS-generated PDB files. If PDB files from multiple simulations are to be evaluated, they should be organized in the following directory structure before running the function: .. figure:: ./fig/ionerdss_analyze_acf_folder.png :alt: Directory structure for ACF analysis :align: center :width: 80% Directory structure for ACF analysis. Important Note: Given a series of `.pdb` files generated during NERDSS simulation, the function calculates the auto-correlation function (ACF) of the system. The ACF describes the correlation of a signal with a delayed copy of itself as a function of delay. In this context, it is the correlation of a complex’s position with its initial position as a function of time, calculated as the inner product between the initial position vector of a complex and its current position vector divided by the squared magnitude of its initial position vector: .. math:: \text{acf}(t) = \frac{\mathbf{r}(0) \cdot \mathbf{r}(t)}{\mathbf{r}(0) \cdot \mathbf{r}(0)} Parameters: - **PDBDirectory** (*str*): The name of the directory where all PDB files are stored. - **mol_list** (*list of str*): The names of the molecules to be evaluated. - **sim_num** (*int*, optional, default=1): The number of repeated simulations to be evaluated. - **time_step** (*int*, optional, default=1): The time steps of the NERDSS simulation in microseconds. - **show_fig** (*bool*, optional, default=True): Whether to display the generated plots. - **save_fig** (*bool*, optional, default=False): Whether to save the generated plots. Returns: - **average_time_array** (*array*): An array of iterations when the mean ACF over all repeated simulations is calculated. - **average_acf_array** (*array*): An array of the mean ACF calculated over all repeated simulations. - **std_acf_array** (*array*): An array of the standard deviation of the ACF calculated over all repeated simulations. Example: .. code-block:: python import ionerdss as ion ion.acf_coord(PDBDirectory="testing_PDBs", mol_list=["A"], sim_num=2, time_step=0.1, show_fig=True, save_fig=False) .. figure:: ./fig/ionerdss_analyze_acf_sep.png :alt: Separate ACF plots for each simulation :align: center :width: 80% Separate ACF plots for each simulation. .. figure:: ./fig/ionerdss_analyze_acf_mean.png :alt: Mean ACF plot over all simulations :align: center :width: 80% Mean ACF plot over all simulations.