CXASAP for photochemistry experiments

Background

If you have used MX to collect a series of datasets of your crystal under irradiation, you can use CXASAP to refine and analyse your structures. This saves you from refining each one separately and then extracting the structural information for your analysis.

Setup

For each data series you need a reference structure. This is usually the first dataset you collected of the series. Although if it undergoes a phase change, you will want a reference structure of the changed structure as well.

For the reference structure, you will need to:

Refine it to convergence
Label the atoms, especially those you are interested in throughout the series
DO NOT use the solvent mask. This will confuse CXASAP because it will look for a fab file. If your structure is porous, just refine it as best you can without SQUEEZE or a solvent mask. If you need to refine it with a solvent mask for publication, you will need to do it manually after you do the parallel refinement.
Place the ins, res, hkl and cif files in a folder together.

Step 1. Download your data

You need all the datasets from the series you are analysing in one folder. Download them from the synchrotron server: instructions here

It is a good idea to have your files for your reference structure also in this folder

Step 2. Copy cxasap files

Because of how the files are laid out in the dataset folders, you need to move the cxasap.ins and cxasap.hkl files from the CX-ASAP_Brute folder within each dataset folder, back into the dataset folder.

For this, download this python script: move_ins_hkl.py

Save it in the folder with all your datasets. Open Powershell in this folder and run the script

python move_ins_hkl.py

After this, you should have a cxasap.ins and cxasap.hkl in each dataset folder.

Step 3. Setup pipeline-refinement

Open powershell in the CX-ASAP-main folder where you installed it in your C drive.

Activate the virtual environment

.\cxasap_venv\Scripts\activate

In general, for each CXASAP pipeline or module there are a few different options you can run in powershell.

If you run the --help command, you will find out about the pipeline and the other commands you can run for it.

For example, running cxasap pipeline-refinement --help, will show you this:

This gives a summary of what the pipeline does, and the options for the pipeline, which are --dependencies, --files, configure, --run and --help.

To check what you need, run this:

cxasap pipeline-refinement --files

This will tell you the files you require. You should already have all the files you need, where you need them, based on the previous steps.

To configure the code, run this:

cxasap pipeline-refinement --configure

This will make a configuration file, called conf.yaml, in the cx_asap folder, which is in your CX-ASAP-main folder.

It will also print out in the terminal a description of what needs to go to each field of the configuration file.

Open the conf.yaml file and fill it in.

The main fields you need to fill in are the experiment_location and the reference_path.

Here is an example:

Be aware that if/when you run the --configure command again for any pipeline, it will overwrite this file. So you might want to save a copy in your dataset folder if you think you will want to use it again, like if you are going to use a different reference file.

Step 4. Run pipeline-refinement

Now you should be able to run pipeline-refinement easily

cxasap pipeline-refinement --run

You should see the output of shelxl running in powershell.

Wait patiently for it to finish.

Step 5. Check the output.

You can look at the error_output.txt or refinement_summary.txt files in the folder with all your datasets. This will tell you if there are structures that did not converge, or if other errors occurred.

If there are multiple structures that have not converged, try increasing the maximum cycles and the refinements in the conf.yaml file, and running pipeline-refinement again. You can also open the cxasap.res files for the structures that did not refine to convergence and see if they are looking sensible or have solved/refined to nonsense. If that is the case, you will need to refine them manually, sorry But after you do this and have all your cifs ready, providing they stay in the same space group as the reference structure, you can skip to step 6 to analyse your cifs.

Step 6. Run pipeline-cif

This will add the synchrotron specific fields to the cif file for each dataset.

First configure pipeline-cif by running cxasap pipeline-cif --configure, then edit the conf.yaml file. It should look something like this:

autoprocess.cif is the name of the cif that is written for each synchrotron dataset, with the relevant synchrotron information in it.

Step 7. Collect cifs into a single folder

In order to run the pipeline that will compare the structures based on their cifs, you need them to all be in the same folder, and have unique names. You can do this manually, or you can run this handy piece of code:

import shutil
import os
import click
import sys
from pathlib import Path, PureWindowsPath, PurePath


@click.command()
@click.option("--path", default=".")
@click.option("--interval", default="none")
def cif_copy(path, interval):
    dirs_list = []
    p = PureWindowsPath(path)
    p.as_posix()
    path = path + "\\"
    #make a directory called cifs if it doesn't already exist
    cifs_dir = "cifs"
    cifs_path = os.path.join(path, cifs_dir)
    cifs_test = os.path.isdir(cifs_path)
    if cifs_test == True:
        pass
    elif cifs_test ==False:
        os.mkdir(cifs_path)


    for folder in os.listdir(path):
        if folder.endswith('process'):
            dirs_list.append(folder)
        

    #make the windows dataset paths play nicely with python
    no_datasets = int(len(dirs_list))
    path_list = list(range(0, no_datasets))
    for n in list(range(0, no_datasets)):
        path_list[n] = p.as_posix()
    
    #combine the dataset names with their path into a list
    combined_list = []
    for i in list(range(0, no_datasets)):
        combined_list.append(path_list[i] + "/" + dirs_list[i])

    #making list of time intervals
    interval_list = list(range(0, no_datasets*int(interval), int(interval)))
    interval_str_list = map(str, interval_list)
    interval_str_list = list(interval_str_list)
    string = "min.cif"
    cif_list = ["{}{}".format(i,string) for i in interval_str_list]
    
    #making dictionary of folder paths and time intervals
    cif_dict = {combined_list[i]: cif_list[i] for i in range (no_datasets)}
  
    #go through each folder and check for a cif, then move it into the cifs folder 
    for key, value in cif_dict.items():
        dest_path = path + "cifs/" + value
        source_path = key + "/cxasap.cif"
        source_path = Path(source_path)
        test = os.path.isfile(source_path)
        if test == True:
            shutil.copy(source_path, dest_path)
        elif test == False:
            pass
        
if __name__ == '__main__':
    cif_copy()

Copy this code and save it as a .py file. It can be saved anywhere.

If you have refined your structures manually and they are in different places and named different things, it is best to do this step manually.

cif_move.py will copy each cxasap.cif from each dataset folder, put into into a folder called "cifs" and give it a name based on the irradiation time for the experiment. There are two options you can give it.

--path : This is the path to your folder with all your datasets. If cif_move is already in this folder, you don't need to worry about this option. Otherwise, include the full path, and it is a good idea to put it in quotes if there is a space anywhere in the path.

--interval : This is the time interval for your photochem experiment. eg if you enter 5, the cifs will be named 0min, 5min, 10min etc. If there is a dataset that has not converged, and so there is no cif, it will skip over this time period.

For example, you should run something like this for a collection with 5 minute intervals

To see if it has been successful, check the "cifs" folder which should be alongside all your other dataset folders.

If for some reason you don't want to include the cifs from some of your datasets, it is a good idea to move the folders to another folder, or you can change the name of the dataset folder so that they no longer end in "process", as the code will try to extract a cif from every folder that ends in "process". But keep in mind it will then ignore this dataset for the time interval naming of the cifs.

Step 8. Add the irradiation time to your cifs.

Now you have all your cifs together in one place, you need to add information about the irradiation time for the experiment to each cif.

Again, you can do this manually, or use this handy piece of code:

import os

cifs = []
time_list = []
for files in os.listdir():
    if files.endswith(".cif"):
        cifs.append(files)
    if files.endswith(".cif"):
        time_list.append(files.split('m')[0])    
time_int = [eval(i) for i in time_list]
time_int.sort()
cifs.sort(key=len)

string = "\n_irradiation_time    "
cif_entries = ["{}{}".format(string, i) for i in time_int]

no_datasets = int(len(cifs))
cif_dict = {cifs[i]: cif_entries[i] for i in range (no_datasets)}

for key, value in cif_dict.items():
    with open(key, "a+") as f:
        f.write(value)

Copy the code above and save it as a .py file in your "cifs" folder.

Then you can simply open powershell in your "cifs" folder and run the code

python cif_add.py

If you don't want to have a copy of cif_move.py in your cifs folder, you can just run the code from your cifs folder, by providing the path to cif_move.py when you run it in the terminal.

eg if it is in your C drive:

python C:\cif_add.py

After you run cif_add.py, you should have a line at the very bottom of each of your cifs that looks something like this:

_irradiation_time 0

Step 9. Run pipeline-variable-analysis

To extract all the structural information your heart may desire from the cifs you have just made, you want to run pipeline-variable-analysis through cxasap

Go back to your powershell in the CX-ASAP-main folder, where you have the virtual environment activated.

Configure pipeline-variable-analysis

cxasap pipeline-variable-analysis --configure

Open the new conf.yaml file in the cx_asap folder, and edit it with your experiment details.

You can change the ADP_analysis, structural_analysis_angles, structural_analysis_bonds and structural_analysis_torsions from false to true if you want to extract the information on these parameters. Generally you may as well make them all true. Unless you have isotropic atoms in your reference structure, in which case leave ADP_analysis as false.

For example the conf.yaml will look something like this:

Make sure you use the correct text for varying_cif_parameter. It needs to match the fields you put in your cif in the previous step.

Now you can run the analysis!

cxasap pipeline-variable-analysis --run

After it has run you should have png files of graphs of bond lengths and angles etc. And the raw data that has been extracted will be in .csv files. These can be opened in excel if you want to select specific data to make graphs.

Step 10. Enjoy your data analysis and write lots of papers!

And don't forget to reference CXASAP https://doi.org/10.1107/S1600576723000298

Australian Synchrotron User Office