Running SCATTR on your data

This section goes over the command line options you will find most useful when running SCATTR on your dataset, along with describing some issues you may face.

Note: Please first refer to the simple example in the Installation section, which goes over running SCATTR on a test dataset and the essential required options.

Freesurfer License

To perform Freesurfer-related processing (e.g. thalamus segmentation), Freesurfer is directly invoked. As such, a Freesurfer license is required to perform such steps in the workflow. By default, SCATTR attempts to use the Freesurfer license saved in the environment variable FS_LICENSE. Alternatively, the path to the Freesurfer license may be passed along by invoking the --fs-license parameter:

--fs-license /path/to/fs_license

Including / excluding subjects to process

By default, SCATTR will run on all subjects in the dataset. If you wish to run on only a subset of subjects, you can use the --participant-label flag:

--participant-label 001

which would only run on sub-001. You can add additional subjects by passing a space-separated list to this option:

--participant-label 001 002

which would run for sub-001 and sub-002.

Similarly, subjects can be excluded from processing using the --exclude-participant-label flag.

Alternate Freesurfer / derived-diffusion data locations

By default, SCATTR attempts to locate Freesurfer and derived diffusion data locations. Users can overwrite these options, by passing along the actual location of each using either --freesurfer_dir or --dwi_dir, respectively.

--freesurfer_dir /path/to/fs_dir --dwi_dir /path/to/dwi_dir

Pre-generated average response function

In some cases, an average response function may have already been separately generated, which is then used for downstream processing (e.g. generated from controls and applied to a patient population). To use a pre-generated average resposne function, the location of the directory containing the associated files can be passed along using --responsemean_dir:

--responsemean_dir /path/to/average_response_dir

Tractography on network storage

Performing tractography can require millions of reads and writes to the storage system in order to update the file on-the-fly. This process can be extremely slow on network storages. To help with this, SCATTR always reads and writes the tractography to a temporary location (e.g. /tmp) before copying the output to the final output, significantly improving the time it takes for tractography to be generated. On a network system, you may be unable to write to /tmp. An alternative on systems with SLURM workload managers is to invoke --slurm_tmpdir, which requests that the workflow write to the local temporary storage system (e.g. /localscratch) instead of the network temporary storage.

BIDS Parsing limitations

SCATTR uses Snakebids, which makes use of pybids to parse a BIDS-compliant dataset. However, because of the way Snakebids and Snakemake operate, one limitation is that the input files in your BIDS dataset needs to be consistent in terms of what optional BIDS entities exist in them. We can use the acqusition (acq) entity as an example. SCATTR should have no problem parsing the following dataset:

PATH_TO_BIDS_DIR/
└── dataset_description.json
└── sub-001/
    └── anat/
        └── sub-001_acq-mprage_T1w.nii.gz
└── sub-002/
    └── anat/
        ├── sub-002_acq-spgr_T1w.nii.gz
...

as the path (with wildcards) will be interpreted as sub-{subject}_acq-{acq}_T1w.nii.gz.

However, the following dataset will raise an error:

PATH_TO_BIDS_DIR/
└── dataset_description.json
└── sub-001/
    └── anat/
        ├── sub-001_acq-mprage_T1w.nii.gz
└── sub-002/
    └── anat/
        ├── sub-002_T1w.nii.gz
...

because two distinct paths (with wildcards) would be found for T1w images: sub-{subject}_acq-{acq}_T1w.nii.gz and sub-{subject}_T1w.nii.gz.

Similarly, you could not have some subjects with the ses identifier, and some subjects without it. There will soon be added functionality in Snakebids to filter out extra files, but for now, if your dataset has these issues, you will need to rename or remove extraneous files.

More example of possible BIDS-compliant datasets can be found in scattr/test/data.