Skip to main content

Run AlphaFold2 on Instance

Prerequisites

  • Instance(GPU recommended). Full AlphaFold databases require ~2+ TB disk; reduced DBs use much less.
  • Conda installed.
  • Read/write access to chosen data and output directories (for example /mnt/model/vdb).

1. System packages (Ubuntu)

Update and install basic build tools and utilities:

sudo apt update -y && sudo apt upgrade -y
sudo apt install -y build-essential cmake git hmmer kalign tzdata wget libstdc++6

2. Create and activate Conda environment

Create a Python 3.8 environment and activate it:

conda create -n alphafold python=3.8 -y
conda activate alphafold

3. Install Python dependencies

Install key packages; follow AlphaFold README for full dependency list:

pip install jaxlib==0.4.26
conda install -c conda-forge pdbfixer libstdcxx-ng -y

(Install any additional pip packages listed in the AlphaFold repo README.)

4. Helper file required by AlphaFold

AlphaFold expects a helper text file under alphafold/alphafold/common/:

mkdir -p alphafold/alphafold/common/
wget -q -P alphafold/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt

5. Clone AlphaFold repo

git clone https://github.com/google-deepmind/alphafold.git

6. cd alphafold

Follow the repository README for any extra setup steps or environment pins.

7. Prepare databases

Place required databases on your instance and note the root data directory (example: /mnt/model/vdb/database). Essential subpaths include pdb70, uniref90, mgnify, bfd (or small_bfd for reduced), uniref30, pdb_mmcif. If disk or time is constrained, use the reduced DB preset or precomputed MSAs.

8. Run AlphaFold (full DBs)

Update paths (FASTA, output, data) and run the full DB preset as a single-line command:

python3 run_alphafold.py --use_gpu_relax=True \
--fasta_paths=/mnt/model/vdb/input/T1049.fasta \
--output_dir=/mnt/model/vdb/output \
--data_dir=/mnt/model/vdb/database \
--pdb70_database_path=/mnt/model/vdb/database/pdb70/pdb70 \
--uniref90_database_path=/mnt/model/vdb/database/uniref90/uniref90.fasta \
--mgnify_database_path=/mnt/model/vdb/database/mgnify/mgy_clusters_2022_05.fa \
--bfd_database_path=/mnt/model/vdb/database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniref30_database_path=/mnt/model/vdb/database/uniref30/UniRef30_2021_03 \
--template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \
--max_template_date=2025-03-04 \
--db_preset=full_dbs \
--model_preset=monomer \
--use_precomputed_msas=False

9. Run AlphaFold (reduced DB option)

If disk/runtime is constrained, use reduced DBs and the small BFD; single-line command:

python3 run_alphafold.py --use_gpu_relax=True \
--fasta_paths=/mnt/model/vdb/input/T1049.fasta \
--output_dir=/mnt/model/vdb/output \
--data_dir=/mnt/model/vdb/database \
--template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \
--max_template_date=2025-03-04 \
--db_preset=reduced_dbs \
--model_preset=monomer \
--use_precomputed_msas=False \
--small_bfd_database_path=/mnt/model/vdb/database/small_bfd/bfd-first_non_consensus_sequences.fasta

10. Parameter highlights

  • --fasta_paths: input FASTA (one or more).
  • --data_dir: root directory containing databases.
  • --output_dir: where predictions and metrics are written.
  • --max_template_date: restrict templates by date.
  • --db_preset: full_dbs or reduced_dbs.
  • --model_preset: monomer (or multimer if configured and set up).
  • --use_precomputed_msas: set True if you have precomputed MSAs.

11. Viewing predictions

After completion, PDB files and metrics are in the output directory. Visualize PDBs with NCBI iCn3D:

Open https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html → File → Open Files → select PDB(s).

12. Common issues & troubleshooting

  • CUDA OOM: switch to monomer, reduce DB size, use a instance with more GPU memory, or enable swap if appropriate for your environment.
  • Missing files / path errors: verify FASTA and all DB paths are correct and accessible.
  • Permission errors: ensure the runtime user can read/write the data and output directories (use chown/chmod as needed).
  • Long runtime / disk constraints: use reduced_dbs or precomputed MSAs to speed runs.

13. Example outputs

AlphaFold writes predictions and auxiliary outputs (ranked models, metrics) into the output directory.

Example header text:

prediction_1 | score=... | model_preset=monomer