Run AlphaFold2 on Instance
Prerequisites
- Instance(GPU recommended). Full AlphaFold databases require ~2+ TB disk; reduced DBs use much less.
- Conda installed.
- Read/write access to chosen data and output directories (for example /mnt/model/vdb).
1. System packages (Ubuntu)
Update and install basic build tools and utilities:
sudo apt update -y && sudo apt upgrade -y
sudo apt install -y build-essential cmake git hmmer kalign tzdata wget libstdc++6
2. Create and activate Conda environment
Create a Python 3.8 environment and activate it:
conda create -n alphafold python=3.8 -y
conda activate alphafold
3. Install Python dependencies
Install key packages; follow AlphaFold README for full dependency list:
pip install jaxlib==0.4.26
conda install -c conda-forge pdbfixer libstdcxx-ng -y
(Install any additional pip packages listed in the AlphaFold repo README.)
4. Helper file required by AlphaFold
AlphaFold expects a helper text file under alphafold/alphafold/common/:
mkdir -p alphafold/alphafold/common/
wget -q -P alphafold/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
5. Clone AlphaFold repo
git clone https://github.com/google-deepmind/alphafold.git
6. cd alphafold
Follow the repository README for any extra setup steps or environment pins.
7. Prepare databases
Place required databases on your instance and note the root data directory (example: /mnt/model/vdb/database). Essential subpaths include pdb70, uniref90, mgnify, bfd (or small_bfd for reduced), uniref30, pdb_mmcif. If disk or time is constrained, use the reduced DB preset or precomputed MSAs.
8. Run AlphaFold (full DBs)
Update paths (FASTA, output, data) and run the full DB preset as a single-line command:
python3 run_alphafold.py --use_gpu_relax=True \
--fasta_paths=/mnt/model/vdb/input/T1049.fasta \
--output_dir=/mnt/model/vdb/output \
--data_dir=/mnt/model/vdb/database \
--pdb70_database_path=/mnt/model/vdb/database/pdb70/pdb70 \
--uniref90_database_path=/mnt/model/vdb/database/uniref90/uniref90.fasta \
--mgnify_database_path=/mnt/model/vdb/database/mgnify/mgy_clusters_2022_05.fa \
--bfd_database_path=/mnt/model/vdb/database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniref30_database_path=/mnt/model/vdb/database/uniref30/UniRef30_2021_03 \
--template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \
--max_template_date=2025-03-04 \
--db_preset=full_dbs \
--model_preset=monomer \
--use_precomputed_msas=False
9. Run AlphaFold (reduced DB option)
If disk/runtime is constrained, use reduced DBs and the small BFD; single-line command:
python3 run_alphafold.py --use_gpu_relax=True \
--fasta_paths=/mnt/model/vdb/input/T1049.fasta \
--output_dir=/mnt/model/vdb/output \
--data_dir=/mnt/model/vdb/database \
--template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \
--max_template_date=2025-03-04 \
--db_preset=reduced_dbs \
--model_preset=monomer \
--use_precomputed_msas=False \
--small_bfd_database_path=/mnt/model/vdb/database/small_bfd/bfd-first_non_consensus_sequences.fasta
10. Parameter highlights
--fasta_paths: input FASTA (one or more).--data_dir: root directory containing databases.--output_dir: where predictions and metrics are written.--max_template_date: restrict templates by date.--db_preset: full_dbs or reduced_dbs.--model_preset: monomer (or multimer if configured and set up).--use_precomputed_msas: set True if you have precomputed MSAs.
11. Viewing predictions
After completion, PDB files and metrics are in the output directory. Visualize PDBs with NCBI iCn3D:
Open https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html → File → Open Files → select PDB(s).
12. Common issues & troubleshooting
- CUDA OOM: switch to monomer, reduce DB size, use a instance with more GPU memory, or enable swap if appropriate for your environment.
- Missing files / path errors: verify FASTA and all DB paths are correct and accessible.
- Permission errors: ensure the runtime user can read/write the data and output directories (use chown/chmod as needed).
- Long runtime / disk constraints: use reduced_dbs or precomputed MSAs to speed runs.
13. Example outputs
AlphaFold writes predictions and auxiliary outputs (ranked models, metrics) into the output directory.
Example header text:
prediction_1 | score=... | model_preset=monomer
14. Resources and links
- AlphaFold official repo: https://github.com/google-deepmind/alphafold