# Run AlphaFold2 on Instance ## Prerequisites * Instance(GPU recommended). Full AlphaFold databases require ~2+ TB disk; reduced DBs use much less. * Conda installed. * Read/write access to chosen data and output directories (for example /mnt/model/vdb). ## 1. System packages (Ubuntu) Update and install basic build tools and utilities: ```bash sudo apt update -y && sudo apt upgrade -y sudo apt install -y build-essential cmake git hmmer kalign tzdata wget libstdc++6 ``` ## 2. Create and activate Conda environment Create a Python 3.8 environment and activate it: ```bash conda create -n alphafold python=3.8 -y conda activate alphafold ``` ## 3. Install Python dependencies Install key packages; follow AlphaFold README for full dependency list: ```bash pip install jaxlib==0.4.26 conda install -c conda-forge pdbfixer libstdcxx-ng -y ``` (Install any additional pip packages listed in the AlphaFold repo README.) ## 4. Helper file required by AlphaFold AlphaFold expects a helper text file under alphafold/alphafold/common/: ```bash mkdir -p alphafold/alphafold/common/ wget -q -P alphafold/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt ``` ## 5. Clone AlphaFold repo ```bash git clone https://github.com/google-deepmind/alphafold.git ``` ## 6. cd alphafold Follow the repository README for any extra setup steps or environment pins. ## 7. Prepare databases Place required databases on your instance and note the root data directory (example: /mnt/model/vdb/database). Essential subpaths include pdb70, uniref90, mgnify, bfd (or small_bfd for reduced), uniref30, pdb_mmcif. If disk or time is constrained, use the reduced DB preset or precomputed MSAs. ## 8. Run AlphaFold (full DBs) Update paths (FASTA, output, data) and run the full DB preset as a single-line command: ```bash python3 run_alphafold.py --use_gpu_relax=True \ --fasta_paths=/mnt/model/vdb/input/T1049.fasta \ --output_dir=/mnt/model/vdb/output \ --data_dir=/mnt/model/vdb/database \ --pdb70_database_path=/mnt/model/vdb/database/pdb70/pdb70 \ --uniref90_database_path=/mnt/model/vdb/database/uniref90/uniref90.fasta \ --mgnify_database_path=/mnt/model/vdb/database/mgnify/mgy_clusters_2022_05.fa \ --bfd_database_path=/mnt/model/vdb/database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --uniref30_database_path=/mnt/model/vdb/database/uniref30/UniRef30_2021_03 \ --template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \ --max_template_date=2025-03-04 \ --db_preset=full_dbs \ --model_preset=monomer \ --use_precomputed_msas=False ``` ## 9. Run AlphaFold (reduced DB option) If disk/runtime is constrained, use reduced DBs and the small BFD; single-line command: ```bash python3 run_alphafold.py --use_gpu_relax=True \ --fasta_paths=/mnt/model/vdb/input/T1049.fasta \ --output_dir=/mnt/model/vdb/output \ --data_dir=/mnt/model/vdb/database \ --template_mmcif_dir=/mnt/model/vdb/database/pdb_mmcif/mmcif_files \ --obsolete_pdbs_path=/mnt/model/vdb/alphafold/pdbs/pdb.json \ --max_template_date=2025-03-04 \ --db_preset=reduced_dbs \ --model_preset=monomer \ --use_precomputed_msas=False \ --small_bfd_database_path=/mnt/model/vdb/database/small_bfd/bfd-first_non_consensus_sequences.fasta ``` ## 10. Parameter highlights * `--fasta_paths`: input FASTA (one or more). * `--data_dir`: root directory containing databases. * `--output_dir`: where predictions and metrics are written. * `--max_template_date`: restrict templates by date. * `--db_preset`: full_dbs or reduced_dbs. * `--model_preset`: monomer (or multimer if configured and set up). * `--use_precomputed_msas`: set True if you have precomputed MSAs. ## 11. Viewing predictions After completion, PDB files and metrics are in the output directory. Visualize PDBs with NCBI iCn3D: Open [https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html](https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html) → File → Open Files → select PDB(s). ## 12. Common issues & troubleshooting * **CUDA OOM**: switch to monomer, reduce DB size, use a instance with more GPU memory, or enable swap if appropriate for your environment. * **Missing files / path errors**: verify FASTA and all DB paths are correct and accessible. * **Permission errors**: ensure the runtime user can read/write the data and output directories (use chown/chmod as needed). * **Long runtime / disk constraints**: use reduced_dbs or precomputed MSAs to speed runs. ## 13. Example outputs AlphaFold writes predictions and auxiliary outputs (ranked models, metrics) into the output directory. Example header text: ```plaintext prediction_1 | score=... | model_preset=monomer ``` ## 14. Resources and links * AlphaFold official repo: [https://github.com/google-deepmind/alphafold](https://github.com/google-deepmind/alphafold) ---