Skip to main content

Troubleshooting & FAQ

Common Issues

Issue: Unable to Mount PFS

Symptoms: PFS fails to mount on compute instance

Possible Causes & Solutions:

  1. PFS not unmounted from previous resize: Complete any pending operations
  2. Incorrect mount path: Ensure mount path is valid and doesn't conflict with existing directories
  3. PFS in error state: Check status in dashboard and contact support

Issue: Resize Operation Failed

Symptoms: Resize doesn't complete or PFS enters error state

Possible Causes & Solutions:

  1. PFS still mounted: Ensure PFS is completely unmounted from all Instances before resizing
  2. Active I/O operations: Wait for all operations to complete
  3. Backend issue: Contact support with error details

Critical Step

Always unmount PFS from all attached Instances before attempting to resize. Failure to do so will cause the resize operation to fail.

Issue: Poor Performance

Symptoms: I/O operations slower than expected

Possible Causes & Solutions:

  1. Network bottleneck: Verify network bandwidth and latency
  2. Suboptimal access patterns: Review and optimize I/O patterns
  3. Resource contention: Check if compute nodes have sufficient resources
  4. Wrong performance tier: You may need a higher performance tier
  5. Too many concurrent operations: Optimize application parallelism

Issue: Cannot Delete PFS

Symptoms: Deletion operation fails or is unavailable

Possible Causes & Solutions:

  1. PFS still mounted: Unmount from all attached Instances first
  2. Active services using PFS: Stop all services accessing the PFS
  3. Pending operations: Wait for any resize or maintenance operations to complete

Frequently Asked Questions

General Questions

Q: What makes PFS different from SFS?
A: PFS is optimized for high-performance parallel I/O workloads, offering superior throughput and IOPS compared to SFS. It's designed for HPC applications and intensive multi-node training scenarios.

Q: Can I convert an existing SFS to PFS?
A: No, you cannot directly convert SFS to PFS. You would need to create a new PFS and migrate your data.

Q: How many Instances can access a PFS simultaneously?
A: PFS is designed for high-concurrency parallel access. The exact number depends on your configuration and performance tier.

Mounting & Access

Q: Why must I unmount PFS before resizing?
A: Unmounting ensures data consistency and prevents corruption during the resize operation. Active I/O during resize can lead to data loss.

Q: Can I specify custom mount paths?
A: Yes, you can specify your preferred mount path when attaching PFS to compute instances, allowing flexible integration with your applications.

Q: What happens to data when I unmount PFS?
A: Data remains intact on the PFS. Unmounting only disconnects the file system from that specific node. You can remount it later without data loss.

Q: Can I mount the same PFS to Instances in different regions?
A: No, PFS is region-specific. Instances must be in the same region as the PFS for optimal performance.

Performance

Q: How do I know if I need PFS instead of SFS?
A: Consider PFS if you:

  • Require maximum I/O throughput
  • Run HPC or parallel computing workloads
  • Need consistent high performance under heavy concurrent access
  • Have budget for premium storage

Q: Why is my PFS not performing as expected?
A: Check:

  • Network bandwidth between nodes and storage
  • I/O access patterns (sequential vs random)
  • Number of concurrent operations
  • Node configuration and resources

Q: What throughput can I expect from PFS?
A: Performance varies based on your specific configuration and workload.

Data Management

Q: How do I backup PFS data?
A: Implement regular backups by:

  • Copying data to Datasets (EOS) for long-term archival
  • Using rsync or parallel copy tools for large datasets

Q: What happens if I delete PFS by accident?
A: Data deletion is permanent and cannot be recovered. Always maintain backups and follow the deletion confirmation process carefully.

Data Safety Recommendations

Backup Strategy

Critical Reminder

PFS deletion is permanent and irreversible. Always maintain backups of mission-critical data.

Recommended Approach:

  1. Automated Backups: Schedule regular backups to external storage
  2. Datasets Integration: Archive important data to Datasets (EOS)
  3. Version Control: Use appropriate tools for code and configuration
  4. Disaster Recovery: Maintain copies in different regions for critical data

Before Resizing

Pre-Resize Checklist
  1. Unmount from all Instances - Verify no Instances have PFS mounted
  2. Backup critical data - Create backups as additional safety measure
  3. Stop dependent services - Ensure no applications are accessing PFS
  4. Verify requirements - Confirm new size meets your needs

Before Deletion

Pre-Deletion Checklist
  1. Verify backups - Confirm all important data is backed up elsewhere
  2. Check dependencies - Ensure no active workloads depend on this PFS
  3. Unmount completely - Remove PFS from all attached Instances
  4. Double-check PFS ID - Verify you're deleting the correct file system

Security Best Practices

  • Access Control: Limit PFS access to only necessary compute instances
  • Data Sensitivity: Understand compliance requirements for stored data
  • Regular Audits: Review which Instances have access to each PFS
  • Unmount When Idle: Unmount PFS from Instances that aren't actively using it