Troubleshooting & FAQ
Common Issues
Issue: Unable to Mount PFS
Symptoms: PFS fails to mount on compute instance
Possible Causes & Solutions:
- PFS not unmounted from previous resize: Complete any pending operations
- Incorrect mount path: Ensure mount path is valid and doesn't conflict with existing directories
- PFS in error state: Check status in dashboard and contact support
Issue: Resize Operation Failed
Symptoms: Resize doesn't complete or PFS enters error state
Possible Causes & Solutions:
- PFS still mounted: Ensure PFS is completely unmounted from all Instances before resizing
- Active I/O operations: Wait for all operations to complete
- Backend issue: Contact support with error details
Always unmount PFS from all attached Instances before attempting to resize. Failure to do so will cause the resize operation to fail.
Issue: Poor Performance
Symptoms: I/O operations slower than expected
Possible Causes & Solutions:
- Network bottleneck: Verify network bandwidth and latency
- Suboptimal access patterns: Review and optimize I/O patterns
- Resource contention: Check if compute nodes have sufficient resources
- Wrong performance tier: You may need a higher performance tier
- Too many concurrent operations: Optimize application parallelism
Issue: Cannot Delete PFS
Symptoms: Deletion operation fails or is unavailable
Possible Causes & Solutions:
- PFS still mounted: Unmount from all attached Instances first
- Active services using PFS: Stop all services accessing the PFS
- Pending operations: Wait for any resize or maintenance operations to complete
Issue: Permission Denied When Accessing PFS from a Container
Symptoms: Container process receives Permission denied when reading or writing to the PFS mount path
Possible Causes & Solutions:
- UID/GID outside the allowed range: Container users must have a UID and GID between 1000 and 1050. Users created outside this range cannot access the PFS even if file permissions would otherwise allow it. Recreate the user with a UID/GID in the 1000–1050 range.
- File ownership mismatch: The files or directories on the PFS are owned by a different UID/GID than the container user. Use
chownto align ownership with the container user's UID and GID. - Incorrect Linux permissions: Verify that the read/write/execute bits on the PFS directory are set correctly for the owner, group, or others as required.
Issue: Non-Root Container User Cannot Access PFS
Symptoms: A non-root user inside a container gets a permission error, but the root user (UID 0) can access the PFS normally
Possible Causes & Solutions:
- UID not in 1000–1050 range: Only UID 0 and UIDs in the range 1000–1050 are permitted to access the PFS from containers. Check the user's UID with
idand recreate it within the allowed range if necessary. - GID not in 1000–1050 range: The user's primary GID must also fall within 1000–1050. Update or recreate the group accordingly.
- Missing group membership: If using group-based permissions on a PFS directory, ensure the container user is a member of the group that owns the directory.
Issue: VM User Cannot Access PFS Files or Directories
Symptoms: A user logged into a VM gets Permission denied when accessing files on the mounted PFS
Possible Causes & Solutions:
- Insufficient file permissions: The file or directory permissions do not grant access to that user or group. Use
ls -laon the PFS mount to check permissions andchmod/chownto correct them. - Wrong user or group ownership: The PFS files are owned by a different user or group. Align ownership with the intended user using
chown. - PFS not mounted for that user's session: Verify the PFS is still mounted with
df -hormount | grep <mount-path>.
Frequently Asked Questions
General Questions
Q: What makes PFS different from SFS?
A: PFS is optimized for high-performance parallel I/O workloads, offering superior throughput and IOPS compared to SFS. It's designed for HPC applications and intensive multi-node training scenarios.
Q: Can I convert an existing SFS to PFS?
A: No, you cannot directly convert SFS to PFS. You would need to create a new PFS and migrate your data.
Q: How many Instances can access a PFS simultaneously?
A: PFS is designed for high-concurrency parallel access. The exact number depends on your configuration and performance tier.
Mounting & Access
Q: Why must I unmount PFS before resizing?
A: Unmounting ensures data consistency and prevents corruption during the resize operation. Active I/O during resize can lead to data loss.
Q: Can I specify custom mount paths?
A: Yes, you can specify your preferred mount path when attaching PFS to compute instances, allowing flexible integration with your applications.
Q: What happens to data when I unmount PFS?
A: Data remains intact on the PFS. Unmounting only disconnects the file system from that specific node. You can remount it later without data loss.
Q: Can I mount the same PFS to Instances in different regions?
A: No, PFS is region-specific. Instances must be in the same region as the PFS for optimal performance.
Q: Which users can access PFS from inside a container?
A: By default, only the root user (UID 0) has access. Additional users must be created with a UID and GID between 1000 and 1050. Users outside this range will be denied access regardless of file-level permissions.
Q: Why does my container user get Permission denied even though the file permissions look correct?
A: PFS enforces a UID/GID allowlist for container-based access. Even if the Linux permission bits allow access, a container user with a UID or GID outside the 1000–1050 range will be blocked. Recreate the user within the allowed range and update file ownership to match.
Q: Do VM users have the same UID/GID restriction as container users?
A: No. On VMs, standard Linux user and group permissions apply without any UID/GID range restriction. Any VM user with the correct file permissions can access the PFS.
Q: How do I set up shared PFS access for multiple container users?
A: Create a shared group with a GID between 1000 and 1050, assign all container users to that group, and set the PFS directories to be group-owned with the appropriate group write permissions. This allows multiple users to collaborate on the same PFS without giving everyone root access.
Performance
Q: How do I know if I need PFS instead of SFS?
A: Consider PFS if you:
- Require maximum I/O throughput
- Run HPC or parallel computing workloads
- Need consistent high performance under heavy concurrent access
- Have budget for premium storage
Q: Why is my PFS not performing as expected?
A: Check:
- Network bandwidth between nodes and storage
- I/O access patterns (sequential vs random)
- Number of concurrent operations
- Node configuration and resources
Q: What throughput can I expect from PFS?
A: Performance varies based on your specific configuration and workload.
Data Management
Q: How do I backup PFS data?
A: Implement regular backups by:
- Copying data to Datasets (EOS) for long-term archival
- Using rsync or parallel copy tools for large datasets
Q: What happens if I delete PFS by accident?
A: Data deletion is permanent and cannot be recovered. Always maintain backups and follow the deletion confirmation process carefully.
Data Safety Recommendations
Backup Strategy
PFS deletion is permanent and irreversible. Always maintain backups of mission-critical data.
Recommended Approach:
- Automated Backups: Schedule regular backups to external storage
- Datasets Integration: Archive important data to Datasets (EOS)
- Version Control: Use appropriate tools for code and configuration
- Disaster Recovery: Maintain copies in different regions for critical data
Before Resizing
- Unmount from all Instances - Verify no Instances have PFS mounted
- Backup critical data - Create backups as additional safety measure
- Stop dependent services - Ensure no applications are accessing PFS
- Verify requirements - Confirm new size meets your needs
Before Deletion
- Verify backups - Confirm all important data is backed up elsewhere
- Check dependencies - Ensure no active workloads depend on this PFS
- Unmount completely - Remove PFS from all attached Instances
- Double-check PFS ID - Verify you're deleting the correct file system
Security Best Practices
- Access Control: Limit PFS access to only necessary compute instances
- Data Sensitivity: Understand compliance requirements for stored data
- Regular Audits: Review which Instances have access to each PFS
- Unmount When Idle: Unmount PFS from Instances that aren't actively using it