The Problem: A 40GB VPS Crying for Help
It started with a simple alert: our production server was at 85% disk usage with only 5.9GB free on a 40GB drive. For a busy web hosting server running WHM/cPanel, this wasn't just an inconvenience, it was a ticking time bomb! Servers filled to capacity can experience everything from failed backups to crashed databases and even complete service outages.
What made this particularly concerning was that this server had been running for over a year without systematic cleanup. Like digital hoarding, temporary files, old logs, and forgotten backups had accumulated, slowly choking our precious disk space.
Our Approach: Safe, Systematic, and Documented
We implemented a three-philosophy approach:
- Never delete without analysis
- Log everything for accountability
- Automate prevention for the future
The Complete Cleanup Solution
1. The Master Script: server-cleanup-master.sh
cat > /usr/local/bin/server-cleanup-master.sh << 'MASTER_EOF'
#!/bin/bash
#
# iT-werX Server Cleanup Master Script
# Purpose: Systematic disk space recovery with full logging
# Author: iT-werX Admin Team
# Version: 1.0
# Configuration
LOG_DIR="/var/log/server-cleanup"
LOG_FILE="${LOG_DIR}/cleanup_$(date +%Y%m%d_%H%M%S).log"
EMAIL_ADMIN="admin@it-werx.ca"
SERVER_NAME=$(hostname)
TEMP_DIR="/tmp/cleanup_$(date +%s)"
THRESHOLD_PERCENT=80 # Alert threshold
# Create required directories
mkdir -p "$LOG_DIR"
mkdir -p "$TEMP_DIR"
# Logging functions
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log_section() {
echo "" | tee -a "$LOG_FILE"
echo "=== $1 ===" | tee -a "$LOG_FILE"
echo "" | tee -a "$LOG_FILE"
}
error_exit() {
log "ERROR: $1"
log "Cleanup process aborted!"
send_report "FAILED"
cleanup_temp
exit 1
}
check_error() {
if [ $? -ne 0 ]; then
error_exit "$1"
fi
}
# Email reporting
send_report() {
local status=$1
local subject="[iT-werX] Server Cleanup ${status} on ${SERVER_NAME} - $(date)"
# Create report
local report_file="${TEMP_DIR}/final_report.txt"
echo "iT-werX Server Cleanup Report" > "$report_file"
echo "==============================" >> "$report_file"
echo "Server: ${SERVER_NAME}" >> "$report_file"
echo "Date: $(date)" >> "$report_file"
echo "Status: ${status}" >> "$report_file"
echo "" >> "$report_file"
cat "$LOG_FILE" >> "$report_file"
# Send email
mail -s "$subject" "$EMAIL_ADMIN" < "$report_file"
log "Report emailed to ${EMAIL_ADMIN}"
}
# Cleanup temporary files
cleanup_temp() {
log "Cleaning up temporary files..."
rm -rf "$TEMP_DIR"
}
# Pre-flight checks
preflight_checks() {
log_section "PRE-FLIGHT CHECKS"
# Check if running as root
if [ "$EUID" -ne 0 ]; then
error_exit "This script must be run as root"
fi
# Check disk space before starting
local usage=$(df / --output=pcent | tail -1 | tr -d ' %')
log "Current disk usage: ${usage}%"
if [ "$usage" -lt "$THRESHOLD_PERCENT" ]; then
log "Disk usage below ${THRESHOLD_PERCENT}%, cleanup not urgently needed"
send_report "NOT_NEEDED"
cleanup_temp
exit 0
fi
# Backup critical files
log "Backing up critical configurations..."
mkdir -p "${TEMP_DIR}/backups"
cp -a /etc/my.cnf /etc/passwd /etc/group "${TEMP_DIR}/backups/" 2>/dev/null
crontab -l > "${TEMP_DIR}/backups/crontab_backup.txt" 2>/dev/null
log "Pre-flight checks passed"
}
# Disk analysis functions
analyze_disk_usage() {
log_section "DISK USAGE ANALYSIS"
# Store initial state
log "Initial disk status:"
df -h | tee -a "$LOG_FILE"
# Analyze top-level directories
log "Analyzing directory sizes..."
local analysis_file="${TEMP_DIR}/disk_analysis.txt"
du -xh / --max-depth=1 2>/dev/null | sort -rh | head -15 > "$analysis_file"
log "Top space-consuming directories:"
cat "$analysis_file" | tee -a "$LOG_FILE"
# Extract problematic directories for later use
grep -E '[0-9]G\s+/' "$analysis_file" | awk '{print $2}' > "${TEMP_DIR}/large_dirs.txt"
}
analyze_home_directories() {
log_section "HOME DIRECTORY ANALYSIS"
local home_analysis="${TEMP_DIR}/home_analysis.txt"
log "Analyzing user home directories..."
du -sh /home/* 2>/dev/null | sort -hr > "$home_analysis"
log "Top user accounts by size:"
head -10 "$home_analysis" | tee -a "$LOG_FILE"
# Store top 5 users for detailed cleanup
head -5 "$home_analysis" | awk '{print $2}' | xargs -n1 basename > "${TEMP_DIR}/top_users.txt"
}
# Cleanup phases
phase1_system_cleanup() {
log_section "PHASE 1: SYSTEM CLEANUP"
log "1. Cleaning package manager cache..."
if command -v yum >/dev/null 2>&1; then
yum clean all
elif command -v apt-get >/dev/null 2>&1; then
apt-get clean
fi
check_error "Package cache cleanup failed"
log "2. Removing old kernel versions..."
if command -v package-cleanup >/dev/null 2>&1; then
package-cleanup --oldkernels --count=1 -y
fi
log "3. Cleaning system temporary files..."
rm -rf /tmp/*
rm -rf /var/tmp/*
}
phase2_log_management() {
log_section "PHASE 2: LOG MANAGEMENT"
local large_logs="${TEMP_DIR}/large_logs.txt"
log "1. Identifying large log files (>50MB)..."
find /var/log -type f -name "*.log" -size +50M 2>/dev/null > "$large_logs"
if [ -s "$large_logs" ]; then
log_count=$(wc -l < "$large_logs")
log "Found ${log_count} large log files"
while read -r logfile; do
size=$(du -h "$logfile" | cut -f1)
log "Rotating: ${logfile} (${size})"
# Rotate instead of delete
if [ -f "$logfile" ]; then
mv "$logfile" "${logfile}.old"
touch "$logfile"
chmod 640 "$logfile" 2>/dev/null
fi
done < "$large_logs"
else
log "No abnormally large log files found"
fi
log "2. Compressing old log files (>30 days)..."
find /var/log -name "*.old" -exec gzip {} \; 2>/dev/null
log "3. Removing very old compressed logs (>90 days)..."
find /var/log -name "*.gz" -mtime +90 -delete 2>/dev/null
}
phase3_user_space_cleanup() {
log_section "PHASE 3: USER SPACE CLEANUP"
if [ ! -f "${TEMP_DIR}/top_users.txt" ]; then
log "No user analysis found, skipping user cleanup"
return 0
fi
local users=$(cat "${TEMP_DIR}/top_users.txt")
for user in $users; do
log "Processing user: ${user}"
local user_log="${TEMP_DIR}/user_${user}_cleanup.log"
# Clean PHP sessions
log " Cleaning PHP sessions..."
find "/home/${user}" -type f -name "sess_*" -mtime +1 -delete 2>/dev/null >> "$user_log"
# Clean statistics cache
log " Cleaning statistics cache..."
find "/home/${user}/tmp" -path "*/analog/*" -type f -name "cache" -delete 2>/dev/null >> "$user_log"
find "/home/${user}/tmp" -path "*/webalizer/*" -name "*.png" -mtime +30 -delete 2>/dev/null >> "$user_log"
# Clean old backups
log " Cleaning old backups..."
find "/home/${user}" -type f \( -name "*.tar.gz" -o -name "*.zip" \) -mtime +30 -delete 2>/dev/null >> "$user_log"
# Clean error logs
log " Truncating large error logs..."
find "/home/${user}/public_html" -name "error_log" -size +1M -exec truncate -s 0 {} \; 2>/dev/null >> "$user_log"
# Report actions
local actions=$(wc -l < "$user_log" 2>/dev/null || echo "0")
log " Performed ${actions} cleanup actions for ${user}"
done
}
phase4_mysql_maintenance() {
log_section "PHASE 4: MYSQL MAINTENANCE"
log "1. Optimizing MySQL tables..."
mysqlcheck -o --all-databases 2>&1 | tee -a "$LOG_FILE"
log "2. Cleaning binary logs (if enabled)..."
mysql -e "PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);" 2>&1 | tee -a "$LOG_FILE"
log "3. Analyzing database sizes..."
local db_analysis="${TEMP_DIR}/mysql_analysis.txt"
mysql -e "SELECT table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
GROUP BY table_schema
ORDER BY ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) DESC;" > "$db_analysis"
log "Database sizes:"
cat "$db_analysis" | tee -a "$LOG_FILE"
}
# Main execution
main() {
log_section "iT-werX SERVER CLEANUP STARTED"
log "Process ID: $$"
log "Temporary directory: ${TEMP_DIR}"
log "Log file: ${LOG_FILE}"
# Execute phases
preflight_checks
analyze_disk_usage
analyze_home_directories
phase1_system_cleanup
phase2_log_management
phase3_user_space_cleanup
phase4_mysql_maintenance
# Final report
log_section "CLEANUP COMPLETE"
log "Final disk status:"
df -h | tee -a "$LOG_FILE"
# Calculate space recovered
local final_usage=$(df / --output=pcent | tail -1 | tr -d ' %')
log "Final disk usage: ${final_usage}%"
send_report "SUCCESS"
cleanup_temp
log "Cleanup process completed successfully"
log_section "END OF CLEANUP"
}
# Trap signals for graceful exit
trap 'log "Interrupt received, cleaning up..."; cleanup_temp; exit 1' INT TERM
# Run main function
main "$@"
MASTER_EOF
chmod +x /usr/local/bin/server-cleanup-master.sh
2. Daily Maintenance Script: daily-cleanup.sh
cat > /etc/cron.daily/itwerx-daily-cleanup.sh << 'DAILY_EOF'
#!/bin/bash
#
# iT-werX Daily Maintenance Script
# Runs safe, non-destructive cleanup daily
LOG_FILE="/var/log/server-cleanup/daily_$(date +%Y%m%d).log"
EMAIL_ADMIN="admin@it-werx.ca"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >> "$LOG_FILE"
}
# Start logging
log "=== iT-werX Daily Maintenance Started ==="
# 1. Clean PHP sessions (older than 1 day)
log "Cleaning PHP sessions..."
find /home -type f -name "sess_*" -mtime +1 -delete 2>/dev/null
session_count=$?
log "PHP session cleanup completed"
# 2. Clean statistics cache
log "Cleaning statistics cache..."
find /home -path "*/tmp/analog/*" -type f -name "cache" -mtime +1 -delete 2>/dev/null
find /home -path "*/tmp/webalizer/*" -name "*.png" -mtime +30 -delete 2>/dev/null
# 3. Clean user temporary files
log "Cleaning user temp files..."
for user_dir in /home/*; do
user=$(basename "$user_dir")
if [ -d "${user_dir}/tmp" ]; then
find "${user_dir}/tmp" -type f -mtime +7 -delete 2>/dev/null
fi
done
# 4. Clean package cache
log "Cleaning package cache..."
if command -v yum >/dev/null 2>&1; then
yum clean all >> "$LOG_FILE" 2>&1
elif command -v apt-get >/dev/null 2>&1; then
apt-get clean >> "$LOG_FILE" 2>&1
fi
# 5. Check disk usage
log "Checking disk usage..."
DISK_USAGE=$(df / --output=pcent | tail -1 | tr -d ' %')
log "Current disk usage: ${DISK_USAGE}%"
# 6. Send alert if above threshold
if [ "$DISK_USAGE" -gt 80 ]; then
echo "Disk usage at ${DISK_USAGE}% on $(hostname)" | \
mail -s "[iT-werX Alert] High Disk Usage on $(hostname)" "$EMAIL_ADMIN"
log "High disk usage alert sent"
fi
# 7. Rotate this log file if too large
LOG_SIZE=$(stat -c%s "$LOG_FILE" 2>/dev/null || echo "0")
if [ "$LOG_SIZE" -gt 10485760 ]; then # 10MB
mv "$LOG_FILE" "${LOG_FILE}.old"
log "Log file rotated"
fi
log "=== Daily Maintenance Completed ==="
# Keep only last 30 days of logs
find /var/log/server-cleanup -name "daily_*.log" -mtime +30 -delete 2>/dev/null
find /var/log/server-cleanup -name "*.old" -mtime +90 -delete 2>/dev/null
DAILY_EOF
chmod +x /etc/cron.daily/itwerx-daily-cleanup.sh3. Emergency Cleanup Script: emergency-cleanup.sh
cat > /usr/local/bin/emergency-cleanup.sh << 'EMERGENCY_EOF'
#!/bin/bash
#
# iT-werX Emergency Cleanup Script
# Use when disk is critically full (>90%)
LOG_FILE="/tmp/emergency_cleanup_$(date +%s).log"
EMAIL_ADMIN="admin@it-werx.ca"
echo "=== iT-werX EMERGENCY CLEANUP ===" | tee "$LOG_FILE"
echo "Started: $(date)" | tee -a "$LOG_FILE"
echo "" | tee -a "$LOG_FILE"
# Check current usage
CURRENT_USAGE=$(df / --output=pcent | tail -1 | tr -d ' %')
echo "Current disk usage: ${CURRENT_USAGE}%" | tee -a "$LOG_FILE"
if [ "$CURRENT_USAGE" -lt 90 ]; then
echo "Disk usage below 90%, emergency cleanup not needed." | tee -a "$LOG_FILE"
echo "Consider running server-cleanup-master.sh instead." | tee -a "$LOG_FILE"
exit 0
fi
echo "" | tee -a "$LOG_FILE"
echo "WARNING: Performing aggressive cleanup!" | tee -a "$LOG_FILE"
echo "" | tee -a "$LOG_FILE"
# 1. Remove all package cache
echo "1. Removing ALL package cache..." | tee -a "$LOG_FILE"
rm -rf /var/cache/yum/* 2>/dev/null
rm -rf /var/cache/apt/archives/* 2>/dev/null
# 2. Clean ALL log files (keep current only)
echo "2. Truncating large log files..." | tee -a "$LOG_FILE"
find /var/log -type f -name "*.log" -size +10M -exec truncate -s 1M {} \; 2>/dev/null
find /usr/local/apache/logs -type f -name "*.log" -size +10M -exec truncate -s 1M {} \; 2>/dev/null
# 3. Remove ALL statistics cache
echo "3. Removing ALL statistics cache..." | tee -a "$LOG_FILE"
find /home -path "*/tmp/*stats*" -type f -delete 2>/dev/null
find /home -path "*/tmp/analog/*" -type f -delete 2>/dev/null
find /home -path "*/tmp/webalizer/*" -type f -delete 2>/dev/null
# 4. Clean ALL PHP sessions
echo "4. Removing ALL PHP sessions..." | tee -a "$LOG_FILE"
find /home -type f -name "sess_*" -delete 2>/dev/null
# 5. Clean backup directories
echo "5. Cleaning backup directories..." | tee -a "$LOG_FILE"
find /backup -type f -mtime +1 -delete 2>/dev/null
find /home -name "*backup*.tar.gz" -type f -mtime +3 -delete 2>/dev/null
# 6. Clear mail queue
echo "6. Clearing mail queue..." | tee -a "$LOG_FILE"
/usr/sbin/exim -bp | /usr/sbin/exiqgrep -z | xargs /usr/sbin/exim -Mrm 2>/dev/null
# 7. Final status
echo "" | tee -a "$LOG_FILE"
echo "Emergency cleanup completed!" | tee -a "$LOG_FILE"
echo "" | tee -a "$LOG_FILE"
echo "Final disk status:" | tee -a "$LOG_FILE"
df -h | tee -a "$LOG_FILE"
# Send report
mail -s "[iT-werX Emergency] Cleanup completed on $(hostname)" "$EMAIL_ADMIN" < "$LOG_FILE"
echo "Report sent to $EMAIL_ADMIN" | tee -a "$LOG_FILE"
echo "Log file saved to: $LOG_FILE" | tee -a "$LOG_FILE"
EMERGENCY_EOF
chmod +x /usr/local/bin/emergency-cleanup.sh4. Analysis Tool: disk-analyzer.sh
cat > /usr/local/bin/disk-analyzer.sh << 'ANALYZER_EOF'
#!/bin/bash
#
# iT-werX Disk Space Analyzer
# Provides detailed analysis without making changes
REPORT_FILE="/tmp/disk_analysis_$(date +%Y%m%d_%H%M%S).txt"
EMAIL_ADMIN="admin@it-werx.ca"
echo "iT-werX Disk Space Analysis Report" > "$REPORT_FILE"
echo "==================================" >> "$REPORT_FILE"
echo "Server: $(hostname)" >> "$REPORT_FILE"
echo "Date: $(date)" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Overall disk status
echo "=== OVERALL DISK STATUS ===" >> "$REPORT_FILE"
df -h >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Top-level directory analysis
echo "=== TOP-LEVEL DIRECTORY ANALYSIS ===" >> "$REPORT_FILE"
du -xh / --max-depth=1 2>/dev/null | sort -rh | head -20 >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Large files (>100MB)
echo "=== LARGE FILES (>100MB) ===" >> "$REPORT_FILE"
find / -type f -size +100M 2>/dev/null | grep -v "^/proc\|^/sys\|^/run" | xargs ls -lh 2>/dev/null | head -20 >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Home directory analysis
echo "=== HOME DIRECTORY ANALYSIS ===" >> "$REPORT_FILE"
echo "Total home usage: $(du -sh /home 2>/dev/null | cut -f1)" >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
echo "Per-user breakdown:" >> "$REPORT_FILE"
du -sh /home/* 2>/dev/null | sort -hr >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Log directory analysis
echo "=== LOG DIRECTORY ANALYSIS ===" >> "$REPORT_FILE"
echo "/var/log size: $(du -sh /var/log 2>/dev/null | cut -f1)" >> "$REPORT_FILE"
echo "Large log files:" >> "$REPORT_FILE"
find /var/log -type f -name "*.log" -size +50M 2>/dev/null | xargs ls -lh 2>/dev/null >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# MySQL database sizes
echo "=== MYSQL DATABASE SIZES ===" >> "$REPORT_FILE"
mysql -e "SELECT table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
GROUP BY table_schema
ORDER BY ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) DESC;" 2>/dev/null >> "$REPORT_FILE"
echo "" >> "$REPORT_FILE"
# Recommendations
echo "=== RECOMMENDATIONS ===" >> "$REPORT_FILE"
USAGE=$(df / --output=pcent | tail -1 | tr -d ' %')
if [ "$USAGE" -gt 90 ]; then
echo "CRITICAL: Run emergency-cleanup.sh immediately!" >> "$REPORT_FILE"
elif [ "$USAGE" -gt 80 ]; then
echo "HIGH: Run server-cleanup-master.sh soon" >> "$REPORT_FILE"
elif [ "$USAGE" -gt 70 ]; then
echo "MODERATE: Consider cleanup in next maintenance window" >> "$REPORT_FILE"
else
echo "OK: Disk usage is at healthy level" >> "$REPORT_FILE"
fi
echo "" >> "$REPORT_FILE"
echo "=== END OF REPORT ===" >> "$REPORT_FILE"
# Display and optionally email
cat "$REPORT_FILE"
read -p "Send this report to $EMAIL_ADMIN? (y/n): " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
mail -s "[iT-werX Analysis] Disk Report for $(hostname)" "$EMAIL_ADMIN" < "$REPORT_FILE"
echo "Report sent to $EMAIL_ADMIN"
fi
echo "Report saved to: $REPORT_FILE"
ANALYZER_EOF
chmod +x /usr/local/bin/disk-analyzer.shImplementation and Results
Setting Up the System
# Create log directory
mkdir -p /var/log/server-cleanup
# Make all scripts executable
chmod +x /usr/local/bin/server-cleanup-master.sh
chmod +x /usr/local/bin/emergency-cleanup.sh
chmod +x /usr/local/bin/disk-analyzer.sh
chmod +x /etc/cron.daily/itwerx-daily-cleanup.sh
# Test the analyzer first
echo "Testing disk analyzer..."
/usr/local/bin/disk-analyzer.sh
# Set up weekly full cleanup (Sundays at 2 AM)
cat > /etc/cron.d/itwerx-cleanup << 'CRON_EOF'
# iT-werX Weekly Server Cleanup
0 2 * * 0 root /usr/local/bin/server-cleanup-master.sh >/dev/null 2>&1
# Daily disk check
0 8 * * * root /usr/local/bin/disk-analyzer.sh | tail -20 >/tmp/daily_disk_check.txt
# Monthly deep analysis
0 3 1 * * root /usr/local/bin/disk-analyzer.sh | mail -s "Monthly Disk Analysis" admin@it-werx.ca
CRON_EOFThe Results: A Self-Healing Server
After implementing this system, we achieved:
- Automatic Space Management: Daily cleanup keeps temporary files in check
- Proactive Monitoring: Alerts trigger before problems become critical
- Documented Processes: Every action is logged and reportable
- Scalable Solution: Works for servers of any size
Key Metrics Achieved:
- Regular cleanup: 1-2GB recovered weekly
- Emergency readiness: Can free 3-5GB in minutes if needed
- Zero downtime: All cleanup happens without service interruption
- Full transparency: Every action logged and reported
Lessons Learned
- Prevention is cheaper than cure: Regular maintenance prevents emergency situations
- Know your data: Analysis before deletion prevents mistakes
- Automate responsibly: Scripts must include safety checks and logging
- Communicate clearly: Email reports keep everyone informed
Get the Code
All scripts are available in our GitHub repository. They're freely usable under the GNU Public license with proper attribution.
Need help with your server? Contact iT-werX for professional system administration services that keep your infrastructure running smoothly.