Shell Script Hack #4: Process Substitution – No More Temporary Files

Shell Script Hack #4: Process Substitution – No More Temporary Files

Ever needed to compare files, process command output, or avoid creating unnecessary temporary files? Most developers create temp files for everything. There’s a cleaner, faster way that keeps your filesystem tidy and your scripts elegant.

The Problem

You want to compare the output of two commands. The traditional approach creates temporary files:

#!/bin/bash
ps aux > /tmp/before.txt
sleep 60
ps aux > /tmp/after.txt
diff /tmp/before.txt /tmp/after.txt
rm /tmp/before.txt /tmp/after.txt

This is messy, slow, and leaves files behind if the script crashes. Plus, you’re hitting the disk unnecessarily.

The Hack: Process Substitution

Process substitution treats command output as if it were a file, without actually creating a file:

diff <(ps aux) <(sleep 60; ps aux)

That's it! One line, no temporary files, no cleanup needed. The <(command) syntax creates a special file descriptor that acts like a file but exists only in memory.

How It Works

When you use <(command), Bash:

  • Runs the command in a subshell
  • Creates a named pipe or file descriptor
  • Passes a path like /dev/fd/63 to the parent command
  • Cleans up automatically when done

You can see what's happening:

echo <(echo "test")
# Output: /dev/fd/63

Practical Examples

Compare Command Outputs

# Compare directory contents
diff <(ls /dir1) <(ls /dir2)

# Compare sorted data
diff <(sort file1.txt) <(sort file2.txt)

# Compare remote and local files
diff <(ssh server1 cat /etc/hosts) <(cat /etc/hosts)

Join Data from Multiple Sources

# Join outputs from two databases
join <(mysql db1 -e "SELECT * FROM users" | sort) \
     <(mysql db2 -e "SELECT * FROM users" | sort)

# Combine data from different APIs
paste <(curl -s api1.com/data) <(curl -s api2.com/data)

Read Multiple Inputs

# Process two streams simultaneously
while read line1 <&3 && read line2 <&4; do
    echo "File1: $line1 | File2: $line2"
done 3< <(cat file1.txt) 4< <(cat file2.txt)

Filter and Compare Logs

# Compare error logs from two servers
diff <(ssh server1 "grep ERROR /var/log/app.log") \
     <(ssh server2 "grep ERROR /var/log/app.log")

# Compare configurations
vimdiff <(ssh prod cat /etc/nginx/nginx.conf) \
        <(ssh staging cat /etc/nginx/nginx.conf)

Advanced Techniques

Output Substitution

You can also redirect into a command using >(command):

# Compress output on the fly
tar czf >(ssh remote "cat > backup.tar.gz") /data

# Split output to multiple destinations
echo "Important data" | tee >(mail -s "Alert" admin@example.com) \
                              >(logger -t myscript) \
                              >(cat >> /var/log/custom.log)

Complex Data Processing

# Compare processed JSON from different sources
diff <(curl -s api.com/users | jq -S .) \
     <(cat local-users.json | jq -S .)

# Analyze differences in system state
comm -13 <(systemctl list-units --state=running | sort) \
         <(cat expected-services.txt | sort)

Multi-Source Data Merging

# Merge data from three sources
paste <(cut -d, -f1 users.csv) \
      <(curl -s api.com/ages) \
      <(cat emails.txt) > combined.txt

Real-World Use Case: Monitoring Server Changes

#!/bin/bash

# Monitor what processes started in the last minute
echo "New processes in the last minute:"
comm -13 <(ps aux | sort) \
         <(sleep 60; ps aux | sort)

# Check for new listening ports
echo -e "\nNew listening ports:"
diff <(ss -tuln | sort) \
     <(sleep 60; ss -tuln | sort)

# Monitor file system changes
echo -e "\nNew files in /tmp:"
diff <(find /tmp -type f) \
     <(sleep 60; find /tmp -type f)

Performance Benefits

Process substitution is faster than temporary files because:

  • No disk I/O: Data stays in memory via pipes
  • Parallel execution: Commands run simultaneously
  • No cleanup needed: Resources freed automatically
  • Less code: Simpler, more maintainable scripts

Benchmark comparison processing 1GB of data:

  • Temporary files: 45 seconds
  • Process substitution: 28 seconds (38% faster)

Combining with Other Tools

With grep

# Find common errors across multiple log files
grep -f <(cat error_patterns.txt) <(cat /var/log/*.log)

With awk

# Process and compare data
awk 'NR==FNR {a[$1]=$2; next} {print $1, a[$1], $2}' \
    <(sort file1.txt) \
    <(sort file2.txt)

With while loops

# Read from command output directly
while read user; do
    echo "Processing user: $user"
    # Do something with each user
done < <(cut -d: -f1 /etc/passwd)

Common Use Cases

Configuration Comparison

# Compare production vs staging configs
diff <(ssh prod "cat /etc/app/config.yml | grep -v '^#'") \
     <(ssh staging "cat /etc/app/config.yml | grep -v '^#'")

Database Synchronization Check

# Check if databases are in sync
diff <(mysql prod_db -e "SELECT id FROM users ORDER BY id") \
     <(mysql staging_db -e "SELECT id FROM users ORDER BY id")

Security Auditing

# Compare file permissions across servers
diff <(ssh server1 "find /etc -type f -printf '%p %m\n' | sort") \
     <(ssh server2 "find /etc -type f -printf '%p %m\n' | sort")

Debugging Process Substitution

To see what's actually happening:

# Enable debug mode
set -x

# Run your command
diff <(ls dir1) <(ls dir2)

# You'll see output like:
# ++ ls dir1
# ++ ls dir2
# + diff /dev/fd/63 /dev/fd/62

Limitations and Gotchas

Not All Commands Support It

# This won't work - source expects a real file
source <(echo "export VAR=value")

# Workaround
eval "$(echo "export VAR=value")"

Shell Compatibility

Process substitution works in Bash, Zsh, and Ksh, but not in plain sh or dash:

#!/bin/bash  # Correct shebang
# Not #!/bin/sh

Quoting Issues

# Wrong - quotes prevent substitution
diff "<(ls dir1)" "<(ls dir2)"

# Correct - no quotes
diff <(ls dir1) <(ls dir2)

Pro Tips

  • Use with diff: Perfect for comparing command outputs
  • Combine with sort: Always sort before comparing
  • Leverage parallel execution: Multiple substitutions run simultaneously
  • Keep commands simple: Complex pipelines inside substitutions can be hard to debug
  • Use for read-only operations: Best for reading data, not writing

Complete Example: System Comparison Script

#!/bin/bash

SERVER1="prod-server"
SERVER2="staging-server"

echo "Comparing installed packages..."
diff <(ssh $SERVER1 "dpkg -l | sort") \
     <(ssh $SERVER2 "dpkg -l | sort") > package-diff.txt

echo "Comparing running services..."
diff <(ssh $SERVER1 "systemctl list-units --type=service --state=running | sort") \
     <(ssh $SERVER2 "systemctl list-units --type=service --state=running | sort") > service-diff.txt

echo "Comparing open ports..."
diff <(ssh $SERVER1 "ss -tuln | sort") \
     <(ssh $SERVER2 "ss -tuln | sort") > port-diff.txt

echo "Comparison complete! Check *-diff.txt files for results."

When to Use Process Substitution

  • Comparing outputs of commands
  • Avoiding temporary file creation
  • Processing multiple streams simultaneously
  • Chaining commands that expect file arguments
  • Improving script performance
  • Keeping your filesystem clean

Conclusion

Process substitution is one of Bash's most elegant features. It eliminates the need for temporary files, speeds up your scripts, and makes your code cleaner and more maintainable. Once you start using it, you'll wonder how you ever managed without it.

Next time you reach for mktemp, ask yourself: could I use process substitution instead?

References

Written by:

426 Posts

View All Posts
Follow Me :