This is note about a mysterious behavior of while read var of the Bash shell. To understand the problem, let's consider
the following problem:
Given a text file called example.txt as follows, write a Bash shell script called join_lines.sh to join the lines
BEGIN Line 1 Line 1
Line 1 Line 1
BEGIN Line 2 Line 2
Line 2 Line 2
Line 2 Line 2
Line 2
BEGIN Line 3 Line 3 Line 3
Line 3
Line 3
The output should be 3 lines, as illustrated in the example below:
$ ./join_lines.sh
Joined Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joined Line: BEGIN Line 2 Line 2 Line 2 Line 2 ine 2 Line 2 Line 2
Joined Line: BEGIN Line 3 Line 3 Line 3 ine 3 ine 3
Our first implementation of join_lines.sh is as follows:
#!/bin/bash
joined=""
cat test.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "Joind Line: ${joined}"
Unfortunately, the output is actually the following:
$ ./join_lines.sh
Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
Joind Line:
$
Why does variable joined lose its value? That is a mystery, isn't it? To understand this, let's revise the script to print out the process ID's of
the shell. The revised version is as follows:
#!/bin/bash
joined=""
cat example.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}"
If we run this revised script, we shall get something like the following:
$ ./join_lines.sh
In 7065 7067: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7065 7067: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7065 7065: Joind Line:
$
By carefully examine the output, we can see that $$ and $BASHPID have different values at the first two lines. So, what is the
difference between $$ and $BASHPID and why are they different?
The Bash manaual page states this:
$ man bash
...
BASHPID
Expands to the process ID of the current bash process. This
differs from $$ under certain circumstances, such as subshells
that do not require bash to be re-initialized. Assignments to
BASHPID have no effect. If BASHPID is unset, it loses its spe‐
cial properties, even if it is subsequently reset.
...
$
The above experiment actually reveals that the while read-loop actually needs to run in a subshell. In fact, there are two
variables, both called joined, one lives in the parent and the other the child bash process.
A simple fix to the script would be to put the
while read-loop and the last echo command in a subshell, e.g., as follows:
#!/bin/bash
joined=""
cat example.txt | \
( \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}" \
)
Let's run this revised script. We shall get:
$ ./join_lines.sh
In 7119 7121: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7119 7121: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7119 7121: Joind Line: BEGIN Line 3 Line 3 Line 3 Line 3 Line 3
The mystery is solved!
No comments:
Post a Comment