This is note about a mysterious behavior of while read var
of the Bash
shell. To understand the problem, let's consider
the following problem:
Given a text file called example.txt
as follows, write a Bash
shell script called join_lines.sh
to join the lines
BEGIN Line 1 Line 1
Line 1 Line 1
BEGIN Line 2 Line 2
Line 2 Line 2
Line 2 Line 2
Line 2
BEGIN Line 3 Line 3 Line 3
Line 3
Line 3
The output should be 3 lines, as illustrated in the example below:
$ ./join_lines.sh
Joined Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joined Line: BEGIN Line 2 Line 2 Line 2 Line 2 ine 2 Line 2 Line 2
Joined Line: BEGIN Line 3 Line 3 Line 3 ine 3 ine 3
Our first implementation of join_lines.sh
is as follows:
#!/bin/bash
joined=""
cat test.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "Joind Line: ${joined}"
Unfortunately, the output is actually the following:
$ ./join_lines.sh
Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
Joind Line:
$
Why does variable joined
lose its value? That is a mystery, isn't it? To understand this, let's revise the script to print out the process ID's of
the shell. The revised version is as follows:
#!/bin/bash
joined=""
cat example.txt | \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}"
If we run this revised script, we shall get something like the following:
$ ./join_lines.sh
In 7065 7067: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7065 7067: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7065 7065: Joind Line:
$
By carefully examine the output, we can see that $$
and $BASHPID
have different values at the first two lines. So, what is the
difference between $$
and $BASHPID
and why are they different?
The Bash
manaual page states this:
$ man bash
...
BASHPID
Expands to the process ID of the current bash process. This
differs from $$ under certain circumstances, such as subshells
that do not require bash to be re-initialized. Assignments to
BASHPID have no effect. If BASHPID is unset, it loses its spe‐
cial properties, even if it is subsequently reset.
...
$
The above experiment actually reveals that the while read
-loop actually needs to run in a subshell. In fact, there are two
variables, both called joined
, one lives in the parent and the other the child bash
process.
A simple fix to the script would be to put the
while read
-loop and the last echo
command in a subshell, e.g., as follows:
#!/bin/bash
joined=""
cat example.txt | \
( \
while read line; do
echo ${line} | grep -E -q "^BEGIN"
if [ $? -eq 0 ]; then
if [ "${joined}" != "" ]; then
echo "In $$ $BASHPID: Joind Line: ${joined}"
joined=""
fi
fi
joined="${joined} ${line}"
done
echo "In $$ $BASHPID: Joind Line: ${joined}" \
)
Let's run this revised script. We shall get:
$ ./join_lines.sh
In 7119 7121: Joind Line: BEGIN Line 1 Line 1 Line 1 Line 1
In 7119 7121: Joind Line: BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7119 7121: Joind Line: BEGIN Line 3 Line 3 Line 3 Line 3 Line 3
The mystery is solved!
No comments:
Post a Comment