Showing posts with label Bash. Show all posts
Showing posts with label Bash. Show all posts

Friday, January 27, 2023

Mysterious bash while read var behavior understood!

This is note about a mysterious behavior of while read var of the Bash shell. To understand the problem, let's consider the following problem:

Given a text file called example.txt as follows, write a Bash shell script called join_lines.sh to join the lines


BEGIN Line 1 Line 1
Line 1 Line 1
BEGIN Line 2 Line 2
Line 2 Line 2
Line 2 Line 2
Line 2
BEGIN Line 3 Line 3 Line 3
Line 3
Line 3

The output should be 3 lines, as illustrated in the example below:


$ ./join_lines.sh
Joined Line: BEGIN Line 1 Line 1 Line 1 Line 1
Joined Line: BEGIN Line 2 Line 2 Line 2 Line 2 ine 2 Line 2 Line 2
Joined Line: BEGIN Line 3 Line 3 Line 3 ine 3 ine 3

Our first implementation of join_lines.sh is as follows:


#!/bin/bash

joined=""
cat test.txt | \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "Joind Line: ${joined}"

Unfortunately, the output is actually the following:


$ ./join_lines.sh
Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
Joind Line:
$

Why does variable joined lose its value? That is a mystery, isn't it? To understand this, let's revise the script to print out the process ID's of the shell. The revised version is as follows:


#!/bin/bash

joined=""
cat example.txt | \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "In $$ $BASHPID: Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "In $$ $BASHPID: Joind Line: ${joined}"

If we run this revised script, we shall get something like the following:


$ ./join_lines.sh
In 7065 7067: Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
In 7065 7067: Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7065 7065: Joind Line:
$

By carefully examine the output, we can see that $$ and $BASHPID have different values at the first two lines. So, what is the difference between $$ and $BASHPID and why are they different?

The Bash manaual page states this:


$ man bash
...
 BASHPID
              Expands  to  the  process  ID of the current bash process.  This
              differs from $$ under certain circumstances, such  as  subshells
              that  do  not require bash to be re-initialized.  Assignments to
              BASHPID have no effect.  If BASHPID is unset, it loses its  spe‐
              cial properties, even if it is subsequently reset.
 ...
$

The above experiment actually reveals that the while read-loop actually needs to run in a subshell. In fact, there are two variables, both called joined, one lives in the parent and the other the child bash process. A simple fix to the script would be to put the while read-loop and the last echo command in a subshell, e.g., as follows:


#!/bin/bash

joined=""
cat example.txt | \
	( \
    while read line; do
        echo ${line} | grep -E -q "^BEGIN"
        if [ $? -eq 0 ]; then
            if [ "${joined}" != "" ]; then
                echo "In $$ $BASHPID: Joind Line: ${joined}"
                joined=""
            fi
        fi
        joined="${joined} ${line}"
    done
echo "In $$ $BASHPID: Joind Line: ${joined}" \
	)

Let's run this revised script. We shall get:


$ ./join_lines.sh
In 7119 7121: Joind Line:  BEGIN Line 1 Line 1 Line 1 Line 1
In 7119 7121: Joind Line:  BEGIN Line 2 Line 2 Line 2 Line 2 Line 2 Line 2 Line 2
In 7119 7121: Joind Line:  BEGIN Line 3 Line 3 Line 3 Line 3 Line 3

The mystery is solved!