Skip to content

bug-868: parser confusions with braces inside ${ cmd; } form of command substitution #868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

phidebian
Copy link


The documentation documents 2 behaviors that are contradictory, and obviously the implementation implements only one of those, and unfortunately this is the worst one that is implemented.

The documnetation says

Command Substitution.
The standard output from a command list enclosed in parentheses pre‐
ceded by a dollar sign ( $(list) ), or in a brace group preceded by a
dollar sign ( ${ list;} ), or in a pair of grave accents (``) may be
used as part or all of a word; trailing new-lines are removed. In the
second case, the { and } are treated as a reserved words so that { must
be followed by a blank and } must appear at the beginning of the line
or follow a ;.
...
Except for the second form, the command list is run in a
subshell so that no side effects are possible. For the second form,
the final } will be recognized as a reserved word after any token.

So the former part say that ${ list ;} is closed either by ;} or \n} This is not true the current implementation miss this, ;} or \n} is never recognised as a closing for ${ list.

The later part say that ${ list ;} is closed either by a } following any token, meaning the first } alone (as a word) is considered a close.

Bottom line

alias echo1=echo
alias echo2=echo
$ echo1 ${ echo2 word } } }
word } }

This is because the implementation close ${ list with the first } then echo2 receive word as uniq arg, then echo1 receive arg } } This is a 'lazy' } lookup for ${...

If we want to get the ${ list [;\n] } semantic we got to implement a 'greedy' } lookup, i.e all the } that are not a reserved word, i.e } following [;\n] are then regular chars.

Because this is not compatible with the current implementation, and because we want to keep the historical strange implementation, we got to introduce a ksh93 option, when it is active we want the greedy lookup.

The option flag is called comsub_brace_greedy and is off by default, i.e the test suite use the lazy implementation, and the greedy implementation require specific tests.

Ksh93 option definition.

  • cmd/ksh93/include/shell.h : Define SH_COMSUB_BRACE_GREEDY
  • cmd/ksh93/data/options.c : Enter "comsub_brace_greedy" into shtab_options[]

Greedy implementation

  • cmd/ksh93/include/shlex.h: Add one more member in
    struct shlex_pvt_lexstate
    { ...
    char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */ ... }
    When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then lp->lex.inbracecomsub=1;

Skipping over } in greedy mode

  • cmd/ksh93/sh/lex.c:sh_lex() ======== while(1) { /* skip over characters in the current state */ state = sh_lexstates[mode]; do { n = STATE(state,c); .... switch(n) case S_REG: case S_BRACE:

For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got n==S_BRACE, we implement a gross hack here, if the '}' is not on the first reserved word, we pretend we got a S_REG and accept the } as a char, not a closing ${...} this is done with
n = STATE(state,c);
/*
* Bug-691: Phi gross hack
* Greedy ${list} will treat all '}' not on a reserved * word (i.e after [;\n]) as regular char. * Non greedy close ${ on first '}' (ATT code) / if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && lp->lex.inbracecomsub && c=='}' && ( state==sh_lexstates[ ST_BEGIN] || state==sh_lexstates[ST_NORM] ) ) { n=S_REG ; / treat '}' as reg char */ }

Setting inbracecomsub=1

  • cmd/ksh93/sh/lex.c:comsub() ======== save = lp->lex; ... /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && *cp=='{' ) lp->lex.inbracecomsub = 1;

Note inbracecomsub is set to 1 here, and reset at return time
lp->lex = save; (See Resetting inbracecomsub)

  • cmd/ksh93/sh/parse.c:sh_dolparen() ============= case LBRACE: lp->lex.inbracecomsub=1; /* bug-691: Phi: */

Resetting inbracecomsub=0

  • cmd/ksh93/sh/lex.c:comsub() ======== A seen above the comsub() return time do reset inbracecomsub=0

  • cmd/ksh93/sh/lex.c:sh_lex() ======== case S_NL: /* bug-691: Phi: / if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; case S_OP: / bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; After a new line or ';' we are not in greedy mode any more next } will close ${

New test

src/cmd/ksh93/tests/shtests:
=======
Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]} Test accouting fixed for greedy_comsub.sh
greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi:

-----------------

The documentation documents 2 behaviors that are contradictory, and
obviously the implementation implements only one of those, and
unfortunately this is the worst one that is implemented.

The documnetation says

   Command Substitution.
       The  standard  output  from a command list enclosed in parentheses pre‐
       ceded by a dollar sign ( $(list) ), or in a brace group preceded  by  a
       dollar  sign  (  ${ list;} ), or in a pair of grave accents (``) may be
       used as part or all of a word; trailing new-lines are removed.  In  the
       second case, the { and } are treated as a reserved words so that { must
       be  followed  by a blank and } must appear at the beginning of the line
       or follow a ;.
       ...
       Except for the second form, the command list is run  in  a
       subshell  so  that  no side effects are possible.  For the second form,
       the final } will be recognized as a reserved word after any token.

So the former part say that ${ list ;} is closed either by ;} or \n}
This is not true the current implementation miss this, ;} or \n} is
never recognised as a closing for ${ list.

The later part say that ${ list ;} is closed either by a } following any
token, meaning the first } alone (as a word) is considered a close.

Bottom line

alias echo1=echo
alias echo2=echo
$ echo1  ${ echo2 word } } }
word } }

This is because the implementation close ${ list with the first } then
echo2 receive word as uniq arg, then echo1 receive arg } }
This is a 'lazy' } lookup for ${...

If we want to get the ${ list [;\n] } semantic we got to implement a
'greedy' } lookup, i.e all the } that are not a reserved word, i.e }
following [;\n] are then regular chars.

Because this is not compatible with the current implementation, and
because we want to keep the historical strange implementation, we got to
introduce a ksh93 option, when it is active we want the greedy lookup.

The option flag is called comsub_brace_greedy and is off by default,
i.e the test suite use the lazy implementation, and the greedy
implementation require specific tests.

Ksh93 option definition.
-----------------------
* cmd/ksh93/include/shell.h : Define SH_COMSUB_BRACE_GREEDY
* cmd/ksh93/data/options.c  : Enter "comsub_brace_greedy" into shtab_options[]

Greedy implementation
---------------------
* cmd/ksh93/include/shlex.h:
Add one more member in
struct _shlex_pvt_lexstate_
{ ...
  char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */
  ...
}
When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then
lp->lex.inbracecomsub=1;

Skipping over } in greedy mode
==============================
* cmd/ksh93/sh/lex.c:sh_lex()
                     ========
	while(1)
	{
		/* skip over characters in the current state */
		state = sh_lexstates[mode];
		do {
			n = STATE(state,c);
                        ....
                        switch(n)
                        case S_REG:
                        case S_BRACE:

For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got
n==S_BRACE, we implement a gross hack here, if the '}' is not on the
first reserved word, we pretend we got a S_REG and accept the } as a
char, not a closing ${...} this is done with
                        n = STATE(state,c);
                        /*
                         * Bug-691: Phi gross hack
                         * Greedy ${list} will treat all '}' not on a reserved
                         * word (i.e after [;\n]) as regular char.
                         * Non greedy close ${ on first '}' (ATT code)
                         */
                        if( sh_isoption(SH_COMSUB_BRACE_GREEDY) &&
                            lp->lex.inbracecomsub && c=='}' &&
                            ( state==sh_lexstates[ ST_BEGIN] ||
                              state==sh_lexstates[ST_NORM]
                            )
                          )
                        {  n=S_REG ; /* treat '}' as reg char */
                        }

Setting inbracecomsub=1
=======================
* cmd/ksh93/sh/lex.c:comsub()
                     ========
        save = lp->lex;
        ...
        /* bug-691: Phi: */
        if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && *cp=='{' )
        	lp->lex.inbracecomsub = 1;

Note inbracecomsub is set to 1 here, and reset at return time
	lp->lex = save; (See Resetting inbracecomsub)

* cmd/ksh93/sh/parse.c:sh_dolparen()
                       =============
	    case LBRACE:
		lp->lex.inbracecomsub=1; /* bug-691: Phi: */

Resetting inbracecomsub=0
=========================
* cmd/ksh93/sh/lex.c:comsub()
                     ========
A seen above the comsub() return time do reset inbracecomsub=0

* cmd/ksh93/sh/lex.c:sh_lex()
                     ========
			case S_NL:
                                /* bug-691: Phi: */
				if( sh_isoption(SH_COMSUB_BRACE_GREEDY) )
                                	lp->lex.inbracecomsub=0;
			case S_OP:
                       		/* bug-691: Phi: */
				if( sh_isoption(SH_COMSUB_BRACE_GREEDY) )
                                	lp->lex.inbracecomsub=0;
After a new line or ';' we are not in greedy mode any more next }
will close ${

New test
========
src/cmd/ksh93/tests/shtests:
                    =======
Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]}
Test accouting fixed for greedy_comsub.sh
                 greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi:
phidebian added 3 commits June 5, 2025 17:07
-----------------

The documentation documents 2 behaviors that are contradictory, and
obviously the implementation implements only one of those, and
unfortunately this is the worst one that is implemented.

The documnetation says

   Command Substitution.
       The  standard  output  from a command list enclosed in parentheses pre‐
       ceded by a dollar sign ( $(list) ), or in a brace group preceded  by  a
       dollar  sign  (  ${ list;} ), or in a pair of grave accents (``) may be
       used as part or all of a word; trailing new-lines are removed.  In  the
       second case, the { and } are treated as a reserved words so that { must
       be  followed  by a blank and } must appear at the beginning of the line
       or follow a ;.
       ...
       Except for the second form, the command list is run  in  a
       subshell  so  that  no side effects are possible.  For the second form,
       the final } will be recognized as a reserved word after any token.

So the former part say that ${ list ;} is closed either by ;} or \n}
This is not true the current implementation miss this, ;} or \n} is
never recognised as a closing for ${ list.

The later part say that ${ list ;} is closed either by a } following any
token, meaning the first } alone (as a word) is considered a close.

Bottom line

alias echo1=echo
alias echo2=echo
$ echo1  ${ echo2 word } } }
word } }

This is because the implementation close ${ list with the first } then
echo2 receive word as uniq arg, then echo1 receive arg } }
This is a 'lazy' } lookup for ${...

If we want to get the ${ list [;\n] } semantic we got to implement a
'greedy' } lookup, i.e all the } that are not a reserved word, i.e }
following [;\n] are then regular chars.

Because this is not compatible with the current implementation, and
because we want to keep the historical strange implementation, we got to
introduce a ksh93 option, when it is active we want the greedy lookup.

The option flag is called comsub_brace_greedy and is off by default,
i.e the test suite use the lazy implementation, and the greedy
implementation require specific tests.

Ksh93 option definition.
-----------------------
* cmd/ksh93/include/shell.h : Define SH_COMSUB_BRACE_GREEDY
* cmd/ksh93/data/options.c  : Enter "comsub_brace_greedy" into shtab_options[]

Greedy implementation
---------------------
* cmd/ksh93/include/shlex.h:
Add one more member in
struct _shlex_pvt_lexstate_
{ ...
  char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */
  ...
}
When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then
lp->lex.inbracecomsub=1;

Skipping over } in greedy mode
==============================
* cmd/ksh93/sh/lex.c:sh_lex()
                     ========
	while(1)
	{
		/* skip over characters in the current state */
		state = sh_lexstates[mode];
		do {
			n = STATE(state,c);
                        ....
                        switch(n)
                        case S_REG:
                        case S_BRACE:

For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got
n==S_BRACE, we implement a gross hack here, if the '}' is not on the
first reserved word, we pretend we got a S_REG and accept the } as a
char, not a closing ${...} this is done with
                        n = STATE(state,c);
                        /*
                         * Bug-691: Phi gross hack
                         * Greedy ${list} will treat all '}' not on a reserved
                         * word (i.e after [;\n]) as regular char.
                         * Non greedy close ${ on first '}' (ATT code)
                         */
                        if( sh_isoption(SH_COMSUB_BRACE_GREEDY) &&
                            lp->lex.inbracecomsub && c=='}' &&
                            ( state==sh_lexstates[ ST_BEGIN] ||
                              state==sh_lexstates[ST_NORM]
                            )
                          )
                        {  n=S_REG ; /* treat '}' as reg char */
                        }

Setting inbracecomsub=1
=======================
* cmd/ksh93/sh/lex.c:comsub()
                     ========
        save = lp->lex;
        ...
        /* bug-691: Phi: */
        if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && *cp=='{' )
        	lp->lex.inbracecomsub = 1;

Note inbracecomsub is set to 1 here, and reset at return time
	lp->lex = save; (See Resetting inbracecomsub)

* cmd/ksh93/sh/parse.c:sh_dolparen()
                       =============
	    case LBRACE:
		lp->lex.inbracecomsub=1; /* bug-691: Phi: */

Resetting inbracecomsub=0
=========================
* cmd/ksh93/sh/lex.c:comsub()
                     ========
A seen above the comsub() return time do reset inbracecomsub=0

* cmd/ksh93/sh/lex.c:sh_lex()
                     ========
			case S_NL:
                                /* bug-691: Phi: */
				if( sh_isoption(SH_COMSUB_BRACE_GREEDY) )
                                	lp->lex.inbracecomsub=0;
			case S_OP:
                       		/* bug-691: Phi: */
				if( sh_isoption(SH_COMSUB_BRACE_GREEDY) )
                                	lp->lex.inbracecomsub=0;
After a new line or ';' we are not in greedy mode any more next }
will close ${

New test
========
src/cmd/ksh93/tests/shtests:
                    =======
Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]}
Test accouting fixed for greedy_comsub.sh
                 greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi:
Make  comsub_brace_greedy available for shcomp as well.
In lex.c, for the shcomp context only add a set -o comsub_brace_greedy catch
and set sh.option accordingly so shcomp can proceed.
@phidebian phidebian changed the title Fixing ${ list; } bug-868: parser confusions with braces inside ${ cmd; } form of command substitution Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant