-
Notifications
You must be signed in to change notification settings - Fork 36
bug-868: parser confusions with braces inside ${ cmd; } form of command substitution #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
phidebian
wants to merge
4
commits into
ksh93:dev
Choose a base branch
from
phidebian:bug-691
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
----------------- The documentation documents 2 behaviors that are contradictory, and obviously the implementation implements only one of those, and unfortunately this is the worst one that is implemented. The documnetation says Command Substitution. The standard output from a command list enclosed in parentheses pre‐ ceded by a dollar sign ( $(list) ), or in a brace group preceded by a dollar sign ( ${ list;} ), or in a pair of grave accents (``) may be used as part or all of a word; trailing new-lines are removed. In the second case, the { and } are treated as a reserved words so that { must be followed by a blank and } must appear at the beginning of the line or follow a ;. ... Except for the second form, the command list is run in a subshell so that no side effects are possible. For the second form, the final } will be recognized as a reserved word after any token. So the former part say that ${ list ;} is closed either by ;} or \n} This is not true the current implementation miss this, ;} or \n} is never recognised as a closing for ${ list. The later part say that ${ list ;} is closed either by a } following any token, meaning the first } alone (as a word) is considered a close. Bottom line alias echo1=echo alias echo2=echo $ echo1 ${ echo2 word } } } word } } This is because the implementation close ${ list with the first } then echo2 receive word as uniq arg, then echo1 receive arg } } This is a 'lazy' } lookup for ${... If we want to get the ${ list [;\n] } semantic we got to implement a 'greedy' } lookup, i.e all the } that are not a reserved word, i.e } following [;\n] are then regular chars. Because this is not compatible with the current implementation, and because we want to keep the historical strange implementation, we got to introduce a ksh93 option, when it is active we want the greedy lookup. The option flag is called comsub_brace_greedy and is off by default, i.e the test suite use the lazy implementation, and the greedy implementation require specific tests. Ksh93 option definition. ----------------------- * cmd/ksh93/include/shell.h : Define SH_COMSUB_BRACE_GREEDY * cmd/ksh93/data/options.c : Enter "comsub_brace_greedy" into shtab_options[] Greedy implementation --------------------- * cmd/ksh93/include/shlex.h: Add one more member in struct _shlex_pvt_lexstate_ { ... char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */ ... } When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then lp->lex.inbracecomsub=1; Skipping over } in greedy mode ============================== * cmd/ksh93/sh/lex.c:sh_lex() ======== while(1) { /* skip over characters in the current state */ state = sh_lexstates[mode]; do { n = STATE(state,c); .... switch(n) case S_REG: case S_BRACE: For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got n==S_BRACE, we implement a gross hack here, if the '}' is not on the first reserved word, we pretend we got a S_REG and accept the } as a char, not a closing ${...} this is done with n = STATE(state,c); /* * Bug-691: Phi gross hack * Greedy ${list} will treat all '}' not on a reserved * word (i.e after [;\n]) as regular char. * Non greedy close ${ on first '}' (ATT code) */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && lp->lex.inbracecomsub && c=='}' && ( state==sh_lexstates[ ST_BEGIN] || state==sh_lexstates[ST_NORM] ) ) { n=S_REG ; /* treat '}' as reg char */ } Setting inbracecomsub=1 ======================= * cmd/ksh93/sh/lex.c:comsub() ======== save = lp->lex; ... /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && *cp=='{' ) lp->lex.inbracecomsub = 1; Note inbracecomsub is set to 1 here, and reset at return time lp->lex = save; (See Resetting inbracecomsub) * cmd/ksh93/sh/parse.c:sh_dolparen() ============= case LBRACE: lp->lex.inbracecomsub=1; /* bug-691: Phi: */ Resetting inbracecomsub=0 ========================= * cmd/ksh93/sh/lex.c:comsub() ======== A seen above the comsub() return time do reset inbracecomsub=0 * cmd/ksh93/sh/lex.c:sh_lex() ======== case S_NL: /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; case S_OP: /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; After a new line or ';' we are not in greedy mode any more next } will close ${ New test ======== src/cmd/ksh93/tests/shtests: ======= Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]} Test accouting fixed for greedy_comsub.sh greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi:
----------------- The documentation documents 2 behaviors that are contradictory, and obviously the implementation implements only one of those, and unfortunately this is the worst one that is implemented. The documnetation says Command Substitution. The standard output from a command list enclosed in parentheses pre‐ ceded by a dollar sign ( $(list) ), or in a brace group preceded by a dollar sign ( ${ list;} ), or in a pair of grave accents (``) may be used as part or all of a word; trailing new-lines are removed. In the second case, the { and } are treated as a reserved words so that { must be followed by a blank and } must appear at the beginning of the line or follow a ;. ... Except for the second form, the command list is run in a subshell so that no side effects are possible. For the second form, the final } will be recognized as a reserved word after any token. So the former part say that ${ list ;} is closed either by ;} or \n} This is not true the current implementation miss this, ;} or \n} is never recognised as a closing for ${ list. The later part say that ${ list ;} is closed either by a } following any token, meaning the first } alone (as a word) is considered a close. Bottom line alias echo1=echo alias echo2=echo $ echo1 ${ echo2 word } } } word } } This is because the implementation close ${ list with the first } then echo2 receive word as uniq arg, then echo1 receive arg } } This is a 'lazy' } lookup for ${... If we want to get the ${ list [;\n] } semantic we got to implement a 'greedy' } lookup, i.e all the } that are not a reserved word, i.e } following [;\n] are then regular chars. Because this is not compatible with the current implementation, and because we want to keep the historical strange implementation, we got to introduce a ksh93 option, when it is active we want the greedy lookup. The option flag is called comsub_brace_greedy and is off by default, i.e the test suite use the lazy implementation, and the greedy implementation require specific tests. Ksh93 option definition. ----------------------- * cmd/ksh93/include/shell.h : Define SH_COMSUB_BRACE_GREEDY * cmd/ksh93/data/options.c : Enter "comsub_brace_greedy" into shtab_options[] Greedy implementation --------------------- * cmd/ksh93/include/shlex.h: Add one more member in struct _shlex_pvt_lexstate_ { ... char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */ ... } When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then lp->lex.inbracecomsub=1; Skipping over } in greedy mode ============================== * cmd/ksh93/sh/lex.c:sh_lex() ======== while(1) { /* skip over characters in the current state */ state = sh_lexstates[mode]; do { n = STATE(state,c); .... switch(n) case S_REG: case S_BRACE: For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got n==S_BRACE, we implement a gross hack here, if the '}' is not on the first reserved word, we pretend we got a S_REG and accept the } as a char, not a closing ${...} this is done with n = STATE(state,c); /* * Bug-691: Phi gross hack * Greedy ${list} will treat all '}' not on a reserved * word (i.e after [;\n]) as regular char. * Non greedy close ${ on first '}' (ATT code) */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && lp->lex.inbracecomsub && c=='}' && ( state==sh_lexstates[ ST_BEGIN] || state==sh_lexstates[ST_NORM] ) ) { n=S_REG ; /* treat '}' as reg char */ } Setting inbracecomsub=1 ======================= * cmd/ksh93/sh/lex.c:comsub() ======== save = lp->lex; ... /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && *cp=='{' ) lp->lex.inbracecomsub = 1; Note inbracecomsub is set to 1 here, and reset at return time lp->lex = save; (See Resetting inbracecomsub) * cmd/ksh93/sh/parse.c:sh_dolparen() ============= case LBRACE: lp->lex.inbracecomsub=1; /* bug-691: Phi: */ Resetting inbracecomsub=0 ========================= * cmd/ksh93/sh/lex.c:comsub() ======== A seen above the comsub() return time do reset inbracecomsub=0 * cmd/ksh93/sh/lex.c:sh_lex() ======== case S_NL: /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; case S_OP: /* bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; After a new line or ';' we are not in greedy mode any more next } will close ${ New test ======== src/cmd/ksh93/tests/shtests: ======= Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]} Test accouting fixed for greedy_comsub.sh greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The documentation documents 2 behaviors that are contradictory, and obviously the implementation implements only one of those, and unfortunately this is the worst one that is implemented.
The documnetation says
Command Substitution.
The standard output from a command list enclosed in parentheses pre‐
ceded by a dollar sign ( $(list) ), or in a brace group preceded by a
dollar sign ( ${ list;} ), or in a pair of grave accents (``) may be
used as part or all of a word; trailing new-lines are removed. In the
second case, the { and } are treated as a reserved words so that { must
be followed by a blank and } must appear at the beginning of the line
or follow a ;.
...
Except for the second form, the command list is run in a
subshell so that no side effects are possible. For the second form,
the final } will be recognized as a reserved word after any token.
So the former part say that ${ list ;} is closed either by ;} or \n} This is not true the current implementation miss this, ;} or \n} is never recognised as a closing for ${ list.
The later part say that ${ list ;} is closed either by a } following any token, meaning the first } alone (as a word) is considered a close.
Bottom line
alias echo1=echo
alias echo2=echo
$ echo1 ${ echo2 word } } }
word } }
This is because the implementation close ${ list with the first } then echo2 receive word as uniq arg, then echo1 receive arg } } This is a 'lazy' } lookup for ${...
If we want to get the ${ list [;\n] } semantic we got to implement a 'greedy' } lookup, i.e all the } that are not a reserved word, i.e } following [;\n] are then regular chars.
Because this is not compatible with the current implementation, and because we want to keep the historical strange implementation, we got to introduce a ksh93 option, when it is active we want the greedy lookup.
The option flag is called comsub_brace_greedy and is off by default, i.e the test suite use the lazy implementation, and the greedy implementation require specific tests.
Ksh93 option definition.
Greedy implementation
struct shlex_pvt_lexstate
{ ...
char inbracecomsub; /* 1 inside ${ ... [;\n]} bug-691: Phi: */ ... }
When ${ is scanned and SH_COMSUB_BRACE_GREEDY is on then lp->lex.inbracecomsub=1;
Skipping over } in greedy mode
For state ST_BEGIN and ST_NORM, when STATE return c=='}' we got n==S_BRACE, we implement a gross hack here, if the '}' is not on the first reserved word, we pretend we got a S_REG and accept the } as a char, not a closing ${...} this is done with
n = STATE(state,c);
/*
* Bug-691: Phi gross hack
* Greedy ${list} will treat all '}' not on a reserved * word (i.e after [;\n]) as regular char. * Non greedy close ${ on first '}' (ATT code) / if( sh_isoption(SH_COMSUB_BRACE_GREEDY) && lp->lex.inbracecomsub && c=='}' && ( state==sh_lexstates[ ST_BEGIN] || state==sh_lexstates[ST_NORM] ) ) { n=S_REG ; / treat '}' as reg char */ }
Setting inbracecomsub=1
Note inbracecomsub is set to 1 here, and reset at return time
lp->lex = save; (See Resetting inbracecomsub)
Resetting inbracecomsub=0
cmd/ksh93/sh/lex.c:comsub() ======== A seen above the comsub() return time do reset inbracecomsub=0
cmd/ksh93/sh/lex.c:sh_lex() ======== case S_NL: /* bug-691: Phi: / if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; case S_OP: / bug-691: Phi: */ if( sh_isoption(SH_COMSUB_BRACE_GREEDY) ) lp->lex.inbracecomsub=0; After a new line or ';' we are not in greedy mode any more next } will close ${
New test
src/cmd/ksh93/tests/shtests:
=======
Added greedy_comsub.sh for testing greedy ${...{...}...[;\n]} Test accouting fixed for greedy_comsub.sh
greedy_comsub.sh) grep -c '^T ' $i ;; # bug-691: Phi: