![]() | |
![]() |
| | Thread Tools | Display Modes |
#11
| |||
| |||
|
|
Csaba Gabor wrote: [...] you will have to account for strings and regular expressions such as: var code = "var messy='it was windy/*sunny*'+" and */cold/*" ^ ^ ^ The concatenation here is rather pointless. [...] |
|
is not syntactically correct to begin with. Which also points out that there is not Regular Expression here. |
#12
| |||
| |||
|
|
I'm looking for a function stripEndComments(code) { // remove trailing comments and whitespace from /* the end of code, which is presumed to be valid // javascript */ ... } |
#13
| |||
| |||
|
|
On Nov 4, 6:59ย*pm, abozhilov <fort... (AT) gmail (DOT) com> wrote: On 4 รฎรร ร, 13:51, Csaba ย*Gabor <dans... (AT) gmail (DOT) com> wrote: I'm looking for a function stripEndComments(code) { ลก // remove trailing comments and whitespace from ลก /* the end of code, which is presumed to be valid ลก // javascript */ ลก ... } You might be able to figure out a way to do this with regular expressions, but I'm thinking that it will be VERY messy because you will have to account for strings and regular expressions such as: var code = "var messy='it was windy/*sunny*'+" and */cold/*" |
#14
| |||
| |||
|
|
On Nov 4, 9:06 pm, Csaba Gabor <dans... (AT) gmail (DOT) com> wrote: You might be able to figure out a way to do this with regular expressions, but I'm thinking that it will be VERY messy because you will have to account for strings and regular expressions such as: var code = "var messy='it was windy/*sunny*'+" and */cold/*" Oops, I see I've made a transcription error. It should read: var code = "var messy='it was windy/*sunny*'+' and */cold/*'" |
|
But the following may be slightly more interesting: var code = "var mess='it\\'s windy//*sunny*'+' & */cold/*' //asdf" |
#15
| |||
| |||
|
|
Csaba Gabor wrote: abozhilov wrote: Csaba Gabor wrote: ลก // remove trailing comments and whitespace from ลก /* the end of code, which is presumed to be valid ลก // javascript */ ลก ... } .... How fortunate then that you don't know what you are talking about. It is rather easy to do if you do it properly. For example: code = code.replace( /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm, function(m, p1, p2, p3, p4) { return (p3 || p4) ? "" : m; }); |
#16
| |||
| |||
|
|
Thomas 'PointedEars' Lahn <PointedE... (AT) web (DOT) de> writes: Csaba Gabor wrote: abozhilov wrote: Csaba Gabor wrote: // remove trailing comments and whitespace from /* the end of code, which is presumed to be valid // javascript */ ... } ... How fortunate then that you don't know what you are talking about. It is rather easy to do if you do it properly. *For example: * code = code.replace( * * /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm, * * function(m, p1, p2, p3, p4) { * * * return (p3 || p4) ? "" : m; * * }); The ('(?:[^']|\\')*') part fails to recognize the end of the following string literal: * 'foo \\' and will match up to the next "'". Ditto for double-quoted strings. Try * ('(?:[^'\\]|\\[^])*') (Here I'm also allowing backslash-newline in string literals, even though it's not in the standard, otherwise replace "[^]" with "."). |
|
And it's easy to add standard (not-single-line) comments as well: * (\/\*(?:[^*]*\*+)*\/) |
|
This only works in the absence of regexp literals. RegExps are harder to recognize, because it's the syntactic starting point that distinguishes the starting slash from a division. E.g., * /foo + 42/g * might be a RegExp, if occuring in an expression context, but not if it occurs where an operator is expected: *bar/foo + 42/g (I.e., it's not tokenizable without context information). And if you can't recognize regexps, you can mess up the recognition of comments and strings as well. |
*\s*$/,"");
#17
| |||
| |||
|
|
On Nov 5, 7:19*am, Lasse Reichstein Nielsen <lrn.unr... (AT) gmail (DOT) com wrote: Thomas 'PointedEars' Lahn <PointedE... (AT) web (DOT) de> writes: Csaba Gabor wrote: abozhilov wrote: Csaba Gabor wrote: * // remove trailing comments and whitespace from * /* the end of code, which is presumed to be valid * // javascript */ * ... } My solution to the 'Remove trailing comments' exercise follows. My reason in posing the exercise was to highlight that in the best spirit of programming, one may use the browser's syntax checking capabilities to do the heavy lifting, rather than having to parse the entire code string manually. Reminder, I only want to remove the final comments at the end of the code, and not at the end of each line. *In short, I want to be able to get at the last code that actually "does something" (or might be doing something). After getting rid of trailing whitespace and vacuous lines, we consider that there exactly three situations. *The final characters are either: 1) *Part of a comment started by // 2) *The end of a comment started by /* 3) *Not a comment |
#18
| ||||
| ||||
|
|
Very interesting. I've not seen that [^] construct in javascript before. With a PHP regular expression if ] is the first character following the ^ in a character class, it means to exclude the right closing bracket ]. Evidently, PHP's [^]] translates to [^\]] in JS |
|
And it's easy to add standard (not-single-line) comments as well: (\/\*(?:[^*]*\*+)*\/) Or: (\/\*.*?(?=\*\/)..) though I have not extensively tested it |
|
Reminder, I only want to remove the final comments at the end of the code, |
|
and not at the end of each line. In short, I want to be able to get at the last code that actually "does something" (or might be doing something). After getting rid of trailing whitespace and vacuous lines, we consider that there exactly three situations. The final characters are either: 1) Part of a comment started by // 2) The end of a comment started by /* 3) Not a comment |
#19
| |||
| |||
|
|
Thomas 'PointedEars' Lahn <PointedEars (AT) web (DOT) de> writes: Csaba Gabor wrote: abozhilov wrote: Csaba Gabor wrote: ร ยก // remove trailing comments and whitespace from ร ยก /* the end of code, which is presumed to be valid ร ยก // javascript */ ร ยก ... } ... How fortunate then that you don't know what you are talking about. It is rather easy to do if you do it properly. For example: code = code.replace( /('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm, function(m, p1, p2, p3, p4) { return (p3 || p4) ? "" : m; }); The ('(?:[^']|\\')*') part fails to recognize the end of the following string literal: 'foo \\' and will match up to the next "'". Ditto for double-quoted strings. |
|
[...] And it's easy to add standard (not-single-line) comments as well: (\/\*(?:[^*]*\*+)*\/) This only works in the absence of regexp literals. RegExps are harder to recognize, because it's the syntactic starting point that distinguishes the starting slash from a division. E.g., /foo + 42/g might be a RegExp, if occuring in an expression context, but not if it occurs where an operator is expected: bar/foo + 42/g (I.e., it's not tokenizable without context information). And if you can't recognize regexps, you can mess up the recognition of comments and strings as well. |
#20
| |||
| |||
|
|
Lasse Reichstein Nielsen wrote: The ('(?:[^']|\\')*') part fails to recognize the end of the following string literal: 'foo \\' and will match up to the next "'". Ditto for double-quoted strings. Not here (Iceweasel 3.5.4, JavaScript 1.8.1). Have you used "'foo \\'" or "'foo \\\\'" for the test? Because the latter is the representation of 'foo \\' in a string value, while "'foo \\'" as a string value represents the syntactically invalid 'foo \' (which is why it must be matched up to the next apostrophe to be a string literal). |
|
/* 'foo \\' */ var code = "'foo \\\\' '"; /* ["'foo \\'", "'foo \\'"] */ /('(?:[^']|\\')*')/.exec(code) If I am overlooking something, can you explain why the recognition of this string literal should fail? |
|
Thank you. I am working on an ECMAScript-compliant source code parser and you have given me quite something to think about. |



![]() |
| Thread Tools | |
| Display Modes | |
| |