HighDots Forums  

encoding of scripts

HTML Writing HTML for the Web (comp.infosystems.www.authoring.html)


Discuss encoding of scripts in the HTML forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Andy Fish
 
Posts: n/a

Default encoding of scripts - 06-02-2008 , 06:41 AM






Hi,

using HTML 4.01 (not xhtml), I have recently discovered that this:

<script>var x='</script>';</script>

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

<script>var x='&lt;/script&gt;';</script>

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script&gt;"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

<script>if (3<5);</script>

IS valid html (and seems even to be valid XHTML) even though it is not valid
XML

Andy



Reply With Quote
  #2  
Old   
Erwin Moller
 
Posts: n/a

Default Re: encoding of scripts - 06-02-2008 , 07:20 AM






Andy Fish schreef:
Quote:
Hi,

using HTML 4.01 (not xhtml), I have recently discovered that this:

script>var x='</script>';</script

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script. initially my reaction was
that this is not a surprise because I had failed to HTML encode the script
contents, so my second attempt was this:

script>var x='&lt;/script>';</script

however this it DOES NOT WORK - the variable ends up containing the text
"&lt;/script>"

can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.

interestingly i have also discovered that this:

script>if (3<5);</script

IS valid html (and seems even to be valid XHTML) even though it is not valid
XML

Andy


What about:

<script>var x='<\/script>';</script>
?
Mind the added \

Regards,
Erwin Moller


Reply With Quote
  #3  
Old   
viza
 
Posts: n/a

Default Re: encoding of scripts - 06-02-2008 , 09:27 AM



On Jun 2, 12:41 pm, "Andy Fish" <ajf... (AT) blueyonder (DOT) co.uk> wrote:
Quote:
can someone point me at part of the w3c specification that states how script
tags are parsed differently to other tags in HTML.
http://www.w3.org/TR/html4/sgml/dtd.html#Script :

<!ENTITY % Script "CDATA" -- script expression -->

http://www.w3.org/TR/html4/sgml/dtd.html#head.content

<!ELEMENT SCRIPT - - %Script; -- script statements -->

Quote:
interestingly i have also discovered that this:

script>if (3<5);</script

IS valid html
Apart from the missing required "type" attribute, yes. The content
type of the script element in HTML4 is CDATA, which means everything
up to the first occurrence of </ is read as-is.

Quote:
(and seems even to be valid XHTML) even though it is not valid XML
This is not possible since XHTML is XML.

The content type of the script element in XHTML1 is PCDATA, which that
your original idea of using
var= '&lt;foo&gt;'

means the same as
var='<foo>'

in a raw javascript file. Note that this doesn't actually work "in
the wild", because most users have broken browsers (eg: IE).

The best thing to do is to never ever have anything in your script
elements and only include scripts in separate files.

HTH
viza


Reply With Quote
  #4  
Old   
Andreas Prilop
 
Posts: n/a

Default Re: encoding of scripts - 06-02-2008 , 09:34 AM



On Mon, 2 Jun 2008, Andy Fish wrote:

Quote:
Newsgroups: comp.infosystems.www.authoring.html
In how many newsgroups did you multipost?


Reply With Quote
  #5  
Old   
Jukka K. Korpela
 
Posts: n/a

Default Re: encoding of scripts - 06-02-2008 , 09:45 AM



Scripsit Andy Fish:

Quote:
using HTML 4.01 (not xhtml), I have recently discovered that this:

script>var x='</script>';</script

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script.
The fact that there is an end tag causes that. Quotes do not matter.
They are just data characters in this context.

Quote:
script>var x='&lt;/script>';</script

however this it DOES NOT WORK - the variable ends up containing the
text "&lt;/script>"
By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.

Quote:
can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML.
They aren't. The _content_ of the <script> _element_ is special. This
can be found in the HTML 4.01 specs simply by looking at the description
of that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript,
you could also write
var x='<'+'/script>';
but that looks a bit more hackish.

Quote:
interestingly i have also discovered that this:

script>if (3<5);</script

IS valid html
No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a
name start character.

Quote:
(and seems even to be valid XHTML)
It isn't valid in XHTML, since by XHTML rules, "<" must not appear in
any context as such except as the starting character of a tag.

In XHTML, the content model of <script> is #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the
delivery format of a web page, because IE does not support XHTML.

Quote:
even though it is not valid XML
It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.

There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
  #6  
Old   
Andy Fish
 
Posts: n/a

Default Re: encoding of scripts - 06-02-2008 , 10:58 AM



thanks for all the replies - i understand it all now

unfortunately i can't write all my scripts in separate js files because this
is all javascript that i'm generating on the fly on the server, but i have
amended my quoting/encoding functions to detect '</' and split it into 2
concatenated strings

:-)


"Jukka K. Korpela" <jkorpela (AT) cs (DOT) tut.fi> wrote

Quote:
Scripsit Andy Fish:

using HTML 4.01 (not xhtml), I have recently discovered that this:

script>var x='</script>';</script

is not valid HTML - the fact that there is an end script tag in quotes
causes the parser to stop recognising the script.

The fact that there is an end tag causes that. Quotes do not matter. They
are just data characters in this context.

script>var x='&lt;/script>';</script

however this it DOES NOT WORK - the variable ends up containing the
text "&lt;/script>"

By HTML 4.01 rules, yes. There the content model is CDATA, which means
that entity references are not recognized, and "&" is just a data
character.

can someone point me at part of the w3c specification that states how
script tags are parsed differently to other tags in HTML.

They aren't. The _content_ of the <script> _element_ is special. This can
be found in the HTML 4.01 specs simply by looking at the description of
that element; it points to
http://www.w3.org/TR/html401/types.html#type-script
which refers to an appendix that explains ways to overcome the "</"
problem, such as prefixing "/" with "\" in JavaScript. In JavaScript, you
could also write
var x='<'+'/script>';
but that looks a bit more hackish.

interestingly i have also discovered that this:

script>if (3<5);</script

IS valid html

No it isn't, but that's due to the lack of the type="..." attribute. If
you fix that, then it is valid. That's because the digit "5" isn't a name
start character.

(and seems even to be valid XHTML)

It isn't valid in XHTML, since by XHTML rules, "<" must not appear in any
context as such except as the starting character of a tag.

In XHTML, the content model of <script> is #PCDATA, so _there_ you could
use &lt; to stand for "<". But it's not wise to use XHTML as the delivery
format of a web page, because IE does not support XHTML.

even though it is not valid XML

It would be impossible for a document to be non-valid XML if it is valid
XHTML. This immediately follows from the _definition_ of validity.

There is a simple way to get rid of such complexities: write your script
into an external file and refer to it via <script type="text/javascript"
src="foo.js"></script>.

--
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.