HighDots Forums  

string length and newlines

Javascript JavaScript language (comp.lang.javascript)


Discuss string length and newlines in the Javascript forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
David Mark
 
Posts: n/a

Default Re: string length and newlines - 01-14-2008 , 06:22 AM






On Jan 14, 4:59*am, Bart Van der Donck <b... (AT) nijlen (DOT) com> wrote:
Quote:
David Mark wrote:
On Jan 13, 6:11*pm, Bart Van der Donck <b... (AT) nijlen (DOT) com> wrote:

It is the browser itself who silently converts \n (or \r) into
\r\n, before the data is sent to the server. The script at the
server only reads out what was offered.

But the database should store in a predetermined canonical form,
regardless of what the browser says. *Whether that is \n, \n\r or \r
is up to the DBA.

You probably mean '\r\n' in stead of '\n\r'. I would say that it's
Yes. CRLF.

Quote:
rather up to the operating system. I haven't seen a case where the DBA
interferes with these OS settings when it comes to _storing_ data.

Fromhttp://en.wikipedia.org/wiki/Newline:
\r: Multics, Unix and Unix-like systems (GNU/Linux, AIX, Xenix, Mac OS
X, etc.), BeOS, Amiga, RISC OS, and others
\r\n: DEC RT-11 and most other early non-Unix, non-IBM OSes, CP/M, MP/
M, DOS, OS/2, Microsoft Windows
\n: Commodore machines, Apple II family and Mac OS up to version 9

http://www.rfc-editor.org/EOLstory.txtsays:
| ASCII text (ed.: like percent-encoded form-data) transmitted across
| the network *must* use the two-character sequence: CR LF (ed.: \r
\n).

I don't agree with your suggestion to store end-of-line characters as
\n by force; I would always store \r\n, as offered by the browser.

As offered by which browser? *As mentioned, some don't send \r\n.

When a browser doesn't send '\r\n', it violates RFC (see quotation
above fromhttp://www.rfc-editor.org/EOLstory.txt). The word *must*
means:

* | MUST * This word, or the terms "REQUIRED" or "SHALL", mean that
the
* | definition is an absolute requirement of the specification.

http://www.faqs.org/rfcs/rfc2119.html

One can safely conclude that a browser which doesn't send '\r\n' is a
bad browser.
I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.

The issue is only related to client-side validation. If the client
counts /n as one character, then it will disagree with the server side
validation. Your suggestion to convert two characters to one before
client-side validation doesn't seem to address the issue (though I may
be missing something.) It seems more logical to me to do the opposite
(you know it will be sent as two, so count it as two in the client.)
If the database stores it as one, there is no harm done.


Reply With Quote
  #12  
Old   
Steve Swift
 
Posts: n/a

Default Re: string length and newlines - 01-15-2008 , 02:05 AM






David Mark wrote:
Quote:
I re-read the OP as I thought it had implied that some browsers were
sending \n alone. If they all send \r\n and a text field is used in
the database (which would likely be the norm in this case), then you
are right on all counts.
I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.

I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area. I'm curious to
know if anyone sees a problem with this; I've not encountered one in
many years of practice.

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk


Reply With Quote
  #13  
Old   
Bart Van der Donck
 
Posts: n/a

Default Re: string length and newlines - 01-15-2008 , 04:56 AM



Steve Swift wrote:

Quote:
I have a related question. Many of my webpages use simple flat files as
their "database" with one line added per transaction. This is fine until
the data to be stored comes from a TEXTAREA, because that can contain
embedded CRLF/CR/LF sequences which would screw up the lines in my file.
Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC. This is an old and wide-spread convention; I
would be surprised to see any browser which would behave differently
(I would immediately send a bug report anyway).

Quote:
I've adopted the convention of converting CRLF or CR or LF into x'0102'
on the assumption that no one (certainly no one in their right mind)
will ever enter hex 01 or 02 characters into a text area.
You should be pretty safe. MSIE, FF and Opera don't allow \x01 and
\x02 to be typed inside form elements; CTRL+A and CTRL+B are shortcuts
to browser functions.

Quote:
I'm curious to know if anyone sees a problem with this; I've not
encountered one in many years of practice.
I think you have a robust solution. A good deal of the ASCII control
characters were actually meant for this purpose; you see them all the
time on older mainframe systems.

--
Bart


Reply With Quote
  #14  
Old   
Steve Swift
 
Posts: n/a

Default Re: string length and newlines - 01-16-2008 , 01:47 AM



Bart Van der Donck wrote:
Quote:
Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.
Bart, Thank you for confirming what I'd noticed in practice.
I do, however, have a few examples where single x'0A' characters have
found their way into my data files, and since this is the linend
sequence on my linux server, it caused problems.

I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs. :-)

--
Steve Swift
http://www.swiftys.org.uk/swifty.html
http://www.ringers.org.uk


Reply With Quote
  #15  
Old   
Bart Van der Donck
 
Posts: n/a

Default Re: string length and newlines - 01-16-2008 , 05:34 AM



Steve Swift wrote:

Quote:
Bart Van der Donck wrote:

Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.

Bart, Thank you for confirming what I'd noticed in practice.
I do, however, have a few examples where single x'0A' characters have
found their way into my data files, and since this is the linend
sequence on my linux server, it caused problems.

I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm thinking of 4 possibilities:

- manual crafting of the URL (?data=one%0Atwo)
- an incorrect browser violating RFC
- an error in the regular expression or its execution order; in your
case it's necessary to first do:
'\r\n' -> '\x01\x02'
before
'\r' -> '\x01\x02' and '\n' -> '\x01\x02'
- incorrect URL parsing of the server script like ?data=one%250Atwo
or something with percent-encoding under UTF-8 (headache warning)

I would go for your pragmatic approach as well.

--
Bart


Reply With Quote
  #16  
Old   
Michael White
 
Posts: n/a

Default Re: string length and newlines - 01-16-2008 , 10:39 AM



Bart Van der Donck wrote:

Quote:
Steve Swift wrote:


Bart Van der Donck wrote:


Checking on a separate CR or LF is not necessary; CR+LF should be
enough. Newlines in a TEXTAREA which are not transmitted as '\r\n',
are in violation of RFC.

....

I'm thinking of 4 possibilities:
[5] User copying and psting.
Mick


Reply With Quote
  #17  
Old   
Dr J R Stockton
 
Posts: n/a

Default Re: string length and newlines - 01-16-2008 , 02:01 PM



In comp.lang.javascript message <478ef61f (AT) news (DOT) greennet.net>, Wed, 16
Jan 2008 06:47:48, Steve Swift <Steve.J.Swift (AT) gmail (DOT) com> posted:
Quote:
I checked my code 'till I was blue in the face, and never found any way
this could happen unless a browser had submitted an x'0A' as a linend
from a TEXTAREA control. Of course, I have no control over what strange
browsers people might be using, so I took the pragmatic approach of
translating both x'0A' and x'0D' to my x'0102' "line-end" sequence.
There have been no re-occurrences of the problem.
I'm just waiting for the browser that sends x'0A0D' now, but hope to
retire before that occurs. :-)
Whenever data is of possibly uncertain origin, it is well to assume the
worst of the characters which come between the lines.

In (past?) Delphi, for example, one could by various editing generate a
source file in which most line separations were CRLF but some were just
LF (or maybe just CR). Unfortunately, the IDE editor believed both, but
the compiler only believed LF.

Therefore, in Delphi, with
<statement1> CR LF
// comment LF
<statement2> CR LF
<statement3> CR LF

<statement2> would not be compiled. An LF between statements would not
matter so much, since, in Delphi, newline is a terminator only for that
type of comment, and not for code statements.

One needs an algorithm to convert bad newlines to good ones.

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Delphi 3? Turnpike 6.05
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
<URL:http://www.bancoems.com/CompLangPascalDelphiMisc-MiniFAQ.htm> clpdmFAQ;
<URL:http://www.borland.com/newsgroups/guide.html> news:borland.* Guidelines


Reply With Quote
  #18  
Old   
Thomas 'PointedEars' Lahn
 
Posts: n/a

Default Re: string length and newlines - 01-16-2008 , 06:54 PM



Dr J R Stockton wrote:
Quote:
One needs an algorithm to convert bad newlines to good ones.
man recode
man iconv


PointedEars


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.