Which hack is this?
Steven Elling
ellings at kcnet.com
Sat Dec 7 18:26:54 CST 2002
On Saturday 07 December 2002 11:04, Hanasaki JiJi wrote:
> Any thoughts on how to decode the below and determine what it was trying
> to send out?
>
> squid[5383]: urlParse: Illegal character in hostname
> '%77ww.o%6e%6cine%2du%70d%61%74e-c%65%6e%74%65%72%2ec%6fm'
First, what is the character set used by the client/app that requested the
above URL? Your logs might tell you, but if not assume US-ASCII.
Here is the reference for decoding the URL: RFC 2396 [2. URI Characters and
Escape Sequences - page 7] (ftp://ftp.rfc-editor.org/in-notes/rfc2396.txt)
"2.1 URI and non-ASCII characters
The relationship between URI and characters has been a source of
confusion for characters that are not part of US-ASCII. To describe
the relationship, it is useful to distinguish between a "character"
(as a distinguishable semantic entity) and an "octet" (an 8-bit
byte). There are two mappings, one from URI characters to octets, and
a second from octets to original characters:
URI character sequence->octet sequence->original character sequence
A URI is represented as a sequence of characters, not as a sequence
of octets. That is because URI might be "transported" by means that
are not through a computer network, e.g., printed on paper, read over
the radio, etc.
A URI scheme may define a mapping from URI characters to octets;
whether this is done depends on the scheme. Commonly, within a
delimited component of a URI, a sequence of characters may be used to
represent a sequence of octets. For example, the character "a"
represents the octet 97 (decimal), while the character sequence "%",
"0", "a" represents the octet 10 (decimal)."
...
More information about the Kclug
mailing list