Which hack is this?

Steven Elling ellings at kcnet.com
Sat Dec 7 18:26:54 CST 2002


On Saturday 07 December 2002 11:04, Hanasaki JiJi wrote:
> Any thoughts on how to decode the below and determine what it was trying
> to send out?
>
> squid[5383]: urlParse: Illegal character in hostname
> '%77ww.o%6e%6cine%2du%70d%61%74e-c%65%6e%74%65%72%2ec%6fm'

First, what is the character set used by the client/app that requested the 
above URL?  Your logs might tell you, but if not assume US-ASCII.

Here is the reference for decoding the URL: RFC 2396 [2. URI Characters and 
Escape Sequences - page 7] (ftp://ftp.rfc-editor.org/in-notes/rfc2396.txt)

"2.1 URI and non-ASCII characters

   The relationship between URI and characters has been a source of
   confusion for characters that are not part of US-ASCII. To describe
   the relationship, it is useful to distinguish between a "character"
   (as a distinguishable semantic entity) and an "octet" (an 8-bit
   byte). There are two mappings, one from URI characters to octets, and
   a second from octets to original characters:

   URI character sequence->octet sequence->original character sequence

   A URI is represented as a sequence of characters, not as a sequence
   of octets. That is because URI might be "transported" by means that
   are not through a computer network, e.g., printed on paper, read over
   the radio, etc.

   A URI scheme may define a mapping from URI characters to octets;
   whether this is done depends on the scheme. Commonly, within a
   delimited component of a URI, a sequence of characters may be used to
   represent a sequence of octets. For example, the character "a"
   represents the octet 97 (decimal), while the character sequence "%",
   "0", "a" represents the octet 10 (decimal)."

...




More information about the Kclug mailing list