regex question

Jason Crowe jcrowe at cmuonline.net
Tue Jul 8 14:01:14 CDT 2003


Thanks for the response. I ended up figuring most of the program out 
based on the one good regex statement that I had. The only problem I 
have left is that this:
_____________________________
if($content =~ 
m/Price:s*ns*</font>s*ns*</td>s*n*.+n*s*<.+$([0-9.,]+)</) {
        $currently = $1;
         $currently =~ s/,//g;
    }
____________________________
won't match this:
____________________________

<td nowrap="yes" valign="top">
<font face="Arial" size="2"><img src="http://pics.ebay.com/aw/pics/bin_15x54.gif" alt="Buy It Now"> 
Price:</font>
</td>

<td valign="top">
<font face="Arial" size="2"><b>US $361.51</b></font>
____________________________

even though this:
____________________________
if($content =~ m/Winnings*bid:s*ns*</font>s*ns*</td>s*n*.+n*s*<.+$([0-9.,]+)</) {
		$currently = $1;
     	$currently =~ s/,//g;
    	}
____________________________

does match:
____________________________

<font face="Arial" size="2">
						Winning bid:
						</font>
</td>
<td width="100%">
<font face="Arial" size="2"><b>US $10.50<font face="Verdana" size="1" 
color="#666666"></font></b></font>
</td>
____________________________
I don't see any differance that would cause this to mess up. :( I am reading all I can on regex's 
and it's a little more involved than I had hoped.

Jason

Garrett Goebel wrote:

> If you want help with regexen in the future, you might try kc.pm.org's 
> mailing list ;)
>
> #!/usr/bin/perl
> use LWP::Simple;
> $ARGV[0] = 3533653544; # O'Reilly: Programming Perl
>
> if($ARGV[0]) {
>   my ($text, $cur, $sell, $buyer, $weight, $sku, $desc);
>   my $url = 
> "http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV 
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>[0]";
>
>   $text = get($url);
>
>   ($cur)    = $text =~ /Currents+bid:.*?$([0-9.,]+)/s;
>   ($sell)   = $text =~ 
> /Sellers+information.*?ShowCoreAskSellerQuestion(?:[^>]*)>([^<]+)/s;
>   ($buy)    = $text =~ 
> /Highs+bidder:.*?ReturnUserEmail(?:[^>]*)>([^<]+)/s;
>   ($weight) = $text =~ /WEIGHT=(d+)/s || 1;
>   ($sku)    = $text =~ /SKU=(d+)/s;
>   ($desc)   = $text =~ /.*-s(.*?)</title>/s;
>
>   defined($_) or $_ = ''  for $cur, $sell, $buyer, $weight, $sku, $desc;
>   $cur =~ s/,//g  if $cur;
>
>   if ($cur) {
>     print("$cur|$weight|$sku|$desc|$sell|$buy");
>   }
> }
>
> -- 
> Garrett Goebel
> IS Development Specialist
>
> ScriptPro                   Direct: 913.403.5261
> 5828 Reeds Road               Main: 913.384.1008
> Mission, KS 66202              Fax: 913.384.2180
> www.scriptpro.com          garrett at scriptpro.com
>
>
>
> > -----Original Message-----
> > From: Jason Crowe [mailto:jcrowe at cmuonline.net]
> > Sent: Monday, July 07, 2003 1:51 PM
> > To: Garrett Goebel
> > Cc: Kclug
> > Subject: Re: regex question [x-bayes]
> >
> >
> > Garrett Goebel wrote:
> >
> > > Jason Crowe wrote:
> > > >
> > > > I have this regex:
> > > >
> > > > $content =~ m/Current
> > bid:n</font>n</td>n.+n<.+$([0-9.,]+)</
> > > >
> > > > That should match for this string and place the number
> > into $content:
> > > >
> > > > Current bid:
> > > > </font>
> > > > </td>
> > > > <td width="100%">
> > > > <font face="Arial" size="2"><b>US $9.99<
> > >
> > > my $url =
> > > "http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV 
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>
> > > <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV 
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>>[0]";
> > >
> > > if($ARGV[0]) {
> > >   my $content = get($url);
> > >   my ($price) = $content =~ /Current bid:.*?$([0-9.,]+)/s;
> > >   $price =~ s/,//g  if $price;
> > > }
> > >
> > > I missed your earlier post... Does this work for you? It sets a
> > > variable $price to the value of whatever you've scraped after the
> > > first $ after "Current bid:".
> > >
> > > The trailing 's' in the regex says to treat the hole thing
> > as a single
> > > line. At which point you can pretty much ignore the
> > end-of-line issue.
> > >
> > > --
> > > Garrett Goebel
> > > IS Development Specialist
> > >
> > > ScriptPro                  Direct: 913.403.5261
> > > 5828 Reeds Road            Main:   913.384.1008
> > > Mission, KS 66202          Fax:    913.384.2180
> > > www.scriptpro.com          garrett at scriptpro dot com
> > >
> > Thanks,
> > Someone showed me that the problem was caused by added spaces on the
> > ebay page. Unfortunatly there is more than one variable that
> > is causeing
> > problems. Here is the script as it is now. The seller, buyer & desc
> > variables are the ones not working currently.
> >
> > Thanks,
> > Jason
> >
> > #!/usr/bin/perl
> >
> > use LWP::Simple;
> >
> > if($ARGV[0]) {
> >   $content =
> > get("http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$A 
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$A>
> > RGV[0]");
> >   $content =~ s/r//g;
> > 
> >   if($content =~
> > m/Currents*bid:s*ns*</font>s*ns*</td>s*n*.+n*s*<.
> > +$([0-9.,]+)</)
> > {
> >     $currently = $1;
> >     $currently =~ s/,//g;
> >   }
> >   if($content =~
> > m/Seller.+s*ns*</font>s*ns*</td>s*ns*.+requested=(.
> > +)&amp;iid/)
> > {
> >     $seller = $1;
> >   }
> >   if($content =~
> > m/Highs*bidder:ns*</font>s*ns*</td>s*n*.+ns*.+s*r
> > equested=(.+)s*&amp;iid/)
> > {
> >     $buyer = $1;
> >   }
> >   if($content =~ m/WEIGHT=(d+)/) {
> >     $weight = $1;
> >   }
> >   if($content =~ m/SKU=(d+)/) {
> >     $sku = $1;
> >   }
> >   if($content =~
> > m/<title>n(.+)n(.+)n(.+)n(.+)n(.+)n(.+)</title>/) {
> >     $desc = "$6";
> >   }
> >   if(!$weight || $weight == 0) {
> >     $weight = 1;
> >   }
> >   if($currently) {
> >     print("$currently|$weight|$sku|$desc|$seller|$buyer");
> >   }
> > }
> >
>




More information about the Kclug mailing list