regex question

Garrett Goebel garrett at scriptpro.com
Mon Jul 7 19:48:52 CDT 2003


If you want help with regexen in the future, you might try kc.pm.org's
mailing list ;)

#!/usr/bin/perl
use LWP::Simple;
$ARGV[0] = 3533653544; # O'Reilly: Programming Perl

if($ARGV[0]) {
my ($text, $cur, $sell, $buyer, $weight, $sku, $desc);
my $url =
"http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV[0]";

$text = get($url);

($cur)    = $text =~ /Currents+bid:.*?$([0-9.,]+)/s;
($sell)   = $text =~
/Sellers+information.*?ShowCoreAskSellerQuestion(?:[^>]*)>([^<]+)/s;
($buy)    = $text =~ /Highs+bidder:.*?ReturnUserEmail(?:[^>]*)>([^<]+)/s;
($weight) = $text =~ /WEIGHT=(d+)/s || 1;
($sku)    = $text =~ /SKU=(d+)/s;
($desc)   = $text =~ /.*-s(.*?)</title>/s;

defined($_) or $_ = ''  for $cur, $sell, $buyer, $weight, $sku, $desc;
$cur =~ s/,//g  if $cur;

if ($cur) {
print("$cur|$weight|$sku|$desc|$sell|$buy");
}
}

--
Garrett Goebel
IS Development Specialist

ScriptPro                   Direct: 913.403.5261
5828 Reeds Road               Main: 913.384.1008
Mission, KS 66202              Fax: 913.384.2180
www.scriptpro.com          garrett at scriptpro.com

> -----Original Message-----
> From: Jason Crowe [mailto:jcrowe at cmuonline.net]
> Sent: Monday, July 07, 2003 1:51 PM
> To: Garrett Goebel
> Cc: Kclug
> Subject: Re: regex question [x-bayes]
>
>
> Garrett Goebel wrote:
>
> > Jason Crowe wrote:
> > >
> > > I have this regex:
> > >
> > > $content =~ m/Current
> bid:n</font>n</td>n.+n<.+$([0-9.,]+)</
> > >
> > > That should match for this string and place the number
> into $content:
> > >
> > > Current bid:
> > > </font>
> > > </td>
> > > <td width="100%">
> > > <font face="Arial" size="2"><b>US $9.99<
> >
> > my $url =
> > "http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV
> > <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>[0]";
> >
> > if($ARGV[0]) {
> >   my $content = get($url);
> >   my ($price) = $content =~ /Current bid:.*?$([0-9.,]+)/s;
> >   $price =~ s/,//g  if $price;
> > }
> >
> > I missed your earlier post... Does this work for you? It sets a
> > variable $price to the value of whatever you've scraped after the
> > first $ after "Current bid:".
> >
> > The trailing 's' in the regex says to treat the hole thing
> as a single
> > line. At which point you can pretty much ignore the
> end-of-line issue.
> >
> > --
> > Garrett Goebel
> > IS Development Specialist
> >
> > ScriptPro                  Direct: 913.403.5261
> > 5828 Reeds Road            Main:   913.384.1008
> > Mission, KS 66202          Fax:    913.384.2180
> > www.scriptpro.com          garrett at scriptpro dot com
> >
> Thanks,
> Someone showed me that the problem was caused by added spaces on the
> ebay page. Unfortunatly there is more than one variable that
> is causeing
> problems. Here is the script as it is now. The seller, buyer & desc
> variables are the ones not working currently.
>
> Thanks,
> Jason
>
> #!/usr/bin/perl
>
> use LWP::Simple;
>
> if($ARGV[0]) {
>   $content =
> get("http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$A
> RGV[0]");
>   $content =~ s/r//g;
>
>   if($content =~
> m/Currents*bid:s*ns*</font>s*ns*</td>s*n*.+n*s*<.
> +$([0-9.,]+)</)
> {
>     $currently = $1;
>     $currently =~ s/,//g;
>   }
>   if($content =~
> m/Seller.+s*ns*</font>s*ns*</td>s*ns*.+requested=(.
> +)&amp;iid/)
> {
>     $seller = $1;
>   }
>   if($content =~
> m/Highs*bidder:ns*</font>s*ns*</td>s*n*.+ns*.+s*r
> equested=(.+)s*&amp;iid/)
> {
>     $buyer = $1;
>   }
>   if($content =~ m/WEIGHT=(d+)/) {
>     $weight = $1;
>   }
>   if($content =~ m/SKU=(d+)/) {
>     $sku = $1;
>   }
>   if($content =~
> m/<title>n(.+)n(.+)n(.+)n(.+)n(.+)n(.+)</title>/) {
>     $desc = "$6";
>   }
>   if(!$weight || $weight == 0) {
>     $weight = 1;
>   }
>   if($currently) {
>     print("$currently|$weight|$sku|$desc|$seller|$buyer");
>   }
> }
>





More information about the Kclug mailing list