regex question
Jason Crowe
jcrowe at cmuonline.net
Tue Jul 8 14:01:14 CDT 2003
Thanks for the response. I ended up figuring most of the program out
based on the one good regex statement that I had. The only problem I
have left is that this:
_____________________________
if($content =~
m/Price:s*ns*</font>s*ns*</td>s*n*.+n*s*<.+$([0-9.,]+)</) {
$currently = $1;
$currently =~ s/,//g;
}
____________________________
won't match this:
____________________________
<td nowrap="yes" valign="top">
<font face="Arial" size="2"><img src="http://pics.ebay.com/aw/pics/bin_15x54.gif" alt="Buy It Now">
Price:</font>
</td>
<td valign="top">
<font face="Arial" size="2"><b>US $361.51</b></font>
____________________________
even though this:
____________________________
if($content =~ m/Winnings*bid:s*ns*</font>s*ns*</td>s*n*.+n*s*<.+$([0-9.,]+)</) {
$currently = $1;
$currently =~ s/,//g;
}
____________________________
does match:
____________________________
<font face="Arial" size="2">
Winning bid:
</font>
</td>
<td width="100%">
<font face="Arial" size="2"><b>US $10.50<font face="Verdana" size="1"
color="#666666"></font></b></font>
</td>
____________________________
I don't see any differance that would cause this to mess up. :( I am reading all I can on regex's
and it's a little more involved than I had hoped.
Jason
Garrett Goebel wrote:
> If you want help with regexen in the future, you might try kc.pm.org's
> mailing list ;)
>
> #!/usr/bin/perl
> use LWP::Simple;
> $ARGV[0] = 3533653544; # O'Reilly: Programming Perl
>
> if($ARGV[0]) {
> my ($text, $cur, $sell, $buyer, $weight, $sku, $desc);
> my $url =
> "http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>[0]";
>
> $text = get($url);
>
> ($cur) = $text =~ /Currents+bid:.*?$([0-9.,]+)/s;
> ($sell) = $text =~
> /Sellers+information.*?ShowCoreAskSellerQuestion(?:[^>]*)>([^<]+)/s;
> ($buy) = $text =~
> /Highs+bidder:.*?ReturnUserEmail(?:[^>]*)>([^<]+)/s;
> ($weight) = $text =~ /WEIGHT=(d+)/s || 1;
> ($sku) = $text =~ /SKU=(d+)/s;
> ($desc) = $text =~ /.*-s(.*?)</title>/s;
>
> defined($_) or $_ = '' for $cur, $sell, $buyer, $weight, $sku, $desc;
> $cur =~ s/,//g if $cur;
>
> if ($cur) {
> print("$cur|$weight|$sku|$desc|$sell|$buy");
> }
> }
>
> --
> Garrett Goebel
> IS Development Specialist
>
> ScriptPro Direct: 913.403.5261
> 5828 Reeds Road Main: 913.384.1008
> Mission, KS 66202 Fax: 913.384.2180
> www.scriptpro.com garrett at scriptpro.com
>
>
>
> > -----Original Message-----
> > From: Jason Crowe [mailto:jcrowe at cmuonline.net]
> > Sent: Monday, July 07, 2003 1:51 PM
> > To: Garrett Goebel
> > Cc: Kclug
> > Subject: Re: regex question [x-bayes]
> >
> >
> > Garrett Goebel wrote:
> >
> > > Jason Crowe wrote:
> > > >
> > > > I have this regex:
> > > >
> > > > $content =~ m/Current
> > bid:n</font>n</td>n.+n<.+$([0-9.,]+)</
> > > >
> > > > That should match for this string and place the number
> > into $content:
> > > >
> > > > Current bid:
> > > > </font>
> > > > </td>
> > > > <td width="100%">
> > > > <font face="Arial" size="2"><b>US $9.99<
> > >
> > > my $url =
> > > "http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>
> > > <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$ARGV>>[0]";
> > >
> > > if($ARGV[0]) {
> > > my $content = get($url);
> > > my ($price) = $content =~ /Current bid:.*?$([0-9.,]+)/s;
> > > $price =~ s/,//g if $price;
> > > }
> > >
> > > I missed your earlier post... Does this work for you? It sets a
> > > variable $price to the value of whatever you've scraped after the
> > > first $ after "Current bid:".
> > >
> > > The trailing 's' in the regex says to treat the hole thing
> > as a single
> > > line. At which point you can pretty much ignore the
> > end-of-line issue.
> > >
> > > --
> > > Garrett Goebel
> > > IS Development Specialist
> > >
> > > ScriptPro Direct: 913.403.5261
> > > 5828 Reeds Road Main: 913.384.1008
> > > Mission, KS 66202 Fax: 913.384.2180
> > > www.scriptpro.com garrett at scriptpro dot com
> > >
> > Thanks,
> > Someone showed me that the problem was caused by added spaces on the
> > ebay page. Unfortunatly there is more than one variable that
> > is causeing
> > problems. Here is the script as it is now. The seller, buyer & desc
> > variables are the ones not working currently.
> >
> > Thanks,
> > Jason
> >
> > #!/usr/bin/perl
> >
> > use LWP::Simple;
> >
> > if($ARGV[0]) {
> > $content =
> > get("http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$A
> <http://cgi.ebay.com/aw-cgi/eBayISAPI.dll?ViewItem&item=$A>
> > RGV[0]");
> > $content =~ s/r//g;
> >
> > if($content =~
> > m/Currents*bid:s*ns*</font>s*ns*</td>s*n*.+n*s*<.
> > +$([0-9.,]+)</)
> > {
> > $currently = $1;
> > $currently =~ s/,//g;
> > }
> > if($content =~
> > m/Seller.+s*ns*</font>s*ns*</td>s*ns*.+requested=(.
> > +)&iid/)
> > {
> > $seller = $1;
> > }
> > if($content =~
> > m/Highs*bidder:ns*</font>s*ns*</td>s*n*.+ns*.+s*r
> > equested=(.+)s*&iid/)
> > {
> > $buyer = $1;
> > }
> > if($content =~ m/WEIGHT=(d+)/) {
> > $weight = $1;
> > }
> > if($content =~ m/SKU=(d+)/) {
> > $sku = $1;
> > }
> > if($content =~
> > m/<title>n(.+)n(.+)n(.+)n(.+)n(.+)n(.+)</title>/) {
> > $desc = "$6";
> > }
> > if(!$weight || $weight == 0) {
> > $weight = 1;
> > }
> > if($currently) {
> > print("$currently|$weight|$sku|$desc|$seller|$buyer");
> > }
> > }
> >
>
More information about the Kclug
mailing list