From schapht at gmail.com Fri Sep 8 13:24:27 2006 From: schapht at gmail.com (Mat Schaffer) Date: Fri, 8 Sep 2006 13:24:27 -0400 Subject: [Mechanize-users] links behavior in 0.6.0 Message-ID: <321AED10-5342-4030-8054-C99EE39A28C5@gmail.com> Here's what I mentioned on ruby-talk a little while ago. Basically mechanize 0.5.4 could match a link on a regexp regardless of children in that link. 0.6.0 can't. Here's a demonstration, but I can't get require_gem to properly load 0.5.4 for some reason. I keep getting "uninitialized constant WWW (NameError)". Under 0.5.4 it will return both links, but under 0.6.0 it'll only return the one without the child. -- mech_test.rb require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new page = agent.get() p page.links.text(/Dude/) -- -- test.html Bold Dude Dude -- From aaron_patterson at speakeasy.net Mon Sep 11 14:06:01 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Mon, 11 Sep 2006 11:06:01 -0700 Subject: [Mechanize-users] mechanize lists In-Reply-To: References: Message-ID: <20060911180601.GB25643@eviladmins.lan> On Mon, Sep 11, 2006 at 09:04:49AM -0400, Mat Schaffer wrote: > I noticed that my message is the first post to mechanize- > users at rubyforge.org. Should I post to devel? That looks empty too, > but I thought I'd check if you were only watching the devel list and > not the users list. Hey Mat, this is definitely a bug in mechanize. If you add this to your script, it should take care of the problem until I make a bugfix release: require 'rubygems' require 'mechanize' class Hpricot::Elem def all_text text = '' children.each do |child| if child.respond_to? :content text << child.content end if child.respond_to? :all_text text << child.all_text end end text end end agent = WWW::Mechanize.new page = agent.get() p page.links.text(/Dude/) Sorry about this bug, I should have a bug fix release out on the 20th or the 21st. I hope this helps. --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Mon Sep 18 09:46:46 2006 From: schapht at gmail.com (Mat Schaffer) Date: Mon, 18 Sep 2006 09:46:46 -0400 Subject: [Mechanize-users] BUG: Possible issue with escaped hrefs Message-ID: <14960D46-C9DA-4CA3-BCCE-E5E39937572D@gmail.com> I noticed an interesting problem today when scripting against a web app. The application contained a link in it that used %20 instead of spaces. After running mechanize through the Charles debugging proxy I found that mechanize was converting %20 to %2520 (double escaping the %20). This appears to happen under both 0.5.4 and 0.6.0. Here's a simple set of files that demonstrate the issue: --- start.html This link has spaces --- end start.html --- 'link with spaces.html' This page is after the link. --- end 'link with spaces.html' --- test.rb require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new # Un-comment to debug using Charles # agent.set_proxy('localhost', '8888') first = agent.get('http://localhost/~schapht/link_test/start.html') second = agent.click(first.links.first) puts second.body --- end test.rb Expected: This page is after the link. Actual: /opt/local/lib/ruby/1.8/net/http.rb:1049:in `request': Unhandled response (WWW::Mechanize::ResponseCodeError) [likely due to 404] Finally, I'm attaching a CSV of Charles' output to this email. Hopefully it'll work on the list. -Mat -------------- next part -------------- A non-text attachment was scrubbed... Name: link_debug.csv Type: application/octet-stream Size: 507 bytes Desc: not available Url : http://rubyforge.org/pipermail/mechanize-users/attachments/20060918/2b7a80a7/attachment.obj From schapht at gmail.com Fri Sep 22 16:32:30 2006 From: schapht at gmail.com (Mat Schaffer) Date: Fri, 22 Sep 2006 16:32:30 -0400 Subject: [Mechanize-users] BUG: Possible issue with escaped hrefs In-Reply-To: <14960D46-C9DA-4CA3-BCCE-E5E39937572D@gmail.com> References: <14960D46-C9DA-4CA3-BCCE-E5E39937572D@gmail.com> Message-ID: On Sep 18, 2006, at 9:46 AM, Mat Schaffer wrote: > I noticed an interesting problem today when scripting against a web > app. The application contained a link in it that used %20 instead > of spaces. After running mechanize through the Charles debugging > proxy I found that mechanize was converting %20 to %2520 (double > escaping the %20). > > This appears to happen under both 0.5.4 and 0.6.0. To follow up, I the same problem appears to happen when submitting forms that have escaped URLs. -Mat From aaron_patterson at speakeasy.net Fri Sep 22 17:40:23 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Fri, 22 Sep 2006 14:40:23 -0700 Subject: [Mechanize-users] BUG: Possible issue with escaped hrefs In-Reply-To: <14960D46-C9DA-4CA3-BCCE-E5E39937572D@gmail.com> References: <14960D46-C9DA-4CA3-BCCE-E5E39937572D@gmail.com> Message-ID: <20060922214023.GA11954@eviladmins.lan> On Mon, Sep 18, 2006 at 09:46:46AM -0400, Mat Schaffer wrote: > I noticed an interesting problem today when scripting against a web > app. The application contained a link in it that used %20 instead of > spaces. After running mechanize through the Charles debugging proxy > I found that mechanize was converting %20 to %2520 (double escaping > the %20). I've committed a fix for this bug, and it should be released with 0.6.1. --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Tue Sep 26 13:39:14 2006 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 26 Sep 2006 13:39:14 -0400 Subject: [Mechanize-users] Interesting mechanize difficulty Message-ID: I found an interesting page today that I was trying to script against. The server returns a 404 with content and the page just works normally in firefox despite the 404. Mechanize raises an exception on it though. I'm working on a test case now. From schapht at gmail.com Tue Sep 26 13:51:54 2006 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 26 Sep 2006 13:51:54 -0400 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: References: Message-ID: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> > I found an interesting page today that I was trying to script > against. The server returns a 404 with content and the page just > works normally in firefox despite the 404. Mechanize raises an > exception on it though. > > I'm working on a test case now. After looking at the code, I think a behavior decision is necesary before a test case. I can understand the reasoning for raising an exception, but it would be really valuable if the page were still available from the exception. I think putting the page into the ResponseCodeError would be a decent approach. Thoughts Aaron? Should I make a patch to the svn trunk? -Mat From aaron_patterson at speakeasy.net Tue Sep 26 17:09:02 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Tue, 26 Sep 2006 14:09:02 -0700 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> Message-ID: <20060926210902.GA13670@eviladmins.lan> On Tue, Sep 26, 2006 at 01:51:54PM -0400, Mat Schaffer wrote: > After looking at the code, I think a behavior decision is necesary > before a test case. I can understand the reasoning for raising an > exception, but it would be really valuable if the page were still > available from the exception. I think putting the page into the > ResponseCodeError would be a decent approach. Thoughts Aaron? > Should I make a patch to the svn trunk? You are correct. We should have access to the page after a 404 is returned. I also agree that putting the page on the exception is a fine idea. If you'd like to patch it, do it against the release branch: svn://rubyforge.org/var/svn/mechanize/branches/RB-0.6.0 Otherwise I will fix this error. Thank you! -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Tue Sep 26 23:43:35 2006 From: schapht at gmail.com (Mat Schaffer) Date: Tue, 26 Sep 2006 23:43:35 -0400 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <20060926210902.GA13670@eviladmins.lan> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> <20060926210902.GA13670@eviladmins.lan> Message-ID: <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> On Sep 26, 2006, at 5:09 PM, Aaron Patterson wrote: > On Tue, Sep 26, 2006 at 01:51:54PM -0400, Mat Schaffer wrote: >> After looking at the code, I think a behavior decision is necesary >> before a test case. I can understand the reasoning for raising an >> exception, but it would be really valuable if the page were still >> available from the exception. I think putting the page into the >> ResponseCodeError would be a decent approach. Thoughts Aaron? >> Should I make a patch to the svn trunk? > > You are correct. We should have access to the page after a 404 is > returned. I also agree that putting the page on the exception is a > fine > idea. > > If you'd like to patch it, do it against the release branch: > svn://rubyforge.org/var/svn/mechanize/branches/RB-0.6.0 > > Otherwise I will fix this error. Thank you! I might have some time tomorrow and I'd like to help out. I just updated to 0.6.1 via gem though, so wouldn't it be better to patch against that? Or are the odds considered unstable? From aaron_patterson at speakeasy.net Wed Sep 27 13:23:40 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Wed, 27 Sep 2006 10:23:40 -0700 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> <20060926210902.GA13670@eviladmins.lan> <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> Message-ID: <20060927172340.GA20687@eviladmins.lan> On Tue, Sep 26, 2006 at 11:43:35PM -0400, Mat Schaffer wrote: > I might have some time tomorrow and I'd like to help out. I just > updated to 0.6.1 via gem though, so wouldn't it be better to patch > against that? Or are the odds considered unstable? I don't really do the whole even odd thing. I consider what I release to be stable. The code in that release branch is the same as what is in 0.6.1. I just keep patching and tagging the release branch for subsiquent bugfix releases. The trunk I save for major releases. I should be able to fix this problem before the end of the weekend if you don't get around to it! :-) --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From aaron_patterson at speakeasy.net Wed Sep 27 13:50:36 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Wed, 27 Sep 2006 10:50:36 -0700 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> <20060926210902.GA13670@eviladmins.lan> <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> Message-ID: <20060927175036.GA28232@eviladmins.lan> On Tue, Sep 26, 2006 at 11:43:35PM -0400, Mat Schaffer wrote: > On Sep 26, 2006, at 5:09 PM, Aaron Patterson wrote: > > On Tue, Sep 26, 2006 at 01:51:54PM -0400, Mat Schaffer wrote: > >> After looking at the code, I think a behavior decision is necesary > >> before a test case. I can understand the reasoning for raising an > >> exception, but it would be really valuable if the page were still > >> available from the exception. I think putting the page into the > >> ResponseCodeError would be a decent approach. Thoughts Aaron? > >> Should I make a patch to the svn trunk? > > > > You are correct. We should have access to the page after a 404 is > > returned. I also agree that putting the page on the exception is a > > fine > > idea. > > > > If you'd like to patch it, do it against the release branch: > > svn://rubyforge.org/var/svn/mechanize/branches/RB-0.6.0 > > > > Otherwise I will fix this error. Thank you! > > I might have some time tomorrow and I'd like to help out. I just > updated to 0.6.1 via gem though, so wouldn't it be better to patch > against that? Or are the odds considered unstable? I took care of the problem, it was a pretty simple fix. If you need it right away, just check out from the release branch and run "rake package" to build the gem. Otherwise this will be out with 0.6.2. --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From schapht at gmail.com Wed Sep 27 15:40:26 2006 From: schapht at gmail.com (Mat Schaffer) Date: Wed, 27 Sep 2006 15:40:26 -0400 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <20060927175036.GA28232@eviladmins.lan> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> <20060926210902.GA13670@eviladmins.lan> <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> <20060927175036.GA28232@eviladmins.lan> Message-ID: <6547E7D9-633C-4493-9C99-D721B35292F4@gmail.com> On Sep 27, 2006, at 1:50 PM, Aaron Patterson wrote: > On Tue, Sep 26, 2006 at 11:43:35PM -0400, Mat Schaffer wrote: >> On Sep 26, 2006, at 5:09 PM, Aaron Patterson wrote: >>> On Tue, Sep 26, 2006 at 01:51:54PM -0400, Mat Schaffer wrote: >>>> After looking at the code, I think a behavior decision is necesary >>>> before a test case. I can understand the reasoning for raising an >>>> exception, but it would be really valuable if the page were still >>>> available from the exception. I think putting the page into the >>>> ResponseCodeError would be a decent approach. Thoughts Aaron? >>>> Should I make a patch to the svn trunk? >>> >>> You are correct. We should have access to the page after a 404 is >>> returned. I also agree that putting the page on the exception is a >>> fine >>> idea. >>> >>> If you'd like to patch it, do it against the release branch: >>> svn://rubyforge.org/var/svn/mechanize/branches/RB-0.6.0 >>> >>> Otherwise I will fix this error. Thank you! >> >> I might have some time tomorrow and I'd like to help out. I just >> updated to 0.6.1 via gem though, so wouldn't it be better to patch >> against that? Or are the odds considered unstable? > > I took care of the problem, it was a pretty simple fix. If you > need it > right away, just check out from the release branch and run "rake > package" to build the gem. Otherwise this will be out with 0.6.2. Hey thanks. Sorry for being too slow on the draw, but I was bogged down in other projects. Maybe next time. -Mat From aaron_patterson at speakeasy.net Wed Sep 27 17:21:56 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Wed, 27 Sep 2006 14:21:56 -0700 Subject: [Mechanize-users] Interesting mechanize difficulty In-Reply-To: <6547E7D9-633C-4493-9C99-D721B35292F4@gmail.com> References: <699F6AD7-7CE2-4974-8EE0-D9F4271E58E7@gmail.com> <20060926210902.GA13670@eviladmins.lan> <28BA51C9-0AA8-4F52-9FC5-56318EC1429C@gmail.com> <20060927175036.GA28232@eviladmins.lan> <6547E7D9-633C-4493-9C99-D721B35292F4@gmail.com> Message-ID: <20060927212156.GA15367@eviladmins.lan> On Wed, Sep 27, 2006 at 03:40:26PM -0400, Mat Schaffer wrote: [snip] > Hey thanks. Sorry for being too slow on the draw, but I was bogged > down in other projects. Maybe next time. No problem! I was just taking a look at it, and I figured I may as well fix it while I'm staring at the code! -- Aaron Patterson http://tenderlovemaking.com/