From schapht at gmail.com Thu Nov 2 15:32:33 2006 From: schapht at gmail.com (Mat Schaffer) Date: Thu, 2 Nov 2006 15:32:33 -0500 Subject: [Mechanize-users] Adding fields to a form Message-ID: Is there a decent way to add a field to a form before posting it? I haven't tried using HPricot manipulations just yet, since I can't ever find really solid docs on hpricot.... Form#[]= doesn't work because it first searches only pre-existing fields. I'm investigating how to write a patch now. But I thought maybe someone here might have an idea. Thanks in advance, Mat From schapht at gmail.com Thu Nov 2 15:41:18 2006 From: schapht at gmail.com (Mat Schaffer) Date: Thu, 2 Nov 2006 15:41:18 -0500 Subject: [Mechanize-users] Adding fields to a form In-Reply-To: References: Message-ID: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> On Nov 2, 2006, at 3:32 PM, Mat Schaffer wrote: > Is there a decent way to add a field to a form before posting it? > I haven't tried using HPricot manipulations just yet, since I can't > ever find really solid docs on hpricot.... > > Form#[]= doesn't work because it first searches only pre-existing > fields. I'm investigating how to write a patch now. > > But I thought maybe someone here might have an idea. > > Thanks in advance, > Mat Well I think I answered my own question. Aaron, you deserve some props. I haven't found a quirk in Mechanize yet that I couldn't patch in one line of code. Way to be extensible. --- field_patch.rb --- class WWW::Mechanize::Form # Fetch the first field whose name is equal to field_name, or create a new field def field(field_name) fields.find { |f| f.name.eql? field_name } || fields << WWW::Mechanize::Field.new(field_name, '') end end --- end field_patch.rb --- From aaron_patterson at speakeasy.net Thu Nov 2 18:02:53 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Thu, 2 Nov 2006 15:02:53 -0800 Subject: [Mechanize-users] Adding fields to a form In-Reply-To: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> References: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> Message-ID: <20061102230253.GA30682@eviladmins.lan> On Thu, Nov 02, 2006 at 03:41:18PM -0500, Mat Schaffer wrote: > On Nov 2, 2006, at 3:32 PM, Mat Schaffer wrote: > > Is there a decent way to add a field to a form before posting it? > > I haven't tried using HPricot manipulations just yet, since I can't > > ever find really solid docs on hpricot.... > > > > Form#[]= doesn't work because it first searches only pre-existing > > fields. I'm investigating how to write a patch now. > > > > But I thought maybe someone here might have an idea. > > > > Thanks in advance, > > Mat > > Well I think I answered my own question. Aaron, you deserve some > props. I haven't found a quirk in Mechanize yet that I couldn't > patch in one line of code. Way to be extensible. Thanks! I'll apply this patch and it will be in the next release. --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From aaron_patterson at speakeasy.net Fri Nov 3 13:22:57 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Fri, 3 Nov 2006 10:22:57 -0800 Subject: [Mechanize-users] Adding fields to a form In-Reply-To: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> References: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> Message-ID: <20061103182257.GA21433@eviladmins.lan> Hey Mat, On Thu, Nov 02, 2006 at 03:41:18PM -0500, Mat Schaffer wrote: > On Nov 2, 2006, at 3:32 PM, Mat Schaffer wrote: > > Is there a decent way to add a field to a form before posting it? > > I haven't tried using HPricot manipulations just yet, since I can't > > ever find really solid docs on hpricot.... > > > > Form#[]= doesn't work because it first searches only pre-existing > > fields. I'm investigating how to write a patch now. > > > > But I thought maybe someone here might have an idea. > > > > Thanks in advance, > > Mat > > Well I think I answered my own question. Aaron, you deserve some > props. I haven't found a quirk in Mechanize yet that I couldn't > patch in one line of code. Way to be extensible. > > --- field_patch.rb --- > > class WWW::Mechanize::Form > # Fetch the first field whose name is equal to field_name, or > create a new field > def field(field_name) > fields.find { |f| f.name.eql? field_name } || fields << > WWW::Mechanize::Field.new(field_name, '') > end > end > > --- end field_patch.rb --- I was thinking about this patch for a little bit, and I think it might be a little dangerous to add a field without the user explicitly knowing. I don't see the Form#field method as a destructive method. Would you be okay with having an "add_field!" method, and also having the Form#[]= method be destructive? That would make the Form class more Hash like. Basically these two code snippets would be exactly the same: form.add_field!('name', 'Aaron') #The second arg could be nil -or- form['name'] = 'Aaron' Both of those would create a new field. Let me know what you think. --Aaron -- Aaron Patterson http://tenderlovemaking.com/ From jrust at lexus.elabor.com Fri Nov 3 17:31:53 2006 From: jrust at lexus.elabor.com (Rust, Jon) Date: Fri, 03 Nov 2006 14:31:53 -0800 Subject: [Mechanize-users] Adding fields to a form In-Reply-To: <20061103182257.GA21433@eviladmins.lan> References: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> <20061103182257.GA21433@eviladmins.lan> Message-ID: <454BC359.4030509@lexus.elabor.com> Aaron Patterson wrote: > I was thinking about this patch for a little bit, and I think it might > be a little dangerous to add a field without the user explicitly > knowing. I agree 100%. Had a partially composed email saying the same thing. Your recommendations are good. Jon From schapht at gmail.com Sat Nov 4 12:08:18 2006 From: schapht at gmail.com (Mat Schaffer) Date: Sat, 4 Nov 2006 12:08:18 -0500 Subject: [Mechanize-users] Adding fields to a form In-Reply-To: <20061103182257.GA21433@eviladmins.lan> References: <0B2D5224-D28F-4347-A852-F88A400E2C55@gmail.com> <20061103182257.GA21433@eviladmins.lan> Message-ID: <47AAF246-0FEC-424F-A00C-B6C122CFD374@gmail.com> On Nov 3, 2006, at 1:22 PM, Aaron Patterson wrote: > I was thinking about this patch for a little bit, and I think it might > be a little dangerous to add a field without the user explicitly > knowing. > > I don't see the Form#field method as a destructive method. Would > you be > okay with having an "add_field!" method, and also having the Form#[]= > method be destructive? That would make the Form class more Hash like. > > Basically these two code snippets would be exactly the same: > > form.add_field!('name', 'Aaron') #The second arg could be nil > > -or- > > form['name'] = 'Aaron' > > Both of those would create a new field. Let me know what you think. > > --Aaron Yeah, that's a really good point. I only needed statements like form ['name'] = 'Aaron' work, and adding to the Form#field method seemed to be the simplest way to achieve that. Since the form is accessed like a Hash, using Form#[]= makes sense to me. Probably the most intuitive and it doesn't add another method to the interface. add_field! isn't bad either though, if that's your fancy. Good catch, thanks, Mat From axel.friedrich_smail at gmx.de Tue Nov 7 15:39:04 2006 From: axel.friedrich_smail at gmx.de (Friedrich, Axel) Date: Tue, 07 Nov 2006 21:39:04 +0100 Subject: [Mechanize-users] mechanize: 400 Bad Request Message-ID: Hello, when trying to access a certain HTML-frame, I get: "in `request': Unhandled response (WWW::Mechanize::ResponseCodeError)" and the page returns: "400 Bad Request" * Why? * How to solve this? With browser, it works. In the logs below, I marked 4 lines with "***", where I see possible differences in the URI. But I don't know, if this is the reason for the malfuncion and how to fix this. mechanize: Net::HTTP::Get: https://www.frankfurter-fondsbank.de/../diverse/navigation.jsp;jsessionid=CdEfG!-34567!-7654?menu=1 firefox-LiveHTTPHeaders: GET /diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.1 Thank you for any help! Axel axel ? friedrich ? _smail AT gmx ? de Details ??????? ruby 1.8.4 (2005-12-24) [i386-mswin32] Windows 98SE Code ???? require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new page = agent.get("https://www.frankfurter-fondsbank.de/login/Logon.jsp") ... login stuff... begin framePage = agent.click( page2.frames.text('nav') ) # This works not # framePage = agent.click( page2.frames.text('head') ) # This works rescue WWW::Mechanize::ResponseCodeError => ex puts ex.page.body end page2 looks like this ????????????????????? ... log of mechanize (time stamps removed) ????????????????????????????????????? # Logfile created on Tue Nov 07 09:03:04 (MEZ) Mitteleurop?ische Zeit 2006 by logger.rb/1.5.2.7 INFO -- : Net::HTTP::Get: https://www.frankfurter-fondsbank.de/login/Logon.jsp DEBUG -- : request-header: accept => */* DEBUG -- : request-header: accept-encoding => gzip,identity DEBUG -- : request-header: user-agent => Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 DEBUG -- : response-header: connection => close DEBUG -- : response-header: content-type => text/html; charset=ISO-8859-1 DEBUG -- : response-header: x-powered-by => Servlet/2.4 JSP/2.0 DEBUG -- : response-header: date => Tue, 07 Nov 2006 08:03:07 GMT DEBUG -- : response-header: server => Apache DEBUG -- : response-header: content-length => 4253 INFO -- : status: 200 DEBUG -- : query: "login=10001&pin=2002" INFO -- : Net::HTTP::Post: https://www.frankfurter-fondsbank.de/login_neutral;jsessionid=CdEfG!-34567!-7654 DEBUG -- : request-header: accept => */* DEBUG -- : request-header: accept-encoding => gzip,identity DEBUG -- : request-header: user-agent => Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 DEBUG -- : request-header: content-type => application/x-www-form-urlencoded DEBUG -- : request-header: referer => https://www.frankfurter-fondsbank.de/login/Logon.jsp DEBUG -- : request-header: content-length => 28 DEBUG -- : response-header: cache-control => no-cache, no-store, max-age=0 DEBUG -- : response-header: connection => close DEBUG -- : response-header: expires => Thu, 01 Jan 1970 00:00:00 GMT DEBUG -- : response-header: content-type => text/html DEBUG -- : response-header: x-powered-by => Servlet/2.4 JSP/2.0 DEBUG -- : response-header: date => Tue, 07 Nov 2006 08:03:10 GMT DEBUG -- : response-header: server => Apache DEBUG -- : response-header: content-length => 2722 DEBUG -- : response-header: pragma => no-cache INFO -- : status: 200 *** INFO -- : Net::HTTP::Get: https://www.frankfurter-fondsbank.de/../diverse/navigation.jsp;jsessionid=CdEfG!-34567!-7654?menu=1 DEBUG -- : request-header: accept => */* DEBUG -- : request-header: accept-encoding => gzip,identity DEBUG -- : request-header: user-agent => Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 *** DEBUG -- : request-header: referer => https://www.frankfurter-fondsbank.de/login_neutral;jsessionid=CdEfG!-34567!-7654 DEBUG -- : response-header: connection => close DEBUG -- : response-header: content-type => text/html; charset=iso-8859-1 DEBUG -- : response-header: date => Tue, 07 Nov 2006 08:03:13 GMT DEBUG -- : response-header: server => Apache DEBUG -- : response-header: content-length => 304 INFO -- : status: 400 log of Firefox with LiveHTTPHeaders (partialy only, it is realy long) ?????????????????????????????????????????????????????????????????????? https://www.frankfurter-fondsbank.de/login/Logon.jsp GET /login/Logon.jsp HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:57:49 GMT Server: Apache Content-Length: 4253 X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/html; charset=ISO-8859-1 ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/login_neutral;jsessionid=AbCdE!-1234!-5678 POST /login_neutral;jsessionid=AbCdE!-1234!-5678 HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.frankfurter-fondsbank.de/login/Logon.jsp Content-Type: application/x-www-form-urlencoded Content-Length: 28 login=10001&pin=2002 HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:58:17 GMT Server: Apache Cache-Control: no-cache, no-store, max-age=0 Pragma: no-cache Content-Length: 2722 Expires: Thu, 01 Jan 1970 00:00:00 GMT X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/html ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/brokerdesign/default/head.jsp;jsessionid=AbCdE!-1234!-5678 GET /brokerdesign/default/head.jsp;jsessionid=AbCdE!-1234!-5678 HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.frankfurter-fondsbank.de/login_neutral;jsessionid=AbCdE!-1234!-5678 HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:58:19 GMT Server: Apache Content-Length: 5052 X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/html; charset=ISO-8859-1 ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 GET /diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.frankfurter-fondsbank.de/login_neutral;jsessionid=AbCdE!-1234!-5678 HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:58:19 GMT Server: Apache Content-Length: 6984 X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/html ---------------------------------------------------------- ... ... ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/images/buttons/but_sitemap.gif GET /images/buttons/but_sitemap.gif HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.frankfurter-fondsbank.de/brokerdesign/default/head.jsp;jsessionid=AbCdE!-1234!-5678 HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:59:06 GMT Server: Apache Last-Modified: Tue, 04 May 2004 07:25:42 GMT Etag: "18e0a6c-16c-37a3c980" Accept-Ranges: bytes Content-Length: 364 Connection: close Content-Type: image/gif ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 *** GET /diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:59:24 GMT Server: Apache Content-Length: 6984 X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/html ---------------------------------------------------------- https://www.frankfurter-fondsbank.de/DesignServlet;jsessionid=AbCdE!-1234!-5678?element=stylesheet GET /DesignServlet;jsessionid=AbCdE!-1234!-5678?element=stylesheet HTTP/1.1 Host: www.frankfurter-fondsbank.de User-Agent: Mozilla/5.0 (Windows; U; Win98; de; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Accept: text/css,*/*;q=0.1 Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive *** Referer: https://www.frankfurter-fondsbank.de/diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.x 200 OK Date: Tue, 07 Nov 2006 19:59:27 GMT Server: Apache Content-Length: 2303 X-Powered-By: Servlet/2.4 JSP/2.0 Connection: close Content-Type: text/css ---------------------------------------------------------- From aaron_patterson at speakeasy.net Wed Nov 8 17:59:40 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Wed, 8 Nov 2006 14:59:40 -0800 Subject: [Mechanize-users] mechanize: 400 Bad Request In-Reply-To: References: Message-ID: <20061108225940.GA1604@eviladmins.lan> On Tue, Nov 07, 2006 at 09:39:04PM +0100, Friedrich, Axel wrote: > > Hello, > > when trying to access a certain HTML-frame, I get: > "in `request': Unhandled response (WWW::Mechanize::ResponseCodeError)" > > and the page returns: "400 Bad Request" > > * Why? > * How to solve this? > > With browser, it works. > > In the logs below, I marked 4 lines with "***", where I see possible > differences in the URI. But I don't know, if this is the reason for the > malfuncion and how to fix this. > > mechanize: > Net::HTTP::Get: > https://www.frankfurter-fondsbank.de/../diverse/navigation.jsp;jsessionid=CdEfG!-34567!-7654?menu=1 > > firefox-LiveHTTPHeaders: > GET /diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.1 It looks like mechanize doesn't support relative paths in links. This is a bug, and I'll look in to fixing it. > > > Thank you for any help! > > Axel > > axel ? friedrich ? _smail AT gmx ? de -- Aaron Patterson http://tenderlovemaking.com/ From aaron_patterson at speakeasy.net Mon Nov 13 03:05:34 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Mon, 13 Nov 2006 00:05:34 -0800 Subject: [Mechanize-users] mechanize: 400 Bad Request In-Reply-To: <20061108225940.GA1604@eviladmins.lan> References: <20061108225940.GA1604@eviladmins.lan> Message-ID: <20061113080534.GA17746@eviladmins.lan> On Wed, Nov 08, 2006 at 02:59:40PM -0800, Aaron Patterson wrote: > On Tue, Nov 07, 2006 at 09:39:04PM +0100, Friedrich, Axel wrote: > > > > Hello, > > > > when trying to access a certain HTML-frame, I get: > > "in `request': Unhandled response (WWW::Mechanize::ResponseCodeError)" > > > > and the page returns: "400 Bad Request" > > > > * Why? > > * How to solve this? > > > > With browser, it works. > > > > In the logs below, I marked 4 lines with "***", where I see possible > > differences in the URI. But I don't know, if this is the reason for the > > malfuncion and how to fix this. > > > > mechanize: > > Net::HTTP::Get: > > https://www.frankfurter-fondsbank.de/../diverse/navigation.jsp;jsessionid=CdEfG!-34567!-7654?menu=1 > > > > firefox-LiveHTTPHeaders: > > GET /diverse/navigation.jsp;jsessionid=AbCdE!-1234!-5678?menu=1 HTTP/1.1 > > It looks like mechanize doesn't support relative paths in links. This > is a bug, and I'll look in to fixing it. Okay, I'm stumped. I thought this was a bug in Mechanize, but I've written a few test cases to try to reproduce the error, and I can't! Are you sure the following line is being executed? > > framePage = agent.click( page2.frames.text('nav') ) # This works not I copied the html you provided in to a file and tried to reproduce the error, but mechanize handles the relative link correctly. Would it be possible for you to write a short test case that reproduces the error? -- Aaron Patterson http://tenderlovemaking.com/ From eduardo.fernandez at gmail.com Mon Nov 13 16:37:16 2006 From: eduardo.fernandez at gmail.com (Eduardo Fernandez Corrales) Date: Mon, 13 Nov 2006 22:37:16 +0100 Subject: [Mechanize-users] handling protocol violations Message-ID: <3bf06c660611131337h51c5dcc2x708a795929d34105@mail.gmail.com> Hello, I have found a webserver that spits the cookie version as a float (1.2) instead of an integer as the protocol mandates. (www.alimentacion.carrefour.es) So far I have been changing line 33 in cookie.rb from when "version" then cookie.version = Integer(value) to when "version" then cookie.version = Integer(value.round) Of course every time WWW:Mechanize gets updated, I have to do the change again. How should we handle this? I am sure there has to be a better way. Thank you very much, I am really having fun using this fine piece of software. -- Eduardo Fern?ndez Corrales || 0xEFC.com From aaron_patterson at speakeasy.net Mon Nov 13 18:11:46 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Mon, 13 Nov 2006 15:11:46 -0800 Subject: [Mechanize-users] handling protocol violations In-Reply-To: <3bf06c660611131337h51c5dcc2x708a795929d34105@mail.gmail.com> References: <3bf06c660611131337h51c5dcc2x708a795929d34105@mail.gmail.com> Message-ID: <20061113231146.GB16829@eviladmins.lan> Hi Eduardo, On Mon, Nov 13, 2006 at 10:37:16PM +0100, Eduardo Fernandez Corrales wrote: > Hello, > > I have found a webserver that spits the cookie version as a float > (1.2) instead of an integer as the protocol mandates. > (www.alimentacion.carrefour.es) > > So far I have been changing line 33 in cookie.rb from > > when "version" then cookie.version = Integer(value) > to > when "version" then cookie.version = Integer(value.round) > > Of course every time WWW:Mechanize gets updated, I have to do the change again. > > How should we handle this? I am sure there has to be a better way. Mechanize should handle this malformed header without blowing up. I'll set the version to nil when it is invalid and send a warning message to the logger. > > Thank you very much, I am really having fun using this fine piece of software. > > -- > Eduardo Fern?ndez Corrales || 0xEFC.com > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -- Aaron Patterson http://tenderlovemaking.com/ From axel.friedrich_smail at gmx.de Sun Nov 19 09:06:17 2006 From: axel.friedrich_smail at gmx.de (Friedrich, Axel) Date: Sun, 19 Nov 2006 15:06:17 +0100 Subject: [Mechanize-users] mechanize: 400 Bad Request In-Reply-To: <20061113080534.GA17746@eviladmins.lan> References: <20061108225940.GA1604@eviladmins.lan> <20061113080534.GA17746@eviladmins.lan> Message-ID: Hello, finally, I found time to create a test case: With browser, the following page shows up correctly: http://cafriedrich.netfirms.com/root/dir1/firstpage.htm With the "standard mechanize code", I get the before mentioned error: ---------------------------------------------------------------------- require 'rubygems' require 'mechanize' require 'logger' require 'fileutils' # ========== CONFIG =================================== @logFilePath = "G:/Ruby/FrankfurterFondsBank/downloaded/log.txt" @downloadDir = 'G:/Ruby/FrankfurterFondsBank/downloaded/' @firstPage = "http://cafriedrich.netfirms.com/root/dir1/firstpage.htm" # ========== END CONFIG =============================== FileUtils.mkpath( File.dirname(@logFilePath) ) FileUtils.mkpath( @downloadDir ) agent = WWW::Mechanize.new{|a| a.log = Logger.new(@logFilePath) } page = agent.get(@firstPage) txt = page.body File.open("@downloadDir" + "firstpage.htm", 'w') {|f| f.write(txt) } puts "\n\n================= frames:================= " page.frames.each { |f| puts sprintf("%-15s %-1s \n", f.text, f.href) } puts "\n================= /frames ================= \n\n" begin mainframe = agent.click( page.frames.text('main') ) txt = mainframe.body File.open(@downloadDir + "mainframe.htm", 'w') {|f| f.write(txt) } frameleft1 = agent.click( page.frames.text('left1') ) txt = page.body File.open(@downloadDir + "left1.htm", 'w') {|f| f.write(txt) } rescue WWW::Mechanize::ResponseCodeError => ex puts "====== ERROR ==========" puts ex.page.body puts "====== /ERROR ==========" end puts "-- Exiting now." With the "workaround code" below, it works: ------------------------------------------- require 'rubygems' require 'mechanize' require 'logger' require 'fileutils' # ========== CONFIG =================================== @logFilePath = "G:/Ruby/FrankfurterFondsBank/downloaded/log.txt" @downloadDir = 'G:/Ruby/FrankfurterFondsBank/downloaded/' @firstPage = "http://cafriedrich.netfirms.com/root/dir1/firstpage.htm" # ========== END CONFIG =============================== FileUtils.mkpath( File.dirname(@logFilePath) ) FileUtils.mkpath( @downloadDir ) agent = WWW::Mechanize.new{|a| a.log = Logger.new(@logFilePath) } page = agent.get(@firstPage) txt = page.body File.open("@downloadDir" + "firstpage.htm", 'w') {|f| f.write(txt) } puts "\n\n================= frames:================= " page.frames.each { |f| puts sprintf("%-15s %-1s \n", f.text, f.href) } puts "\n================= /frames ================= \n\n" begin mainframe = agent.click( page.frames.text('main') ) txt = mainframe.body File.open(@downloadDir + "mainframe.htm", 'w') {|f| f.write(txt) } href = 'http://cafriedrich.netfirms.com/root/' # <- WORKAROUND href << page.frames.text('left1').href.sub('../', '') # <- WORKAROUND frameleft1 = agent.get(href) # <- WORKAROUND txt = page.body File.open(@downloadDir + "left1.htm", 'w') {|f| f.write(txt) } rescue WWW::Mechanize::ResponseCodeError => ex puts "====== ERROR ==========" puts ex.page.body puts "====== /ERROR ==========" end puts "-- Exiting now." From aaron_patterson at speakeasy.net Mon Nov 20 14:31:40 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Mon, 20 Nov 2006 11:31:40 -0800 Subject: [Mechanize-users] mechanize: 400 Bad Request In-Reply-To: References: <20061108225940.GA1604@eviladmins.lan> <20061113080534.GA17746@eviladmins.lan> Message-ID: <20061120193140.GA655@eviladmins.lan> On Sun, Nov 19, 2006 at 03:06:17PM +0100, Friedrich, Axel wrote: > > Hello, > > finally, I found time to create a test case: Thank you! I understand the bug now. When mechanize sees that the url is relative, it looks at the last url in its history to determine where to go next. In the case of frames, that is not correct, since your request flow is not linear. I'll fix this issue for the next release. Here's my workaround where the rescue block never gets called: require 'rubygems' require 'mechanize' # ========== CONFIG =================================== @firstPage = "http://cafriedrich.netfirms.com/root/dir1/firstpage.htm" # ========== END CONFIG =============================== agent = WWW::Mechanize.new page = agent.get(@firstPage) begin mainframe = agent.click( page.frames.text('main') ) agent.history.pop # <= This makes it work! frameleft1 = agent.click( page.frames.text('left1') ) rescue WWW::Mechanize::ResponseCodeError => ex puts "Error" end puts "-- Exiting now." -- Aaron Patterson http://tenderlovemaking.com/ From axel.friedrich_smail at gmx.de Tue Nov 21 12:29:26 2006 From: axel.friedrich_smail at gmx.de (Friedrich, Axel) Date: Tue, 21 Nov 2006 18:29:26 +0100 Subject: [Mechanize-users] mechanize: 400 Bad Request In-Reply-To: <20061120193140.GA655@eviladmins.lan> References: <20061108225940.GA1604@eviladmins.lan> <20061113080534.GA17746@eviladmins.lan> <20061120193140.GA655@eviladmins.lan> Message-ID: > I'll fix this issue for the next release. > Here's my workaround Thank you! - Axel From stearns at eliot.com Tue Nov 21 23:48:56 2006 From: stearns at eliot.com (Bryan Stearns) Date: Tue, 21 Nov 2006 20:48:56 -0800 Subject: [Mechanize-users] to_absolute_uri typo in 0.6.3? Message-ID: <1164170936.2C898F77@ef12.dngr.org> I just started using Mechanize, and started using Ruby about thirty seconds before that, but one of the sites I'm scraping does a redirect on form submission to a badly-formed relative URL: index.cfm?action=bing&bang=boom=1|a=|b=|c= (etc.) Interestingly, Mechanize 0.6.2 handled this OK, but in 0.6.3 this causes a URI::InvalidURIError exception from URI.parse() in to_absolute_uri in mechanize.rb. I noticed that the new 0.6.3 version of to_absolute_uri starts with: url = URI.parse( URI.unescape(Util.html_unescape(url.to_s.strip)).gsub(/ /, '%20') ) unless url.is_a? URI where the old 0.6.2 version started with: url = URI.parse( URI.escape( URI.unescape(url.to_s.strip) )) unless url.is_a? URI It seemed funny to me that the new version does two things named 'unescape', where the old version 'escaped' something it had just 'unescaped' :-) -- and sure enough, changing the outer 'unescape' back to 'escape' fixed my problem. I didn't try any other test cases, so I don't know if I'm undoing the intention behind that change. Like I said, I'm new to Mechanize, so sorry if I misinterpreted... (I used this as an excuse to try out test/unit and WEBrick - if you want to see the test script I wrote to demonstrate the problem and try my solution, see .) Anyway, Mechanize is very cool - seems like I've been writing web-page scraping code in various forms and languages since 1995, and this is by far the closest to 'do what I mean' that I've found, so thanks! ...Bryan From aaron_patterson at speakeasy.net Wed Nov 22 14:08:47 2006 From: aaron_patterson at speakeasy.net (Aaron Patterson) Date: Wed, 22 Nov 2006 11:08:47 -0800 Subject: [Mechanize-users] to_absolute_uri typo in 0.6.3? In-Reply-To: <1164170936.2C898F77@ef12.dngr.org> References: <1164170936.2C898F77@ef12.dngr.org> Message-ID: <20061122190847.GA25344@eviladmins.lan> Hi Bryan, On Tue, Nov 21, 2006 at 08:48:56PM -0800, Bryan Stearns wrote: > I just started using Mechanize, and started using Ruby about thirty > seconds before that, but one of the sites I'm scraping does a redirect > on form submission to a badly-formed relative URL: > > index.cfm?action=bing&bang=boom=1|a=|b=|c= (etc.) > > Interestingly, Mechanize 0.6.2 handled this OK, but in 0.6.3 this causes > a URI::InvalidURIError exception from URI.parse() in to_absolute_uri in > mechanize.rb. > > I noticed that the new 0.6.3 version of to_absolute_uri starts with: > url = URI.parse( > URI.unescape(Util.html_unescape(url.to_s.strip)).gsub(/ /, > '%20') > ) unless url.is_a? URI > where the old 0.6.2 version started with: > url = URI.parse( > URI.escape( > URI.unescape(url.to_s.strip) > )) unless url.is_a? URI > > It seemed funny to me that the new version does two things named > 'unescape', where the old version 'escaped' something it had just > 'unescaped' :-) -- and sure enough, changing the outer 'unescape' back > to 'escape' fixed my problem. The problem is that urls sometimes contain URI escaped entities. Given the following code: require 'rubygems' require 'mechanize' include WWW def parse_it(url) URI.parse( URI.escape( Mechanize::Util.html_unescape(url.to_s.strip) ) ) end puts parse_it('/somewhere%20special') Produces: /somewhere%2520special Which is incorrect. It should be '/somewhere special', hence the unescape, escape, sequence in 0.6.2. In 0.6.2, if a URL had a '#' sign in it, it would get escaped, which is wrong. Thats why I made the change in 0.6.3. The reason for the html_unescape is because it is legal to html escape hrefs in HTML. For example: Click Me! The browser is then expected to make the following request when that link is clicked: /foo?a=b&b=c Having pipes in the URL does seem to be a problem, and I'll look in to it. I need to come up with a better solution for this whole url escaping thing. > > I didn't try any other test cases, so I don't know if I'm undoing the > intention behind that change. Like I said, I'm new to Mechanize, so > sorry if I misinterpreted... No problem. I'll look in to fixing this bug. > > (I used this as an excuse to try out test/unit and WEBrick - if you want > to see the test script I wrote to demonstrate the problem and try my > solution, see .) > > Anyway, Mechanize is very cool - seems like I've been writing web-page > scraping code in various forms and languages since 1995, and this is by > far the closest to 'do what I mean' that I've found, so thanks! Very cool. I'm glad you like the library. If there's any features you want, syntax improvements, or things you'd like it to do differently, please let me know! > > ...Bryan > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -- Aaron Patterson http://tenderlovemaking.com/ From tanshunquan at gmail.com Fri Nov 24 05:50:44 2006 From: tanshunquan at gmail.com (Shunquan Tan) Date: Fri, 24 Nov 2006 18:50:44 +0800 Subject: [Mechanize-users] Maybe a bug in Cookie:cookies Message-ID: Hi man, Thank you very much for your good work. I think there may be a bug in Mechanize 0.6.3. Cookie:cookies (cookie.rb:L83 ) @jar[domain].each_key do |name| => if url.path =~ /^#{@jar[domain][name].path}/ if @jar[domain][name].expires.nil? cookies << @jar[domain][name] elsif Time.now < @jar[domain][name].expires cookies << @jar[domain][name] end end end If I redirect from /login.cgi to / (finish login, the website return some cookies and tell me to redirect to its main page), the url.path will not contain the string in @jar[domain][name].path, which is /login.cgi. So I will never get the cookies belong to the domain and never successful log in. Sincerely, Harish Tan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20061124/4dadfb30/attachment-0001.html