From astarr at wiredquote.com Thu Jan 7 16:32:24 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Thu, 7 Jan 2010 13:32:24 -0800 Subject: [Mechanize-users] Concurrency issue in library? Message-ID: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> I'm using mechanize with separate agents in separate threads. I recently got the following error, and suspect that it may be a concurrency issue -- i.e., both threads were futzing with the same object. Anyone else think that could be the case? TypeError: can't modify frozen object In /usr/local/lib/ruby/1.8/net/https.rb:138:in `verify_mode=' /usr/local/lib/ruby/1.8/net/https.rb:138:in `verify_mode=' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/ssl_resolver.rb:20:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain.rb:30:in `pass' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/handler.rb:6:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/connection_resolver.rb:73:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain.rb:30:in `pass' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/handler.rb:6:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/request_resolver.rb:27:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain.rb:30:in `pass' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/handler.rb:6:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/parameter_resolver.rb:18:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain.rb:30:in `pass' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/handler.rb:6:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain/uri_resolver.rb:68:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize/chain.rb:25:in `handle' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:445:in `fetch_page' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:404:in `post_form' /usr/local/lib/ruby/gems/1.8/gems/mechanize-0.9.2/lib/www/mechanize.rb:335:in `submit' -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Thu Jan 7 19:04:43 2010 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 7 Jan 2010 16:04:43 -0800 Subject: [Mechanize-users] Concurrency issue in library? In-Reply-To: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> References: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> Message-ID: <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> On Thu, Jan 7, 2010 at 1:32 PM, Aaron Starr wrote: > > I'm using mechanize with separate agents in separate threads. I recently got > the following error, and suspect that it may be a concurrency issue -- i.e., > both threads were futzing with the same object. Anyone else think that could > be the case? Separate agents should be completely separate. I think this might be a bug specific to SSL requests. Would you mind grabbing the gem from github and trying it out? I think it may fix this problem. -- Aaron Patterson http://tenderlovemaking.com/ From ross at roscommonhq.com Fri Jan 8 00:17:31 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Fri, 08 Jan 2010 16:17:31 +1100 Subject: [Mechanize-users] input form fields not in the # Hi This may be a dumb question with an obvious answer. It would seem that an input form field identified with an 'id' qualifier and not with a 'name' qualifier is not recognised by Mechanize - at least it isn't in the form field list. Is there any way of getting at these elements or am I, as I suspect, fresh out of luck. But you never know ... Appreciate any help. Regards Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Fri Jan 8 00:52:18 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Thu, 7 Jan 2010 21:52:18 -0800 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> Message-ID: <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> Did you try grabbing it by the id with nokogiri? On Jan 7, 2010, at 9:17 PM, Ross Cameron wrote: > Hi > > This may be a dumb question with an obvious answer. > > It would seem that an input form field identified with an 'id' > qualifier and not with a 'name' qualifier is not recognised by > Mechanize - at least it isn't in the form field list. > > Is there any way of getting at these elements or am I, as I suspect, > fresh out of luck. But you never know ... > > Appreciate any help. > > Regards > Ross > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From ross at roscommonhq.com Fri Jan 8 05:42:08 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Fri, 08 Jan 2010 21:42:08 +1100 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> Message-ID: <4B470C00.9070808@roscommonhq.com> Yes, it can be grabbed by Nokogiri. But, not sure how it helps because I want to stuff the field value and submit the form. Not too familiar with Nokogiri to say it can't do it and having been through the classes docs, I can't see how it could. After all, it's simply a parser is it not. Or am I not correct? Always willing to learn. Would appreciate any assistance here. Regards Ross. ------------------------------------------------------------------------ Ross Cameron | Director Roscommon Pty Ltd | ABN 85 099 499 840 p: +61 2 9016 4133 | m: +61 4 3312 9087 | f: +61 2 9420 4525 | w: www.roscommonhq.com | AIM: rossppc Roscommon uses the five sentences email reply policy. Please consider our environment before printing this email. NOTE: This email and any attachments may be confidential. If received in error, please delete the email. Because emails and attachments may be interfered with, may contain computer viruses or other defects and may not be successfully replicated on other systems, you must be cautious. Roscommon cannot guarantee that what you receive is what we sent. If you have any doubts about the authenticity of an email from Roscommon, please contact us immediately. jeremywoertink at gmail.com wrote: > Did you try grabbing it by the id with nokogiri? > > > > On Jan 7, 2010, at 9:17 PM, Ross Cameron wrote: > >> Hi >> >> This may be a dumb question with an obvious answer. >> >> It would seem that an input form field identified with an 'id' >> qualifier and not with a 'name' qualifier is not recognised by >> Mechanize - at least it isn't in the form field list. >> >> Is there any way of getting at these elements or am I, as I suspect, >> fresh out of luck. But you never know ... >> >> Appreciate any help. >> >> Regards >> Ross >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Fri Jan 8 12:09:39 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Fri, 8 Jan 2010 09:09:39 -0800 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> <4B470C00.9070808@roscommonhq.com> Message-ID: <43F9C579-CC8C-4522-A727-9AB78D49AD2F@gmail.com> You could try page.search("#someid").attr("value","somevalue") Then try submitting. That would be awesome if it worked! Let me know :) On Jan 8, 2010, at 2:42 AM, Ross Cameron wrote: > Yes, it can be grabbed by Nokogiri. But, not sure how it helps > because I want to stuff the field value and submit the form. Not too > familiar with Nokogiri to say it can't do it and having been through > the classes docs, I can't see how it could. After all, it's simply a > parser is it not. Or am I not correct? > > Always willing to learn. > > Would appreciate any assistance here. > > Regards > Ross. > Ross Cameron | Director > Roscommon Pty Ltd | ABN 85 099 499 840 > p: +61 2 9016 4133 | m: +61 4 3312 9087 | f: +61 2 9420 4525 | w: www.roscommonhq.com > | AIM: rossppc > > Roscommon uses the five sentences email reply policy. Please > consider our environment before printing this email. > > NOTE: This email and any attachments may be confidential. If > received in error, please delete the email. Because emails and > attachments may be interfered with, may contain computer viruses or > other defects and may not be successfully replicated on other > systems, you must be cautious. Roscommon cannot guarantee that what > you receive is what we sent. If you have any doubts about the > authenticity of an email from Roscommon, please contact us > immediately. > > > jeremywoertink at gmail.com wrote: >> >> Did you try grabbing it by the id with nokogiri? >> >> >> >> On Jan 7, 2010, at 9:17 PM, Ross Cameron >> wrote: >> >>> Hi >>> >>> This may be a dumb question with an obvious answer. >>> >>> It would seem that an input form field identified with an 'id' >>> qualifier and not with a 'name' qualifier is not recognised by >>> Mechanize - at least it isn't in the form field list. >>> >>> Is there any way of getting at these elements or am I, as I >>> suspect, fresh out of luck. But you never know ... >>> >>> Appreciate any help. >>> >>> Regards >>> Ross >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at roscommonhq.com Fri Jan 8 22:23:20 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Sat, 09 Jan 2010 14:23:20 +1100 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> <4B470C00.9070808@roscommonhq.com> Message-ID: <4B47F6A8.2010205@roscommonhq.com> Jeremy As you say, that would be awesome, but unfortunately it doesn't work as we might've hoped. Stuffing the attributes works as expected. Submitting the form does not, however. Any thoughts? Ross jeremywoertink at gmail.com wrote: You could try page.search("#someid").attr("value","somevalue") Then try submitting. That would be awesome if it worked! Let me know :) Ross Cameron wrote: > Yes, it can be grabbed by Nokogiri. But, not sure how it helps because > I want to stuff the field value and submit the form. Not too familiar > with Nokogiri to say it can't do it and having been through the > classes docs, I can't see how it could. After all, it's simply a > parser is it not. Or am I not correct? > > Always willing to learn. > > Would appreciate any assistance here. > > Regards > Ross. > > jeremywoertink at gmail.com wrote: >> Did you try grabbing it by the id with nokogiri? >> >> >> >> On Jan 7, 2010, at 9:17 PM, Ross Cameron wrote: >> >>> Hi >>> >>> This may be a dumb question with an obvious answer. >>> >>> It would seem that an input form field identified with an 'id' >>> qualifier and not with a 'name' qualifier is not recognised by >>> Mechanize - at least it isn't in the form field list. >>> >>> Is there any way of getting at these elements or am I, as I suspect, >>> fresh out of luck. But you never know ... >>> >>> Appreciate any help. >>> >>> Regards >>> Ross >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Sat Jan 9 02:14:21 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Fri, 8 Jan 2010 23:14:21 -0800 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> <4B470C00.9070808@roscommonhq.com> <4B47F6A8.2010205@roscommonhq.com> Message-ID: Hmm... I'm not in front of my machine to try, but.. In mechanize can you use << with the fields? If so... Try input = page.search("#someid") input.attr("name", "somename") page.forms.first.fields << input maybe that will put it in there for you. Hope that works! On Jan 8, 2010, at 7:23 PM, Ross Cameron wrote: > Jeremy > > As you say, that would be awesome, but unfortunately it doesn't work > as we might've hoped. > > Stuffing the attributes works as expected. Submitting the form does > not, however. > > Any thoughts? > > Ross > > jeremywoertink at gmail.com wrote: > > You could try > > page.search("#someid").attr("value","somevalue") > > Then try submitting. That would be awesome if it worked! Let me > know :) > Ross Cameron wrote: >> >> Yes, it can be grabbed by Nokogiri. But, not sure how it helps >> because I want to stuff the field value and submit the form. Not >> too familiar with Nokogiri to say it can't do it and having been >> through the classes docs, I can't see how it could. After all, it's >> simply a parser is it not. Or am I not correct? >> >> Always willing to learn. >> >> Would appreciate any assistance here. >> >> Regards >> Ross. >> >> jeremywoertink at gmail.com wrote: >>> >>> Did you try grabbing it by the id with nokogiri? >>> >>> >>> >>> On Jan 7, 2010, at 9:17 PM, Ross Cameron >>> wrote: >>> >>>> Hi >>>> >>>> This may be a dumb question with an obvious answer. >>>> >>>> It would seem that an input form field identified with an 'id' >>>> qualifier and not with a 'name' qualifier is not recognised by >>>> Mechanize - at least it isn't in the form field list. >>>> >>>> Is there any way of getting at these elements or am I, as I >>>> suspect, fresh out of luck. But you never know ... >>>> >>>> Appreciate any help. >>>> >>>> Regards >>>> Ross >>>> >>>> _______________________________________________ >>>> Mechanize-users mailing list >>>> Mechanize-users at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at roscommonhq.com Sat Jan 9 06:32:59 2010 From: ross at roscommonhq.com (Ross Cameron) Date: Sat, 09 Jan 2010 22:32:59 +1100 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> <4B470C00.9070808@roscommonhq.com> <4B47F6A8.2010205@roscommonhq.com> Message-ID: <4B48696B.8070303@roscommonhq.com> Jeremy Again, another little feature I've learnt - thanks. Following your suggestion, the form looks like this: page.forms.first.fields << input.shift # input was an Array - see your code below # # #(Element:0x..fdb848784 { name = "input", attributes = [ #(Attr:0x..fdb842532 { name = "type", value = "text" }), #(Attr:0x..fdb842528 { name = "id", value = "url" }), #(Attr:0x..fdb84251e { name = "size", value = "60" }), #(Attr:0x..fdb842514 { name = "maxlength", value = "1024" }), #(Attr:0x..fdb84250a { name = "*value*", value = "*abcdef*" })] })} {radiobuttons} ... which is derived from the following test case html

URL: 

... where the field id = 'url' was stuffed with 'abcde', which, on submit, returns the run time error: ...gems/mechanize-0.9.3/lib/www/mechanize/form.rb:142:in `proc_query': undefined method `query_value' for # (NoMethodError) The field was added to the form as above. I am guessing there might a way to cast the Nokogiri or Element class to WWW::Mechanize, in which case it could be a solution??? Ross jeremywoertink at gmail.com wrote: > Hmm... I'm not in front of my machine to try, but.. > In mechanize can you use << with the fields? If so... > Try > input = page.search("#someid") > input.attr("name", "somename") > page.forms.first.fields << input > > maybe that will put it in there for you. > > Hope that works! > > > > On Jan 8, 2010, at 7:23 PM, Ross Cameron > wrote: > >> Jeremy >> >> As you say, that would be awesome, but unfortunately it doesn't work as we might've hoped. >> >> Stuffing the attributes works as expected. Submitting the form does not, however. >> >> Any thoughts? >> >> Ross >> >> jeremywoertink at gmail.com wrote: >> >> You could try >> >> page.search("#someid").attr("value","somevalue") >> >> Then try submitting. That would be awesome if it worked! Let me know :) >> >> Ross Cameron wrote: >>> Yes, it can be grabbed by Nokogiri. But, not sure how it helps >>> because I want to stuff the field value and submit the form. Not too >>> familiar with Nokogiri to say it can't do it and having been through >>> the classes docs, I can't see how it could. After all, it's simply a >>> parser is it not. Or am I not correct? >>> >>> Always willing to learn. >>> >>> Would appreciate any assistance here. >>> >>> Regards >>> Ross. >>> >>> jeremywoertink at gmail.com wrote: >>>> Did you try grabbing it by the id with nokogiri? >>>> >>>> >>>> >>>> On Jan 7, 2010, at 9:17 PM, Ross Cameron >>> > wrote: >>>> >>>>> Hi >>>>> >>>>> This may be a dumb question with an obvious answer. >>>>> >>>>> It would seem that an input form field identified with an 'id' >>>>> qualifier and not with a 'name' qualifier is not recognised by >>>>> Mechanize - at least it isn't in the form field list. >>>>> >>>>> Is there any way of getting at these elements or am I, as I >>>>> suspect, fresh out of luck. But you never know ... >>>>> >>>>> Appreciate any help. >>>>> >>>>> Regards >>>>> Ross >>>>> >>>>> _______________________________________________ >>>>> Mechanize-users mailing list >>>>> Mechanize-users at rubyforge.org >>>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>>> _______________________________________________ >>>> Mechanize-users mailing list >>>> Mechanize-users at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/mechanize-users >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Sat Jan 9 23:26:25 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Sat, 9 Jan 2010 20:26:25 -0800 Subject: [Mechanize-users] input form fields not in the # References: <4B46BFEB.1030400@roscommonhq.com> <9A21E01B-A522-4AEA-8B18-D5C3A3FB5157@gmail.com> <4B470C00.9070808@roscommonhq.com> <4B47F6A8.2010205@roscommonhq.com> <4B48696B.8070303@roscommonhq.com> Message-ID: <1ea5c3821001092026k608d39fg7d3ccf9466381a92@mail.gmail.com> Hey Ross, Did you ever get it? Sorry, I was at CES all day :) It was awesome. Anyway, if you're pushing the information into this field, and it has no name, then just make a new one with the correct name and value :) @new_field = WWW::Mechanize::Form::Field.new("url", " http://www.justprofessionals.com") # shameless plug :) page.forms.first.fields << @new_field @new_page = page.forms.first.submit See if that works for ya, and let me know :) ~Jeremy On Sat, Jan 9, 2010 at 3:32 AM, Ross Cameron wrote: > Jeremy > > Again, another little feature I've learnt - thanks. > > Following your suggestion, the form looks like this: > > page.forms.first.fields << input.shift # input was an Array - see > your code below > > # {name "form1"} > {method "GET"} > {action "form2.php"} > {fields > # > # > #(Element:0x..fdb848784 { > name = "input", > attributes = [ > #(Attr:0x..fdb842532 { name = "type", value = "text" }), > #(Attr:0x..fdb842528 { name = "id", value = "url" }), > #(Attr:0x..fdb84251e { name = "size", value = "60" }), > #(Attr:0x..fdb842514 { name = "maxlength", value = "1024" }), > #(Attr:0x..fdb84250a { name = "*value*", value = "*abcdef*" })] > })} > {radiobuttons} ... > > > which is derived from the following test case html > > >

> > > URL:  value="" /> >

>

... > > where the field id = 'url' was stuffed with 'abcde', > > which, on submit, returns the run time error: > ...gems/mechanize-0.9.3/lib/www/mechanize/form.rb:142:in `proc_query': > undefined method `query_value' for # > (NoMethodError) > > The field was added to the form as above. I am guessing there might a way > to cast the Nokogiri or Element class to WWW::Mechanize, in which case it > could be a solution??? > > > Ross > > > jeremywoertink at gmail.com wrote: > > Hmm... I'm not in front of my machine to try, but.. > In mechanize can you use << with the fields? If so... > Try > input = page.search("#someid") > input.attr("name", "somename") > page.forms.first.fields << input > > maybe that will put it in there for you. > > Hope that works! > > > > On Jan 8, 2010, at 7:23 PM, Ross Cameron wrote: > > Jeremy > > As you say, that would be awesome, but unfortunately it doesn't work as we might've hoped. > > Stuffing the attributes works as expected. Submitting the form does not, however. > > Any thoughts? > > Ross > jeremywoertink at gmail.com wrote: > > You could try > > page.search("#someid").attr("value","somevalue") > > Then try submitting. That would be awesome if it worked! Let me know :) > > > Ross Cameron wrote: > > Yes, it can be grabbed by Nokogiri. But, not sure how it helps because I > want to stuff the field value and submit the form. Not too familiar with > Nokogiri to say it can't do it and having been through the classes docs, I > can't see how it could. After all, it's simply a parser is it not. Or am I > not correct? > > Always willing to learn. > > Would appreciate any assistance here. > > Regards > Ross. > > jeremywoertink at gmail.com wrote: > > Did you try grabbing it by the id with nokogiri? > > > > On Jan 7, 2010, at 9:17 PM, Ross Cameron < > ross at roscommonhq.com> wrote: > > Hi > > This may be a dumb question with an obvious answer. > > It would seem that an input form field identified with an 'id' qualifier > and not with a 'name' qualifier is not recognised by Mechanize - at least it > isn't in the form field list. > > Is there any way of getting at these elements or am I, as I suspect, fresh > out of luck. But you never know ... > > Appreciate any help. > > Regards > Ross > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > ------------------------------ > > _______________________________________________ > Mechanize-users mailing listMechanize-users at rubyforge.orghttp://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > ------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.orghttp://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mr.danielaquino at gmail.com Tue Jan 12 19:53:38 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Tue, 12 Jan 2010 19:53:38 -0500 Subject: [Mechanize-users] EOFError content-length != page length Message-ID: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> I know it seems like EOFError should be raised if content-length != page length but it would be nice to turn this off just like we have quirks mode for html soup. I have an issue where I have to target a broken web server that I don't control. I had to resort to using mechanize in another language that didn't have the check in place rather than editing ruby-mechanize and then having users download my patched version. It would be very nice if there was an option to removed the need for the length received to be equal to the content length header message... From mike at csa.net Tue Jan 12 23:21:02 2010 From: mike at csa.net (Mike Dalessio) Date: Tue, 12 Jan 2010 23:21:02 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> Message-ID: <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> On Tue, Jan 12, 2010 at 7:53 PM, Daniel Aquino wrote: > I know it seems like EOFError should be raised if content-length != > page length but it would be nice to turn this off just like we have > quirks mode for html soup. > > I have an issue where I have to target a broken web server that I don't > control. > >From the FAQ: Q: I keep getting an EOFError: protocol.rb:133:in `sysread': end of file reached (EOFError) A: Some people have experienced an EOFError during normal mechanize usage. Most of the time this occurs because the remote website claims to support keep alives, but does not implement them correctly. Try turning off keep alives on your mechanize object: mech.keep_alive = false Please let us know if this works for you. > > I had to resort to using mechanize in another language that didn't > have the check in place rather than editing ruby-mechanize and then > having users download my patched version. > > It would be very nice if there was an option to removed the need for > the length received to be equal to the content length header > message... > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From mr.danielaquino at gmail.com Tue Jan 12 23:33:59 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Tue, 12 Jan 2010 23:33:59 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> Message-ID: <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> it doesn't I knew about that issue from searching and it's on the faq. My problem is that the page appears truncated but in fact is reached Perhaps attempting to parse the page first and then complaining if there is an error would be better? On Tue, Jan 12, 2010 at 11:21 PM, Mike Dalessio wrote: > > > On Tue, Jan 12, 2010 at 7:53 PM, Daniel Aquino > wrote: >> >> I know it seems like EOFError should be raised if content-length != >> page length but it would be nice to turn this off just like we have >> quirks mode for html soup. >> >> I have an issue where I have to target a broken web server that I don't >> control. > > From the FAQ: > > Q: I keep getting an EOFError: > ? protocol.rb:133:in `sysread': end of file reached (EOFError) > > A:? Some people have experienced an EOFError during normal mechanize usage. > ??? Most of the time this occurs because the remote website claims to > support > ??? keep alives, but does not implement them correctly.? Try turning off > ??? keep alives on your mechanize object: > > ????? mech.keep_alive = false > > Please let us know if this works for you. > >> >> I had to resort to using mechanize in another language that didn't >> have the check in place rather than editing ruby-mechanize and then >> having users download my patched version. >> >> It would be very nice if there was an option to removed the need for >> the length received to be equal to the content length header >> message... >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > > -- > mike dalessio > mike at csa.net > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From mr.danielaquino at gmail.com Mon Jan 18 14:26:23 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Mon, 18 Jan 2010 14:26:23 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> Message-ID: <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> Any plan to make this a feature? I can wip togetther a patch for it On 1/12/10, Daniel Aquino wrote: > it doesn't I knew about that issue from searching and it's on the faq. > > My problem is that the page appears truncated but in fact is > reached > > Perhaps attempting to parse the page first and then complaining if > there is an error would be better? > > On Tue, Jan 12, 2010 at 11:21 PM, Mike Dalessio wrote: >> >> >> On Tue, Jan 12, 2010 at 7:53 PM, Daniel Aquino >> >> wrote: >>> >>> I know it seems like EOFError should be raised if content-length != >>> page length but it would be nice to turn this off just like we have >>> quirks mode for html soup. >>> >>> I have an issue where I have to target a broken web server that I don't >>> control. >> >> From the FAQ: >> >> Q: I keep getting an EOFError: >> ? protocol.rb:133:in `sysread': end of file reached (EOFError) >> >> A:? Some people have experienced an EOFError during normal mechanize >> usage. >> ??? Most of the time this occurs because the remote website claims to >> support >> ??? keep alives, but does not implement them correctly.? Try turning off >> ??? keep alives on your mechanize object: >> >> ????? mech.keep_alive = false >> >> Please let us know if this works for you. >> >>> >>> I had to resort to using mechanize in another language that didn't >>> have the check in place rather than editing ruby-mechanize and then >>> having users download my patched version. >>> >>> It would be very nice if there was an option to removed the need for >>> the length received to be equal to the content length header >>> message... >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> >> >> >> -- >> mike dalessio >> mike at csa.net >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > -- Sent from my mobile device From aaron.patterson at gmail.com Mon Jan 18 14:38:39 2010 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Mon, 18 Jan 2010 11:38:39 -0800 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> Message-ID: <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> On Mon, Jan 18, 2010 at 11:26 AM, Daniel Aquino wrote: > Any plan to make this a feature? ?I can wip togetther a patch for it Go for it. If you include tests, I will apply it. :-D -- Aaron Patterson http://tenderlovemaking.com/ From felipemattosinho at terra.com.br Mon Jan 18 16:40:55 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Mon, 18 Jan 2010 22:40:55 +0100 Subject: [Mechanize-users] How to click on a link, in a specific part of the web page! Help Message-ID: <20100118214057.E690E8000095@bufalo.tpn.terra.com> Hi everybody, I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the both gems however for my master thesis, and since I can?t see much documentation for mechanize on the internet I have a question. URL = 'http://reviews.cnet.com' SEARCH_FIELD_NAME = 'tsearch' XPATH_TO_RESULT_PAGE = "/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" XPATH_TO_FIRST_LINK_RESULT_PAGE = "/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" @@mech = WWW::Mechanize.new # Creates an instance of Mechanize and select CNET Website page = @@mech.get(URL) search_form = page.form(SEARCH_FIELD_NAME) search_form.query = query pre_page = @@mech.submit(search_form, search_form.buttons.first) pre_page.search(XPATH_TO_FIRST_LINK_RESULT_PAGE) do |result| @last_page = WWW::Mechanize::Page::Link.new(result,@@mech, at pre_page).click end My problem is with the variable @last_page. I am not so sure if I am doing something but I believe I am. I mean that was the only way I found to do what I wanted to. On the variable pre_page I search for a specific field where results are present. I cannot rely just on the name of the link because links with the name of my search can be everywhere on this page. That is why I want to specify just a part of the page where a link with the name of my query should be clicked. That was the only way that I found to restrict the links that I want to click. The problem is that I tried to make a new link, based on the result (which is correct, that was the link I was searching for), and to click on it to proceed to the next page. However @last_page is always nil and this is not working. If someone has a good idea or how can I make it correct , please send me a reply! Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Mon Jan 18 19:29:54 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Mon, 18 Jan 2010 16:29:54 -0800 Subject: [Mechanize-users] How to click on a link, in a specific part of the web page! Help In-Reply-To: <1ea5c3821001181615x157aed4cq6b82cc55402fac51@mail.gmail.com> References: <20100118214057.E690E8000095@bufalo.tpn.terra.com> <1ea5c3821001181615x157aed4cq6b82cc55402fac51@mail.gmail.com> Message-ID: <1ea5c3821001181629q27591731j1e52b6ab5a823b38@mail.gmail.com> Here, got bored and decided to try your code out. I cleaned it up a little. http://pastie.org/784099 Try this and see if you get what you're looking for. ~Jeremy Woertink On Mon, Jan 18, 2010 at 4:15 PM, Jeremy Woertink wrote: > well.. first, your variables are all over the place. You have a mix of > Constants, class, instance, and local variables. Not a problem, but since > you're new to Ruby, I would recommend sticking to some sort of normal > scheme. Now with that being said, looking at > > > *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* > ).click > > you call @pre_page which is nil because it was never defined. Did you try > creating it with just the pre_page variable? > > ~Jeremy Woertink > > 2010/1/18 Felipe Jord?o A. P. Mattosinho > >> Hi everybody, >> >> >> >> I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the >> both gems however for my master thesis, and since I can?t see much >> documentation for mechanize on the internet I have a question. >> >> URL = 'http://reviews.cnet.com' >> >> SEARCH_FIELD_NAME = 'tsearch' >> >> XPATH_TO_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" >> >> XPATH_TO_FIRST_LINK_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" >> >> >> >> @@mech = WWW::Mechanize.new >> >> >> >> # Creates an instance of Mechanize and select CNET Website >> >> >> >> page = @@mech.get(URL) >> >> >> >> >> >> search_form = page.form(SEARCH_FIELD_NAME) >> >> >> >> search_form.query = query >> >> >> >> pre_page = @@mech.submit(search_form, search_form.buttons.first) >> >> * * >> >> * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| >> >> >> >> * @last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,* >> @pre_page*).click >> >> >> >> end >> >> >> >> My problem is with the variable @last_page. I am not so sure if I am >> doing something but I believe I am. I mean that was the only way I found to >> do what I wanted to. >> >> On the variable pre_page I search for a specific field where results are >> present. I cannot rely just on the name of the link because links with the >> name of my search can be everywhere on this page. That is why I want to >> specify just a part of the page where a link with the name of my query >> should be clicked. That was the only way that I found to restrict the links >> that I want to click. The problem is that I tried to make a new link, based >> on the result (which is correct, that was the link I was searching for), and >> to click on it to proceed to the next page. However @last_page is always nil >> and this is not working. If someone has a good idea or how can I make it >> correct , please send me a reply! >> >> Best Regards >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Mon Jan 18 19:15:33 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Mon, 18 Jan 2010 16:15:33 -0800 Subject: [Mechanize-users] How to click on a link, in a specific part of the web page! Help In-Reply-To: <20100118214057.E690E8000095@bufalo.tpn.terra.com> References: <20100118214057.E690E8000095@bufalo.tpn.terra.com> Message-ID: <1ea5c3821001181615x157aed4cq6b82cc55402fac51@mail.gmail.com> well.. first, your variables are all over the place. You have a mix of Constants, class, instance, and local variables. Not a problem, but since you're new to Ruby, I would recommend sticking to some sort of normal scheme. Now with that being said, looking at *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* ).click you call @pre_page which is nil because it was never defined. Did you try creating it with just the pre_page variable? ~Jeremy Woertink 2010/1/18 Felipe Jord?o A. P. Mattosinho > Hi everybody, > > > > I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the > both gems however for my master thesis, and since I can?t see much > documentation for mechanize on the internet I have a question. > > URL = 'http://reviews.cnet.com' > > SEARCH_FIELD_NAME = 'tsearch' > > XPATH_TO_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" > > XPATH_TO_FIRST_LINK_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" > > > > @@mech = WWW::Mechanize.new > > > > # Creates an instance of Mechanize and select CNET Website > > > > page = @@mech.get(URL) > > > > > > search_form = page.form(SEARCH_FIELD_NAME) > > > > search_form.query = query > > > > pre_page = @@mech.submit(search_form, search_form.buttons.first) > > * * > > * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| > > > > * @last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page > *).click > > > > end > > > > My problem is with the variable @last_page. I am not so sure if I am doing > something but I believe I am. I mean that was the only way I found to do > what I wanted to. > > On the variable pre_page I search for a specific field where results are > present. I cannot rely just on the name of the link because links with the > name of my search can be everywhere on this page. That is why I want to > specify just a part of the page where a link with the name of my query > should be clicked. That was the only way that I found to restrict the links > that I want to click. The problem is that I tried to make a new link, based > on the result (which is correct, that was the link I was searching for), and > to click on it to proceed to the next page. However @last_page is always nil > and this is not working. If someone has a good idea or how can I make it > correct , please send me a reply! > > Best Regards > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize-mail at zizee.com Mon Jan 18 20:42:08 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Tue, 19 Jan 2010 11:42:08 +1000 Subject: [Mechanize-users] How to stop downloading of non text (PDFs, Images etc.) Message-ID: <4B550DF0.7070304@zizee.com> Howdy All, I am creating a tool that allows a user to request a URL to be downloaded and information viewed about the downloaded content. I would like to restrict the user to being able to only request URLs that resolve to html or xml (or other text documents e.g. txt, jsp, asp etc), as the tool has no useful functionality if someone specifies a PNG, PDF or any other binary file. I would like the tool to fail fast instead of trying to download a 20 MB powerpoint which is of no use. At first I thought I would just validate the URL to ensure that it does not end with certain suffixes, but quickly realised that since a server can redirect url with impunity, screening out the URLs before at the start won't catch all instances. Trying to maintain a list of all valid (or all invalid) file extensions would also be a painful maintenance overhead. I thought that there may be a way to set valid mime-types for mechanize, or to perform tests on the final URL after all redirects have finished before the download starts. Anyway, I am hoping somebody could give me a suggestion on how I could achieve this filtering, or even just steer me in the right direction to (even a suggestion of a good google query would be helpful!). Thanks, -Jimmy From felipemattosinho at terra.com.br Mon Jan 18 20:48:44 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Tue, 19 Jan 2010 02:48:44 +0100 Subject: [Mechanize-users] RES: How to click on a link, in a specific part of the web page! Help In-Reply-To: <000e0cdfc5be29afcc047d78dbc6@google.com> Message-ID: <20100119014846.24D7DA0000084@caplan.tpn.terra.com> Hi Matt, Thanks for your reply. Each solved the problem! Regards! _____ De: mattw922 at gmail.com [mailto:mattw922 at gmail.com] Enviada em: ter?a-feira, 19 de janeiro de 2010 00:39 Para: Felipe Jord?o A. P. Mattosinho Assunto: Re: [Mechanize-users] How to click on a link, in a specific part of the web page! Help Felipe, Where you have: pre_page.search(XPATH_TO_FIRST_LINK_RESULT_PAGE) do |result| @last_page = WWW::Mechanize::Page::Link.new(result,@@mech, at pre_page).click You should have: pre_page.search(XPATH_TO_FIRST_LINK_RESULT_PAGE).each do |result| @last_page = WWW::Mechanize::Page::Link.new(result,@@mech,pre_page).click Notice the .each and the removal of an @. Worked for me. Matt On Jan 18, 2010 2:40pm, "Felipe Jord?o A. P. Mattosinho" wrote: > > > > > > > > > > > > > > Hi everybody, > > > > I am a new Ruby programmer and Mechanize & > Nokogiri user. I am using the both gems however for my master thesis, and since > I can?t see much documentation for mechanize on the internet I have a > question. > > > > > > URL = > 'http://reviews.cnet.com' > > SEARCH_FIELD_NAME = 'tsearch' > > XPATH_TO_RESULT_PAGE = "/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" > > > XPATH_TO_FIRST_LINK_RESULT_PAGE = > "/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" > > > > @@mech = WWW::Mechanize.new > > > > # Creates an instance of Mechanize > and select CNET Website > > > > page = @@mech.get(URL) > > > > > > search_form = page.form(SEARCH_FIELD_NAME) > > > > search_form.query = query > > > > pre_page > = @@mech.submit(search_form, > search_form.buttons.first) > > > > pre_page.search(XPATH_TO_FIRST_LINK_RESULT_PAGE) do |result| > > > > @last_page = WWW::Mechanize::Page::Link.new(result,@@mech, at pre_page).click > > > > end > > > > My problem is with the variable @last_page. I am not > so sure if I am doing something but I believe I am. I mean that was the only > way I found to do what I wanted to. > > On the variable pre_page I search for a specific field > where results are present. I cannot rely just on the name of the link because > links with the name of my search can be everywhere on this page. That is why I want > to specify just a part of the page where a link with the name of my query > should be clicked. That was the only way that I found to restrict the links > that I want to click. The problem is that I tried to make a new link, based on > the result (which is correct, that was the link I was searching for), and to > click on it to proceed to the next page. However @last_page is always nil and > this is not working. If someone has a good idea or how can I make it correct , > please send me a reply! > > > > Best Regards > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mr.danielaquino at gmail.com Tue Jan 19 10:50:11 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Tue, 19 Jan 2010 10:50:11 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> Message-ID: <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> what does the comment mean here ? Does this mean there is a way around this by specifying the content length while making the request ? # Net::HTTP ignores EOFError if Content-length is given, so we emulate it here. unless res_klass <= Net::HTTPRedirection raise EOFError if (!params[:request].is_a?(Net::HTTP::Head)) && @response.content_length() && @response.content_length() != total end On Mon, Jan 18, 2010 at 2:38 PM, Aaron Patterson wrote: > On Mon, Jan 18, 2010 at 11:26 AM, Daniel Aquino > wrote: >> Any plan to make this a feature? ?I can wip togetther a patch for it > > Go for it. ?If you include tests, I will apply it. ?:-D > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From mr.danielaquino at gmail.com Tue Jan 19 10:56:25 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Tue, 19 Jan 2010 10:56:25 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> Message-ID: <66f0f93e1001190756p1bfd4d3fw7f8d93b9bf40ea15@mail.gmail.com> my problem here is that @response.content_length() = 1142 but total = 1109.... again as i said this is a broken http server that I must deal with so it would be nice to have a way around this type of misconfiguration so the lib gracefully deals with the situation... I clearly see the entire page being loaded including the final so if you ask me mechanize should be good enough to handle this if it expects to deal with the web ! On Tue, Jan 19, 2010 at 10:50 AM, Daniel Aquino wrote: > what does the comment mean here ? > > Does this mean there is a way around this by specifying the content > length while making the request ? > > ? ? ? ? ?# Net::HTTP ignores EOFError if Content-length is given, so > we emulate it here. > ? ? ? ? ?unless res_klass <= Net::HTTPRedirection > ? ? ? ? ? ?raise EOFError if > (!params[:request].is_a?(Net::HTTP::Head)) && > @response.content_length() && @response.content_length() != total > ? ? ? ? ?end > > > > > On Mon, Jan 18, 2010 at 2:38 PM, Aaron Patterson > wrote: >> On Mon, Jan 18, 2010 at 11:26 AM, Daniel Aquino >> wrote: >>> Any plan to make this a feature? ?I can wip togetther a patch for it >> >> Go for it. ?If you include tests, I will apply it. ?:-D >> >> -- >> Aaron Patterson >> http://tenderlovemaking.com/ >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > From aaron.patterson at gmail.com Tue Jan 19 12:09:44 2010 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Tue, 19 Jan 2010 09:09:44 -0800 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001190756p1bfd4d3fw7f8d93b9bf40ea15@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> <66f0f93e1001190756p1bfd4d3fw7f8d93b9bf40ea15@mail.gmail.com> Message-ID: <6959e1681001190909s2a7c86cfv21e78990ae5d58ce@mail.gmail.com> On Tue, Jan 19, 2010 at 7:56 AM, Daniel Aquino wrote: > my problem here is that @response.content_length() = 1142 but total = 1109.... > > again as i said this is a broken http server that I must deal with so > it would be nice to have a way around this type of misconfiguration so > the lib gracefully deals with the situation... I clearly see the > entire page being loaded including the final so if you > ask me mechanize should be good enough to handle this if it expects to > deal with the web ! Cool story! So do you have a patch or tests? -- Aaron Patterson http://tenderlovemaking.com/ From mr.danielaquino at gmail.com Tue Jan 19 12:25:36 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Tue, 19 Jan 2010 12:25:36 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <6959e1681001190909s2a7c86cfv21e78990ae5d58ce@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> <66f0f93e1001190756p1bfd4d3fw7f8d93b9bf40ea15@mail.gmail.com> <6959e1681001190909s2a7c86cfv21e78990ae5d58ce@mail.gmail.com> Message-ID: <66f0f93e1001190925vc3954c1g7df2e2e1919a4668@mail.gmail.com> no cause i was wondering about those comments in the code if there was a way to override it already... I'm not sure how you would want to add this to the interface... perhaps there could be a quirks flag or ignore_content_length flag ? On Tue, Jan 19, 2010 at 12:09 PM, Aaron Patterson wrote: > On Tue, Jan 19, 2010 at 7:56 AM, Daniel Aquino > wrote: >> my problem here is that @response.content_length() = 1142 but total = 1109.... >> >> again as i said this is a broken http server that I must deal with so >> it would be nice to have a way around this type of misconfiguration so >> the lib gracefully deals with the situation... I clearly see the >> entire page being loaded including the final so if you >> ask me mechanize should be good enough to handle this if it expects to >> deal with the web ! > > Cool story! > > So do you have a patch or tests? > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From felipemattosinho at terra.com.br Tue Jan 19 13:16:20 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Tue, 19 Jan 2010 19:16:20 +0100 Subject: [Mechanize-users] How to click on a link, in a specific part of the web In-Reply-To: Message-ID: <20100119181622.42BFA100000F0@montmartre.tpn.terra.com> Hi Jeremy, Thanks your solution is very clean too, and helped a lot. The reason why I mixed the variables is because they are part of a class which I didn?t publish here. Then makes more sense for me to use mechanize (@@mech) as a class variable and not as a instance variable. The @ in the wrong place, was a mistyping stuff!!! Best Regards -----Mensagem original----- De: mechanize-users-bounces at rubyforge.org [mailto:mechanize-users-bounces at rubyforge.org] Em nome de mechanize-users-request at rubyforge.org Enviada em: ter?a-feira, 19 de janeiro de 2010 01:45 Para: mechanize-users at rubyforge.org Assunto: Mechanize-users Digest, Vol 34, Issue 5 Send Mechanize-users mailing list submissions to mechanize-users at rubyforge.org To subscribe or unsubscribe via the World Wide Web, visit http://rubyforge.org/mailman/listinfo/mechanize-users or, via email, send a message with subject or body 'help' to mechanize-users-request at rubyforge.org You can reach the person managing the list at mechanize-users-owner at rubyforge.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Mechanize-users digest..." Today's Topics: 1. Re: How to click on a link, in a specific part of the web page! Help (Jeremy Woertink) 2. Re: How to click on a link, in a specific part of the web page! Help (Jeremy Woertink) ---------------------------------------------------------------------- Message: 1 Date: Mon, 18 Jan 2010 16:29:54 -0800 From: Jeremy Woertink To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] How to click on a link, in a specific part of the web page! Help Message-ID: <1ea5c3821001181629q27591731j1e52b6ab5a823b38 at mail.gmail.com> Content-Type: text/plain; charset="windows-1252" Here, got bored and decided to try your code out. I cleaned it up a little. http://pastie.org/784099 Try this and see if you get what you're looking for. ~Jeremy Woertink On Mon, Jan 18, 2010 at 4:15 PM, Jeremy Woertink wrote: > well.. first, your variables are all over the place. You have a mix of > Constants, class, instance, and local variables. Not a problem, but since > you're new to Ruby, I would recommend sticking to some sort of normal > scheme. Now with that being said, looking at > > > *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* > ).click > > you call @pre_page which is nil because it was never defined. Did you try > creating it with just the pre_page variable? > > ~Jeremy Woertink > > 2010/1/18 Felipe Jord?o A. P. Mattosinho > >> Hi everybody, >> >> >> >> I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the >> both gems however for my master thesis, and since I can?t see much >> documentation for mechanize on the internet I have a question. >> >> URL = 'http://reviews.cnet.com' >> >> SEARCH_FIELD_NAME = 'tsearch' >> >> XPATH_TO_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" >> >> XPATH_TO_FIRST_LINK_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" >> >> >> >> @@mech = WWW::Mechanize.new >> >> >> >> # Creates an instance of Mechanize and select CNET Website >> >> >> >> page = @@mech.get(URL) >> >> >> >> >> >> search_form = page.form(SEARCH_FIELD_NAME) >> >> >> >> search_form.query = query >> >> >> >> pre_page = @@mech.submit(search_form, search_form.buttons.first) >> >> * * >> >> * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| >> >> >> >> * @last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,* >> @pre_page*).click >> >> >> >> end >> >> >> >> My problem is with the variable @last_page. I am not so sure if I am >> doing something but I believe I am. I mean that was the only way I found to >> do what I wanted to. >> >> On the variable pre_page I search for a specific field where results are >> present. I cannot rely just on the name of the link because links with the >> name of my search can be everywhere on this page. That is why I want to >> specify just a part of the page where a link with the name of my query >> should be clicked. That was the only way that I found to restrict the links >> that I want to click. The problem is that I tried to make a new link, based >> on the result (which is correct, that was the link I was searching for), and >> to click on it to proceed to the next page. However @last_page is always nil >> and this is not working. If someone has a good idea or how can I make it >> correct , please send me a reply! >> >> Best Regards >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Mon, 18 Jan 2010 16:15:33 -0800 From: Jeremy Woertink To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] How to click on a link, in a specific part of the web page! Help Message-ID: <1ea5c3821001181615x157aed4cq6b82cc55402fac51 at mail.gmail.com> Content-Type: text/plain; charset="windows-1252" well.. first, your variables are all over the place. You have a mix of Constants, class, instance, and local variables. Not a problem, but since you're new to Ruby, I would recommend sticking to some sort of normal scheme. Now with that being said, looking at *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* ).click you call @pre_page which is nil because it was never defined. Did you try creating it with just the pre_page variable? ~Jeremy Woertink 2010/1/18 Felipe Jord?o A. P. Mattosinho > Hi everybody, > > > > I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the > both gems however for my master thesis, and since I can?t see much > documentation for mechanize on the internet I have a question. > > URL = 'http://reviews.cnet.com' > > SEARCH_FIELD_NAME = 'tsearch' > > XPATH_TO_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" > > XPATH_TO_FIRST_LINK_RESULT_PAGE ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" > > > > @@mech = WWW::Mechanize.new > > > > # Creates an instance of Mechanize and select CNET Website > > > > page = @@mech.get(URL) > > > > > > search_form = page.form(SEARCH_FIELD_NAME) > > > > search_form.query = query > > > > pre_page = @@mech.submit(search_form, search_form.buttons.first) > > * * > > * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| > > > > * @last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page > *).click > > > > end > > > > My problem is with the variable @last_page. I am not so sure if I am doing > something but I believe I am. I mean that was the only way I found to do > what I wanted to. > > On the variable pre_page I search for a specific field where results are > present. I cannot rely just on the name of the link because links with the > name of my search can be everywhere on this page. That is why I want to > specify just a part of the page where a link with the name of my query > should be clicked. That was the only way that I found to restrict the links > that I want to click. The problem is that I tried to make a new link, based > on the result (which is correct, that was the link I was searching for), and > to click on it to proceed to the next page. However @last_page is always nil > and this is not working. If someone has a good idea or how can I make it > correct , please send me a reply! > > Best Regards > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users End of Mechanize-users Digest, Vol 34, Issue 5 ********************************************** From barjunk at attglobal.net Tue Jan 19 20:17:43 2010 From: barjunk at attglobal.net (barsalou) Date: Tue, 19 Jan 2010 16:17:43 -0900 Subject: [Mechanize-users] How do I mechanize a link like this? Message-ID: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> Here is the clip: onclick="window.open('/view/viewFullBill.doview' There are some other things obviously....but is this just another agent.get? Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From jeremywoertink at gmail.com Wed Jan 20 13:24:34 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Wed, 20 Jan 2010 10:24:34 -0800 Subject: [Mechanize-users] How to click on a link, in a specific part of the web In-Reply-To: <20100119181622.42BFA100000F0@montmartre.tpn.terra.com> References: <20100119181622.42BFA100000F0@montmartre.tpn.terra.com> Message-ID: <1ea5c3821001201024t2914d705w3f8e9ddd6e88fdcf@mail.gmail.com> ah, then that would make sense! I'm glad you got it working :) ~Jeremy 2010/1/19 Felipe Jord?o A. P. Mattosinho > Hi Jeremy, > > Thanks your solution is very clean too, and helped a lot. The reason why I > mixed the variables is because they are part of a class which I didn?t > publish here. Then makes more sense for me to use mechanize (@@mech) as a > class variable and not as a instance variable. The @ in the wrong place, > was > a mistyping stuff!!! > > > Best Regards > > > > > > -----Mensagem original----- > De: mechanize-users-bounces at rubyforge.org > [mailto:mechanize-users-bounces at rubyforge.org] Em nome de > mechanize-users-request at rubyforge.org > Enviada em: ter?a-feira, 19 de janeiro de 2010 01:45 > Para: mechanize-users at rubyforge.org > Assunto: Mechanize-users Digest, Vol 34, Issue 5 > > Send Mechanize-users mailing list submissions to > mechanize-users at rubyforge.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://rubyforge.org/mailman/listinfo/mechanize-users > or, via email, send a message with subject or body 'help' to > mechanize-users-request at rubyforge.org > > You can reach the person managing the list at > mechanize-users-owner at rubyforge.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Mechanize-users digest..." > > > Today's Topics: > > 1. Re: How to click on a link, in a specific part of the web > page! Help (Jeremy Woertink) > 2. Re: How to click on a link, in a specific part of the web > page! Help (Jeremy Woertink) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 18 Jan 2010 16:29:54 -0800 > From: Jeremy Woertink > To: Ruby Mechanize Users List > Subject: Re: [Mechanize-users] How to click on a link, in a specific > part of the web page! Help > Message-ID: > <1ea5c3821001181629q27591731j1e52b6ab5a823b38 at mail.gmail.com> > Content-Type: text/plain; charset="windows-1252" > > Here, > > got bored and decided to try your code out. I cleaned it up a little. > > http://pastie.org/784099 > > Try this and see if you get what you're looking for. > > ~Jeremy Woertink > > On Mon, Jan 18, 2010 at 4:15 PM, Jeremy Woertink > wrote: > > > well.. first, your variables are all over the place. You have a mix of > > Constants, class, instance, and local variables. Not a problem, but since > > you're new to Ruby, I would recommend sticking to some sort of normal > > scheme. Now with that being said, looking at > > > > > > *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* > > ).click > > > > you call @pre_page which is nil because it was never defined. Did you try > > creating it with just the pre_page variable? > > > > ~Jeremy Woertink > > > > 2010/1/18 Felipe Jord?o A. P. Mattosinho > > > >> Hi everybody, > >> > >> > >> > >> I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the > >> both gems however for my master thesis, and since I can?t see much > >> documentation for mechanize on the internet I have a question. > >> > >> URL = 'http://reviews.cnet.com' > >> > >> SEARCH_FIELD_NAME = 'tsearch' > >> > >> XPATH_TO_RESULT_PAGE > ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" > >> > >> XPATH_TO_FIRST_LINK_RESULT_PAGE > ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" > >> > >> > >> > >> @@mech = WWW::Mechanize.new > >> > >> > >> > >> # Creates an instance of Mechanize and select CNET Website > >> > >> > >> > >> page = @@mech.get(URL) > >> > >> > >> > >> > >> > >> search_form = page.form(SEARCH_FIELD_NAME) > >> > >> > >> > >> search_form.query = query > >> > >> > >> > >> pre_page = @@mech.submit(search_form, search_form.buttons.first) > >> > >> * * > >> > >> * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| > >> > >> > >> > >> * @last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,* > >> @pre_page*).click > >> > >> > >> > >> end > >> > >> > >> > >> My problem is with the variable @last_page. I am not so sure if I am > >> doing something but I believe I am. I mean that was the only way I found > to > >> do what I wanted to. > >> > >> On the variable pre_page I search for a specific field where results are > >> present. I cannot rely just on the name of the link because links with > the > >> name of my search can be everywhere on this page. That is why I want to > >> specify just a part of the page where a link with the name of my query > >> should be clicked. That was the only way that I found to restrict the > links > >> that I want to click. The problem is that I tried to make a new link, > based > >> on the result (which is correct, that was the link I was searching for), > and > >> to click on it to proceed to the next page. However @last_page is always > nil > >> and this is not working. If someone has a good idea or how can I make it > >> correct , please send me a reply! > >> > >> Best Regards > >> > >> _______________________________________________ > >> Mechanize-users mailing list > >> Mechanize-users at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/mechanize-users > >> > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > < > http://rubyforge.org/pipermail/mechanize-users/attachments/20100118/2974f9f > 9/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 18 Jan 2010 16:15:33 -0800 > From: Jeremy Woertink > To: Ruby Mechanize Users List > Subject: Re: [Mechanize-users] How to click on a link, in a specific > part of the web page! Help > Message-ID: > <1ea5c3821001181615x157aed4cq6b82cc55402fac51 at mail.gmail.com> > Content-Type: text/plain; charset="windows-1252" > > well.. first, your variables are all over the place. You have a mix of > Constants, class, instance, and local variables. Not a problem, but since > you're new to Ruby, I would recommend sticking to some sort of normal > scheme. Now with that being said, looking at > > *@last_page* = WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page* > ).click > > you call @pre_page which is nil because it was never defined. Did you try > creating it with just the pre_page variable? > > ~Jeremy Woertink > > 2010/1/18 Felipe Jord?o A. P. Mattosinho > > > Hi everybody, > > > > > > > > I am a new Ruby programmer and Mechanize & Nokogiri user. I am using the > > both gems however for my master thesis, and since I can?t see much > > documentation for mechanize on the internet I have a question. > > > > URL = 'http://reviews.cnet.com' > > > > SEARCH_FIELD_NAME = 'tsearch' > > > > XPATH_TO_RESULT_PAGE > ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul" > > > > XPATH_TO_FIRST_LINK_RESULT_PAGE > ="/html/body/div[2]/div/div[2]/div[3]/div[2]/form/ul/li/div[4]/a" > > > > > > > > @@mech = WWW::Mechanize.new > > > > > > > > # Creates an instance of Mechanize and select CNET Website > > > > > > > > page = @@mech.get(URL) > > > > > > > > > > > > search_form = page.form(SEARCH_FIELD_NAME) > > > > > > > > search_form.query = query > > > > > > > > pre_page = @@mech.submit(search_form, search_form.buttons.first) > > > > * * > > > > * pre_page*.search(XPATH_TO_FIRST_LINK_RESULT_PAGE)* do* |result| > > > > > > > > * @last_page* = > WWW::Mechanize::Page::Link.new(result,@@mech,*@pre_page > > *).click > > > > > > > > end > > > > > > > > My problem is with the variable @last_page. I am not so sure if I am > doing > > something but I believe I am. I mean that was the only way I found to do > > what I wanted to. > > > > On the variable pre_page I search for a specific field where results are > > present. I cannot rely just on the name of the link because links with > the > > name of my search can be everywhere on this page. That is why I want to > > specify just a part of the page where a link with the name of my query > > should be clicked. That was the only way that I found to restrict the > links > > that I want to click. The problem is that I tried to make a new link, > based > > on the result (which is correct, that was the link I was searching for), > and > > to click on it to proceed to the next page. However @last_page is always > nil > > and this is not working. If someone has a good idea or how can I make it > > correct , please send me a reply! > > > > Best Regards > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > < > http://rubyforge.org/pipermail/mechanize-users/attachments/20100118/b6d355a > 4/attachment.html > > > > ------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > End of Mechanize-users Digest, Vol 34, Issue 5 > ********************************************** > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Wed Jan 20 13:27:17 2010 From: jeremywoertink at gmail.com (Jeremy Woertink) Date: Wed, 20 Jan 2010 10:27:17 -0800 Subject: [Mechanize-users] How do I mechanize a link like this? In-Reply-To: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> References: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> Message-ID: <1ea5c3821001201027k41fdd16ev99577cf6863a230d@mail.gmail.com> Mechanize doesn't support JavaScript. So any links that have onclick events, or use javascript as their href won't work. Check out Watir gem You can also look into Johnson or Lyndon (if you use MacRuby). These both can handle Javascript. Good luck. ~Jeremy On Tue, Jan 19, 2010 at 5:17 PM, barsalou wrote: > Here is the clip: > > onclick="window.open('/view/viewFullBill.doview' > > There are some other things obviously....but is this just another > agent.get? > > Mike B. > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From astarr at wiredquote.com Wed Jan 20 14:28:38 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Wed, 20 Jan 2010 11:28:38 -0800 Subject: [Mechanize-users] How do I mechanize a link like this? In-Reply-To: <1ea5c3821001201027k41fdd16ev99577cf6863a230d@mail.gmail.com> References: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> <1ea5c3821001201027k41fdd16ev99577cf6863a230d@mail.gmail.com> Message-ID: <669cc1ca1001201128x24a51d0fn2507b37324f54ab7@mail.gmail.com> Or, you can do something like: agent.get page.content.match(/window\.open\('([^']+)/)[1] That's completely untested, but you get the gist. Use a regex to pull out the URL, then get it. Aaron On Wed, Jan 20, 2010 at 10:27 AM, Jeremy Woertink wrote: > Mechanize doesn't support JavaScript. So any links that have onclick > events, or use javascript as their href won't work. Check out Watir gem > > You can also look into Johnson or Lyndon (if you use MacRuby). These both > can handle Javascript. > > Good luck. > > ~Jeremy > > > > On Tue, Jan 19, 2010 at 5:17 PM, barsalou wrote: > >> Here is the clip: >> >> onclick="window.open('/view/viewFullBill.doview' >> >> There are some other things obviously....but is this just another >> agent.get? >> >> Mike B. >> >> ---------------------------------------------------------------- >> This message was sent using IMP, the Internet Messaging Program. >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barjunk at attglobal.net Wed Jan 20 16:14:25 2010 From: barjunk at attglobal.net (barsalou) Date: Wed, 20 Jan 2010 12:14:25 -0900 Subject: [Mechanize-users] How do I mechanize a link like this? In-Reply-To: <669cc1ca1001201128x24a51d0fn2507b37324f54ab7@mail.gmail.com> References: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> <1ea5c3821001201027k41fdd16ev99577cf6863a230d@mail.gmail.com> <669cc1ca1001201128x24a51d0fn2507b37324f54ab7@mail.gmail.com> Message-ID: <20100120121425.xq0b9848pwkswcc4@lcgalaska.com> Quoting Aaron Starr : > Or, you can do something like: > > agent.get page.content.match(/window\.open\('([^']+)/)[1] > > That's completely untested, but you get the gist. Use a regex to pull out > the URL, then get it. > That's a good idea too. I think in this case that link won't change, however. Is there a way to capture, besides ethereal, what a web page is posting so I can mimick the actions? Maybe some sort of firefox plugin? There is this javascript thing that happens then produces the page I want....maybe I'll have to resort to ethereal.... Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From astarr at wiredquote.com Wed Jan 20 16:42:42 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Wed, 20 Jan 2010 13:42:42 -0800 Subject: [Mechanize-users] How do I mechanize a link like this? In-Reply-To: <20100120121425.xq0b9848pwkswcc4@lcgalaska.com> References: <20100119161743.g0qk64r3swkow4ok@lcgalaska.com> <1ea5c3821001201027k41fdd16ev99577cf6863a230d@mail.gmail.com> <669cc1ca1001201128x24a51d0fn2507b37324f54ab7@mail.gmail.com> <20100120121425.xq0b9848pwkswcc4@lcgalaska.com> Message-ID: <669cc1ca1001201342l4b5632c1t3ebd934324ee5e61@mail.gmail.com> On Wed, Jan 20, 2010 at 1:14 PM, barsalou wrote: > [...] > Is there a way to capture, besides ethereal, what a web page is posting so > I can mimick the actions? > [...] > I like Charles. http://www.charlesproxy.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize-mail at zizee.com Wed Jan 20 20:14:12 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Thu, 21 Jan 2010 11:14:12 +1000 Subject: [Mechanize-users] Cannot download www.yahoo.com with firefox user_agent string Message-ID: <4B57AA64.7050306@zizee.com> Hi All, This is rather lengthy, but thought the info should be useful. If anyone can help me it would be much appreciated! I'm having an issue trying to get mechanize to download yahoo.com whilst having mechanize identify itself as firefox 3.1. When I leave the mechanize user_agent string as default, the address "http://www.yahoo.com" ends up being redirected to "http://au.yahoo.com/?p=us". When I change the user_agent string to firefox's using the built in alias "Mac FireFox", and also my browser's user agent) I get directed to "http://m.www.yahoo.com/" which, if put into firefox, will resolve to "http://au.yahoo.com/?p=us". Interestingly if I user the user_agent_alias of "Linux Mozilla" Has anyone seen this before? It is the only url I have had problems with so I'm a bit perplexed, I'm guessing yahoo is doing something a little weird and mechanize is probably in the right, but it would be good if there was a work around someone could suggest. It is quite important for my application to be able to use their own user agent strings, so I would prefer not to limit the user to only run as the default user agent (or the preset aliases). I have tried the following changes to default settings: agent.redirection_limit = 50 agent.follow_meta_refresh = true But this has had no effect. BTW: I am running ubuntu 9.04 with "ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]" and the following gems installed: -mechanize (0.9.3) -nokogiri (1.4.1) Here is a copy and paste from the console if you would like to reproduce it: irb(main):002:0> require 'rubygems' => true irb(main):003:0> require 'mechanize' => true irb(main):004:0> agent = WWW::Mechanize.new => #, @proxy_port=nil, @history=[], @open_timeout=nil, @keep_alive=true, @auth_hash={}, @cert=nil, @post_connect_hook=#, @follow_meta_refresh=false, @watch_for_set=nil, @proxy_pass=nil, @redirect_ok=true, @log=nil, @keep_alive_time=300, @digest=nil, @verify_callback=nil, @conditional_requests=true, @pluggable_parser=#WWW::Mechanize::Page, "text/html"=>WWW::Mechanize::Page, "application/vnd.wap.xhtml+xml"=>WWW::Mechanize::Page}>, @user_agent="WWW-Mechanize/0.9.3 (http://rubyforge.org/projects/mechanize/)", @proxy_addr=nil, @pass=nil, @html_parser=Nokogiri::HTML, @connection_cache={}, @password=nil, @ca_file=nil, @proxy_user=nil, @read_timeout=nil, @scheme_handlers={"https"=>#, "file"=>#, "http"=>#, "relative"=>#}, @request_headers={}, @key=nil, @cookie_jar=#, @redirection_limit=20, @user=nil, @history_added=nil> irb(main):005:0> page = agent.get "http://www.yahoo.com" => #} {meta} {title "Yahoo!7"} [SNIP - it seems to download correctly] irb(main):006:0> page.uri.to_s => "http://au.yahoo.com/?p=us" **I'm coming from an Australian IP, yahoo will probably send you elsewhere if you not in Oz** irb(main):007:0> agent.user_agent = "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15" => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15) Gecko/2009102815 Ubuntu/9.04 (jaunty) Firefox/3.0.15" irb(main):008:0> page = agent.get "http://www.yahoo.com" => #} {meta} {title nil} {iframes} {frames} {links} {forms}> irb(main):009:0> page.uri.to_s => "http://m.www.yahoo.com/" Thanks, -Jimmy From astarr at wiredquote.com Wed Jan 20 20:31:32 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Wed, 20 Jan 2010 17:31:32 -0800 Subject: [Mechanize-users] Concurrency issue in library? In-Reply-To: <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> References: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> Message-ID: <669cc1ca1001201731t713eba74u879ff3bf2885aa49@mail.gmail.com> Using someone else's experience and the results of my own futzing, I've ended up with the following monkey patch: http://pastie.org/787553 Basically, in addition to checking for http_obj.frozen? it also checks for http_obj.ssl_context.frozen? It's difficult for me to tell if it's working, because it's only a rare, intermittent problem. So far, so good. If someone is familiar with the code, can you tell me if this is a really bad idea for some reason? In particular, it's not clear to me what should happen if the ssl_context is frozen. Or, why the http_obj might not be frozen when the ssl_context is. But that's apparently what's happening. Aaron On Thu, Jan 7, 2010 at 4:04 PM, Aaron Patterson wrote: > On Thu, Jan 7, 2010 at 1:32 PM, Aaron Starr wrote: > > > > I'm using mechanize with separate agents in separate threads. I recently > got > > the following error, and suspect that it may be a concurrency issue -- > i.e., > > both threads were futzing with the same object. Anyone else think that > could > > be the case? > > Separate agents should be completely separate. I think this might be > a bug specific to SSL requests. Would you mind grabbing the gem from > github and trying it out? I think it may fix this problem. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertj at promedicalinc.com Thu Jan 21 07:19:10 2010 From: robertj at promedicalinc.com (Robert Jackson) Date: Thu, 21 Jan 2010 07:19:10 -0500 Subject: [Mechanize-users] Concurrency issue in library? In-Reply-To: <669cc1ca1001201731t713eba74u879ff3bf2885aa49@mail.gmail.com> References: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> <669cc1ca1001201731t713eba74u879ff3bf2885aa49@mail.gmail.com> Message-ID: I have had a similar situation in a routine that we have been working on for the last few weeks. Our conclusion was the same as yours although we implemented it slightly differently. Obviously, we didn't like monkey patching the system so we dug around a bit, and found that the issue has already been solved upstream in the following commits: http://github.com/tenderlove/mechanize/commit/f2fc38873c81f95472111da30b56d99c54946563 http://github.com/tenderlove/mechanize/commit/6ead88f3547fe012992460e455858ccade88d878 We did tests before and after those commits and the problem hasn't occurred since. (We were getting "can't modify frozen object" errors on nearly every run of our script.) I believe that Mechanize 0.9.3 was released around 2009/06/08 (per Gemcutter) . There has been a flurry of commits and merges in the last day or so. Any chance of an updated gem being pushed to gemcutter with all of the fixes, and new goodies? Robert Jackson On Jan 20, 2010, at 8:31 PM, Aaron Starr wrote: > > Using someone else's experience and the results of my own futzing, > I've ended up with the following monkey patch: > > http://pastie.org/787553 > > Basically, in addition to checking for http_obj.frozen? it also > checks for http_obj.ssl_context.frozen? > > It's difficult for me to tell if it's working, because it's only a > rare, intermittent problem. So far, so good. If someone is familiar > with the code, can you tell me if this is a really bad idea for some > reason? In particular, it's not clear to me what should happen if > the ssl_context is frozen. Or, why the http_obj might not be frozen > when the ssl_context is. But that's apparently what's happening. > > Aaron > > > On Thu, Jan 7, 2010 at 4:04 PM, Aaron Patterson > wrote: > On Thu, Jan 7, 2010 at 1:32 PM, Aaron Starr > wrote: > > > > I'm using mechanize with separate agents in separate threads. I > recently got > > the following error, and suspect that it may be a concurrency > issue -- i.e., > > both threads were futzing with the same object. Anyone else think > that could > > be the case? > > Separate agents should be completely separate. I think this might be > a bug specific to SSL requests. Would you mind grabbing the gem from > github and trying it out? I think it may fix this problem. > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.patterson at gmail.com Thu Jan 21 12:12:40 2010 From: aaron.patterson at gmail.com (Aaron Patterson) Date: Thu, 21 Jan 2010 09:12:40 -0800 Subject: [Mechanize-users] Concurrency issue in library? In-Reply-To: References: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> <669cc1ca1001201731t713eba74u879ff3bf2885aa49@mail.gmail.com> Message-ID: <6959e1681001210912k4cfa59d0kb635af57d91d1b64@mail.gmail.com> On Thu, Jan 21, 2010 at 4:19 AM, Robert Jackson wrote: > I have had a similar situation in a routine that we have been working on for > the last few weeks. ?Our conclusion was the same as yours although we > implemented it slightly differently. > Obviously, we didn't like monkey patching the system so we dug around a bit, > and found that the issue has already been solved upstream in the following > commits: > http://github.com/tenderlove/mechanize/commit/f2fc38873c81f95472111da30b56d99c54946563 > http://github.com/tenderlove/mechanize/commit/6ead88f3547fe012992460e455858ccade88d878 > We did tests before and after those commits and the problem hasn't occurred > since. (We were getting "can't modify frozen object"?errors on nearly every > run of our script.) > I believe that Mechanize 0.9.3 was released around 2009/06/08 (per > Gemcutter) . ?There has been a flurry of commits and merges in the last day > or so. > Any chance of an updated gem being pushed to gemcutter with all of the > fixes, and new goodies? Ya, I'm gearing up for a 1.0 release soon, so if folks want stuff in the release, send me your patches. :-) I'm not sure on the exact date, but sometime before Feb 18th. -- Aaron Patterson http://tenderlovemaking.com/ From mr.danielaquino at gmail.com Thu Jan 21 12:57:39 2010 From: mr.danielaquino at gmail.com (Daniel Aquino) Date: Thu, 21 Jan 2010 12:57:39 -0500 Subject: [Mechanize-users] EOFError content-length != page length In-Reply-To: <66f0f93e1001190925vc3954c1g7df2e2e1919a4668@mail.gmail.com> References: <66f0f93e1001121653g6c39f013sa776df229da5bae7@mail.gmail.com> <618c07251001122021m794889b1x888f2af9f23d29fb@mail.gmail.com> <66f0f93e1001122033v7c8f0a7eg2f16449a0f37d6a5@mail.gmail.com> <66f0f93e1001181126q3046c26awe99b442ba8182bd5@mail.gmail.com> <6959e1681001181138w6d4bd43dwcaafb74a58003960@mail.gmail.com> <66f0f93e1001190750w1b28070cid8deeb200583be11@mail.gmail.com> <66f0f93e1001190756p1bfd4d3fw7f8d93b9bf40ea15@mail.gmail.com> <6959e1681001190909s2a7c86cfv21e78990ae5d58ce@mail.gmail.com> <66f0f93e1001190925vc3954c1g7df2e2e1919a4668@mail.gmail.com> Message-ID: <66f0f93e1001210957j6d6b9567w25a193a48b70ec57@mail.gmail.com> any input ? On Tue, Jan 19, 2010 at 12:25 PM, Daniel Aquino wrote: > no cause i was wondering about those comments in the code if there was > a way to override it already... > > I'm not sure how you would want to add this to the interface... > > perhaps there could be a quirks flag or ignore_content_length flag ? > > On Tue, Jan 19, 2010 at 12:09 PM, Aaron Patterson > wrote: >> On Tue, Jan 19, 2010 at 7:56 AM, Daniel Aquino >> wrote: >>> my problem here is that @response.content_length() = 1142 but total = 1109.... >>> >>> again as i said this is a broken http server that I must deal with so >>> it would be nice to have a way around this type of misconfiguration so >>> the lib gracefully deals with the situation... I clearly see the >>> entire page being loaded including the final so if you >>> ask me mechanize should be good enough to handle this if it expects to >>> deal with the web ! >> >> Cool story! >> >> So do you have a patch or tests? >> >> -- >> Aaron Patterson >> http://tenderlovemaking.com/ >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > From astarr at wiredquote.com Thu Jan 21 16:48:44 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Thu, 21 Jan 2010 13:48:44 -0800 Subject: [Mechanize-users] Concurrency issue in library? In-Reply-To: References: <669cc1ca1001071332r3f02da4fy4359e1e39af92eb@mail.gmail.com> <6959e1681001071604q71e03836q206b5d0313bf37c0@mail.gmail.com> <669cc1ca1001201731t713eba74u879ff3bf2885aa49@mail.gmail.com> Message-ID: <669cc1ca1001211348n1335ce78u99aaa23caf11b86f@mail.gmail.com> Oh, thanks for the heads-up. I looked at the code on github, but I'm not especially clever, and for some reason I didn't see these updated versions of the file. I appreciate the response. Aaron On Thu, Jan 21, 2010 at 4:19 AM, Robert Jackson wrote: > I have had a similar situation in a routine that we have been working on > for the last few weeks. Our conclusion was the same as yours although we > implemented it slightly differently. > > Obviously, we didn't like monkey patching the system so we dug around a > bit, and found that the issue has already been solved upstream in the > following commits: > > > http://github.com/tenderlove/mechanize/commit/f2fc38873c81f95472111da30b56d99c54946563 > > http://github.com/tenderlove/mechanize/commit/6ead88f3547fe012992460e455858ccade88d878 > > We did tests before and after those commits and the problem hasn't occurred > since. (We were getting "can't modify frozen object" errors on nearly every > run of our script.) > > I believe that Mechanize 0.9.3 was released around 2009/06/08 (per > Gemcutter) . There has been a flurry of commits and merges in the last day > or so. > > Any chance of an updated gem being pushed to gemcutter with all of the > fixes, and new goodies? > > Robert Jackson > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nohfreakz at web.de Sat Jan 23 15:34:37 2010 From: nohfreakz at web.de (nohfreakz at web.de) Date: Sat, 23 Jan 2010 21:34:37 +0100 Subject: [Mechanize-users] Having Problems with JS-Button within a form. Message-ID: <1049471022@web.de> Hi there, i get following issue with a form. Within the form there are two buttons: ... ... ... When i get the page and choose the form, the form only has the first button ( with the js call ) in its buttons-list, so i can not submit the form. Is there any solution ? Greetz, Nils ______________________________________________________________________ Haiti-Nothilfe! Helfen Sie per SMS: Sende UIHAITI an die Nummer 81190. Von 5 Euro je SMS (zzgl. SMS-Geb?hr) gehen 4,83 Euro an UNICEF. From jeremywoertink at gmail.com Sat Jan 23 17:47:25 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Sat, 23 Jan 2010 14:47:25 -0800 Subject: [Mechanize-users] Having Problems with JS-Button within a form. In-Reply-To: <1049471022@web.de> References: <1049471022@web.de> Message-ID: <4055A932-F30B-4B9C-AB75-9AA599E41C59@gmail.com> Create the submit button and submit the form :) On Jan 23, 2010, at 12:34 PM, nohfreakz at web.de wrote: > Hi there, > > i get following issue with a form. > > Within the form there are two buttons: > > ... > id="BlaBlaBtn" class="hidden" /> > ... > > ... > > When i get the page and choose the form, the form only has the first > button ( with the js call ) in its buttons-list, so i can not > submit the form. > > Is there any solution ? > > Greetz, > Nils > ______________________________________________________________________ > Haiti-Nothilfe! Helfen Sie per SMS: Sende UIHAITI an die Nummer 81190. > Von 5 Euro je SMS (zzgl. SMS-Geb?hr) gehen 4,83 Euro an UNICEF. > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From mechanize-mail at zizee.com Sun Jan 24 18:46:44 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Mon, 25 Jan 2010 09:46:44 +1000 Subject: [Mechanize-users] Does mechanize have the ability to set a maximum download limit? Message-ID: <4B5CDBE4.7090701@zizee.com> Hi there, I would like to set an upper limit on the size of any file that mechanize will download. I have read the documentation and googled but cannot see any support for this type of functionality. Is there any way to do it? Or should I be looking at a timeout solution? Thanks, Jimmy From felipemattosinho at terra.com.br Mon Jan 25 00:29:33 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Mon, 25 Jan 2010 06:29:33 +0100 Subject: [Mechanize-users] Does Amazon.com blocks scraping? Message-ID: <20100125052944.0C83E90000090@hecla.tpn.terra.com> Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as response, but right after getting the page, I received a message that Amazon.com closed my connection, and redirects me to another place. If anyone succeeded with Amazon.com, circunventing this protection, please send me some info Regards, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize-mail at zizee.com Mon Jan 25 07:35:03 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Mon, 25 Jan 2010 22:35:03 +1000 Subject: [Mechanize-users] Does Amazon.com blocks scraping? In-Reply-To: <20100125052944.0C83E90000090@hecla.tpn.terra.com> References: <20100125052944.0C83E90000090@hecla.tpn.terra.com> Message-ID: <4B5D8FF7.80002@zizee.com> Hi Felipe, I was unable to reproduce your "Amazon closing the connection" issue. Could you perhaps post a sequence of commands from irb that can consistently reproduce it? Also, what happens if you change the user agent alias before the first connection with Amazon? If you do not experience the disconnection problem, is there any reason why changing the user agent before first contact is not a satisfactory solution? Cheers, -Jimmy Felipe Jord?o A. P. Mattosinho wrote: > > Hi there > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t > change the agent alias, it would fetch a page exactly like the normal > one, but without the intial search field(maybe a silly way to prevent > scraping). Then after it, I changed to some other alias, and submit a > search. I got the result page as response, but right after getting the > page, I received a message that Amazon.com closed my connection, and > redirects me to another place. > > If anyone succeeded with Amazon.com, circunventing this protection, > please send me some info > > Regards, > > Felipe > > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From mechanize-mail at zizee.com Mon Jan 25 16:16:21 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Tue, 26 Jan 2010 07:16:21 +1000 Subject: [Mechanize-users] Does Amazon.com blocks scraping? In-Reply-To: <20100125052944.0C83E90000090@hecla.tpn.terra.com> References: <20100125052944.0C83E90000090@hecla.tpn.terra.com> Message-ID: <4B5E0A25.8040606@zizee.com> Hi Felipe, Just had another thought, and it's probably something you've already considered so I apologise if I'm pointing out the obvious, but have you checked out Amazon's A2S (formally ECS) web services? https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html It's been a while since I played with this stuff, but if provides access to the ability to search for Amazon products and list details, reviews etc. I used to use the following gem: http://www.pluitsolutions.com/projects/amazon-ecs But I'm not sure if that is still the one to beat (there were a number of them some years ago). Of course, these services may not do what you need, but I thought it worth suggesting just in case. Cheers. Felipe Jord?o A. P. Mattosinho wrote: > > Hi there > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t > change the agent alias, it would fetch a page exactly like the normal > one, but without the intial search field(maybe a silly way to prevent > scraping). Then after it, I changed to some other alias, and submit a > search. I got the result page as response, but right after getting the > page, I received a message that Amazon.com closed my connection, and > redirects me to another place. > > If anyone succeeded with Amazon.com, circunventing this protection, > please send me some info > > Regards, > > Felipe > > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From peter at rubyrailways.com Tue Jan 26 03:24:27 2010 From: peter at rubyrailways.com (peter at rubyrailways.com) Date: Tue, 26 Jan 2010 00:24:27 -0800 Subject: [Mechanize-users] Nokogiri vs mechanize objects Message-ID: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel@webmail.rubyrailways.com> Hey all, Is it possible to 'cast' a Nokogri objects as a Mechanize one? i.e. I get back a Nokogiri Element after searching with an XPath, and now I'd like to click it (let's suppose it's an ). So something like >> agent = WWW::Mechanize.new >> ... >> agent.get('http://github.com/') >> ... >> link = agent.page.search("//p[child::strong[contains(.,'GitHub')]]/a[1]") => a href"http://help.github.com/post-receive-hooks/"web hooka >> link.click NoMethodError: undefined method `click' for web hook:Nokogiri::XML::NodeSet from (irb):8 There has to be a mechanize object which is the "alter-ego" of the Nokogiri element - how do I switch between the two? Cheers, Peter From nohfreakz at web.de Tue Jan 26 12:00:31 2010 From: nohfreakz at web.de (Nils Haldenwang) Date: Tue, 26 Jan 2010 18:00:31 +0100 Subject: [Mechanize-users] =?iso-8859-15?q?Having_Problems_with_JS-Button_?= =?iso-8859-15?q?within_a_form=2E?= Message-ID: <1054463354@web.de> Dumb question maybe, but how can i create the button myself ? :-) > Create the submit button and submit the form :) > > > > On Jan 23, 2010, at 12:34 PM, nohfreakz at web.de wrote: > > > Hi there, > > > > i get following issue with a form. > > > > Within the form there are two buttons: > > > > ... > > > id="BlaBlaBtn" class="hidden" /> > > ... > > > > ... > > > > When i get the page and choose the form, the form only has the first > > button ( with the js call ) in its buttons-list, so i can not > > submit the form. > > > > Is there any solution ? > > > > Greetz, > > Nils > > ______________________________________________________________________ > > Haiti-Nothilfe! Helfen Sie per SMS: Sende UIHAITI an die Nummer 81190. > > Von 5 Euro je SMS (zzgl. SMS-Geb?hr) gehen 4,83 Euro an UNICEF. > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users ________________________________________________________________________ Kostenlos tippen, t?glich 1 Million gewinnen: zum WEB.DE MillionenKlick! http://produkte.web.de/go/08/ From felipemattosinho at terra.com.br Tue Jan 26 12:27:02 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Tue, 26 Jan 2010 18:27:02 +0100 Subject: [Mechanize-users] Does Amazon.com block scraping? Message-ID: <20100126172708.C1BBCB0000140@bermore.tpn.terra.com> Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as response, but right after getting the page, I received a message that Amazon.com closed my connection, and redirects me to another place. If anyone succeeded with Amazon.com, circunventing this protection, please send me some info Regards, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike at csa.net Tue Jan 26 12:31:49 2010 From: mike at csa.net (Mike Dalessio) Date: Tue, 26 Jan 2010 12:31:49 -0500 Subject: [Mechanize-users] Does Amazon.com block scraping? In-Reply-To: <20100126172708.C1BBCB0000140@bermore.tpn.terra.com> References: <20100126172708.C1BBCB0000140@bermore.tpn.terra.com> Message-ID: <618c07251001260931j113ccc53y4d58b5dad52f9480@mail.gmail.com> Hello Felipe, Jimmy McGrath was kind enough to reply, twice, to your original email. (Thank you, Jimmy.) Please be kind enough to respond to his replies, and please do not resend your request when you've already received a response. Thank you for your kind cooperation, and for using Mechanize. 2010/1/26 Felipe Jord?o A. P. Mattosinho > Hi there > > > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t change > the agent alias, it would fetch a page exactly like the normal one, but > without the intial search field(maybe a silly way to prevent scraping). > Then after it, I changed to some other alias, and submit a search. I got the > result page as response, but right after getting the page, I received a > message that Amazon.com closed my connection, and redirects me to another > place. > > If anyone succeeded with Amazon.com, circunventing this protection, please > send me some info > > > > Regards, > > > > Felipe > > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Tue Jan 26 13:29:16 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Tue, 26 Jan 2010 10:29:16 -0800 Subject: [Mechanize-users] Having Problems with JS-Button within a form. In-Reply-To: <1054463354@web.de> References: <1054463354@web.de> Message-ID: If I were in front of a computer I would totally show ya, but best I can say is look up the mechanize docs. You pretty much create a new button object, and then something like page.forms.first.buttons << @my_new_button I think it's something like WWW::Mechanize::Form::Button ,but don't quote me on that :) hope that helps. On Jan 26, 2010, at 9:00 AM, Nils Haldenwang wrote: > Dumb question maybe, but how can i create the button myself ? :-) > > >> Create the submit button and submit the form :) >> >> >> >> On Jan 23, 2010, at 12:34 PM, nohfreakz at web.de wrote: >> >>> Hi there, >>> >>> i get following issue with a form. >>> >>> Within the form there are two buttons: >>> >>> ... >>> >> id="BlaBlaBtn" class="hidden" /> >>> ... >>> >>> ... >>> >>> When i get the page and choose the form, the form only has the first >>> button ( with the js call ) in its buttons-list, so i can not >>> submit the form. >>> >>> Is there any solution ? >>> >>> Greetz, >>> Nils >>> ______________________________________________________________________ >>> Haiti-Nothilfe! Helfen Sie per SMS: Sende UIHAITI an die Nummer >>> 81190. >>> Von 5 Euro je SMS (zzgl. SMS-Geb?hr) gehen 4,83 Euro an UNICEF. >>> >>> _______________________________________________ >>> Mechanize-users mailing list >>> Mechanize-users at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/mechanize-users >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > ________________________________________________________________________ > Kostenlos tippen, t?glich 1 Million gewinnen: zum WEB.DE MillionenKl > ick! > http://produkte.web.de/go/08/ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From felipemattosinho at terra.com.br Tue Jan 26 13:44:17 2010 From: felipemattosinho at terra.com.br (=?iso-8859-1?Q?Felipe_Jord=E3o_A._P._Mattosinho?=) Date: Tue, 26 Jan 2010 19:44:17 +0100 Subject: [Mechanize-users] Does Amazon.com blocks scraping? In-Reply-To: Message-ID: <20100126184420.C5F9E28000187@1p2.tpn.terra.com> Thanks Jimmy, First thanks for answering it. About the web service I know about it, however the thing is that I am doing a project in my university where I need to scrap content, or to be more specific product reviews. So I just used Amazon.com as an example, but I would never scrap something when I have a service that can give me everything I need. The thing is just to use a scraping technique!But thanks anyway for you suggestion! Well, besides that these are the commands: @@mech = WWW::Mechanize.new #HERE IF I DON?T SET THE ALIAS I RECEIVE THE SAME MAIN PAGE BUT WITHOUT THE SEARCH #FIELD @@mech.user_agent_alias = 'Mac Safari' page = @@mech.get("http://www.amazon.com") search_form = page.form("site-search") search_form["field-keywords"] = "Nikon Coolpix P90" @page = @@mech.submit(search_form, search_form.buttons.first) PP @page #HERE AFTER printing the page I confirm that I received the page I was #expecting, however on the console I get this " #ActionController::RoutingError (No route matches "/aan/2009-09-#09/static/amazon/ iframeproxy-1.html" with {:method=>:get}): " #Now when I want to get the first match in the result page from that XPATH # I receive a null object (which is strange for me since I have the content # stored on @page variable) @match = @page.search("/html/body/div[4]/div/div/div[2]/div[3]/div/div/div[3]/div/a") # HERE @match is nil, which sounds strange to me. NOW A QUESTIOON CONCERNING THE MAILGROUP ITSELF: Is there a way to receive single messages and the digest. Or if I enable the digest mode, I stop receiving single messages? Felipe -----Mensagem original----- De: mechanize-users-bounces at rubyforge.org [mailto:mechanize-users-bounces at rubyforge.org] Em nome de mechanize-users-request at rubyforge.org Enviada em: ter?a-feira, 26 de janeiro de 2010 18:40 Para: mechanize-users at rubyforge.org Assunto: Mechanize-users Digest, Vol 34, Issue 10 Send Mechanize-users mailing list submissions to mechanize-users at rubyforge.org To subscribe or unsubscribe via the World Wide Web, visit http://rubyforge.org/mailman/listinfo/mechanize-users or, via email, send a message with subject or body 'help' to mechanize-users-request at rubyforge.org You can reach the person managing the list at mechanize-users-owner at rubyforge.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Mechanize-users digest..." Today's Topics: 1. Does mechanize have the ability to set a maximum download limit? (Jimmy McGrath) 2. Does Amazon.com blocks scraping? (Felipe Jord?o A. P. Mattosinho) 3. Re: Does Amazon.com blocks scraping? (Jimmy McGrath) 4. Re: Does Amazon.com blocks scraping? (Jimmy McGrath) 5. Nokogiri vs mechanize objects (peter at rubyrailways.com) 6. Re: Having Problems with JS-Button within a form. (Nils Haldenwang) 7. Does Amazon.com block scraping? (Felipe Jord?o A. P. Mattosinho) 8. Re: Does Amazon.com block scraping? (Mike Dalessio) ---------------------------------------------------------------------- Message: 1 Date: Mon, 25 Jan 2010 09:46:44 +1000 From: Jimmy McGrath To: mechanize-users at rubyforge.org Subject: [Mechanize-users] Does mechanize have the ability to set a maximum download limit? Message-ID: <4B5CDBE4.7090701 at zizee.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hi there, I would like to set an upper limit on the size of any file that mechanize will download. I have read the documentation and googled but cannot see any support for this type of functionality. Is there any way to do it? Or should I be looking at a timeout solution? Thanks, Jimmy ------------------------------ Message: 2 Date: Mon, 25 Jan 2010 06:29:33 +0100 From: Felipe Jord?o A. P. Mattosinho To: Subject: [Mechanize-users] Does Amazon.com blocks scraping? Message-ID: <20100125052944.0C83E90000090 at hecla.tpn.terra.com> Content-Type: text/plain; charset="iso-8859-1" Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as response, but right after getting the page, I received a message that Amazon.com closed my connection, and redirects me to another place. If anyone succeeded with Amazon.com, circunventing this protection, please send me some info Regards, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 25 Jan 2010 22:35:03 +1000 From: Jimmy McGrath To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] Does Amazon.com blocks scraping? Message-ID: <4B5D8FF7.80002 at zizee.com> Content-Type: text/plain; charset=windows-1252; format=flowed Hi Felipe, I was unable to reproduce your "Amazon closing the connection" issue. Could you perhaps post a sequence of commands from irb that can consistently reproduce it? Also, what happens if you change the user agent alias before the first connection with Amazon? If you do not experience the disconnection problem, is there any reason why changing the user agent before first contact is not a satisfactory solution? Cheers, -Jimmy Felipe Jord?o A. P. Mattosinho wrote: > > Hi there > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t > change the agent alias, it would fetch a page exactly like the normal > one, but without the intial search field(maybe a silly way to prevent > scraping). Then after it, I changed to some other alias, and submit a > search. I got the result page as response, but right after getting the > page, I received a message that Amazon.com closed my connection, and > redirects me to another place. > > If anyone succeeded with Amazon.com, circunventing this protection, > please send me some info > > Regards, > > Felipe > > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users ------------------------------ Message: 4 Date: Tue, 26 Jan 2010 07:16:21 +1000 From: Jimmy McGrath To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] Does Amazon.com blocks scraping? Message-ID: <4B5E0A25.8040606 at zizee.com> Content-Type: text/plain; charset=windows-1252; format=flowed Hi Felipe, Just had another thought, and it's probably something you've already considered so I apologise if I'm pointing out the obvious, but have you checked out Amazon's A2S (formally ECS) web services? https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html It's been a while since I played with this stuff, but if provides access to the ability to search for Amazon products and list details, reviews etc. I used to use the following gem: http://www.pluitsolutions.com/projects/amazon-ecs But I'm not sure if that is still the one to beat (there were a number of them some years ago). Of course, these services may not do what you need, but I thought it worth suggesting just in case. Cheers. Felipe Jord?o A. P. Mattosinho wrote: > > Hi there > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t > change the agent alias, it would fetch a page exactly like the normal > one, but without the intial search field(maybe a silly way to prevent > scraping). Then after it, I changed to some other alias, and submit a > search. I got the result page as response, but right after getting the > page, I received a message that Amazon.com closed my connection, and > redirects me to another place. > > If anyone succeeded with Amazon.com, circunventing this protection, > please send me some info > > Regards, > > Felipe > > ------------------------------------------------------------------------ > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users ------------------------------ Message: 5 Date: Tue, 26 Jan 2010 00:24:27 -0800 From: peter at rubyrailways.com To: mechanize-users at rubyforge.org Subject: [Mechanize-users] Nokogiri vs mechanize objects Message-ID: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel at webmail.rubyrailways.com> Content-Type: text/plain;charset=iso-8859-1 Hey all, Is it possible to 'cast' a Nokogri objects as a Mechanize one? i.e. I get back a Nokogiri Element after searching with an XPath, and now I'd like to click it (let's suppose it's an ). So something like >> agent = WWW::Mechanize.new >> ... >> agent.get('http://github.com/') >> ... >> link = agent.page.search("//p[child::strong[contains(.,'GitHub')]]/a[1]") => a href"http://help.github.com/post-receive-hooks/"web hooka >> link.click NoMethodError: undefined method `click' for web hook:Nokogiri::XML::NodeSet from (irb):8 There has to be a mechanize object which is the "alter-ego" of the Nokogiri element - how do I switch between the two? Cheers, Peter ------------------------------ Message: 6 Date: Tue, 26 Jan 2010 18:00:31 +0100 From: Nils Haldenwang To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] Having Problems with JS-Button within a form. Message-ID: <1054463354 at web.de> Content-Type: text/plain; charset=iso-8859-15 Dumb question maybe, but how can i create the button myself ? :-) > Create the submit button and submit the form :) > > > > On Jan 23, 2010, at 12:34 PM, nohfreakz at web.de wrote: > > > Hi there, > > > > i get following issue with a form. > > > > Within the form there are two buttons: > > > > ... > > > id="BlaBlaBtn" class="hidden" /> > > ... > > > > ... > > > > When i get the page and choose the form, the form only has the first > > button ( with the js call ) in its buttons-list, so i can not > > submit the form. > > > > Is there any solution ? > > > > Greetz, > > Nils > > ______________________________________________________________________ > > Haiti-Nothilfe! Helfen Sie per SMS: Sende UIHAITI an die Nummer 81190. > > Von 5 Euro je SMS (zzgl. SMS-Geb?hr) gehen 4,83 Euro an UNICEF. > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users ________________________________________________________________________ Kostenlos tippen, t?glich 1 Million gewinnen: zum WEB.DE MillionenKlick! http://produkte.web.de/go/08/ ------------------------------ Message: 7 Date: Tue, 26 Jan 2010 18:27:02 +0100 From: Felipe Jord?o A. P. Mattosinho To: Subject: [Mechanize-users] Does Amazon.com block scraping? Message-ID: <20100126172708.C1BBCB0000140 at bermore.tpn.terra.com> Content-Type: text/plain; charset="iso-8859-1" Hi there Does anyone know if Amazon.com has any sort of server side script that tries to block scraping activities? I first noticed that if I didn?t change the agent alias, it would fetch a page exactly like the normal one, but without the intial search field(maybe a silly way to prevent scraping). Then after it, I changed to some other alias, and submit a search. I got the result page as response, but right after getting the page, I received a message that Amazon.com closed my connection, and redirects me to another place. If anyone succeeded with Amazon.com, circunventing this protection, please send me some info Regards, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 8 Date: Tue, 26 Jan 2010 12:31:49 -0500 From: Mike Dalessio To: Ruby Mechanize Users List Subject: Re: [Mechanize-users] Does Amazon.com block scraping? Message-ID: <618c07251001260931j113ccc53y4d58b5dad52f9480 at mail.gmail.com> Content-Type: text/plain; charset="windows-1252" Hello Felipe, Jimmy McGrath was kind enough to reply, twice, to your original email. (Thank you, Jimmy.) Please be kind enough to respond to his replies, and please do not resend your request when you've already received a response. Thank you for your kind cooperation, and for using Mechanize. 2010/1/26 Felipe Jord?o A. P. Mattosinho > Hi there > > > > Does anyone know if Amazon.com has any sort of server side script that > tries to block scraping activities? I first noticed that if I didn?t change > the agent alias, it would fetch a page exactly like the normal one, but > without the intial search field(maybe a silly way to prevent scraping). > Then after it, I changed to some other alias, and submit a search. I got the > result page as response, but right after getting the page, I received a > message that Amazon.com closed my connection, and redirects me to another > place. > > If anyone succeeded with Amazon.com, circunventing this protection, please > send me some info > > > > Regards, > > > > Felipe > > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ Mechanize-users mailing list Mechanize-users at rubyforge.org http://rubyforge.org/mailman/listinfo/mechanize-users End of Mechanize-users Digest, Vol 34, Issue 10 *********************************************** From mechanize-mail at zizee.com Tue Jan 26 19:56:41 2010 From: mechanize-mail at zizee.com (Jimmy McGrath) Date: Wed, 27 Jan 2010 10:56:41 +1000 Subject: [Mechanize-users] Does Amazon.com blocks scraping? In-Reply-To: <20100126184420.C5F9E28000187@1p2.tpn.terra.com> References: <20100126184420.C5F9E28000187@1p2.tpn.terra.com> Message-ID: <4B5F8F49.5010501@zizee.com> Apologies to the list if this is the second time this message is sent, I had an issue with my mail client and I think my original attempt failed. ------------------------------------------------------------------------ No worries Felipe, thought you would have probably been on the case already about the Amazon web services, but couldn't hurt to suggest it just in case. Now, on to your issue: You say you are getting the following on the console: > #ActionController::RoutingError (No route matches > "/aan/2009-09-#09/static/amazon/ iframeproxy-1.html" with {:method=>:get}): > This screams out Rails to me. I doubt it has anything to do with Mechanize at all (anyone with more experience with Mechanize please feel free to correct me). Are you sure you are not making a request to the rails framework somewhere? As to the command: > @match = @page.search("/html/body/div[4]/div/div/div[2]/div[3]/div/div/div[3]/div/a") If I perform the following search: @page.search("/html/body/div[4]/div") I get an empty array as a response, so that search argument is borked. I don't think that is the best way to get a link out of the page anyway as that will be very brittle. I think that you should look into the nokogiri documentation for a way to search for links with a specific css class. I can't tell you off the top of my head how to do it and I need to get back to work, so I'll have to leave you to work it out yourself. BTW: In Answer to the original question: "Does Amazon block scraping", I don't think that they are attempting to block scraping at all, more likely they don't recognise the mechanise user agent string and they get confused. It would be an interesting exercise to use the user agent switch plug-in in firefox and set it as mechanize, then see how amazon renders within firefox, but as I said, I have to get back to work. Ciao. Felipe Jord?o A. P. Mattosinho wrote: > Thanks Jimmy, > > First thanks for answering it. About the web service I know about it, > however the thing is that I am doing a project in my university where I need > to scrap content, or to be more specific product reviews. So I just used > Amazon.com as an example, but I would never scrap something when I have a > service that can give me everything I need. The thing is just to use a > scraping technique!But thanks anyway for you suggestion! > Well, besides that these are the commands: > > @@mech = WWW::Mechanize.new > > #HERE IF I DON?T SET THE ALIAS I RECEIVE THE SAME MAIN PAGE BUT WITHOUT THE > SEARCH #FIELD > > @@mech.user_agent_alias = 'Mac Safari' > > page = @@mech.get("http://www.amazon.com") > > search_form = page.form("site-search") > > search_form["field-keywords"] = "Nikon Coolpix P90" > > @page = @@mech.submit(search_form, search_form.buttons.first) > > > PP @page > > > #HERE AFTER printing the page I confirm that I received the page I was > #expecting, however on the console I get this " > #ActionController::RoutingError (No route matches > "/aan/2009-09-#09/static/amazon/ iframeproxy-1.html" with {:method=>:get}): > " > > #Now when I want to get the first match in the result page from that XPATH > # I receive a null object (which is strange for me since I have the content > # stored on @page variable) > > @match = > @page.search("/html/body/div[4]/div/div/div[2]/div[3]/div/div/div[3]/div/a") > > # HERE @match is nil, which sounds strange to me. > > NOW A QUESTIOON CONCERNING THE MAILGROUP ITSELF: Is there a way to receive > single messages and the digest. Or if I enable the digest mode, I stop > receiving single messages? > > > Felipe > > - From peter at rubyrailways.com Thu Jan 28 15:50:36 2010 From: peter at rubyrailways.com (peter at rubyrailways.com) Date: Thu, 28 Jan 2010 12:50:36 -0800 Subject: [Mechanize-users] Nokogiri vs mechanize objects In-Reply-To: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel@webmail.rubyrailways.com> References: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel@webmail.rubyrailways.com> Message-ID: I suppose getting no answer after several days means "it's not possible" - so I hacked around a bit, and now it is! >> require 'mechanize' => true >> a = WWW::Mechanize.new => ... >> a.get('http://www.google.com/ncr') => ... >> a.page.at("//input[@name='q']").fill_textfield('ruby') => ... >> a.page.at("//input[@name='btnG']").submit_form => ... >> a.page.at("//a[@class='l']") => a href"http://www.ruby-lang.org/" class"l"emRubyem Programming Languagea I am using XPaths for everything, and this way I never have to wonder any more about properly matching a form element on the page, or a button having funky text which blows up when going through iconv and other problems, so I used it all the time in celerity and missed it from mechanize. Do you think this patch has a chance to get accepted into mechanize? If yes, I'd like to discuss it with the gem maintainer (whether the solution is OK conceptually, make sure there is good test coverage etc). Cheers, Peter > Hey all, > > Is it possible to 'cast' a Nokogri objects as a Mechanize one? i.e. I get > back a Nokogiri Element after searching with an XPath, and now I'd like to > click it (let's suppose it's an ). So something like > >>> agent = WWW::Mechanize.new >>> ... >>> agent.get('http://github.com/') >>> ... >>> link = >>> agent.page.search("//p[child::strong[contains(.,'GitHub')]]/a[1]") > => a href"http://help.github.com/post-receive-hooks/"web hooka >>> link.click > NoMethodError: undefined method `click' for href="http://help.github.com/post-receive-hooks/">web > hook:Nokogiri::XML::NodeSet > from (irb):8 > > There has to be a mechanize object which is the "alter-ego" of the > Nokogiri element - how do I switch between the two? > > Cheers, > Peter > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From mike at csa.net Thu Jan 28 15:52:19 2010 From: mike at csa.net (Mike Dalessio) Date: Thu, 28 Jan 2010 15:52:19 -0500 Subject: [Mechanize-users] Nokogiri vs mechanize objects In-Reply-To: References: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel@webmail.rubyrailways.com> Message-ID: <618c07251001281252t5959faan99d9d4631d324ab2@mail.gmail.com> On Thu, Jan 28, 2010 at 3:50 PM, wrote: > I suppose getting no answer after several days means "it's not possible" - > so I hacked around a bit, and now it is! > > >> require 'mechanize' > => true > >> a = WWW::Mechanize.new > => ... > >> a.get('http://www.google.com/ncr') > => ... > >> a.page.at("//input[@name='q']").fill_textfield('ruby') > => ... > >> a.page.at("//input[@name='btnG']").submit_form > => ... > >> a.page.at("//a[@class='l']") > => a href"http://www.ruby-lang.org/" class"l"emRubyem Programming > Languagea > > I am using XPaths for everything, and this way I never have to wonder any > more about properly matching a form element on the page, or a button > having funky text which blows up when going through iconv and other > problems, so I used it all the time in celerity and missed it from > mechanize. > > Do you think this patch has a chance to get accepted into mechanize? If > yes, I'd like to discuss it with the gem maintainer (whether the solution > is OK conceptually, make sure there is good test coverage etc). > Yes, this is a patch I'd be interested in seeing. Discuss away! Do you have a github branch we can take a look at? > > Cheers, > Peter > > > Hey all, > > > > Is it possible to 'cast' a Nokogri objects as a Mechanize one? i.e. I get > > back a Nokogiri Element after searching with an XPath, and now I'd like > to > > click it (let's suppose it's an ). So something like > > > >>> agent = WWW::Mechanize.new > >>> ... > >>> agent.get('http://github.com/') > >>> ... > >>> link = > >>> agent.page.search("//p[child::strong[contains(.,'GitHub')]]/a[1]") > > => a href"http://help.github.com/post-receive-hooks/"web hooka > >>> link.click > > NoMethodError: undefined method `click' for > href="http://help.github.com/post-receive-hooks/">web > > hook:Nokogiri::XML::NodeSet > > from (irb):8 > > > > There has to be a mechanize object which is the "alter-ego" of the > > Nokogiri element - how do I switch between the two? > > > > Cheers, > > Peter > > > > > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- mike dalessio mike at csa.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at rubyrailways.com Thu Jan 28 16:32:14 2010 From: peter at rubyrailways.com (peter at rubyrailways.com) Date: Thu, 28 Jan 2010 13:32:14 -0800 Subject: [Mechanize-users] Nokogiri vs mechanize objects In-Reply-To: <618c07251001281252t5959faan99d9d4631d324ab2@mail.gmail.com> References: <9dee3ceec852ebe3fcc127361f54eaaa.squirrel@webmail.rubyrailways.com> <618c07251001281252t5959faan99d9d4631d324ab2@mail.gmail.com> Message-ID: > Yes, this is a patch I'd be interested in seeing. Discuss away! The idea is simple: - there is a hash caled nokogiri2mechanize, which maps Nokogiri::XML::Nodes to mechanize objects (this is one thing I'd like to ask - I think it's enough to map Form::Field::XXX and Page::Link?). - every time a new mechanize object we are interested in is created (i.e. Form::Field::XXX or Page::Link) an entry is added to nokogiri2mechanize - Nokogiri::XML::Node is monkey patched with #click etc. (nokogiri_utils.rb) so if one calls #click on a nokgiri Node, it's the corresponding Mech object is looked up and the message forwarded to it As simple as this sounds, the devil is in the details... since this is the first time I ever opened the mechanize source code, I am not sure about a few things: - are we really interested just in Form::Field::XXX and Page::Link and nothing more? - how/when to populate nokogiri2mechanize ? At the moment it's done in page.rb, lines 87 -> 91 by manually calling page#forms() and page#links() but this doesn't feel intuitive to me - how do I make sure nokogiri_utils has all the methods we need (this is kind of related to the first question, ie identifying all the classes we want to represent in the mapping). Since this patch is just a proof of concept, I didn't strive (well, even try) to provide all the methods that'll be needed - testing: how to test all this - write a test case for each method in nokogiri_utils.rb? What else? > Do you have a github branch we can take a look at? http://github.com/scrubber/mechanize/commit/8cf963c9d28ee09395dac7306a599cb5335ef007 As I said it's just a proof of concept (tested all the methods in nokogiri_utls.rb on several pages and it works) so obviously the code might be rough around the edges. If you are wondering why is form passed to all the buttons - it's so that I can submit the form just using the button, i.e.: a.page.at("//input[@name='btnG']").submit_form #no need to pass the form since the button knows which form it belongs to Cheers, Peter From radek.simcik at gmail.com Fri Jan 29 21:48:55 2010 From: radek.simcik at gmail.com (Radek Simcik) Date: Sat, 30 Jan 2010 13:48:55 +1100 Subject: [Mechanize-users] =?windows-1252?q?bug_or_desired_behaviour=3F_co?= =?windows-1252?q?uldn=92t_identify_form_using_string_=91post=92_bu?= =?windows-1252?q?t_=91POST=92=2E_html_contains_=91post=92_though?= Message-ID: hi the code that didn't work although the page source code says ` ` Please see more info on http://stackoverflow.com/questions/2165834/bug-or-desired-behaviour-couldnt-identify-form-using-string-post-but-post Thank you Radek `login_form = page.form_with(:method => 'post')` and code that works `login_form = page.form_with(:method => 'POST')` I inspected the form object via `puts page.forms.inspect` and got [#