From lukich at gmail.com Thu Jun 3 13:20:13 2010 From: lukich at gmail.com (Luka Stolyarov) Date: Thu, 3 Jun 2010 10:20:13 -0700 Subject: [Mechanize-users] issue submitting a form Message-ID: Hi. Recently I started rebuilding my old Mechanize script, which I used to automatically log in to a certain site and retrieve files from it. Old version worked great, however, when I did the update it started complaining. Here's the log of the error: /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:30:in `<=>': undefined method `<=>' for :Nokogiri::XML::Element (NoMethodError) from /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `sort' from /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `build_query' from /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in `request_data' from /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in `post_form' from /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in `submit' from downloader.rb:16:in `downloader' from downloader.rb:38 Basically, what I'm doing is this: require 'rubygems' require 'mechanize' agent = Mechanize.new # create agent bject page = agent.get('www.mysite.com";) #authenticate form = page.forms.first form.username = 'username' form.password = 'password' #submit form page = agent.submit form I think what's causing the error is a hidden field that the form has which looks like this: do you guys know how I can circumvent this issue? Thanks! Luka From astarr at wiredquote.com Thu Jun 3 19:13:12 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Thu, 3 Jun 2010 16:13:12 -0700 Subject: [Mechanize-users] issue submitting a form In-Reply-To: References: Message-ID: I think you have to update nokogiri, too. On Thu, Jun 3, 2010 at 10:20 AM, Luka Stolyarov wrote: > Hi. Recently I started rebuilding my old Mechanize script, which I > used to automatically log in to a certain site and retrieve files from > it. Old version worked great, however, when I did the update it > started complaining. Here's the log of the error: > > > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:30:in > `<=>': undefined method `<=>' for value="/blog/">:Nokogiri::XML::Element (NoMethodError) > from > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in > `sort' > from > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in > `build_query' > from > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in > `request_data' > from > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in > `post_form' > from > /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in > `submit' > from downloader.rb:16:in `downloader' > from downloader.rb:38 > > > Basically, what I'm doing is this: > > require 'rubygems' > require 'mechanize' > > agent = Mechanize.new > > # create agent bject > page = agent.get('www.mysite.com";) > > #authenticate > form = page.forms.first > form.username = 'username' > form.password = 'password' > > #submit form > page = agent.submit form > > I think what's causing the error is a hidden field that the form has > which looks like this: > > > > do you guys know how I can circumvent this issue? > > Thanks! > Luka > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lukich at gmail.com Thu Jun 3 23:19:16 2010 From: lukich at gmail.com (Luka Stolyarov) Date: Thu, 3 Jun 2010 20:19:16 -0700 Subject: [Mechanize-users] issue submitting a form In-Reply-To: References: Message-ID: That was painless. Thanks! On Thu, Jun 3, 2010 at 4:13 PM, Aaron Starr wrote: > > I think you have to update nokogiri, too. > > On Thu, Jun 3, 2010 at 10:20 AM, Luka Stolyarov wrote: >> >> Hi. ?Recently I started rebuilding my old Mechanize script, which I >> used to automatically log in to a certain site and retrieve files from >> it. ?Old version worked great, however, when I did the update it >> started complaining. ?Here's the log of the error: >> >> >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:30:in >> `<=>': undefined method `<=>' for > value="/blog/">:Nokogiri::XML::Element (NoMethodError) >> ? ? ? ?from >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in >> `sort' >> ? ? ? ?from >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in >> `build_query' >> ? ? ? ?from >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in >> `request_data' >> ? ? ? ?from >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in >> `post_form' >> ? ? ? ?from >> /Users/lukastolyarov/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in >> `submit' >> ? ? ? ?from downloader.rb:16:in `downloader' >> ? ? ? ?from downloader.rb:38 >> >> >> Basically, what I'm doing is this: >> >> require 'rubygems' >> require 'mechanize' >> >> ?agent = Mechanize.new >> >> ?# create agent bject >> ?page = agent.get('www.mysite.com";) >> >> ?#authenticate >> ?form = page.forms.first >> ?form.username = 'username' >> ?form.password = 'password' >> >> ?#submit form >> ?page = agent.submit form >> >> I think what's causing the error is a hidden field that the form has >> which looks like this: >> >> ? >> >> do you guys know how I can circumvent this issue? >> >> Thanks! >> Luka >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From xavi.caballe at gmail.com Thu Jun 10 16:28:50 2010 From: xavi.caballe at gmail.com (=?ISO-8859-1?Q?Xavi_Caball=E9?=) Date: Thu, 10 Jun 2010 22:28:50 +0200 Subject: [Mechanize-users] submit button param is duplicated when a form is submitted a second time Message-ID: Hello, Below I've pasted a simple example to reproduce what I say in the subject. I wonder... Is it a bug? Is there any workaround? Thanks! require 'rubygems' require 'mechanize' agent = Mechanize.new page = agent.get('http://google.com/') google_form = page.form('f') google_form.q = 'ruby mechanize' page = agent.submit(google_form, google_form.buttons.last) pp google_form.request_data page = agent.submit(google_form, google_form.buttons.last) pp google_form.request_data From astarr at wiredquote.com Thu Jun 10 20:24:24 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Thu, 10 Jun 2010 17:24:24 -0700 Subject: [Mechanize-users] submit button param is duplicated when a form is submitted a second time In-Reply-To: References: Message-ID: I think it's working as expected. You probably want to do this: require 'rubygems' require 'mechanize' agent = Mechanize.new page = agent.get('http://google.com/') google_form = page.form('f') google_form.q = 'ruby mechanize' page = agent.submit(google_form, google_form.buttons.last) + google_form = page.form('f') pp google_form.request_data page = agent.submit(google_form, google_form.buttons.last) + google_form = page.form('f') pp google_form.request_data I.e., use the new form from the new page, rather than continuing to use the old form that you added the button to. Or, if you really want to use the old form, for some reason, I think you could just do this: require 'rubygems' require 'mechanize' agent = Mechanize.new page = agent.get('http://google.com/') google_form = page.form('f') google_form.q = 'ruby mechanize' page = agent.submit(google_form, google_form.buttons.last) pp google_form.request_data - page = agent.submit(google_form, google_form.buttons.last) + page = agent.submit(google_form) pp google_form.request_data On Thu, Jun 10, 2010 at 1:28 PM, Xavi Caball? wrote: > Hello, > > Below I've pasted a simple example to reproduce what I say in the > subject. I wonder... > Is it a bug? Is there any workaround? > > Thanks! > > > require 'rubygems' > require 'mechanize' > > agent = Mechanize.new > page = agent.get('http://google.com/') > google_form = page.form('f') > google_form.q = 'ruby mechanize' > page = agent.submit(google_form, google_form.buttons.last) > pp google_form.request_data > page = agent.submit(google_form, google_form.buttons.last) > pp google_form.request_data > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavi.caballe at gmail.com Fri Jun 11 08:30:50 2010 From: xavi.caballe at gmail.com (=?ISO-8859-1?Q?Xavi_Caball=E9?=) Date: Fri, 11 Jun 2010 14:30:50 +0200 Subject: [Mechanize-users] submit button param is duplicated when a form is submitted a second time In-Reply-To: References: Message-ID: I really want to use the old form, but submit it with another button, so what you propose doesn't work for what I want (as it submits the form with the same button I used the first time). I've implemented a reset_clicked_buttons to work around this, but I think it should be supported, otherwise how do you "mechanize" a browsing sequence like... - user visits google.com - clicks on "Google Search" - goes back - now clicks on "I'm Feeling Lucky" (browsers only send this button, not the button that was clicked before, but Mechanize would send both) Thanks! On Fri, Jun 11, 2010 at 2:24 AM, Aaron Starr wrote: > > I think it's working as expected. You probably want to do this: > ?? require 'rubygems' > ?? require 'mechanize' > > ?? agent = Mechanize.new > ?? page = agent.get('http://google.com/') > ?? google_form = page.form('f') > ?? google_form.q = 'ruby mechanize' > ?? page = agent.submit(google_form, google_form.buttons.last) > + google_form = page.form('f') > ?? pp google_form.request_data > ?? page = agent.submit(google_form, google_form.buttons.last) > + google_form = page.form('f') > ?? pp google_form.request_data > > I.e., use the new form from the new page, rather than continuing to use the > old form that you added the button to. Or, if you really want to use the old > form, for some reason, I think you could just do this: > ?? require 'rubygems' > ?? require 'mechanize' > > ?? agent = Mechanize.new > ?? page = agent.get('http://google.com/') > ?? google_form = page.form('f') > ?? google_form.q = 'ruby mechanize' > ?? page = agent.submit(google_form, google_form.buttons.last) > ?? pp google_form.request_data > - ?page = agent.submit(google_form, google_form.buttons.last) > + page = agent.submit(google_form) > ?? pp google_form.request_data > > > > On Thu, Jun 10, 2010 at 1:28 PM, Xavi Caball? > wrote: >> >> Hello, >> >> Below I've pasted a simple example to reproduce what I say in the >> subject. I wonder... >> Is it a bug? Is there any workaround? >> >> Thanks! >> >> >> require 'rubygems' >> require 'mechanize' >> >> agent = Mechanize.new >> page = agent.get('http://google.com/') >> google_form = page.form('f') >> google_form.q = 'ruby mechanize' >> page = agent.submit(google_form, google_form.buttons.last) >> pp google_form.request_data >> page = agent.submit(google_form, google_form.buttons.last) >> pp google_form.request_data >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > From mechanize at chesak.com Sun Jun 13 03:14:20 2010 From: mechanize at chesak.com (joe chesak) Date: Sun, 13 Jun 2010 09:14:20 +0200 Subject: [Mechanize-users] populating & submitting a form with a 'node' Message-ID: Hello, I am trying to populate a username and password, then submit the form. I'll post the form object from mechanize and then show how I am doing. => # # #} {radiobuttons} {checkboxes #} {file_uploads} {buttons # #}> First, I have trouble setting a values on "_ctl18:txtUsername" and "_ctl18:txtPassword", maybe because of the strange characters in there. So, the way I am doing this form is like this... form.fields[1] = "Roy" form.fields[2] = "Rogers" ...after I do that the portion of the form for the login and password look like this... #(Element:0x81d732d8 { name = "input", attributes = [ #(Attr:0x81d6c244 { name = "name", value = "_ctl18:txtUsername" }), #(Attr:0x81d6c230 { name = "type", value = "text" }), #(Attr:0x81d6c21c { name = "id", value = "_ctl18_txtUsername" }), #(Attr:0x81d6c208 { name = "class", value = "Textbox_red" })] }), @value=""> #} ...It seems okay to me. I'm not sure about the 'node' part but it seems to be prepopulated. So I then submit the form... form.submit ...but submitting this gives me the following error.... >> form.submit NoMethodError: undefined method `node' for "200628":String from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:29:in `<=>' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `sort' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `build_query' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in `request_data' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in `post_form' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in `submit' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:141:in `submit' from (irb):46 from :0 Am I not assigning the username and password to the correct variables?... perh Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize at chesak.com Sun Jun 13 05:05:22 2010 From: mechanize at chesak.com (joe chesak) Date: Sun, 13 Jun 2010 11:05:22 +0200 Subject: [Mechanize-users] populating & submitting a form with a 'node' (repost. fixed an important typo) Message-ID: Hello, I am trying to populate a username and password, then submit the form. I'll post the form object from mechanize and then show how I am doing. => # # #} {radiobuttons} {checkboxes #} {file_uploads} {buttons # #}> First, I have trouble setting a values on "_ctl18:txtUsername" and "_ctl18:txtPassword", maybe because of the strange characters in there. So, the way I am doing this form is like this... form.fields[1] = "Roy" form.fields[2] = "Rogers" ...after I do that the portion of the form for the login and password look like this... #(Element:0x81d732d8 { name = "input", attributes = [ #(Attr:0x81d6c244 { name = "name", value = "_ctl18:txtUsername" }), #(Attr:0x81d6c230 { name = "type", value = "text" }), #(Attr:0x81d6c21c { name = "id", value = "_ctl18_txtUsername" }), #(Attr:0x81d6c208 { name = "class", value = "Textbox_red" })] }), @value="Roy"> #} ...It seems okay to me. I'm not sure about the 'node' part but it seems to be prepopulated. So I then submit the form... form.submit ...but submitting this gives me the following error.... >> form.submit NoMethodError: undefined method `node' for "200628":String from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:29:in `<=>' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `sort' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in `build_query' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in `request_data' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in `post_form' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in `submit' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:141:in `submit' from (irb):46 from :0 Am I not assigning the username and password to the correct variables?... perh Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize at chesak.com Tue Jun 15 06:17:56 2010 From: mechanize at chesak.com (joe chesak) Date: Tue, 15 Jun 2010 12:17:56 +0200 Subject: [Mechanize-users] following a javascript link that unhides a field ... quick workaround? Message-ID: I am using Mechanize for the first time, and have gone through a login page and a couple of links. But I am now attempting to click a javascript link. The link unhides a div that has three fields in it. And Mechanize throws an error when I hit this javascript link. Here is html of the javascript link.... Avgrens p? Revisor/Regnskapsf?rer ...which shows up like this in Mechanize.... # ...and then I click it like this and get the following error... >> agent.page.link_with(:text => /Avgrens p? Revisor.*/).click ERROR Mechanize::UnsupportedSchemeError: Mechanize::UnsupportedSchemeError from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:166:in `initialize' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/chain/uri_resolver.rb:39:in `call' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/chain/uri_resolver.rb:39:in `handle' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/chain.rb:24:in `handle' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:509:in `fetch_page' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:259:in `get' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:315:in `click' from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/page/link.rb:44:in `click' from (irb):92 from :0 ...Oh, and here's what the hidden div looks like... ... So it looks like everything I need should be here. I have to believe that there must be manual way for me to 'give' Mechanize what it needs to be able to build it's internal model with these three fields. How should I proceed? Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mechanize at chesak.com Tue Jun 15 08:02:11 2010 From: mechanize at chesak.com (joe chesak) Date: Tue, 15 Jun 2010 14:02:11 +0200 Subject: [Mechanize-users] populating & submitting a form with a 'node' (repost. fixed an important typo) In-Reply-To: References: Message-ID: Solved. Please close this thread. There are two variables in the hierarchy with the same name and I was assigning a value to the higher one, which was overwriting valuable information, such as 'node'. Joe On Sun, Jun 13, 2010 at 11:05 AM, joe chesak wrote: > Hello, > > I am trying to populate a username and password, then submit the form. > I'll post the form object from mechanize and then show how I am doing. > > > => # {name "Form1"} > {method "POST"} > {action "default.aspx"} > {fields > # @name="__VIEWSTATE", > @node= > #(Element:0x81d734a4 { > name = "input", > attributes = [ > #(Attr:0x81d70498 { name = "type", value = "hidden" }), > #(Attr:0x81d70484 { name = "name", value = "__VIEWSTATE" }), > #(Attr:0x81d70470 { name = "id", value = "__VIEWSTATE" }), > #(Attr:0x81d7045c { > name = "value", > value = > "/wEPDwUKLTk5MDMzOTQ2OA9kFgICAQ9kFgoCAQ8WAh4Fd2lkdGgFBTc2OXB4FgJmD2QWAmYPZBYCAgEPDxYCHgtOYXZpZ2F0ZVVybAUMZGVmYXVsdC5hc3B4ZGQCAw8WAh8ABQU3NjlweGQCBQ8WAh8ABQU3NjlweBYCZg9kFgICAg9kFgICAw9kFgICBw9kFhICAQ8PFgIeBFRleHQFCjEzLjA2LjIwMTBkZAIFDw8WAh8CBQg5MzfCoDEzOWRkAgkPDxYCHwIFCDM4NsKgNjk1ZGQCDg8PFgIeB1Zpc2libGVoZGQCEA8PFgIfAgUBMGRkAhMPDxYCHwNoZGQCFQ8PFgIfAgUBMGRkAhgPDxYCHwNoZGQCGg8PFgIfAgUBMGRkAgcPFgIfAAUFNzY5cHhkAgkPFgIfAAUFNzY5cHhkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYCBRBfY3RsMTg6aWJ0bkxvZ2luBRNfY3RsMTg6Y2JSZW1lbWJlck1l0mXXF4LaeFCK0sHzC+ps+KkutyE=" > })] > }), > @value= > > "/wEPDwUKLTk5MDMzOTQ2OA9kFgICAQ9kFgoCAQ8WAh4Fd2lkdGgFBTc2OXB4FgJmD2QWAmYPZBYCAgEPDxYCHgtOYXZpZ2F0ZVVybAUMZGVmYXVsdC5hc3B4ZGQCAw8WAh8ABQU3NjlweGQCBQ8WAh8ABQU3NjlweBYCZg9kFgICAg9kFgICAw9kFgICBw9kFhICAQ8PFgIeBFRleHQFCjEzLjA2LjIwMTBkZAIFDw8WAh8CBQg5MzfCoDEzOWRkAgkPDxYCHwIFCDM4NsKgNjk1ZGQCDg8PFgIeB1Zpc2libGVoZGQCEA8PFgIfAgUBMGRkAhMPDxYCHwNoZGQCFQ8PFgIfAgUBMGRkAhgPDxYCHwNoZGQCGg8PFgIfAgUBMGRkAgcPFgIfAAUFNzY5cHhkAgkPFgIfAAUFNzY5cHhkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYCBRBfY3RsMTg6aWJ0bkxvZ2luBRNfY3RsMTg6Y2JSZW1lbWJlck1l0mXXF4LaeFCK0sHzC+ps+KkutyE="> > # @name="_ctl18:txtUsername", > @node= > #(Element:0x81d732d8 { > name = "input", > attributes = [ > #(Attr:0x81d6c244 { name = "name", value = "_ctl18:txtUsername" }), > #(Attr:0x81d6c230 { name = "type", value = "text" }), > #(Attr:0x81d6c21c { name = "id", value = "_ctl18_txtUsername" }), > #(Attr:0x81d6c208 { name = "class", value = "Textbox_red" })] > }), > @value=""> > # @name="_ctl18:txtPassword", > @node= > #(Element:0x81d730d0 { > name = "input", > attributes = [ > #(Attr:0x81d682ac { name = "name", value = "_ctl18:txtPassword" }), > #(Attr:0x81d68298 { name = "type", value = "password" }), > #(Attr:0x81d68284 { name = "id", value = "_ctl18_txtPassword" }), > #(Attr:0x81d68270 { name = "class", value = "Textbox_red" })] > }), > @value="">} > {radiobuttons} > {checkboxes > # @checked=false, > @name="_ctl18:cbRememberMe", > @value=nil>} > {file_uploads} > {buttons > # @name="_ctl18:Button1", > @node= > #(Element:0x81d72ec8 { > name = "input", > attributes = [ > #(Attr:0x81d60d2c { name = "type", value = "submit" }), > #(Attr:0x81d60d18 { name = "name", value = "_ctl18:Button1" }), > #(Attr:0x81d60d04 { name = "value", value = "Logg inn" }), > #(Attr:0x81d60cf0 { name = "id", value = "_ctl18_Button1" }), > #(Attr:0x81d60cdc { name = "class", value = "LoginButton" }), > #(Attr:0x81d60cc8 { > name = "onmouseover", > value = "style.color='#a3a3a3'" > }), > #(Attr:0x81d60cb4 { > name = "onmouseout", > value = "style.color='White'" > })] > }), > @value="Logg inn"> > # @name="_ctl18:ibtnLogin", > @node= > #(Element:0x81d72c98 { > name = "input", > attributes = [ > #(Attr:0x81d5a8dc { name = "type", value = "image" }), > #(Attr:0x81d5a8c8 { name = "name", value = "_ctl18:ibtnLogin" }), > #(Attr:0x81d5a8b4 { name = "id", value = "_ctl18_ibtnLogin" }), > #(Attr:0x81d5a8a0 { name = "src", value = "images/right_arrow.GIF" > }), > #(Attr:0x81d5a88c { name = "border", value = "0" })] > }), > @value=nil, > @x=nil, > @y=nil>}> > > > First, I have trouble setting a values on "_ctl18:txtUsername" and > "_ctl18:txtPassword", maybe because of the strange characters in there. So, > the way I am doing this form is like this... > > form.fields[1] = "Roy" > form.fields[2] = "Rogers" > > ...after I do that the portion of the form for the login and password look > like this... > > #(Element:0x81d732d8 { > name = "input", > attributes = [ > #(Attr:0x81d6c244 { name = "name", value = "_ctl18:txtUsername" }), > #(Attr:0x81d6c230 { name = "type", value = "text" }), > #(Attr:0x81d6c21c { name = "id", value = "_ctl18_txtUsername" }), > #(Attr:0x81d6c208 { name = "class", value = "Textbox_red" })] > }), > @value="Roy"> > # @name="_ctl18:txtPassword", > @node= > #(Element:0x81d730d0 { > name = "input", > attributes = [ > #(Attr:0x81d682ac { name = "name", value = "_ctl18:txtPassword" }), > #(Attr:0x81d68298 { name = "type", value = "password" }), > #(Attr:0x81d68284 { name = "id", value = "_ctl18_txtPassword" }), > #(Attr:0x81d68270 { name = "class", value = "Textbox_red" })] > }), > @value="Rogers">} > > ...It seems okay to me. I'm not sure about the 'node' part but it seems to > be prepopulated. So I then submit the form... > > > form.submit > > ...but submitting this gives me the following error.... > > >> form.submit > NoMethodError: undefined method `node' for "200628":String > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form/field.rb:29:in > `<=>' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in > `sort' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:171:in > `build_query' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:221:in > `request_data' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:452:in > `post_form' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:370:in > `submit' > from > /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/form.rb:141:in > `submit' > from (irb):46 > from :0 > > > Am I not assigning the username and password to the correct variables?... > perh > > > Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremywoertink at gmail.com Tue Jun 15 10:48:33 2010 From: jeremywoertink at gmail.com (jeremywoertink at gmail.com) Date: Tue, 15 Jun 2010 07:48:33 -0700 Subject: [Mechanize-users] following a javascript link that unhides a field ... quick workaround? In-Reply-To: References: Message-ID: Mechanize can't parse JavaScript. Look into using Harmony. Oh, and good luck! On Jun 15, 2010, at 3:17 AM, joe chesak wrote: > I am using Mechanize for the first time, and have gone through a > login page and a couple of links. But I am now attempting to click > a javascript link. The link unhides a div that has three fields in > it. And Mechanize throws an error when I hit this javascript link. > > > Here is html of the javascript link.... > > > onclick="if ( document.getElementById('_ctl20_hiddenRevisor').value > == 'none') { document.getElementById('_ctl20_DivRevisor').style > ['display'] = ''; document.getElementById > ('_ctl20_hiddenRevisor').value = ''; } else { document.getElementById > ('_ctl20_DivRevisor').style['display'] = 'none'; > document.getElementById('_ctl20_hiddenRevisor').value = 'none'; }" > href="javascript:;">Avgrens p? Revisor/Regnskapsf?rer ="_ctl20:hiddenRevisor" type="hidden" id="_ctl20_hiddenRevisor" valu > e="none" /> > > > ...which shows up like this in Mechanize.... > > > # ipt:;"> > > > ...and then I click it like this and get the following error... > > > >> agent.page.link_with(:text => /Avgrens p? Revisor.*/).click > > ERROR > Mechanize::UnsupportedSchemeError: Mechanize::UnsupportedSchemeError > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/ > mechanize.rb:166:in `initialize' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/ > chain/uri_resolver.rb:39:in `call' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/ > chain/uri_resolver.rb:39:in `handle' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/ > chain.rb:24:in `handle' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/ > mechanize.rb:509:in `fetch_page' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/ > mechanize.rb:259:in `get' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/ > mechanize.rb:315:in `click' > from /Users/joe/.gem/ruby/1.8/gems/mechanize-1.0.0/lib/mechanize/ > page/link.rb:44:in `click' > from (irb):92 > from :0 > > > ...Oh, and here's what the hidden div looks like... > > > > > > ... So it looks like everything I need should be here. I have to > believe that there must be manual way for me to 'give' Mechanize > what it needs to be able to build it's internal model with these > three fields. How should I proceed? > > Joe > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From andrelimasilva at gmail.com Wed Jun 16 08:46:20 2010 From: andrelimasilva at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lima?=) Date: Wed, 16 Jun 2010 09:46:20 -0300 Subject: [Mechanize-users] CAPTCHA Code Message-ID: Hello, Is possible detect CAPTCHA code with mechanize or any other gem? Thanks! Andr? Lima -------------- next part -------------- An HTML attachment was scrubbed... URL: From apoc at sixserv.org Thu Jun 17 01:29:02 2010 From: apoc at sixserv.org (Matthias -apoc- Hecker) Date: Thu, 17 Jun 2010 07:29:02 +0200 Subject: [Mechanize-users] CAPTCHA Code In-Reply-To: References: Message-ID: <4C19B29E.6070600@sixserv.org> Hello list, Andr? Lima! Well I do not know any gems in specific for this and it isn't really a Mechanize related topic, but I've had dealt with CAPTCHA 'problems' before so... The question is, how difficult the CAPTCHA is, you want to solve. Sometimes its enough to just enhance the image somehow and then use a OCR engine to detect it. Arguably the best OCR engine is Finereader, but for simple CAPTCHAs gocr should suffice. You can spent a lot of time/resources on engineering and OCR training of a specific CAPTCHA and breaking and generating CAPTCHAs is a constant topic in the field of computer science / computer vision (there many papers about these topics). There are however very good CAPTCHAs out there that you simply cannot break automatically (like Recaptcha), they are mostly focusing on making the seperation process (seperating characters from eachother) as hard as possible. The only chance you have to automate the detection process, is to use human resources, there some services like deCaptcher.com that provide human captcha recognition for 2$ per 1000 detected CAPTCHAs, that may a solution for some cases. That's just some brief glimpse on this topic, without really knowing what you're up to. Matthias On 06/16/2010 02:46 PM, Andr? Lima wrote: > Hello, > > Is possible detect CAPTCHA code with mechanize or any other gem? > > Thanks! > > Andr? Lima > > > > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From andrelimasilva at gmail.com Fri Jun 18 11:01:42 2010 From: andrelimasilva at gmail.com (=?ISO-8859-1?Q?Andr=E9_Lima?=) Date: Fri, 18 Jun 2010 12:01:42 -0300 Subject: [Mechanize-users] CAPTCHA Code In-Reply-To: <4C19B29E.6070600@sixserv.org> References: <4C19B29E.6070600@sixserv.org> Message-ID: Many Thanks, I will read more about OCR! Andre Lima On Thu, Jun 17, 2010 at 2:29 AM, Matthias -apoc- Hecker wrote: > Hello list, Andr? Lima! > > Well I do not know any gems in specific for this and it isn't really a > Mechanize related topic, but I've had dealt with CAPTCHA 'problems' before > so... > > The question is, how difficult the CAPTCHA is, you want to solve. Sometimes > its enough to just enhance the image somehow and then use a OCR engine to > detect it. Arguably the best OCR engine is Finereader, but for simple > CAPTCHAs gocr should suffice. You can spent a lot of time/resources on > engineering and OCR training of a specific CAPTCHA and breaking and > generating CAPTCHAs is a constant topic in the field of computer science / > computer vision (there many papers about these topics). > > There are however very good CAPTCHAs out there that you simply cannot break > automatically (like Recaptcha), they are mostly focusing on making the > seperation process (seperating characters from eachother) as hard as > possible. The only chance you have to automate the detection process, is to > use human resources, there some services like deCaptcher.com that provide > human captcha recognition for 2$ per 1000 detected CAPTCHAs, that may a > solution for some cases. > > That's just some brief glimpse on this topic, without really knowing what > you're up to. > > Matthias > > > On 06/16/2010 02:46 PM, Andr? Lima wrote: > >> Hello, >> >> Is possible detect CAPTCHA code with mechanize or any other gem ? >> >> Thanks! >> >> Andr? Lima >> >> >> >> >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users >> > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- Atenciosamente, Andr? Lima -------------- next part -------------- An HTML attachment was scrubbed... URL: From soyoung.shin at socrata.com Wed Jun 23 13:11:48 2010 From: soyoung.shin at socrata.com (Soyoung Shin) Date: Wed, 23 Jun 2010 10:11:48 -0700 Subject: [Mechanize-users] gathering all links and transforming them to absolute urls Message-ID: <85A3D2FA-3160-40AC-B41D-07F1A8B65776@socrata.com> I'm trying to look at a page, and get all the urls for the links as absolute urls. I searched briefly, and couldn't find any answers for the ruby version. Is this kind of operation supported? or do I have to do it by hand. #what I tried, this doesn't work :( !!! uris = [] page.links.each {|x| y.push(x.uri()) } # or page.links.each {|x| y.push(x.href) } Thanks! Soyoung From soyoung.shin at socrata.com Fri Jun 25 18:53:09 2010 From: soyoung.shin at socrata.com (Soyoung Shin) Date: Fri, 25 Jun 2010 15:53:09 -0700 Subject: [Mechanize-users] determining whether a link might be a file? Message-ID: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> Hi again. I figured out how to use ruby and solve that last problem :) On another note, I'm trying to build a crawler that will generally avoid hitting (but maybe still get the url for) non-html (downloadable) files like csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or .xml, but when there are intermediate redirects, it can be difficult. For example, this was linked from cnet http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 which redirects to http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg which downloads a dmg for virtual dj. has anyone got a solution to this? Thanks Soyoung From apoc at sixserv.org Fri Jun 25 20:51:08 2010 From: apoc at sixserv.org (Matthias -apoc- Hecker) Date: Sat, 26 Jun 2010 02:51:08 +0200 Subject: [Mechanize-users] determining whether a link might be a file? In-Reply-To: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> References: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> Message-ID: <4C254EFC.3050109@sixserv.org> Hello list, Soyoung Shin, Well you could deactivate automated redirection and fetch the location header: agent.redirect_ok = false page = agent.get 'http://example.com/' puts page.header['location'] But I guess thats not really more practical, you cannot really use the url to determine the file content-type (There can be mod_rewrite rules etc). Another possible solution would be to sent a HEAD request first to determine the content-type/content-disposition response header: page = agent.head 'http://example.com/' puts page.header['content-type'] That would work. I think in theory it should be possible to sent a normal GET request, then retrieve the response header and *then* decide to proceed or stop (if it isn't text/html for instance). However I don't think that this is possible with Mechanize, the post_connect_hook is triggered after the file is downloaded so that ain't an option. I hope this helps a little. Matthias Soyoung Shin wrote: > Hi again. I figured out how to use ruby and solve that last problem :) > > On another note, I'm trying to build a crawler that will generally avoid hitting (but maybe still get the url for) non-html (downloadable) files like csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or .xml, but when there are intermediate redirects, it can be difficult. For example, this was linked from cnet > > http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 > > which redirects to > > http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg > > which downloads a dmg for virtual dj. has anyone got a solution to this? > > Thanks > Soyoung > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -- (a) (p)roof (o)f (c)oncept .. http://apoc.sixserv.org/ From miha.ploha at gmail.com Sat Jun 26 01:44:22 2010 From: miha.ploha at gmail.com (Mihael) Date: Sat, 26 Jun 2010 07:44:22 +0200 Subject: [Mechanize-users] determining whether a link might be a file? In-Reply-To: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> References: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> Message-ID: <42EAF109-846E-472E-ADD3-63B62501D5DD@gmail.com> Hey, maybe u could use something like this: head = a.head(img_url) content_type = head.response["content-type"] if head.kind_of?(WWW::Mechanize::File) && (content_type =~ /image/) image = a.get(img_url) filename = img_url.split('/').last path = @temp_path/filename image.save_as(path) asset = Asset.new(:original_url => img_url, :mayo_id => @source.id, :uploaded_data=> ActionController::TestUploadedFile.new(path, content_type)) asset.save File.delete(path) #cleanup else warn "skipped image url: #{img_url} !!! not an image url" return nil #this is not an image url end On Jun 26, 2010, at 12:53 AM, Soyoung Shin wrote: > Hi again. I figured out how to use ruby and solve that last problem :) > > On another note, I'm trying to build a crawler that will generally avoid hitting (but maybe still get the url for) non-html (downloadable) files like csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or .xml, but when there are intermediate redirects, it can be difficult. For example, this was linked from cnet > > http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 > > which redirects to > > http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg > > which downloads a dmg for virtual dj. has anyone got a solution to this? > > Thanks > Soyoung > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From ross.fairbanks at gmail.com Sun Jun 27 03:01:11 2010 From: ross.fairbanks at gmail.com (Ross Fairbanks) Date: Sun, 27 Jun 2010 08:01:11 +0100 Subject: [Mechanize-users] Quoted strings in cookies Message-ID: Attached is a unit test that checks whether strings with quotes are being preserved when they're received in cookies. Currently this test is failing because quotes are being stripped as per http://github.com/tenderlove/mechanize/issues#issue/33 Hopefully this is useful. I'm new to Git so let me know if there are any problems with the patch. Thanks Ross -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0364-Added-unit-test-for-preserving-quotes-in-strings-whe.patch Type: application/octet-stream Size: 2215 bytes Desc: not available URL: From naren.salem at gmail.com Sun Jun 27 19:38:02 2010 From: naren.salem at gmail.com (Naren) Date: Sun, 27 Jun 2010 23:38:02 +0000 (UTC) Subject: [Mechanize-users] issue submitting a form References: Message-ID: Saved me hours! Thanks. From soyoung.shin at socrata.com Mon Jun 28 12:54:38 2010 From: soyoung.shin at socrata.com (Soyoung Shin) Date: Mon, 28 Jun 2010 09:54:38 -0700 Subject: [Mechanize-users] determining whether a link might be a file? In-Reply-To: <42EAF109-846E-472E-ADD3-63B62501D5DD@gmail.com> References: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> <42EAF109-846E-472E-ADD3-63B62501D5DD@gmail.com> Message-ID: <3F38E13F-22E3-45C6-AC90-C51BEB66DF6D@socrata.com> That works, but unfortunately it still downloads the entire file before inspecting the headers. I think at this point, it seems like a better option will be to use a mixture of curl/wget + mechanize. headers = `curl --head url` if headers.include? "301 something something" # inspect the redirect url for suffixes like .jpeg end # continue as normal ~Soyoung On Jun 25, 2010, at 10:44 PM, Mihael wrote: : > Hey, maybe u could use something like this: > > head = a.head(img_url) > content_type = head.response["content-type"] > if head.kind_of?(WWW::Mechanize::File) && (content_type =~ /image/) > image = a.get(img_url) > filename = img_url.split('/').last > path = @temp_path/filename > image.save_as(path) > asset = Asset.new(:original_url => img_url, :mayo_id => @source.id, :uploaded_data=> ActionController::TestUploadedFile.new(path, content_type)) > asset.save > File.delete(path) #cleanup > else > warn "skipped image url: #{img_url} !!! not an image url" > return nil #this is not an image url > end > > On Jun 26, 2010, at 12:53 AM, Soyoung Shin wrote: > >> Hi again. I figured out how to use ruby and solve that last problem :) >> >> On another note, I'm trying to build a crawler that will generally avoid hitting (but maybe still get the url for) non-html (downloadable) files like csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or .xml, but when there are intermediate redirects, it can be difficult. For example, this was linked from cnet >> >> http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 >> >> which redirects to >> >> http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg >> >> which downloads a dmg for virtual dj. has anyone got a solution to this? >> >> Thanks >> Soyoung >> _______________________________________________ >> Mechanize-users mailing list >> Mechanize-users at rubyforge.org >> http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users From astarr at wiredquote.com Mon Jun 28 13:07:12 2010 From: astarr at wiredquote.com (Aaron Starr) Date: Mon, 28 Jun 2010 10:07:12 -0700 Subject: [Mechanize-users] determining whether a link might be a file? In-Reply-To: <3F38E13F-22E3-45C6-AC90-C51BEB66DF6D@socrata.com> References: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> <42EAF109-846E-472E-ADD3-63B62501D5DD@gmail.com> <3F38E13F-22E3-45C6-AC90-C51BEB66DF6D@socrata.com> Message-ID: Seems like, in that case, you should just take Matthias' suggestion: agent.redirect_ok = false page = agent.get url # inspect page.header['location'] for suffixes like .jpeg On Mon, Jun 28, 2010 at 9:54 AM, Soyoung Shin wrote: > That works, but unfortunately it still downloads the entire file before > inspecting the headers. I think at this point, it seems like a better option > will be to use a mixture of curl/wget + mechanize. > > headers = `curl --head url` > if headers.include? "301 something something" > # inspect the redirect url for suffixes like .jpeg > end > # continue as normal > > ~Soyoung > > On Jun 25, 2010, at 10:44 PM, Mihael wrote: > : > > Hey, maybe u could use something like this: > > > > head = a.head(img_url) > > content_type = head.response["content-type"] > > if head.kind_of?(WWW::Mechanize::File) && (content_type =~ > /image/) > > image = a.get(img_url) > > filename = img_url.split('/').last > > path = @temp_path/filename > > image.save_as(path) > > asset = Asset.new(:original_url => img_url, :mayo_id => @ > source.id, :uploaded_data=> ActionController::TestUploadedFile.new(path, > content_type)) > > asset.save > > File.delete(path) #cleanup > > else > > warn "skipped image url: #{img_url} !!! not an image url" > > return nil #this is not an image url > > end > > > > On Jun 26, 2010, at 12:53 AM, Soyoung Shin wrote: > > > >> Hi again. I figured out how to use ruby and solve that last problem :) > >> > >> On another note, I'm trying to build a crawler that will generally avoid > hitting (but maybe still get the url for) non-html (downloadable) files like > csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or > .xml, but when there are intermediate redirects, it can be difficult. For > example, this was linked from cnet > >> > >> > http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 > >> > >> which redirects to > >> > >> > http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg > >> > >> which downloads a dmg for virtual dj. has anyone got a solution to this? > >> > >> Thanks > >> Soyoung > >> _______________________________________________ > >> Mechanize-users mailing list > >> Mechanize-users at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From soyoung.shin at socrata.com Mon Jun 28 13:17:26 2010 From: soyoung.shin at socrata.com (Soyoung Shin) Date: Mon, 28 Jun 2010 10:17:26 -0700 Subject: [Mechanize-users] determining whether a link might be a file? In-Reply-To: References: <9A5BC441-7598-4A68-B856-9AD2221407EF@socrata.com> <42EAF109-846E-472E-ADD3-63B62501D5DD@gmail.com> <3F38E13F-22E3-45C6-AC90-C51BEB66DF6D@socrata.com> Message-ID: <272578B1-466B-4E61-993D-38671FC0278B@socrata.com> ah, duh. thanks! :3 On Jun 28, 2010, at 10:07 AM, Aaron Starr wrote: > > Seems like, in that case, you should just take Matthias' suggestion: > > agent.redirect_ok = false > page = agent.get url > # inspect page.header['location'] for suffixes like .jpeg > > > On Mon, Jun 28, 2010 at 9:54 AM, Soyoung Shin wrote: > That works, but unfortunately it still downloads the entire file before inspecting the headers. I think at this point, it seems like a better option will be to use a mixture of curl/wget + mechanize. > > headers = `curl --head url` > if headers.include? "301 something something" > # inspect the redirect url for suffixes like .jpeg > end > # continue as normal > > ~Soyoung > > On Jun 25, 2010, at 10:44 PM, Mihael wrote: > : > > Hey, maybe u could use something like this: > > > > head = a.head(img_url) > > content_type = head.response["content-type"] > > if head.kind_of?(WWW::Mechanize::File) && (content_type =~ /image/) > > image = a.get(img_url) > > filename = img_url.split('/').last > > path = @temp_path/filename > > image.save_as(path) > > asset = Asset.new(:original_url => img_url, :mayo_id => @source.id, :uploaded_data=> ActionController::TestUploadedFile.new(path, content_type)) > > asset.save > > File.delete(path) #cleanup > > else > > warn "skipped image url: #{img_url} !!! not an image url" > > return nil #this is not an image url > > end > > > > On Jun 26, 2010, at 12:53 AM, Soyoung Shin wrote: > > > >> Hi again. I figured out how to use ruby and solve that last problem :) > >> > >> On another note, I'm trying to build a crawler that will generally avoid hitting (but maybe still get the url for) non-html (downloadable) files like csv, xml, exe, etc. It's simple enough to avoid links that end in .csv or .xml, but when there are intermediate redirects, it can be difficult. For example, this was linked from cnet > >> > >> http://dw.com.com/redir?edId=3&siteId=4&oId=3000-18502_4-10976868&ontId=18502_4&spi=a10bbf0aa9a8a3315fe085cb27966826&lop=link&tag=tdw_dltext<ype=dl_dlnow&pid=11422203&mfgId=74349&merId=74349&pguid=kDCrwgoPjAYAAFAMm8gAAADn&destUrl=http%3A%2F%2Fdownload.cnet.com%2F3001-18502_4-10976868.html%3Fspi%3Da10bbf0aa9a8a3315fe085cb27966826 > >> > >> which redirects to > >> > >> http://software-files-l.cnet.com/s/software/11/42/22/03/install_virtualdj_trial_v6.1.dmg?e=1277527830&h=1aca74c88927f3f981bfb5d756764454&lop=link&ptype=1901&ontid=18502&siteId=4&edId=3&spi=a10bbf0aa9a8a3315fe085cb27966826&pid=11422203&psid=10976868&fileName=install_virtualdj_trial_v6.1.dmg > >> > >> which downloads a dmg for virtual dj. has anyone got a solution to this? > >> > >> Thanks > >> Soyoung > >> _______________________________________________ > >> Mechanize-users mailing list > >> Mechanize-users at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/mechanize-users > > > > _______________________________________________ > > Mechanize-users mailing list > > Mechanize-users at rubyforge.org > > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From barjunk at attglobal.net Tue Jun 29 23:23:25 2010 From: barjunk at attglobal.net (barsalou) Date: Tue, 29 Jun 2010 19:23:25 -0800 Subject: [Mechanize-users] post an XML page Message-ID: <20100629192325.343wpynhcgcs0ccw@lcgalaska.com> I'm working with a site that requires me to post an XML page (WCTP formatted) to the site, then it will return some code for success or not. my current thought is: create a page with XML stuff post it manage response My main problem is that most of my use of Mechanize was to get the page then process info. I'm not very clear how I'd create a page from an unpopulated or "new" page object. Is using Mechanize just a bad idea for my case? If so, what is a better alternative? Thanks for any ideas or guidance. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From barjunk at attglobal.net Wed Jun 30 00:21:23 2010 From: barjunk at attglobal.net (barsalou) Date: Tue, 29 Jun 2010 20:21:23 -0800 Subject: [Mechanize-users] post an XML page In-Reply-To: <20100629192325.343wpynhcgcs0ccw@lcgalaska.com> References: <20100629192325.343wpynhcgcs0ccw@lcgalaska.com> Message-ID: <20100629202123.ews8urg1kwgc4cs0@lcgalaska.com> Quoting barsalou : > I'm working with a site that requires me to post an XML page (WCTP > formatted) to the site, then it will return some code for success or > not. > > my current thought is: > > create a page with XML stuff > post it > manage response > > My main problem is that most of my use of Mechanize was to get the page > then process info. > > I'm not very clear how I'd create a page from an unpopulated or "new" > page object. > Nevermind...I figured this out and it was fairly strait forward: agent = Mechanize.new string = "my xml stuff here" page = agent.post(myurl,string) Please let me know if there is a "nicer" way to do this. Mike B. ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.