From maku at makuchaku.info Fri Sep 14 01:31:19 2007 From: maku at makuchaku.info (=?UTF-8?Q?=E0=A4=AE=E0=A4=AF=E0=A4=82=E0=A4=95_?= =?UTF-8?Q?=E0=A4=9C=E0=A5=88=E0=A4=A8_(?= =?UTF-8?Q?makuchaku)?=) Date: Fri, 14 Sep 2007 11:01:19 +0530 Subject: [Mechanize-users] Unable to scrap gmail.com - EOFError: End of file reached Message-ID: <185269960709132231r5455275eif178633ae753d9f7@mail.gmail.com> Hi all, I am so excited to use mechanize! It has opened a whole new world of projects for me :) I am trying to login into the Gmail.com server, as described in http://schf.uc.org/articles/2007/02/14/scraping-gmail-with-mechanize-and-hpricot but am running into a few issues... irb(main):010:0> page = agent.submit form EOFError: end of file reached from /usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread' from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill' from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout' from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout' from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill' from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil' from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline' from /usr/lib/ruby/1.8/net/http.rb:2017:in `read_status_line' from /usr/lib/ruby/1.8/net/http.rb:2006:in `read_new' from /usr/lib/ruby/1.8/net/http.rb:1047:in `request' from /var/lib/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:514:in `fetch_page' from /var/lib/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:428:in `post_form' from /var/lib/gems/1.8/gems/mechanize-0.6.10/lib/mechanize.rb:247:in `submit' from (irb):10 from :0 irb(main):011:0> When I do a submit on the form, I get this huge backtrace. It would be great if anyone can help me out with this problem - some pointers on what I am doing wrong - would be great. I would surely find a solution :) I have the following installed on my Ubuntu box Ruby - 1.8.5 hpricot - 0.6 mechanize - 0.6.10 libopenssl-ruby 1.0.0 httpclient 2.1.1 Thanks in advance for the help :) Regards, Maku http://www.makuchaku.info/blog From maku at makuchaku.info Sat Sep 15 04:12:16 2007 From: maku at makuchaku.info (=?UTF-8?Q?=E0=A4=AE=E0=A4=AF=E0=A4=82=E0=A4=95_?= =?UTF-8?Q?=E0=A4=9C=E0=A5=88=E0=A4=A8_(?= =?UTF-8?Q?makuchaku)?=) Date: Sat, 15 Sep 2007 13:42:16 +0530 Subject: [Mechanize-users] Unable to scrap gmail.com - EOFError: End of file reached In-Reply-To: <185269960709132231r5455275eif178633ae753d9f7@mail.gmail.com> References: <185269960709132231r5455275eif178633ae753d9f7@mail.gmail.com> Message-ID: <185269960709150112y41edbbaas42af97e3c1f0a857@mail.gmail.com> On 9/14/07, ???? ??? (makuchaku) wrote: > Hi all, > > I am so excited to use mechanize! It has opened a whole new world of > projects for me :) > > I am trying to login into the Gmail.com server, as described in > http://schf.uc.org/articles/2007/02/14/scraping-gmail-with-mechanize-and-hpricot > but am running into a few issues... > > irb(main):010:0> page = agent.submit form > EOFError: end of file reached Lots of googling revealed that its a known filed bug. http://rubyforge.org/tracker/index.php?func=detail&aid=13319&group_id=1453&atid=5709 The bug report also suggests a solution - which rocks! Thanks 'orangechicken' :-) -- Maku http://www.makuchaku.info/blog From shane at digitalsanctum.com Thu Sep 20 19:04:15 2007 From: shane at digitalsanctum.com (Shane Witbeck) Date: Thu, 20 Sep 2007 19:04:15 -0400 Subject: [Mechanize-users] issues submitting a search form Message-ID: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> Hello to the list and thanks to Aaron for the cool software. I've been fooling around with Mechanize and Hpricot for a couple of days and from the docs I've read, the following code SHOULD work but doesn't. I've tried the same code on a couple of different sites and I get the same exception for each. Any pointers or suggestions are appreciated. http://pastie.caboo.se/99216 -- Thank you, Shane Witbeck Digital Sanctum, inc. ----------------------------------------------------- skype: digitalsanctum blog: http://www.digitalsanctum.com From aaron at tenderlovemaking.com Thu Sep 20 19:15:54 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Thu, 20 Sep 2007 16:15:54 -0700 Subject: [Mechanize-users] issues submitting a search form In-Reply-To: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> References: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> Message-ID: <20070920231554.GB2096@mac-mini.lan> Hey Shane, On Thu, Sep 20, 2007 at 07:04:15PM -0400, Shane Witbeck wrote: > Hello to the list and thanks to Aaron for the cool software. > > I've been fooling around with Mechanize and Hpricot for a couple of > days and from the docs I've read, the following code SHOULD work but > doesn't. I've tried the same code on a couple of different sites and I > get the same exception for each. Any pointers or suggestions are > appreciated. > > http://pastie.caboo.se/99216 The problem is that you're setting the values through hpricot, and not through mechanize. Mechanize doesn't know to reparse the page after you've modified it. I actually think that is a cool way to use HPricot, and I'm going to support it automagically in the future. However, for now, you'll need to tell the page to reparse the page. Here is you're script with the lines I changed highlighted: http://pastie.caboo.se/99219 Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From shane at digitalsanctum.com Thu Sep 20 19:17:39 2007 From: shane at digitalsanctum.com (Shane Witbeck) Date: Thu, 20 Sep 2007 19:17:39 -0400 Subject: [Mechanize-users] issues submitting a search form In-Reply-To: <20070920231554.GB2096@mac-mini.lan> References: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> <20070920231554.GB2096@mac-mini.lan> Message-ID: <153af7a10709201617n74ba0793q1e065ffa431b6b07@mail.gmail.com> Aaron, you tha man. Thanks for the 12 minute response time. Shane On 9/20/07, Aaron Patterson wrote: > Hey Shane, > > On Thu, Sep 20, 2007 at 07:04:15PM -0400, Shane Witbeck wrote: > > Hello to the list and thanks to Aaron for the cool software. > > > > I've been fooling around with Mechanize and Hpricot for a couple of > > days and from the docs I've read, the following code SHOULD work but > > doesn't. I've tried the same code on a couple of different sites and I > > get the same exception for each. Any pointers or suggestions are > > appreciated. > > > > http://pastie.caboo.se/99216 > > The problem is that you're setting the values through hpricot, and not > through mechanize. Mechanize doesn't know to reparse the page after > you've modified it. I actually think that is a cool way to use HPricot, > and I'm going to support it automagically in the future. > > However, for now, you'll need to tell the page to reparse the page. > > Here is you're script with the lines I changed highlighted: > > http://pastie.caboo.se/99219 > > Hope that helps! > > -- > Aaron Patterson > http://tenderlovemaking.com/ > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > -- Thank you, Shane Witbeck Digital Sanctum, inc. ----------------------------------------------------- skype: digitalsanctum blog: http://www.digitalsanctum.com From barjunk at attglobal.net Thu Sep 20 19:25:36 2007 From: barjunk at attglobal.net (barsalou) Date: Thu, 20 Sep 2007 15:25:36 -0800 Subject: [Mechanize-users] issues submitting a search form In-Reply-To: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> References: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> Message-ID: <20070920152536.r7z0v22yuco0ck04@lcgalaska.com> Could you provide the exception you are getting as well? Mike B. Quoting Shane Witbeck : > Hello to the list and thanks to Aaron for the cool software. > > I've been fooling around with Mechanize and Hpricot for a couple of > days and from the docs I've read, the following code SHOULD work but > doesn't. I've tried the same code on a couple of different sites and I > get the same exception for each. Any pointers or suggestions are > appreciated. > > http://pastie.caboo.se/99216 > > -- > Thank you, > > Shane Witbeck > Digital Sanctum, inc. > ----------------------------------------------------- > skype: digitalsanctum > blog: http://www.digitalsanctum.com > _______________________________________________ > Mechanize-users mailing list > Mechanize-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/mechanize-users > ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From Bil.Kleb at NASA.gov Thu Sep 20 21:12:38 2007 From: Bil.Kleb at NASA.gov (Bil Kleb) Date: Thu, 20 Sep 2007 21:12:38 -0400 Subject: [Mechanize-users] issues submitting a search form In-Reply-To: <153af7a10709201617n74ba0793q1e065ffa431b6b07@mail.gmail.com> References: <153af7a10709201604n5e18373n66e6ac14cfce8cdd@mail.gmail.com> <20070920231554.GB2096@mac-mini.lan> <153af7a10709201617n74ba0793q1e065ffa431b6b07@mail.gmail.com> Message-ID: <46F31A86.3060405@NASA.gov> Shane Witbeck wrote: > Aaron, you tha man. Thanks for the 12 minute response time. Yeah, open source is so lame; the response time is typically less than it takes me to find the customer support contact for a commercial vendor... Regards, -- Bil Kleb http://fun3d.larc.nasa.gov From wflanagan at gmail.com Sun Sep 23 10:52:51 2007 From: wflanagan at gmail.com (William Flanagan) Date: Sun, 23 Sep 2007 10:52:51 -0400 Subject: [Mechanize-users] Selecting Links with their parent class attribute? Message-ID: <321613ee0709230752s306fa0c2g223c45cf64d432c8@mail.gmail.com> Hi all, Trying to figure out a quick way to do something. I have a yahoo groups list, and it has listings that look like this: freecycledc Basically, right now I'm trying to get a group listing. I would think I could select links based on the "ygrp-nowrap" class variable, but it's not working. Any advice on the how to select the hrefs (and strings) in this type of a configuration? Thanks, Will -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070923/6766a94a/attachment.html From aaron at tenderlovemaking.com Sun Sep 23 16:22:38 2007 From: aaron at tenderlovemaking.com (Aaron Patterson) Date: Sun, 23 Sep 2007 13:22:38 -0700 Subject: [Mechanize-users] Selecting Links with their parent class attribute? In-Reply-To: <321613ee0709230752s306fa0c2g223c45cf64d432c8@mail.gmail.com> References: <321613ee0709230752s306fa0c2g223c45cf64d432c8@mail.gmail.com> Message-ID: <20070923202238.GA3653@mac-mini.lan> Hey Will, On Sun, Sep 23, 2007 at 10:52:51AM -0400, William Flanagan wrote: > Hi all, > > Trying to figure out a quick way to do something. I have a yahoo groups > list, and it has listings that look like this: > > > > > freecycledc > > src="http://f9g.yahoofs.com/groups/g_11337116/.HomePage/__tn_/4d04.jpg?grA.E.GBtlOnTmsC" > width="96" height="48" alt=""> > > > > > Basically, right now I'm trying to get a group listing. I would think I > could select links based on the "ygrp-nowrap" class variable, but it's not > working. Any advice on the how to select the hrefs (and strings) in this > type of a configuration? Mechanize lets you get ahold of the hpricot object and lets you click on links returned from hpricot. So I would suggest using hpricot searching for this task. I think the following example may be close to what you're looking for: page.search("//li[@class='smalltype ygrp-nowrap']/a").each do |link| puts link.inner_text f = mech.click(link) end Hope that helps! -- Aaron Patterson http://tenderlovemaking.com/ From shane at digitalsanctum.com Mon Sep 24 08:21:59 2007 From: shane at digitalsanctum.com (Shane Witbeck) Date: Mon, 24 Sep 2007 08:21:59 -0400 Subject: [Mechanize-users] selecting a form Message-ID: <153af7a10709240521o5249c933l50b0a77968f07552@mail.gmail.com> What's the best way to select a form on a page with Mechanize if the form doesn't have the "name" attribute? I'm already familiar with the page.form('myformname') syntax but this doesn't? work with forms that have no names. -- Thank you, Shane Witbeck Digital Sanctum, inc. ----------------------------------------------------- skype: digitalsanctum blog: http://www.digitalsanctum.com From maku at makuchaku.info Mon Sep 24 09:53:49 2007 From: maku at makuchaku.info (=?UTF-8?Q?=E0=A4=AE=E0=A4=AF=E0=A4=82=E0=A4=95_?= =?UTF-8?Q?=E0=A4=9C=E0=A5=88=E0=A4=A8_(?= =?UTF-8?Q?makuchaku)?=) Date: Mon, 24 Sep 2007 19:23:49 +0530 Subject: [Mechanize-users] selecting a form In-Reply-To: <153af7a10709240521o5249c933l50b0a77968f07552@mail.gmail.com> References: <153af7a10709240521o5249c933l50b0a77968f07552@mail.gmail.com> Message-ID: <185269960709240653q32a531x1d4aed664c9cf089@mail.gmail.com> On 9/24/07, Shane Witbeck wrote: > > What's the best way to select a form on a page with Mechanize if the > form doesn't have the "name" attribute? I'm already familiar with the > page.form('myformname') syntax but this doesn't? work with forms that > have no names. You can try selecting the form based on its location - like page.forms.firstmaybe... Try http://www.makuchaku.info/blog/mechanizing-orkut I hope it helps :) -- Maku -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070924/1f29eea2/attachment-0001.html From bodo at wannawork.de Sun Sep 30 05:33:58 2007 From: bodo at wannawork.de (Bodo Tasche) Date: Sun, 30 Sep 2007 11:33:58 +0200 Subject: [Mechanize-users] Problems with Forms Message-ID: <46FF6D86.9030905@wannawork.de> Hi, I am using mechanize for a while now. Works great. But at the moment I have a small problem using it. The problematic file is attached. In that PHP-File you will see a formular, but mechanize doesn't recognize it. Could somebody check this? Thanks, Bodo -- http://www.tvbrowser.org http://www.wannawork.de -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/mechanize-users/attachments/20070930/698e3dbd/attachment.html