From lists at ruby-forum.com Mon Jun 1 11:58:33 2009 From: lists at ruby-forum.com (Jarrett Colby) Date: Mon, 1 Jun 2009 17:58:33 +0200 Subject: BBCode Fork of RedCloth In-Reply-To: <45BE7F12-C5F8-4FFC-A11B-44159C11D3E8@jasongarber.com> References: <45BE7F12-C5F8-4FFC-A11B-44159C11D3E8@jasongarber.com> Message-ID: <34fd0db5d86ee36d8644cb8b197095c1@ruby-forum.com> Here comes a bit of shameless self-promotion, but hopefully it will be helpful. I just released the beta version of a standalone Ruby BB code parser as a gem. It's called RbbCode, and my favorite thing about it is that it gracefully handles invalid input. It's available on github at: http://github.com/jarrett/rbbcode To install: gem sources -a http://gems.github.com sudo gem install jarrett-rbbcode It's lacking in documentation as of now (June 1, 2009). The RDoc is pretty much non-existent. But you'll find a basic usage example in the README. Note that this is very new, so it likely has some bugs. If you find any, please do submit an issue on github, and I'll fix the bug and re-release. There's a halfway-decent spec suite in place, but it could use a lot more examples. Jason Garber wrote: > That's great, Ryan! Thanks for providing the Ruby community with a > way to parse BBCode. I ran into that deficiency a few years ago when > trying to gradually replace parts of an old PHP system with Ruby. I > think I just left the BBCode part in PHP. :-) > > Would you consider splitting it off into a new piece of software, so > it's not "Ryan's fork of RedCloth" but rather a library people will > find when looking for a Ruby BBCode parser? You may not have time to > get to it for awhile, but I think it would be really helpful. > > The quick and dirty way is to just pull it into a repo of a new name. > I think your architecture, though, won't serve you well in the long > run. You should find a way to connect with RedCloth straight from the > source and layer your BBCode functionality on top of it. Here are a > few ways it could be done: > 1.) Serial processing: Your package would be BBCode only. The user > would run the document through your package before running it through > RedCloth > 2.) Plugin: not sure if this is possible, but I wonder if there'd be a > way to have Ragel include one scanner in another so it all gets > processed in one shot. The user would install the RedCloth gem and > any plugin gems and those plugins would modify RedCloth behavior. Not > sure that would work?I can see lots of problems with this one. > 3. RedCloth flavors: Yours would be RedCloth-BBCode and it would just > include RedCloth in vendor/ or something. > > I want to make RedCloth the sort of thing that people can extend > without too much trouble, but also so it's easy for the end user to > keep up with RedCloth advances without having to wait for the > extension to merge in the changes from upstream?which will be very > slow if it's a painful merge process. > > If you were to move your fork to a new project and start a clean fork > of RedCloth to make general changes you need for your library to be > compatible, I'd be glad to merge those in. > > Good luck figuring it all out and thanks again, > Jason -- Posted via http://www.ruby-forum.com/. From judofyr at gmail.com Fri Jun 5 13:24:01 2009 From: judofyr at gmail.com (Magnus Holm) Date: Fri, 5 Jun 2009 19:24:01 +0200 Subject: bug Message-ID: <391a49da0906051024n28b978fdqa665cb4a53a3569e@mail.gmail.com> Hey, and thanks for a great RedCloth! I have however stumbled upon a very weird bug. Have a look at this Textile file: https://gist.github.com/c75681ce99f8c2515ee6. I have two -tags inside, but it still escapes single-quote to ‘ in the first -block, but not the last... Any reason why this would happen? //Magnus Holm -------------- next part -------------- An HTML attachment was scrubbed... URL: From gaspard at teti.ch Sun Jun 7 15:17:41 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Sun, 7 Jun 2009 21:17:41 +0200 Subject: Close to a 4.2 release; experimenting with Ragel alternatives In-Reply-To: References: Message-ID: <7a9f744a0906071217n2dae5a14wbe5f60210d6a712e@mail.gmail.com> Hi Jason ! Hmmm, this is good and bad news: Good: ruby hooks means I could use a single pass to parse textile customizations in zena instead of running two parsers: nice. Bad: I have just switched to ragel for QueryBuilder to parse pseudo sql and I fear your shortcomings (if that's an english phrase). Could you describe more precisely what you are missing with ragel ? I'm parsing about anything I want with this thing but maybe I'm too dumb to see the walls I'm running into... Gaspard On Sun, Jun 7, 2009 at 12:59 PM, Jason Garber wrote: > I just went through the ticket list and dropped a bunch from the 4.2 > milestone that are just too difficult with Ragel. ?Many of them I've poked > at and they've left me saying, "how the heck am I supposed to do that!?" > ?Multi-byte content will probably never work because Ragel docs say it won't > with conditionals (actions that return true or false to determine if a state > should be accepted), which I see no way around. ?Not recognizing vertical > pipes escaped with notextile tags in tables, exiting the HTML machine on the > first closing block tag it sees, leaving pre blocks prematurely... all these > bugs would require a lot?of time and code to fix. ?And they're just the tip > of the?iceberg. ?If I walk through the code and look at it through the lens > of nondeterminism, I can see lots more problems that people just haven't run > into yet. > I'd like to release RedCloth 4.2 once I fix the low-hanging fruit. ?Then, I > plan to poke around for alternatives to Ragel. ?It's been great, but > RedCloth has gotten really difficult to maintain because: > 1.) It has to compile > 2.) It compiles to three languages, has a couple binary gem distributions, > and needs to work with Ruby 1.8 and 1.9, which is always a challenge > 3.) Many reported bugs involve nondeterminism and require things DFAs like > Ragel have a hard time doing > 4.) Not that many people can fix bugs themselves because they don't know > Ragel or they don't understand the code. > 5.) It's hard to tell people they can't mix in extensions. ?Right now > RedCloth is a black box and you have to pre- or post-parse for extra > patterns, like wiki links. ?I want people to be able to use it how they > want. ?If that means mixing in their own cruddy patterns, awesome. > A PEG might be the way to go. ?Looking at Treetop, which is nice, decently > maintained, has some history, and is used by Cucumber. ?Doesn't let me > manipulate the parser's acceptance of expressions in code, though. ?It's a > known problem, which is why you don't see any yaml parsers in treetop yet > (they have a proposal on?Global Parsing State and Semantic Backtrack > Triggering). ?Also, without backreferences or the equivalent in code, it > would be hard to match things like HTML tags. > Also looking at James Edward Gray II's Ghost Wheel. ?I like the grammar > syntax better and he says it "provides hooks for Ruby code that can be used > to make parsing decisions or transform parsed results," but it's less widely > used and well-documented and I haven't tried it out, so I don't know its > limitations. > If anyone else has suggestions of things I should explore, do let me know! > ?I want to keep RedCloth fast, but it also needs to be maintainable. > Jason > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > From lists at ruby-forum.com Sun Jun 7 17:52:48 2009 From: lists at ruby-forum.com (Hao Dong) Date: Sun, 7 Jun 2009 23:52:48 +0200 Subject: URL cloaking? Message-ID: HI, there, is there any gem/plugin out there that does URL cloaking? Ideally if RedCloth has built-in function like that, it would be great. Thanks! -- Posted via http://www.ruby-forum.com/. From jg at jasongarber.com Mon Jun 8 00:28:35 2009 From: jg at jasongarber.com (Jason Garber) Date: Mon, 8 Jun 2009 00:28:35 -0400 Subject: Close to a 4.2 release; experimenting with Ragel alternatives In-Reply-To: <7a9f744a0906071217n2dae5a14wbe5f60210d6a712e@mail.gmail.com> References: <7a9f744a0906071217n2dae5a14wbe5f60210d6a712e@mail.gmail.com> Message-ID: It's probably me who's too dumb for Ragel. :). Take a look at the bugs tagged difficult on the tracker. Also I'll forward you what I sent to why describing the problems. Sent from my iPod On Jun 7, 2009, at 3:17 PM, Gaspard Bucher wrote: > Hi Jason ! > > Hmmm, this is good and bad news: > > Good: ruby hooks means I could use a single pass to parse textile > customizations in zena instead of running two parsers: nice. > > Bad: I have just switched to ragel for QueryBuilder to parse pseudo > sql and I fear your shortcomings (if that's an english phrase). > > Could you describe more precisely what you are missing with ragel ? > I'm parsing about anything I want with this thing but maybe I'm too > dumb to see the walls I'm running into... > > Gaspard > > On Sun, Jun 7, 2009 at 12:59 PM, Jason Garber > wrote: >> I just went through the ticket list and dropped a bunch from the 4.2 >> milestone that are just too difficult with Ragel. Many of them >> I've poked >> at and they've left me saying, "how the heck am I supposed to do >> that!?" >> Multi-byte content will probably never work because Ragel docs say >> it won't >> with conditionals (actions that return true or false to determine >> if a state >> should be accepted), which I see no way around. Not recognizing >> vertical >> pipes escaped with notextile tags in tables, exiting the HTML >> machine on the >> first closing block tag it sees, leaving pre blocks prematurely... >> all these >> bugs would require a lot of time and code to fix. And they're just >> the tip >> of the iceberg. If I walk through the code and look at it through >> the lens >> of nondeterminism, I can see lots more problems that people just >> haven't run >> into yet. >> I'd like to release RedCloth 4.2 once I fix the low-hanging fruit. >> Then, I >> plan to poke around for alternatives to Ragel. It's been great, but >> RedCloth has gotten really difficult to maintain because: >> 1.) It has to compile >> 2.) It compiles to three languages, has a couple binary gem >> distributions, >> and needs to work with Ruby 1.8 and 1.9, which is always a challenge >> 3.) Many reported bugs involve nondeterminism and require things >> DFAs like >> Ragel have a hard time doing >> 4.) Not that many people can fix bugs themselves because they don't >> know >> Ragel or they don't understand the code. >> 5.) It's hard to tell people they can't mix in extensions. Right now >> RedCloth is a black box and you have to pre- or post-parse for extra >> patterns, like wiki links. I want people to be able to use it how >> they >> want. If that means mixing in their own cruddy patterns, awesome. >> A PEG might be the way to go. Looking at Treetop, which is nice, >> decently >> maintained, has some history, and is used by Cucumber. Doesn't let >> me >> manipulate the parser's acceptance of expressions in code, though. >> It's a >> known problem, which is why you don't see any yaml parsers in >> treetop yet >> (they have a proposal on Global Parsing State and Semantic Backtrack >> Triggering). Also, without backreferences or the equivalent in >> code, it >> would be hard to match things like HTML tags. >> Also looking at James Edward Gray II's Ghost Wheel. I like the >> grammar >> syntax better and he says it "provides hooks for Ruby code that can >> be used >> to make parsing decisions or transform parsed results," but it's >> less widely >> used and well-documented and I haven't tried it out, so I don't >> know its >> limitations. >> If anyone else has suggestions of things I should explore, do let >> me know! >> I want to keep RedCloth fast, but it also needs to be maintainable. >> Jason >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From gaspard at teti.ch Mon Jun 8 07:06:55 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Mon, 8 Jun 2009 13:06:55 +0200 Subject: Need your advice on RedCloth In-Reply-To: <88BDD050-1BE3-4795-A0A9-1FB84A32C1A7@jasongarber.com> References: <5120B73A-8EF8-4DF9-B035-D02AD9A8406D@jasongarber.com> <88BDD050-1BE3-4795-A0A9-1FB84A32C1A7@jasongarber.com> Message-ID: <7a9f744a0906080406q362d762cp337a2792dbe226ce@mail.gmail.com> Hi Jason ! I've looked to the ragel code for textile and you are right: it has become quite hard to understand. I have gone through the list of difficult defects and through the current textile reference and I have the feeling that the current parser is quite complicated for the task at hand. Textile does not look like such a complicated grammar (at least not what is listed in the reference page), but maybe I'm wrong and there are many places where determinism is not easily attained. I really feel that the parts that are difficult for the parser are also difficult for the reader when editing text. And most of these hard-to-parse and hard-to-read features in textile (except for tables) are not related to describing content but to styling: something like setting an "id" in an article seems really bad to me: what if you display two articles on one page and they both define "hot" id ? Same goes with "em" padding: that's not content, that's styling. I feel very concerned about all these issues related to textile because I am building a CMS in which my clients put *everything*: letters, comments, documents, quality certification stuff, control lists, etc. So I really need a textile parser that can survive in the long run (10yrs). To achieve this goal, we need to: a. have a parser that is easy to enhance with new needs without breaking old text b. have a grammar that is easy to parse For point "a", I think we can live with S-expression generation and customization during s-expression tree processing. For example an image with caption would be parsed as: !file.jpg (foo bar baz)! ==> [:image, "file.jpg (foo bar baz)"] So the processor will run ruby regex to "finish the work". This means the parser in "C" is kept simple and if someone wants to add more features to the "image" tag, she just has to change the ruby regex. For point "b": we need to *not* support shortcut syntax for styling features such as the "id" thing or "em" padding (at least not at the "C" parser level). If someone really wants an em padding, she should use html (it's not nice to use and this is an indication that this is bad practice) :
# one # two
Since I *really* need such a tool, I could help refactoring redcloth into a two step parser (half in "C", half in ruby). What do you think ? Gaspard On Mon, Jun 8, 2009 at 6:40 AM, Jason Garber wrote: > Gaspard, here's a copy of my complaining to _why, which contains a few > examples of what I'm up against. ?This was awhile back, so I've moved on a > bit, but what you were asking about still applies. > There's probably a way to do everything I need to in Ragel, but I can't > figure it out. I spent a few hours figuring out a different capture > mechanism and thought I was being quite clever, but in the end it didn't > work out. ?I was relieved because it felt like I was reinventing something a > tool should provide anyway. > Jason > > Begin forwarded message: > > From: Jason Garber > Date: June 1, 2009 8:50:23 AM EDT > To: why the lucky stiff > Subject: Need your advice on RedCloth > Hi, why. ?I need your advice on RedCloth and Ragel. ?The current mark and > capture mechanism has just gotten too ugly for me to handle. ?Since I took > over the project, we've had to add several more variables/macros/actions to > mark fallback captures and I had to add a separate machine to parse > attributes. ?It's been livable, but as I fix more bugs it keeps heading in > that direction and I don't like it. > > I've reduced the problem down to the deterministic nature of the machine. > ?For example, merging in PyTextile-style table attributes creates a conflict > when recognizing these two possibilities: > (#myid)# This is a list item with an id > (# This is a list item with padding-left:1em > > It has to get to the third character to know whether the first character was > a left indent or the start of the id, but by then it's too late?the indent > has already been stored. ?You wind up getting the same output from (#myid)# > one as you would ((#myid)# one. ?I solved it easily enough with a > conditional action that looks to see if p+2 is a space, but I feel like I > shouldn't have to. > > I wish that from the start state there were two '(' transitions, one marking > the indent, one the id. ?The branches would have different things both > leading to a final state. ?At the final state, an action would discard the > captured bits that were not on the path to the final state, leaving only the > things that "stuck." ?Basically, the same thing backtracking regex engines > do when matching /(\()?(\(#([a-z]+\))?# (.+)/ > > Plus, there's the matter of having to think about nondeterminism at every > step of the way, like when writing cite = "??" mtext "??". ?I never thought > about it that book titles might end with or contain a question mark. ?It > took me nearly an hour to get that working, but a regex would have just done > it without me wasting brain cycles. > > I've also had to write duplicate patterns that don't have embedded actions > so that I can look ahead (for extended blocks and such). ?So I wind up with > duplicate patterns A and A_noactions, C and C_noactions... > > I'm new to all this stuff, but it seems Ragel produces a DFA, not an NFA, so > what I describe above isn't possible. ?Is there a way to accomplish it with > Ragel? ?It's tempting to just switch to Oniguruma. ?I'll bet it wouldn't be > too much slower if we interfaced with it directly in C and did all the > string manipulations in C. ?Might have distribution problems. > > What do you think? > > Jason > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > From jg at jasongarber.com Wed Jun 10 16:10:24 2009 From: jg at jasongarber.com (Jason Garber) Date: Wed, 10 Jun 2009 16:10:24 -0400 Subject: Problems 4.2 installing on Windows? Message-ID: <49C277E3-B2C8-4FD1-827C-F210F4DA91F9@jasongarber.com> I keep an old Windows machine around for making sure that RedCloth Win32 binary gems install. When I try gem update RedCloth, it just installs 4.1.9. Did I do something wrong in the gemspec? Would someone else with a windows machine try it and let me know how it goes? Jason From hgs at dmu.ac.uk Wed Jun 10 16:19:40 2009 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Wed, 10 Jun 2009 21:19:40 +0100 (BST) Subject: Problems 4.2 installing on Windows? In-Reply-To: <49C277E3-B2C8-4FD1-827C-F210F4DA91F9@jasongarber.com> References: <49C277E3-B2C8-4FD1-827C-F210F4DA91F9@jasongarber.com> Message-ID: On Wed, 10 Jun 2009, Jason Garber wrote: > I keep an old Windows machine around for making sure that RedCloth Win32 > binary gems install. When I try gem update RedCloth, it just installs 4.1.9. > Did I do something wrong in the gemspec? Would someone else with a windows > machine try it and let me know how it goes? Has the gem propagated to the mirrors? > > Jason > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From jg at jasongarber.com Thu Jun 11 10:49:07 2009 From: jg at jasongarber.com (Jason Garber) Date: Thu, 11 Jun 2009 10:49:07 -0400 Subject: Benchmarking the pure-ruby parser Message-ID: Benchmarking the pure-ruby version of the parser, which is a new option in 4.2... ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.2.0 compiled in Ruby... Finished in 30.703538 seconds 30 seconds is a long time compared to... ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 3.0.4 compiled in ruby-regex... Finished in 0.346966 seconds ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.0.0 compiled in C... Finished in 0.055147 seconds ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.1.1 compiled in C... Finished in 0.05756 seconds ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.1.9 compiled in C... Finished in 0.140867 seconds ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.2.0 compiled in C... Finished in 0.13385 seconds Uh, yeah. I knew it was slow, but until now I had no idea it was _that_ slow compared to 3.0.4. The tests in 3.0.4 took a long time to run, so I just figured the two all-ruby versions would be about the same. Now that I think about it, it does make sense that even lots of Regexes would be faster than doing all the looping, character comparison, and concatenation in Ruby. Also note that RedCloth has been getting slower with each version, except this last one where I tried hard to reduce complexity in some areas. Fixing bugs usually means making the machine more specific, which means more complexity. The parser binary has more than doubled in size since 4.0.0. Let's see about Ruby 1.8 vs 1.9... ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 3.0.4 compiled in ruby-regex... Finished in 0.320067 seconds ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 3.0.4 compiled in ruby-regex... Finished in 0.566467 seconds So 3.0.4 is _slower_ in Ruby 1.9. Interesting. Is 4.2.0? ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.2.0 compiled in C... Finished in 0.148924 seconds ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O spec/ spec.opts Benchmarking version 4.2.0 compiled in C... Finished in 0.107696 seconds Thankfully, no. My conclusion is, I need to do something different with RedCloth in the long term if I want it available as a pure-Ruby library and if I don't want the size and parse time to keep ballooning. Jason From gaspard at teti.ch Thu Jun 11 12:44:04 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Thu, 11 Jun 2009 18:44:04 +0200 Subject: Benchmarking the pure-ruby parser In-Reply-To: References: Message-ID: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> Thanks for the very interesting comparaisons Jason. Pure ruby could definitely be an option. As I said, I really need a good, ruby extendable textile parser for zena in the long run so I'd be glad to help. I have implemented several pure ruby (regex based) parsers (parser for http://zenadmin.org/zafu is one) and other ragel based (json/scripts for http://rubyk.org for example). Regarding 3.0.4: I see most regexp are not left anchored (starting with /\A.../). I think we could get a nice speedup (and a more robust parser) if we used some kind of state machine instead of raw split and gsub. The zafu parser for example uses contextual regexps and eats input data as it matches. This means that input data is only parsed a minimal amount of times. Gaspard On Thu, Jun 11, 2009 at 4:49 PM, Jason Garber wrote: > Benchmarking the pure-ruby version of the parser, which is a new option in > 4.2... > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.2.0 compiled in Ruby... > Finished in 30.703538 seconds > > 30 seconds is a long time compared to... > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 3.0.4 compiled in ruby-regex... > Finished in 0.346966 seconds > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.0.0 compiled in C... > Finished in 0.055147 seconds > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.1.1 compiled in C... > Finished in 0.05756 seconds > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.1.9 compiled in C... > Finished in 0.140867 seconds > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.2.0 compiled in C... > Finished in 0.13385 seconds > > Uh, yeah. ?I knew it was slow, but until now I had no idea it was _that_ > slow compared to 3.0.4. ?The tests in 3.0.4 took a long time to run, so I > just figured the two all-ruby versions would be about the same. ?Now that I > think about it, it does make sense that even lots of Regexes would be faster > than doing all the looping, character comparison, and concatenation in Ruby. > > Also note that RedCloth has been getting slower with each version, except > this last one where I tried hard to reduce complexity in some areas. ?Fixing > bugs usually means making the machine more specific, which means more > complexity. ?The parser binary has more than doubled in size since 4.0.0. > > Let's see about Ruby 1.8 vs 1.9... > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 3.0.4 compiled in ruby-regex... > Finished in 0.320067 seconds > > ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O > spec/spec.opts > Benchmarking version 3.0.4 compiled in ruby-regex... > Finished in 0.566467 seconds > > So 3.0.4 is _slower_ in Ruby 1.9. ?Interesting. ?Is 4.2.0? > > ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/spec.opts > Benchmarking version 4.2.0 compiled in C... > Finished in 0.148924 seconds > > ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O > spec/spec.opts > Benchmarking version 4.2.0 compiled in C... > Finished in 0.107696 seconds > > Thankfully, no. > > My conclusion is, I need to do something different with RedCloth in the long > term if I want it available as a pure-Ruby library and if I don't want the > size and parse time to keep ballooning. > > Jason > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > From jg at jasongarber.com Thu Jun 11 13:57:26 2009 From: jg at jasongarber.com (Jason Garber) Date: Thu, 11 Jun 2009 13:57:26 -0400 Subject: Benchmarking the pure-ruby parser In-Reply-To: References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> Message-ID: On Jun 11, 2009, at 1:24 PM, Hugh Sasse wrote: > Is the ruby code to do this in the RedCloth-4.2.0 distro? I can't see > any parsers in there (grep -lir parser .) except the C code. You have to get the source code yourself and build it with rake pureruby compile. I didn't include it in any package/distro because I want to discourage its use (it is 100x slower, after all!). From jg at jasongarber.com Thu Jun 11 14:00:02 2009 From: jg at jasongarber.com (Jason Garber) Date: Thu, 11 Jun 2009 14:00:02 -0400 Subject: Benchmarking the pure-ruby parser In-Reply-To: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> Message-ID: <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> Gaspard, thanks for offering to help! I'll surely take you up on it? seems like you have some very helpful experience. Want to do a little more research on it myself first, though. Then maybe you and I can get together and work out some prototypes. Jason On Jun 11, 2009, at 12:44 PM, Gaspard Bucher wrote: > Thanks for the very interesting comparaisons Jason. Pure ruby could > definitely be an option. > > As I said, I really need a good, ruby extendable textile parser for > zena in the long run so I'd be glad to help. I have implemented > several pure ruby (regex based) parsers (parser for > http://zenadmin.org/zafu is one) and other ragel based (json/scripts > for http://rubyk.org for example). > > Regarding 3.0.4: I see most regexp are not left anchored (starting > with /\A.../). I think we could get a nice speedup (and a more robust > parser) if we used some kind of state machine instead of raw split and > gsub. The zafu parser for example uses contextual regexps and eats > input data as it matches. This means that input data is only parsed a > minimal amount of times. > > Gaspard > > On Thu, Jun 11, 2009 at 4:49 PM, Jason Garber > wrote: >> Benchmarking the pure-ruby version of the parser, which is a new >> option in >> 4.2... >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.2.0 compiled in Ruby... >> Finished in 30.703538 seconds >> >> 30 seconds is a long time compared to... >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 3.0.4 compiled in ruby-regex... >> Finished in 0.346966 seconds >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.0.0 compiled in C... >> Finished in 0.055147 seconds >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.1.1 compiled in C... >> Finished in 0.05756 seconds >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.1.9 compiled in C... >> Finished in 0.140867 seconds >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.2.0 compiled in C... >> Finished in 0.13385 seconds >> >> Uh, yeah. I knew it was slow, but until now I had no idea it was >> _that_ >> slow compared to 3.0.4. The tests in 3.0.4 took a long time to >> run, so I >> just figured the two all-ruby versions would be about the same. >> Now that I >> think about it, it does make sense that even lots of Regexes would >> be faster >> than doing all the looping, character comparison, and concatenation >> in Ruby. >> >> Also note that RedCloth has been getting slower with each version, >> except >> this last one where I tried hard to reduce complexity in some >> areas. Fixing >> bugs usually means making the machine more specific, which means more >> complexity. The parser binary has more than doubled in size since >> 4.0.0. >> >> Let's see about Ruby 1.8 vs 1.9... >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 3.0.4 compiled in ruby-regex... >> Finished in 0.320067 seconds >> >> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O >> spec/spec.opts >> Benchmarking version 3.0.4 compiled in ruby-regex... >> Finished in 0.566467 seconds >> >> So 3.0.4 is _slower_ in Ruby 1.9. Interesting. Is 4.2.0? >> >> ~/Documents/redcloth(master) $ spec spec/benchmark_spec.rb -O spec/ >> spec.opts >> Benchmarking version 4.2.0 compiled in C... >> Finished in 0.148924 seconds >> >> ~/Documents/redcloth(master) $ spec19 spec/benchmark_spec.rb -O >> spec/spec.opts >> Benchmarking version 4.2.0 compiled in C... >> Finished in 0.107696 seconds >> >> Thankfully, no. >> >> My conclusion is, I need to do something different with RedCloth in >> the long >> term if I want it available as a pure-Ruby library and if I don't >> want the >> size and parse time to keep ballooning. >> >> Jason >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From gaspard at teti.ch Thu Jun 11 14:25:23 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Thu, 11 Jun 2009 20:25:23 +0200 Subject: Benchmarking the pure-ruby parser In-Reply-To: <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> Message-ID: <7a9f744a0906111125t35afaf85x6e28f912c7ff04ab@mail.gmail.com> For benchmarking, I think it could make sense to have some representative data of a typical application. This might mean large amounts of pure text without any textile tags (except paragraphs), or it might be a mixture of very short texts with a few very long ones. I could give you the data from http://zenadmin.org, but the links and images are not very typical: "":45 (internal link to node 45) > Want to do a little more research on it myself first, though. Of course ! Please continue to give us feedback. This kind of experimenting can provide very useful information on the different parsing solutions. G. From hgs at dmu.ac.uk Thu Jun 11 14:26:38 2009 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Thu, 11 Jun 2009 19:26:38 +0100 (BST) Subject: Benchmarking the pure-ruby parser In-Reply-To: References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> Message-ID: On Thu, 11 Jun 2009, Jason Garber wrote: > On Jun 11, 2009, at 1:24 PM, Hugh Sasse wrote: > > Is the ruby code to do this in the RedCloth-4.2.0 distro? I can't see > > any parsers in there (grep -lir parser .) except the C code. > > You have to get the source code yourself and build it with rake pureruby > compile. I didn't include it in any package/distro because I want to > discourage its use (it is 100x slower, after all!). Ok, just wondered if I could see any speedups in a few mins. I'll leave this for now. I will have a bit more time in a few weeks I think. Hugh > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From jg at jasongarber.com Thu Jun 11 14:56:00 2009 From: jg at jasongarber.com (Jason Garber) Date: Thu, 11 Jun 2009 14:56:00 -0400 Subject: Problems 4.2 installing on Windows? In-Reply-To: References: <49C277E3-B2C8-4FD1-827C-F210F4DA91F9@jasongarber.com> Message-ID: <9082E171-0BF1-4148-A0A3-58060E0B50E3@jasongarber.com> I think so. Tried again a day later and still got the same thing. C:\>gem install RedCloth -v 4.2 Successfully installed RedCloth-4.1.9-x86-mswin32-60 1 gem installed Installing ri documentation for RedCloth-4.1.9-x86-mswin32-60... Installing RDoc documentation for RedCloth-4.1.9-x86-mswin32-60... C:\>irb irb(main):001:0> require 'rubygems' => false irb(main):002:0> require 'redcloth' => true irb(main):003:0> RedCloth::VERSION => 4.1.9 On Jun 10, 2009, at 4:19 PM, Hugh Sasse wrote: > On Wed, 10 Jun 2009, Jason Garber wrote: > >> I keep an old Windows machine around for making sure that RedCloth >> Win32 >> binary gems install. When I try gem update RedCloth, it just >> installs 4.1.9. >> Did I do something wrong in the gemspec? Would someone else with a >> windows >> machine try it and let me know how it goes? > > Has the gem propagated to the mirrors? >> >> Jason >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From hgs at dmu.ac.uk Thu Jun 11 15:30:08 2009 From: hgs at dmu.ac.uk (Hugh Sasse) Date: Thu, 11 Jun 2009 20:30:08 +0100 (BST) Subject: Problems 4.2 installing on Windows? In-Reply-To: <9082E171-0BF1-4148-A0A3-58060E0B50E3@jasongarber.com> References: <49C277E3-B2C8-4FD1-827C-F210F4DA91F9@jasongarber.com> <9082E171-0BF1-4148-A0A3-58060E0B50E3@jasongarber.com> Message-ID: On Thu, 11 Jun 2009, Jason Garber wrote: > I think so. Tried again a day later and still got the same thing. > > C:\>gem install RedCloth -v 4.2 > Successfully installed RedCloth-4.1.9-x86-mswin32-60 > 1 gem installed > Installing ri documentation for RedCloth-4.1.9-x86-mswin32-60... > Installing RDoc documentation for RedCloth-4.1.9-x86-mswin32-60... > > C:\>irb > irb(main):001:0> require 'rubygems' > => false > irb(main):002:0> require 'redcloth' > => true > irb(main):003:0> RedCloth::VERSION > => 4.1.9 > Hmmmm. There's been too much happening on the Rubygems list of late for me to keep up with changes, but that explicit version request and results look like a bug in rubgems to my uninformed eye. I suppose updating your rubygems doesn't help? I'm seeing this: hgs at Q2P14HGS /usr/lib/ruby/gems/1.8/gems/RedCloth-4.2.0 20:21:28$ gem query --remote -d -a -n RedCloth *** REMOTE GEMS *** RedCloth (4.2.0, 4.1.9, 4.1.1, 4.1.0, 4.0.4, 4.0.3, 4.0.2, 4.0.1, 4.0.0, 3.0.4, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 2.0.11, 2.0.10, 2.0.9, 2.0.8, 2.0.7, 2.0.6, 2.0.5, 2.0.4, 2.0.3, 2.0.2) Platforms: 2.0.2: ruby 2.0.3: ruby 2.0.4: ruby 2.0.5: ruby 2.0.6: ruby 2.0.7: ruby 2.0.8: ruby 2.0.9: ruby 2.0.10: ruby 2.0.11: ruby 3.0.0: ruby 3.0.1: ruby 3.0.2: ruby 3.0.3: ruby 3.0.4: ruby 4.0.0: ruby, x86-mswin32-60 4.0.1: ruby, x86-mswin32-60 4.0.2: ruby, x86-mswin32-60 4.0.3: ruby, x86-mswin32-60 4.0.4: ruby, x86-mswin32-60 4.1.0: ruby, universal-java, x86-mswin32-60 4.1.1: ruby, universal-java, x86-mswin32-60 4.1.9: ruby, universal-java, x86-mswin32-60 4.2.0: ruby, universal-java, x86-mswin32-60 Author: Jason Garber Rubyforge: http://rubyforge.org/projects/redcloth Homepage: http://redcloth.org RedCloth-4.2.0 - Textile parser for Ruby. http://redcloth.org / redclothcoderay (0.3.0, 0.2.0, 0.1.2, 0.1.1, 0.1.0) Author: August Lilleaas Rubyforge: http://rubyforge.org/projects/redclothcoderay Homepage: http://redclothcoderay.rubyforge.org Integrates CodeRay with RedCloth by adding a <source> t ag. hgs at Q2P14HGS /usr/lib/ruby/gems/1.8/gems/RedCloth-4.2.0 20:22:57$ What does (I'm not *sure* this is right, but from gem install --help): C:\>gem install RedCloth -v 4.2 --platform=x86-mswin32-60 do for you? Hugh > On Jun 10, 2009, at 4:19 PM, Hugh Sasse wrote: > > > On Wed, 10 Jun 2009, Jason Garber wrote: > > > > > I keep an old Windows machine around for making sure that RedCloth Win32 > > > binary gems install. When I try gem update RedCloth, it just installs > > > 4.1.9. > > > Did I do something wrong in the gemspec? Would someone else with a > > > windows > > > machine try it and let me know how it goes? > > > > Has the gem propagated to the mirrors? > > > > > > Jason > > > _______________________________________________ > > > Redcloth-upwards mailing list > > > Redcloth-upwards at rubyforge.org > > > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > _______________________________________________ > > Redcloth-upwards mailing list > > Redcloth-upwards at rubyforge.org > > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From justincollins at ucla.edu Thu Jun 11 18:07:08 2009 From: justincollins at ucla.edu (Justin Collins) Date: Thu, 11 Jun 2009 15:07:08 -0700 Subject: Benchmarking the pure-ruby parser In-Reply-To: References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> Message-ID: <4A31800C.90304@ucla.edu> Hugh Sasse wrote: > Is the ruby code to do this in the RedCloth-4.2.0 distro? I can't see > any parsers in there (grep -lir parser .) except the C code. > > I'm wondering, much of the work of a parser will be looking for > fixed tokens, won't it? I know for numbers and arbitrary strings > this isn't true... I ask because one of the few weaknesses in Ruby > is that there is an over-emphasis on Regexps for processing strings > in the language, and the only way I know to test whether some string > (str) begins with a substring (sought), other than regexps, is > str.index(sought).zero? . Have I missed something obvious? It seems > to me that constructing something as complex as a regexp for such an > operation would be expensive. Even str.scan which will accept a string > rather than a regexp seems to treat such a string as a regexp, from a > quick look at the c code. > > Hugh > Using index() to check the beginning of the string will be very slow, since will search the entire string for the substring, although you are only interested in the beginning of the string. I would suggest str[0,sought.length] == sought, which is considerably faster, especially for short substrings. For longer ones, it seems (from some quick benchmarking) that each approach is about equal. -Justin require 'benchmark' include Benchmark str = "adiaodibasd" * 1000 sought = "hello" bmbm do |t| t.report "Index" do 100000.times do str.index(sought) == 0 end end t.report "Regexp" do reg = /^#{sought}/ 100000.times do str =~ reg end end t.report "Slice" do 100000.times do str[0,sought.length] == sought end end end From gaspard at teti.ch Tue Jun 16 16:29:44 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Tue, 16 Jun 2009 22:29:44 +0200 Subject: Starting the next version of RedCloth In-Reply-To: <6825BE58-5CFC-4286-9796-36D6798C72F4@jasongarber.com> References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> <7a9f744a0906111125t35afaf85x6e28f912c7ff04ab@mail.gmail.com> <6825BE58-5CFC-4286-9796-36D6798C72F4@jasongarber.com> Message-ID: <7a9f744a0906161329w7846e2cbt2019c5cbfea96154@mail.gmail.com> I followed some of the discussions on the treetop mailing list and it sure looks great. I'll have a look at the subset you are parsing and if that's not too much, I'll try an Oniguruma based parser. >From what I understand, Treetop builds an abstract syntax tree with rendering methods attached. This means that the parser knows about everything there is to know about parsing *and* rendering (Interpreter pattern). I'll try to build something that could be totally agnostic of what we actually do with the AST (or S-expression), may it be pdf, html, latex or other SM gears (Visitor pattern). Gaspard On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber wrote: > Gaspard (and others who are interested), > I've been researching some tools in this domain to figure out what they're > good at and what they're not. ?I like the advantages of parsing expression > grammars (PEGs) over regular expressions so I started a test rewrite with > Treetop to find out what the gotchas are (there are always gotchas). ?I've > hit surprisingly few and mostly they were indicative that I was thinking > about the problem wrongly. ?Overall it's going great, it's fast (so far) and > very elegant. ?I can get down and spec individual parts of the grammar or > even individual rules! > I'm interested to see what you would come up with using contextual regexps > like you mentioned for the zafu parser. ?I'd welcome you to build a little > prototype like I'm building with Treetop > (http://github.com/jgarber/redcloth-treetop). ?If yours implements the same > subset of Textile, then we can do some benchmarking, compare the grammars > and so forth. ?Don't feel obligated?just know that I'm open-minded about it. > I think I'd really like to stay away from C/Java unless someone demonstrates > that it's the only way to parse with any speed. ?My experience so far with > Treetop indicates a Ruby parser can?be fast enough (and if speed were really > more important to us than some other factors, we wouldn't have picked Ruby > in the first place!). > Either way, I also agree that it needs to be durable over the long-term. > ?I'm making websites in Textile right and left and I can't be updating them > every time the parser changes. ?I think lots of unit tests are the only sure > way to prevent regression (like happened in 4.2.0) and I've been very > pleased with how I can spec the behavior of individual rules and grammars in > my treetop prototype. > So, if you want to start a prototype regexp parser, that's cool, or if you > like what you see on my prototype project, you're welcome to join that as > well. > Regards, > Jason > On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote: > > For benchmarking, I think it could make sense to have some > representative data of a typical application. This might mean large > amounts of pure text without any textile tags (except paragraphs), or > it might be a mixture of very short texts with a few very long ones. I > could give you the data from http://zenadmin.org, but the links and > images are not very typical: > > "":45 (internal link to node 45) > > Want to do a little more research on it myself first, though. > > Of course ! Please continue to give us feedback. This kind of > experimenting can provide very useful information on the different > parsing solutions. > > G. > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > From jg at jasongarber.com Tue Jun 16 16:35:24 2009 From: jg at jasongarber.com (Jason Garber) Date: Tue, 16 Jun 2009 16:35:24 -0400 Subject: Starting the next version of RedCloth In-Reply-To: <7a9f744a0906161329w7846e2cbt2019c5cbfea96154@mail.gmail.com> References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> <7a9f744a0906111125t35afaf85x6e28f912c7ff04ab@mail.gmail.com> <6825BE58-5CFC-4286-9796-36D6798C72F4@jasongarber.com> <7a9f744a0906161329w7846e2cbt2019c5cbfea96154@mail.gmail.com> Message-ID: <7B05506C-4B63-4034-BB76-4404B93AB1BA@jasongarber.com> Yes, you can use Treetop for the interpreter pattern out of the box, but I'm doing following the visitor pattern like Cucumber did. Just started working on the visitor, actually. And my posts to the treetop list are a little old?I've moved past the problems I described there? so don't read them as the current state of things. Hope I'm using all the terminology right. I'm new to all this (no CompSci background). On Jun 16, 2009, at 4:29 PM, Gaspard Bucher wrote: > I followed some of the discussions on the treetop mailing list and it > sure looks great. I'll have a look at the subset you are parsing and > if that's not too much, I'll try an Oniguruma based parser. > >> From what I understand, Treetop builds an abstract syntax tree with > rendering methods attached. This means that the parser knows about > everything there is to know about parsing *and* rendering (Interpreter > pattern). > > I'll try to build something that could be totally agnostic of what we > actually do with the AST (or S-expression), may it be pdf, html, latex > or other SM gears (Visitor pattern). > > Gaspard > > > On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber > wrote: >> Gaspard (and others who are interested), >> I've been researching some tools in this domain to figure out what >> they're >> good at and what they're not. I like the advantages of parsing >> expression >> grammars (PEGs) over regular expressions so I started a test >> rewrite with >> Treetop to find out what the gotchas are (there are always >> gotchas). I've >> hit surprisingly few and mostly they were indicative that I was >> thinking >> about the problem wrongly. Overall it's going great, it's fast (so >> far) and >> very elegant. I can get down and spec individual parts of the >> grammar or >> even individual rules! >> I'm interested to see what you would come up with using contextual >> regexps >> like you mentioned for the zafu parser. I'd welcome you to build a >> little >> prototype like I'm building with Treetop >> (http://github.com/jgarber/redcloth-treetop). If yours implements >> the same >> subset of Textile, then we can do some benchmarking, compare the >> grammars >> and so forth. Don't feel obligated?just know that I'm open-minded >> about it. >> I think I'd really like to stay away from C/Java unless someone >> demonstrates >> that it's the only way to parse with any speed. My experience so >> far with >> Treetop indicates a Ruby parser can be fast enough (and if speed >> were really >> more important to us than some other factors, we wouldn't have >> picked Ruby >> in the first place!). >> Either way, I also agree that it needs to be durable over the long- >> term. >> I'm making websites in Textile right and left and I can't be >> updating them >> every time the parser changes. I think lots of unit tests are the >> only sure >> way to prevent regression (like happened in 4.2.0) and I've been very >> pleased with how I can spec the behavior of individual rules and >> grammars in >> my treetop prototype. >> So, if you want to start a prototype regexp parser, that's cool, or >> if you >> like what you see on my prototype project, you're welcome to join >> that as >> well. >> Regards, >> Jason >> On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote: >> >> For benchmarking, I think it could make sense to have some >> representative data of a typical application. This might mean large >> amounts of pure text without any textile tags (except paragraphs), or >> it might be a mixture of very short texts with a few very long >> ones. I >> could give you the data from http://zenadmin.org, but the links and >> images are not very typical: >> >> "":45 (internal link to node 45) >> >> Want to do a little more research on it myself first, though. >> >> Of course ! Please continue to give us feedback. This kind of >> experimenting can provide very useful information on the different >> parsing solutions. >> >> G. >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> >> >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards From gaspard at teti.ch Wed Jun 17 03:59:43 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Wed, 17 Jun 2009 09:59:43 +0200 Subject: Starting the next version of RedCloth In-Reply-To: References: <7a9f744a0906110944p49f814bei1966a3b5ae467e74@mail.gmail.com> <5A603C74-B363-41F9-94ED-2A55FD82A9EE@jasongarber.com> <7a9f744a0906111125t35afaf85x6e28f912c7ff04ab@mail.gmail.com> <6825BE58-5CFC-4286-9796-36D6798C72F4@jasongarber.com> <7a9f744a0906161329w7846e2cbt2019c5cbfea96154@mail.gmail.com> <7B05506C-4B63-4034-BB76-4404B93AB1BA@jasongarber.com> Message-ID: <7a9f744a0906170059j1fb04b08n6fb9a747c30851a6@mail.gmail.com> Hi Hugh ! I went through a description of the top down parser using Python (http://effbot.org/zone/simple-top-down-parsing.htm) and it's funny how we keep reinventing the wheel: I just implemented something similar for querybuilder, using a ruby array as stack and poping the stack depending on operator precedence (http://tinyurl.com/mgj6et). Using operator precedence could be an interesting solution to solve inherent textile ambiguities but it does not remove the curse of tokenizing. I'll keep this operator precedence idea in mind when writing a regexp based prototype. G. On Wed, Jun 17, 2009 at 12:45 AM, Hugh Sasse wrote: > It sounds like you are not looking for alternative techniques any > more. That's good, but I've found Top Down Operator Precedence > parsers, as explained in Chapter 9 of "Beautiful Code" [Oram and > Wilson, O'Reilly] particularly easy to understand, and relatively OK > to debug as well. ?The chapter is an implementation and explanation > of a paper by Vaughan Pratt in 1973, but it's an ACM paper, so access > to it may be a problem. > > ? ? ? ?HTH > ? ? ? ?Hugh > > On Tue, 16 Jun 2009, Jason Garber wrote: > >> Yes, you can use Treetop for the interpreter pattern out of the box, but I'm >> doing following the visitor pattern like Cucumber did. ?Just started working >> on the visitor, actually. ?And my posts to the treetop list are a little >> old?I've moved past the problems I described there?so don't read them as the >> current state of things. >> >> Hope I'm using all the terminology right. ?I'm new to all this (no CompSci >> background). >> >> On Jun 16, 2009, at 4:29 PM, Gaspard Bucher wrote: >> >> > I followed some of the discussions on the treetop mailing list and it >> > sure looks great. I'll have a look at the subset you are parsing and >> > if that's not too much, I'll try an Oniguruma based parser. >> > >> > > From what I understand, Treetop builds an abstract syntax tree with >> > rendering methods attached. This means that the parser knows about >> > everything there is to know about parsing *and* rendering (Interpreter >> > pattern). >> > >> > I'll try to build something that could be totally agnostic of what we >> > actually do with the AST (or S-expression), may it be pdf, html, latex >> > or other SM gears (Visitor pattern). >> > >> > Gaspard >> > >> > >> > On Tue, Jun 16, 2009 at 10:02 PM, Jason Garber wrote: >> > > Gaspard (and others who are interested), >> > > I've been researching some tools in this domain to figure out what they're >> > > good at and what they're not. ?I like the advantages of parsing expression >> > > grammars (PEGs) over regular expressions so I started a test rewrite with >> > > Treetop to find out what the gotchas are (there are always gotchas). ?I've >> > > hit surprisingly few and mostly they were indicative that I was thinking >> > > about the problem wrongly. ?Overall it's going great, it's fast (so far) >> > > and >> > > very elegant. ?I can get down and spec individual parts of the grammar or >> > > even individual rules! >> > > I'm interested to see what you would come up with using contextual regexps >> > > like you mentioned for the zafu parser. ?I'd welcome you to build a little >> > > prototype like I'm building with Treetop >> > > (http://github.com/jgarber/redcloth-treetop). ?If yours implements the >> > > same >> > > subset of Textile, then we can do some benchmarking, compare the grammars >> > > and so forth. ?Don't feel obligated?just know that I'm open-minded about >> > > it. >> > > I think I'd really like to stay away from C/Java unless someone >> > > demonstrates >> > > that it's the only way to parse with any speed. ?My experience so far with >> > > Treetop indicates a Ruby parser can be fast enough (and if speed were >> > > really >> > > more important to us than some other factors, we wouldn't have picked Ruby >> > > in the first place!). >> > > Either way, I also agree that it needs to be durable over the long-term. >> > > I'm making websites in Textile right and left and I can't be updating them >> > > every time the parser changes. ?I think lots of unit tests are the only >> > > sure >> > > way to prevent regression (like happened in 4.2.0) and I've been very >> > > pleased with how I can spec the behavior of individual rules and grammars >> > > in >> > > my treetop prototype. >> > > So, if you want to start a prototype regexp parser, that's cool, or if you >> > > like what you see on my prototype project, you're welcome to join that as >> > > well. >> > > Regards, >> > > Jason >> > > On Jun 11, 2009, at 2:25 PM, Gaspard Bucher wrote: >> > > >> > > For benchmarking, I think it could make sense to have some >> > > representative data of a typical application. This might mean large >> > > amounts of pure text without any textile tags (except paragraphs), or >> > > it might be a mixture of very short texts with a few very long ones. I >> > > could give you the data from http://zenadmin.org, but the links and >> > > images are not very typical: >> > > >> > > "":45 (internal link to node 45) >> > > >> > > Want to do a little more research on it myself first, though. >> > > >> > > Of course ! Please continue to give us feedback. This kind of >> > > experimenting can provide very useful information on the different >> > > parsing solutions. >> > > >> > > G. >> > > _______________________________________________ >> > > Redcloth-upwards mailing list >> > > Redcloth-upwards at rubyforge.org >> > > http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > > >> > > >> > > _______________________________________________ >> > > Redcloth-upwards mailing list >> > > Redcloth-upwards at rubyforge.org >> > > http://rubyforge.org/mailman/listinfo/redcloth-upwards >> > > >> > _______________________________________________ >> > Redcloth-upwards mailing list >> > Redcloth-upwards at rubyforge.org >> > http://rubyforge.org/mailman/listinfo/redcloth-upwards >> >> _______________________________________________ >> Redcloth-upwards mailing list >> Redcloth-upwards at rubyforge.org >> http://rubyforge.org/mailman/listinfo/redcloth-upwards > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards > From gaspard at teti.ch Wed Jun 17 09:50:38 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Wed, 17 Jun 2009 15:50:38 +0200 Subject: strange tests Message-ID: <7a9f744a0906170650p4fa801f6r64fdc0ab4d558a6f@mail.gmail.com> Hi Jason ! I am slowly building a parser and found the following specs: "should not include trailing double-asterisk in a word if the next char is a space": src: "yow** " htm: "yow" "should not include trailing double-asterisk in a word if the next char is EOF": src: "yow**" htm: "yow" "should include trailing asterisk in a word if the next char is a double-asterisk": src: "yow***" htm: "yow*" "should include trailing double-asterisk in a word if the next char is a double-asterisk": src: "yow****" htm: "yow**" I just don't understand why we should eat the trailing "**". This sounds like over-engineering... Please tell me the reason for this (conditional) stripping. Gaspard From gaspard at teti.ch Wed Jun 17 16:51:52 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Wed, 17 Jun 2009 22:51:52 +0200 Subject: textile... Message-ID: <7a9f744a0906171351h493fd934y8d1f363b4bfbc278@mail.gmail.com> Hi list ! >From the tests provided by Jason in his treetop experiment I started to get the feeling that it can be very hard to write a token based parser for textile since sometimes you need to know all the characters of the current line to know if "**" actually start a bold sentence. For example: This **is not a bold sentence, even if you would think it is, it's not **. Believe me. I had a look at the original implementation of textile in PHP and they actually run tons of very complicated regexps. This is an extract to parse inline elements (bold, em, ...): http://gist.github.com/131486. All this to say that I think I will abort my "move forward" parser solution and will try another route: the "split" parser: 1. split text into paragraphs/tables 2. split paragraphs into inline elements (loop until no more split) 3. split inline elements into links, etc 4. continue spliting and replacing This is the fastest way I can imagine to parse elegantly something like textile. I'll let you know when I have a prototype... Gaspard PS: forget about my other message on word processing, I actually did not understand that these specs were only related to some internal word parser. From gaspard at teti.ch Thu Jun 18 08:01:37 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Thu, 18 Jun 2009 14:01:37 +0200 Subject: divide and conquer Message-ID: <7a9f744a0906180501r78597bafn403ec833a4d22fa1@mail.gmail.com> Hi list (and Jason) ! I have a prototype parser that uses a Regexp based "Divide and Conquer" pattern to parse textile: http://github.com/gaspard/redcloth-regexp/tree/master This parser currently only parses simple 'list', 'strong', 'em' and 'bold' but it is very easy to extend and adapt. To give you an idea of how this thing works: 1. take a string 2. try to match first regular expression from context (if you are in :main and :main => [:p, :bold], the first regexp is defined by :p => ..) 3. if the pattern matches, insert a placeholder and scan matched text in the new context (:p). 4. when you cannot match (no more re in context list), unfold by expanding text to an S-expression tree Example: "hello _em and *strong*_" match regular expression associated with :em => "hello @@=9347=@@" scan matched content in :em context => "em and *strong*" matches :strong => "em and @@=9350=@@" no match in "strong" expand in :strong context ==> [:strong, "strong"] expand in :em context ==> [:em, "em and ", [:strong, "strong"]] expand in :main ==> [:main, "hello ", [:em, "em and ", [:strong, "strong"]]] Let me know what you think. Gaspard From jg at jasongarber.com Fri Jun 19 08:59:10 2009 From: jg at jasongarber.com (Jason Garber) Date: Fri, 19 Jun 2009 08:59:10 -0400 Subject: Away Message-ID: Hey everyone. Just a quick note to say I'm reading your messages with interest but I can't respond right now. Out of town and busy... Brother's wedding tomorrow and an aunt's two days after. Four family reunions in between. Look forward to getting back and catching up. Best, Jason Sent from my iPhone. From murphy at rubychan.de Sat Jun 20 11:28:30 2009 From: murphy at rubychan.de (Kornelius Kalnbach) Date: Sat, 20 Jun 2009 17:28:30 +0200 Subject: Problem with @[lang]...@ in RedCloth 4.2.0 In-Reply-To: <90498079-6FFB-4688-BBD6-99C4659E24C3@jasongarber.com> References: <90498079-6FFB-4688-BBD6-99C4659E24C3@jasongarber.com> Message-ID: <4A3D001E.3080006@rubychan.de> hello! the latest update of RedCloth broke a feature that I was using: $ echo '@[ruby]puts "Hello, World!"@' | redcloth _4.1.9_

puts "Hello, World!"

vs. $ echo '@[ruby]puts "Hello, World!"@' | redcloth _4.2.0_

[ruby]puts "Hello, World!"

the coderay/for_redcloth extension used the old syntax to enable syntax highlighting for @ and bc. code blocks. feature or bug? [murphy] From jonathan at parkerhill.com Sun Jun 21 02:08:40 2009 From: jonathan at parkerhill.com (Jonathan Linowes) Date: Sun, 21 Jun 2009 02:08:40 -0400 Subject: RedCloth 4.2.0 released In-Reply-To: <7DBE11CC-0233-4219-A276-810D79F9DCAC@parkerhill.com> References: <90498079-6FFB-4688-BBD6-99C4659E24C3@jasongarber.com> <7DBE11CC-0233-4219-A276-810D79F9DCAC@parkerhill.com> Message-ID: also, I've started getting this warning, can't say for certain its from my upgrade from 4.1.9 to 4.2.1 but i'm pretty sure it is. Any ideas why? uninitialized constant Gem::Specification::PLATFORM_CROSS_TARGETS From manuel at ethicalsoftware.it Thu Jun 25 10:18:01 2009 From: manuel at ethicalsoftware.it (stuefer manuel) Date: Thu, 25 Jun 2009 16:18:01 +0200 Subject: access to helper method Message-ID: <1245939481.26789.2.camel@butterfly> Hi, does someone of you ever tried to put some helper methods in database with some other text like this: h1. title <%= helper(id) %> description and get an output using RedCloth?! what I need is that the helper function would be elaborated .. not only displayed as html mS From gaspard at teti.ch Thu Jun 25 10:32:46 2009 From: gaspard at teti.ch (Gaspard Bucher) Date: Thu, 25 Jun 2009 16:32:46 +0200 Subject: access to helper method In-Reply-To: <1245939481.26789.2.camel@butterfly> References: <1245939481.26789.2.camel@butterfly> Message-ID: <7a9f744a0906250732t183da6efi4128eec4afa0eb4d@mail.gmail.com> I am investigating something similar for zena using RubyLess (direct parsing of ruby from db is *dangerous* to say the least): http://tinyurl.com/mhw5ag. While investigating on new ways to parse, extracting things such as "<%= ... %>" or "{{... }}" is definitely an option. For the moment, I just double parse the text and only extract "[label] ... [/label]" in a two step process: 1. remove [label]...[/label] from text and replace with a placeholder "===38475===" 2. parse with redcloth 3. replace placeholders by actual content Gaspard On Thu, Jun 25, 2009 at 4:18 PM, stuefer manuel wrote: > Hi, does someone of you ever tried to put some helper methods in > database with some other text like this: > > ? ? ? ?h1. title > ? ? ? ?<%= helper(id) %> > ? ? ? ?description > > and get an output using RedCloth?! what I need is that the helper > function would be elaborated .. not only displayed as html > > mS > > > > _______________________________________________ > Redcloth-upwards mailing list > Redcloth-upwards at rubyforge.org > http://rubyforge.org/mailman/listinfo/redcloth-upwards >