From mental at rydia.net Fri Nov 25 15:30:14 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 15:30:14 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> Message-ID: <1132950615.6830.363.camel@localhost.localdomain> On Fri, 2005-11-25 at 10:53 -0800, Terence Parr wrote: > Cool. Was wondering about that. Let's do in Java (only stable v3 > target at the moment) and then dump ASTs to disk and then compare. Yeah, that would definitely be a lot easier. Let's do it. I guess we need: 1. an on-disk format to dump the ASTs to 2. a program that dumps using the existing YACC grammar 3. a java program that dumps using the ANTLR grammar 4. a program in (language-of-choice) which compares dumped ASTs I can do #2, you will probably want to do #3, and we can probably defer #4 until we've got things reasonably close and get tired of visual inspection. If we're careful about whitespace plain diff(1) might do. Unfortunately I've found that MetaRuby's ParseTree isn't going to be comprehensive enough. It just does individual methods, not arbitrary fragments or whole scripts. However, looking at Ruby's source, it apparently has a "ripper" build configuration which appears to dump parse trees somehow. I'll investigate that, as it may dictate the ultimate format we use for #1. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/3acf979d/attachment.bin From parrt at cs.usfca.edu Fri Nov 25 16:10:39 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 25 Nov 2005 13:10:39 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1132950615.6830.363.camel@localhost.localdomain> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> Message-ID: <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> On Nov 25, 2005, at 12:30 PM, MenTaLguY wrote: > On Fri, 2005-11-25 at 10:53 -0800, Terence Parr wrote: >> Cool. Was wondering about that. Let's do in Java (only stable v3 >> target at the moment) and then dump ASTs to disk and then compare. > > Yeah, that would definitely be a lot easier. > > Let's do it. > > I guess we need: > > 1. an on-disk format to dump the ASTs to > > 2. a program that dumps using the existing YACC grammar > > 3. a java program that dumps using the ANTLR grammar > > 4. a program in (language-of-choice) which compares dumped ASTs Yup. > I can do #2, you will probably want to do #3, and we can probably > defer > #4 until we've got things reasonably close and get tired of visual > inspection. If we're careful about whitespace plain diff(1) might do. Yep. I usually dump in lisp form ( root child1 child2 ... ) XML format works too. > Unfortunately I've found that MetaRuby's ParseTree isn't going to be > comprehensive enough. It just does individual methods, not arbitrary > fragments or whole scripts. Well, that's cool. We'll check whole programs at a time. > However, looking at Ruby's source, it apparently has a "ripper" build > configuration which appears to dump parse trees somehow. I'll > investigate that, as it may dictate the ultimate format we use for #1. Ah. Sure. Dumping in any format from an AST is easy. Ter From mental at rydia.net Fri Nov 25 17:56:19 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 17:56:19 -0500 Subject: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project Message-ID: <1132959380.6830.461.camel@localhost.localdomain> === HELLO? Hi there. With all this talk lately about "Ruby grammar this" and "Ruby grammar that", I'd like to take this moment to announce the newly-formed The Ruby Grammar Project. === WHAT? The The Ruby Grammar Project aims to: 1. Develop an ANTLR grammar as an alternative to the YACC grammar which is ruining everybody's parties these days. 2. Draw up a formal specification of Ruby's grammar and keep it up-to-date. Especially for people who are making Ruby implementations, this should be two shades of awesome. === BUT? It doesn't aim to: 1. Innovate the Ruby grammar. 2. Provide a forum for debating features of the Ruby grammar. We're descriptivists. Don't like ->{} ? Don't look at us. If matz says we get stabby blocks, we get stabby blocks. === WHO? YOU. We are the Grammarians. Join us! You want to be on the winning side, right? Our shadow operatives have already successfully captured Doctor Terence Parr, famed parser researcher and primary author of ANTLR. Soon, the world shall tremble before our orbiting space station of DOOM! Wait, wrong script. If Ruby's grammar interests you, I'd encourage you to come along and pitch in. Look, we even have our own mailing list: http://rubyforge.org/mailman/listinfo/rubygrammar-grammarians Right now one of the big things on the table is yanking 'ripper' (http://rubyforge.org/projects/ripper/) and fashioning from it a test harness for the ANTLR grammar. Specifically, the important part is dumping the parse tree from ripper into a file. YAML maybe. Pretty easy, I think? But we could use an extra pair of hands to do it. === WHY? Everybody's been talking about Ruby's grammar lately. Seems like there's a lot of interest, but all the different isolated groups of people just aren't talking to one another much. Trains and buses passing in the night. I want to set up a place for this Ruby grammar stuff. So, for example, rather than wistfully remarking about how nice it would be to have a better alternative to the rather inscrutable YACC grammar, we can actually do something. I firmly believe that every idea has an appointed time, whose time will come. And things. But what if that time doesn't come? That almost happened once. The invention of the aeroplane was delayed and delayed until the cosmic committee finally noticed we were long overdue. But the pressure was already built-up and rather than the aeroplane neatly emerging from the mind of some cranky but affable old genius named Norberto Parks (who liked to walk around town on Tuesdays and also the smell of gingerbread), it exploded all over the world: Mozhaiski, Ader. Whitehead, Gilmore, Pearce. Jatho, Wright (Wrights?), Vuia, Ellehammer and Santos-Dumont. But what if it hadn't? Soon, the pressure should have built up to the point where all the world's supply of genii would have been consumed by a sudden and spontaneous explosion of aeroplanery. Chaos! Fully-formed biplanes tumbling from the hedgerows! It is clear from the rising buzz about reformulations of Ruby's grammar that we are once again nearing such a crisis point. Unless someone springs into action soon, I fear Rubyists all over the world will begin collapsing at their keyboards, half-tokenised grammars spilling from their ears! This is for our own safety, people. Love, -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/82332ce8/attachment.bin From mental at rydia.net Fri Nov 25 18:11:17 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 18:11:17 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> Message-ID: <1132960279.6830.468.camel@localhost.localdomain> On Fri, 2005-11-25 at 13:10 -0800, Terence Parr wrote: > Yep. I usually dump in lisp form > > ( root child1 child2 ... ) Ah, nuts. Yes. I should have suggested s-expressions in the announcement. Though I'll probably end up doing the ripper thing personally anyway. I'm hoping I can spend more time familiarizing myself with ANTLR though. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/bd32d5eb/attachment.bin From parrt at cs.usfca.edu Fri Nov 25 18:19:59 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 25 Nov 2005 15:19:59 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1132960279.6830.468.camel@localhost.localdomain> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> Message-ID: <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> On Nov 25, 2005, at 3:11 PM, MenTaLguY wrote: > On Fri, 2005-11-25 at 13:10 -0800, Terence Parr wrote: >> Yep. I usually dump in lisp form >> >> ( root child1 child2 ... ) > > Ah, nuts. Yes. I should have suggested s-expressions in the > announcement. Though I'll probably end up doing the ripper thing > personally anyway. I'm hoping I can spend more time familiarizing > myself with ANTLR though. Kewl. If I knew ruby syntax, we'd be done with the grammar now. : ( I just finished my first script to build and test my v3 ANTLR examples. Works great! Amazing...Java collections seem so bare in comparison now. Back to closure blocks...oh smalltalk how I missed you! ;)) If I had a canonical list of constructs including tricky examples, it would really help. Ter From mental at rydia.net Fri Nov 25 19:44:57 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 19:44:57 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> Message-ID: <1132965899.6830.482.camel@localhost.localdomain> On Fri, 2005-11-25 at 15:19 -0800, Terence Parr wrote: > Kewl. If I knew ruby syntax, we'd be done with the grammar now. : > ( I just finished my first script to build and test my v3 ANTLR > examples. Works great! Amazing...Java collections seem so bare in > comparison now. Back to closure blocks...oh smalltalk how I missed > you! ;)) Ah, you're just scratching the surface. I'll warn you -- from here on it can get addictive. :) > If I had a canonical list of constructs including tricky examples, it > would really help. Hmm, I'm hoping some of the grammarian-newcomers can help out with that, but to start off here are some bits with method calls: First: foo bar baz (where foo and bar are method names) parses the same as: foo( bar( baz ) ) Note that omitting the parenthesis for more than one function in the "stack" is deprecated, however. Next: def foo zort = 3 # assign to variable blah = zort # assign to variable from variable ... end versus: def foo blah = zort # assign to variable from method result zort = 3 # assign to variable ... end I am not sure that 'blah = zort' is parsed differently in these cases, however; that subtree may look the same in the AST and simply get interpreted differently. Next: foo 1, 2, 3 # ok foo(1, 2, 3) # ok bar(1, 2, 3) { ... } # ok bar 1, 2, 3 { ... } # parse error bar 1, 2, baz { ... } # parses as bar(1, 2, baz { ... }) -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/a93f96f9/attachment.bin From mental at rydia.net Fri Nov 25 20:13:13 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 20:13:13 -0500 Subject: [Rubygrammar-grammarians] Welcome newcomers! Message-ID: <1132967593.6830.488.camel@localhost.localdomain> I see a number of folks have joined since the announcement. I've got an assignment for y'all, if you're interested: find all the weird places in Ruby's grammar. Corner cases. The places where a space or a carriage return between two tokens or something else minor makes the difference between working and not working. Game? Just post back to the list with any you think of. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/36961e5f/attachment.bin From mfp at acm.org Fri Nov 25 21:36:30 2005 From: mfp at acm.org (Mauricio Fernandez) Date: Sat, 26 Nov 2005 03:36:30 +0100 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1132965899.6830.482.camel@localhost.localdomain> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> Message-ID: <20051126023630.GB13981@tux-chan> ???????Fernandez?????? On Fri, Nov 25, 2005 at 07:44:57PM -0500, MenTaLguY wrote: [reordered for clarity of exposition] > I am not sure that 'blah = zort' is parsed differently in these cases, > however; that subtree may look the same in the AST and simply get > interpreted differently. IIRC they result in > def foo zort = 3 # assign to variable => NODE_LASGN(ID:zort, INT2FIX(3), ...) > blah = zort # assign to variable from variable => NODE_LASGN(ID:blah, NODE_LVAR(zort), ...) > end > versus: > > def foo > blah = zort # assign to variable from method result => NODE_LASGN(ID:blah, NODE_VCALL(0,ID:zort,0), ...) > zort = 3 # assign to variable => NODE_LASGN(ID:zort, INT2FIX(3), ...) > ... > end > ?????????????? -- Mauricio Fernandez From mental at rydia.net Fri Nov 25 23:17:08 2005 From: mental at rydia.net (MenTaLguY) Date: Fri, 25 Nov 2005 23:17:08 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <20051126023630.GB13981@tux-chan> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> Message-ID: <1132978629.6830.503.camel@localhost.localdomain> On Sat, 2005-11-26 at 03:36 +0100, Mauricio Fernandez wrote: > ???????Fernandez?????? > On Fri, Nov 25, 2005 at 07:44:57PM -0500, MenTaLguY wrote: > [reordered for clarity of exposition] > > I am not sure that 'blah = zort' is parsed differently in these cases, > > however; that subtree may look the same in the AST and simply get > > interpreted differently. > > IIRC they result in > > > def foo > zort = 3 # assign to variable > => NODE_LASGN(ID:zort, INT2FIX(3), ...) > > blah = zort # assign to variable from variable > => NODE_LASGN(ID:blah, NODE_LVAR(zort), ...) > > end Ah, excellent. Thanks. So once the variable is introduced into the scope, the AST generated for its bare name is in fact different. What's the NODE_LASGN() notation from, by the way? -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051125/bdb95923/attachment-0001.bin From chneukirchen at gmail.com Sat Nov 26 06:27:30 2005 From: chneukirchen at gmail.com (Christian Neukirchen) Date: Sat, 26 Nov 2005 12:27:30 +0100 Subject: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project In-Reply-To: <1132959380.6830.461.camel@localhost.localdomain> (MenTaLguY's message of "Sat, 26 Nov 2005 07:56:01 +0900") References: <1132959380.6830.461.camel@localhost.localdomain> Message-ID: MenTaLguY writes: > The The Ruby Grammar Project aims to: > > 1. Develop an ANTLR grammar as an alternative to the YACC grammar which > is ruining everybody's parties these days. In general, a good idea. However, can there be made a pure-Ruby parser from an ANTLR grammar? > -mental -- Christian Neukirchen http://chneukirchen.org From puellula at gmail.com Sat Nov 26 12:24:04 2005 From: puellula at gmail.com (Sara) Date: Sat, 26 Nov 2005 18:24:04 +0100 Subject: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project References: <1132959380.6830.461.camel@localhost.localdomain> Message-ID: <018c01c5f2ae$3520cae0$6401a8c0@trudy> ----- Original Message ----- From: "Christian Neukirchen" To: Cc: Sent: Saturday, November 26, 2005 12:27 PM Subject: Re: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project > MenTaLguY writes: > >> The The Ruby Grammar Project aims to: >> >> 1. Develop an ANTLR grammar as an alternative to the YACC grammar which >> is ruining everybody's parties these days. > > In general, a good idea. However, can there be made a pure-Ruby > parser from an ANTLR grammar? Hi :) I'm realizing this for a little part of Ruby Grammar. bye, Sara > >> -mental > -- > Christian Neukirchen http://chneukirchen.org > _______________________________________________ > Rubygrammar-grammarians mailing list > Rubygrammar-grammarians at rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygrammar-grammarians From mental at rydia.net Sat Nov 26 12:42:57 2005 From: mental at rydia.net (MenTaLguY) Date: Sat, 26 Nov 2005 12:42:57 -0500 Subject: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project In-Reply-To: References: <1132959380.6830.461.camel@localhost.localdomain> Message-ID: <1133026978.6830.526.camel@localhost.localdomain> On Sat, 2005-11-26 at 20:27 +0900, Christian Neukirchen wrote: > MenTaLguY writes: > > > The The Ruby Grammar Project aims to: > > > > 1. Develop an ANTLR grammar as an alternative to the YACC grammar which > > is ruining everybody's parties these days. > > In general, a good idea. However, can there be made a pure-Ruby > parser from an ANTLR grammar? Do you mean: 1. Is it possible to write a complete grammar for Ruby in ANTLR, given how squirrely its (Ruby's) syntax is? 2. Does ANTLR have a backend that can generate parsers in Ruby? The answers are: 1. Yes. Ter (who is primarily writing the grammar) seems pretty confident that if YACC can do it, ANTLR can do it. Given he's a guy on the forefront of parser research, I think he can make it happen. 2. Not yet but eventually. The grammar will be for the new ANTLR v3, for which most backends are still being written. I'm working on the Ruby backend, though that's under the aegis of the ANTLR project rather than the The Ruby Grammar Project. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/c57a7593/attachment.bin From parrt at cs.usfca.edu Sat Nov 26 14:49:30 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 26 Nov 2005 11:49:30 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1132965899.6830.482.camel@localhost.localdomain> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> Message-ID: <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> On Nov 25, 2005, at 4:44 PM, MenTaLguY wrote: Howdy....thanks for the examples... > Hmm, I'm hoping some of the grammarian-newcomers can help out with > that, > but to start off here are some bits with method calls: > > First: > > foo bar baz Which object is foo a method of? Is that the "main" thing I keep reading about? foo returns an object that bar is a method of? > (where foo and bar are method names) parses the same as: > > foo( bar( baz ) ) Ah. So it calls bar(baz) first and then uses result as an arg to foo? That is very different from how'd I read it w/o the parens. Holy smokes. > Note that omitting the parenthesis for more than one function in the > "stack" is deprecated, however. Thank gawd. > Next: > > def foo > zort = 3 # assign to variable > blah = zort # assign to variable from variable > ... > end > > versus: > > def foo > blah = zort # assign to variable from method result > zort = 3 # assign to variable > ... > end > > I am not sure that 'blah = zort' is parsed differently in these cases, > however; that subtree may look the same in the AST and simply get > interpreted differently. Unless you have lazy evaluation, shouldn't the second version result in blah being nil? Regardless, the AST should just be syntax so you should see two assignment trees. One of the operands juts happens to be an INT. I'd generate (ASSIGN ID ID) and (ASSIGN ID INT) or some such. > Next: > > foo 1, 2, 3 # ok ack. > foo(1, 2, 3) # ok > bar(1, 2, 3) { ... } # ok > bar 1, 2, 3 { ... } # parse error > bar 1, 2, baz { ... } # parses as bar(1, 2, baz { ... }) Yes, that all seems ok. It's just methodCall : ID arglist ; arglist : "(" exprList ")" | exprList ; no sweat. Uh, 'cept for that wacky newline issue where you can have \n in the middle of an expression. I'll figure that part out later... Ter From parrt at cs.usfca.edu Sat Nov 26 14:54:03 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 26 Nov 2005 11:54:03 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <20051126023630.GB13981@tux-chan> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> Message-ID: <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> On Nov 25, 2005, at 6:36 PM, Mauricio Fernandez wrote: > IIRC they result in > >> def foo > zort = 3 # assign to variable > => NODE_LASGN(ID:zort, INT2FIX(3), ...) >> blah = zort # assign to variable from variable > => NODE_LASGN(ID:blah, NODE_LVAR(zort), ...) >> end That makes perfect sense. >> def foo >> blah = zort # assign to variable from method result > => NODE_LASGN(ID:blah, NODE_VCALL(0,ID:zort,0), ...) >> zort = 3 # assign to variable > => NODE_LASGN(ID:zort, INT2FIX(3), ...) >> ... >> end Ah. So the AST construction treats referenced values as method calls? Interesting. So the AST construction routine has to have flow analysis to see if something is defined on the path of that execution? That doesn't sound like a very easy approach. Imagine if ... zort=3 end blah = zort In this case, which tree do you generate? The flow analysis is the only thing that can answer that...something usually WAY further down the compilation pipe. Well, I'm going to build a grammar. If people want to use it for doing Ruby tools like refactoring engines, cool. If people need it to replace the front-end of the Ruby interpreter, seems like there is much more involved than just a grammar with AST construction. 'course you'd start with a grammar and add actions... Ter From parrt at cs.usfca.edu Sat Nov 26 14:55:02 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 26 Nov 2005 11:55:02 -0800 Subject: [Rubygrammar-grammarians] [ANN] The Ruby Grammar Project In-Reply-To: References: <1132959380.6830.461.camel@localhost.localdomain> Message-ID: <36B9AFCA-8580-477A-9709-74E7BBB0699B@cs.usfca.edu> On Nov 26, 2005, at 3:27 AM, Christian Neukirchen wrote: > MenTaLguY writes: > >> The The Ruby Grammar Project aims to: >> >> 1. Develop an ANTLR grammar as an alternative to the YACC grammar >> which >> is ruining everybody's parties these days. > > In general, a good idea. However, can there be made a pure-Ruby > parser from an ANTLR grammar? Sure. Once a backend for ANTLR v3 is available that dumps Ruby. :) Ter From ryand-ruby at zenspider.com Sat Nov 26 15:46:23 2005 From: ryand-ruby at zenspider.com (Ryan Davis) Date: Sat, 26 Nov 2005 12:46:23 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> Message-ID: On Nov 26, 2005, at 11:49 AM, Terence Parr wrote: >> First: >> >> foo bar baz > > Which object is foo a method of? Is that the "main" thing I keep > reading about? foo returns an object that bar is a method of? > >> (where foo and bar are method names) parses the same as: >> >> foo( bar( baz ) ) > > Ah. So it calls bar(baz) first and then uses result as an arg to > foo? That is very different from how'd I read it w/o the parens. > Holy smokes. Hey Terr, long time no email... It has been many many years. I think I can help here. 1) yes, ruby has ties smalltalk, but nowhere near the grammar side of things. You'll just have to let go of that. It is much more of a loose java grammar than it is a smalltalk grammar. 2) I have tools to help this exploration process along quite easily: 509 % echo "foo bar baz" | parse_tree_show -f (eval):1: warning: parenthesize argument(s) for future version [[:class, :Example, :Object, [:defn, :example, [:scope, [:block, [:args], [:fcall, :foo, [:array, [:fcall, :bar, [:array, [:vcall, :baz]]]]]]]]]] This is in my package ParseTree, which you can easily install via 'sudo gem install -y ParseTree' assuming you have rubygems installed. parse_tree_show displays ruby's internal AST (at the class/method level) in a more digestible form. It doesn't do any parsing on it's own. The '-f' part of it stands for fast-mode or fragment-mode and lets you pipe in snippets of ruby w/o having a whole file/class/ method to play with. --- I tried doing an LR2LL flip on ruby about 2 years ago and got stymied. It wasn't nearly as easy as doing smalltalk and I can't keep the whole grammar in my head. I've started working on a tool for grammar exploration/experimentation (tentatively called yaccpuke or GIT - grammar inspection toolkit). It hopes to do only a few things, but do them well: 1) provide a simple DSL for describing grammars by hand ( from smalltalk: bod [ uod | be ] ) 2) ability to read in y.output (stripped grammar from yacc when run w/ -v) 3) provide an interactive session (via irb) with api to explore and manipulate grammar rules ( grammar.cycles? and grammar[:bod].replace (:be)) I've got 1 & 2 down and am extending 3 right now. I'm hoping that when this tool is done enough it'll provide me the extra brainpower to do the LR2LL flip on ruby's yacc based grammar. I haven't been tracking what you've been up to Terr. I stopped using Antlr in the early v2 days (mostly because I stopped coding in java and C++ as much as possible) so I don't know if you've done more work in this area. --- I should also point out the work being done by Don Roberts and John Brant (of refactoring browser fame). They are working on a suite of tools that can be taught a new language and then spit out a new refactoring browser for that language. They said that in about a month (about now actually) they'd be done with C# and able to work on back-porting the system into squeak. They have a lot of experience in this area and might want to bite off a chunk as well. At the very least, the sooner we can get the grammar into a more digestible form, the sooner we can have a feature-complete refactoring browser! From mtraverso at gmail.com Sat Nov 26 17:04:37 2005 From: mtraverso at gmail.com (Martin Traverso) Date: Sat, 26 Nov 2005 14:04:37 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> Message-ID: <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> Here's another tricky one: parsing double quoted strings, especially when they use the %Q syntax. First, strings can contain nested ruby expressions (full programs, actually): "1 + 2 = #{1 + 2}" -> 1 + 2 = 3 Double-quoted strings can also be constructed with % or %Q and an delimiter character, according to these rules (from the Pickaxe book). "Following the type character is a delimiter, which can be any nonalphabetic or nonmultibyte character. If the delimiter is one of the characters (, [, {, or <, the literal consists of the characters up to the matching closing delimiter, taking account of nested delimiter pairs. For all other delimiters, the literal comprises the characters up to the next occurrence of the delimiter character." Here are a few of examples of valid strings: %Q/1 + 2 = #{ 1 + 2 }/ # -> 1 + 2 = 3 %/1 + 2 = #{ 1 + 2 }/ %{1 + 2 = #{ 1 + 2 }} %(1 + #{ %(2 = #{ 1 + 2 })}) %((1 + 2) * 3 = 9) # -> (1 + 2) * 3 = 9 Martin On 11/26/05, Terence Parr wrote: > > > On Nov 25, 2005, at 4:44 PM, MenTaLguY wrote: > > Howdy....thanks for the examples... > > > Hmm, I'm hoping some of the grammarian-newcomers can help out with > > that, > > but to start off here are some bits with method calls: > > > > First: > > > > foo bar baz > > Which object is foo a method of? Is that the "main" thing I keep > reading about? foo returns an object that bar is a method of? > > > (where foo and bar are method names) parses the same as: > > > > foo( bar( baz ) ) > > Ah. So it calls bar(baz) first and then uses result as an arg to > foo? That is very different from how'd I read it w/o the parens. > Holy smokes. > > > Note that omitting the parenthesis for more than one function in the > > "stack" is deprecated, however. > > Thank gawd. > > > Next: > > > > def foo > > zort = 3 # assign to variable > > blah = zort # assign to variable from variable > > ... > > end > > > > versus: > > > > def foo > > blah = zort # assign to variable from method result > > zort = 3 # assign to variable > > ... > > end > > > > I am not sure that 'blah = zort' is parsed differently in these cases, > > however; that subtree may look the same in the AST and simply get > > interpreted differently. > > Unless you have lazy evaluation, shouldn't the second version result > in blah being nil? > > Regardless, the AST should just be syntax so you should see two > assignment trees. One of the operands juts happens to be an INT. > I'd generate > > (ASSIGN ID ID) > > and > > (ASSIGN ID INT) > > or some such. > > > Next: > > > > foo 1, 2, 3 # ok > > ack. > > > foo(1, 2, 3) # ok > > bar(1, 2, 3) { ... } # ok > > bar 1, 2, 3 { ... } # parse error > > bar 1, 2, baz { ... } # parses as bar(1, 2, baz { ... }) > > Yes, that all seems ok. It's just > > methodCall : ID arglist ; > arglist : "(" exprList ")" | exprList ; > > no sweat. Uh, 'cept for that wacky newline issue where you can have > \n in the middle of an expression. I'll figure that part out later... > > Ter > > _______________________________________________ > Rubygrammar-grammarians mailing list > Rubygrammar-grammarians at rubyforge.org > http://rubyforge.org/mailman/listinfo/rubygrammar-grammarians > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/29dec60f/attachment-0001.htm From parrt at cs.usfca.edu Sat Nov 26 18:27:02 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 26 Nov 2005 15:27:02 -0800 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132807699.6830.266.camel@localhost.localdomain> <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> Message-ID: On Nov 26, 2005, at 12:46 PM, Ryan Davis wrote: > Hey Terr, long time no email... It has been many many years. Howdy do! Long time... > I think I can help here. 1) yes, ruby has ties smalltalk, but > nowhere near the grammar side of things. You'll just have to let go > of that. It is much more of a loose java grammar than it is a > smalltalk grammar. Ok, i'm letting go :) > 2) I have tools to help this exploration process along quite easily: > > 509 % echo "foo bar baz" | parse_tree_show -f > (eval):1: warning: parenthesize argument(s) for future version > [[:class, > :Example, > :Object, > [:defn, > :example, > [:scope, > [:block, > [:args], > [:fcall, :foo, [:array, [:fcall, :bar, [:array, > [:vcall, :baz]]]]]]]]]] Looks cool/useful. Why the :array node? > This is in my package ParseTree, which you can easily install via > 'sudo gem install -y ParseTree' assuming you have rubygems installed. Not yet. > I tried doing an LR2LL flip on ruby about 2 years ago and got > stymied. It wasn't nearly as easy as doing smalltalk and I can't > keep the whole grammar in my head. I've started working on a tool > for grammar exploration/experimentation (tentatively called > yaccpuke or GIT - grammar inspection toolkit). It hopes to do only > a few things, but do them well: > > 1) provide a simple DSL for describing grammars by hand ( from > smalltalk: bod [ uod | be ] ) > 2) ability to read in y.output (stripped grammar from yacc when run > w/ -v) > 3) provide an interactive session (via irb) with api to explore and > manipulate grammar rules ( grammar.cycles? and grammar[:bod].replace > (:be)) Interesting. :) > I've got 1 & 2 down and am extending 3 right now. I'm hoping that > when this tool is done enough it'll provide me the extra brainpower > to do the LR2LL flip on ruby's yacc based grammar. > > I haven't been tracking what you've been up to Terr. I stopped > using Antlr in the early v2 days (mostly because I stopped coding > in java and C++ as much as possible) so I don't know if you've done > more work in this area. Well, ANTLR v3 is Soooo much more powerful and easy to use. Plus we have ANTLRWorks http://www.antlr.org/works now too, a grammar dev environment. > I should also point out the work being done by Don Roberts and John > Brant (of refactoring browser fame). They are working on a suite of > tools that can be taught a new language and then spit out a new > refactoring browser for that language. They said that in about a > month (about now actually) they'd be done with C# and able to work > on back-porting the system into squeak. They have a lot of > experience in this area and might want to bite off a chunk as well. > At the very least, the sooner we can get the grammar into a more > digestible form, the sooner we can have a feature-complete > refactoring browser! Yep, getting a nice easy-to-use grammar together should be useful. My main interest at the moment is making a real grammar for testing ANTLR v3. :) Ter From parrt at cs.usfca.edu Sat Nov 26 18:39:52 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 26 Nov 2005 15:39:52 -0800 Subject: [Rubygrammar-grammarians] embedded expressions and dynamic quotes In-Reply-To: <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> References: <322C399B-5087-46A4-906A-6F65A9D5E972@cs.usfca.edu> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> Message-ID: On Nov 26, 2005, at 2:04 PM, Martin Traverso wrote: > Here's another tricky one: parsing double quoted strings, > especially when they use the %Q syntax. > > First, strings can contain nested ruby expressions (full programs, > actually): > > "1 + 2 = #{1 + 2}" -> 1 + 2 = 3 I've seen this in groovy. This looks easier as the #{ clearly identifies when to start a new nested scope. This nested expression stuff is not that big of a deal in this case. Seems to me I made an example of this for the groovy guys...hmm...wonder what happened to that. When you are parsing a string, the #{ triggers an action that creates a ruby parser instance and calls the expr rule to match that on the input stream. Heh, I found it! Ok, here is some nested crap in comments: int i = 0; /** @author foo {{z=3; q=4;}} {yy=33;}*/ method foo() { int j = i; i = 4; } /** @author bar */ method zero() { return 0; } I used a simple java like language as an example with both embedded javadoc @author stuff and embedded expressions within the comments trigger by simple {...}. > Double-quoted strings can also be constructed with % or %Q and an > delimiter character, according to these rules (from the Pickaxe book). > > "Following the type character is a delimiter, which can be any > nonalphabetic or nonmultibyte > character. If the delimiter is one of the characters (, [, {, or <, > the literal > consists of the characters up to the matching closing delimiter, > taking account of nested > delimiter pairs. For all other delimiters, the literal comprises > the characters up to the > next occurrence of the delimiter character." Holy crap! That is VERY tough for a static lexer to deal with. I wonder if a semantic predicate will help us out here...hmm... > Here are a few of examples of valid strings: > > %Q/1 + 2 = #{ 1 + 2 }/ # -> 1 + 2 = 3 Wow. Might have to have the input scanner solve this. Yes, that would be easiest. Convert the start / stop to some bizarre char sequence unknown to ruby users. Then we effectively normalize these into something static the lexer can scarf. :) Ok, nothing impossible so far. Ter From mental at rydia.net Sat Nov 26 22:27:33 2005 From: mental at rydia.net (MenTaLguY) Date: Sat, 26 Nov 2005 22:27:33 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> Message-ID: <1133062053.6830.541.camel@localhost.localdomain> On Sat, 2005-11-26 at 11:54 -0800, Terence Parr wrote: > Ah. So the AST construction treats referenced values as method > calls? Interesting. So the AST construction routine has to have > flow analysis to see if something is defined on the path of that > execution? That doesn't sound like a very easy approach. Imagine > > if ... > zort=3 > end > blah = zort > > In this case, which tree do you generate? The flow analysis is the > only thing that can answer that...something usually WAY further down > the compilation pipe. Actually flow analysis isn't required. For example: if false zort = 3 end blah = zort Will result in zort being treated as a variable (with a value of nil, since it never gets anything assigned to it). For a bare identifier to be treated as a variable (rather than a method call), it's sufficient for an assignment to it to appear anywhere prior to that point within a method body. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/4b6aab2c/attachment.bin From mental at rydia.net Sat Nov 26 22:39:45 2005 From: mental at rydia.net (MenTaLguY) Date: Sat, 26 Nov 2005 22:39:45 -0500 Subject: [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1133062053.6830.541.camel@localhost.localdomain> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> Message-ID: <1133062785.6830.546.camel@localhost.localdomain> On Sat, 2005-11-26 at 22:27 -0500, MenTaLguY wrote: > For a bare identifier to be treated as a variable (rather than a method > call), it's sufficient for an assignment to it to appear anywhere prior > to that point within a method body. Sorry, _the_ method body. Assignments in other method bodies or outside the method body don't matter. Needless to say, (if I remember the interview correctly) the way implicit variable declaration happens in Ruby is near the top on matz' list of things-he-wishes-he-would-have-done-differently. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/3349067f/attachment.bin From mental at rydia.net Sat Nov 26 23:45:37 2005 From: mental at rydia.net (MenTaLguY) Date: Sat, 26 Nov 2005 23:45:37 -0500 Subject: [grammarians] new subject prefix Message-ID: <1133066738.6830.550.camel@localhost.localdomain> Rubygrammar-grammarians: now with a new, shorter subject prefix. Also comes in lime and raspberry-gorilla! -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/6a0a590f/attachment.bin From mental at rydia.net Sat Nov 26 23:57:47 2005 From: mental at rydia.net (MenTaLguY) Date: Sat, 26 Nov 2005 23:57:47 -0500 Subject: [grammarians] s-expression dumper: works Message-ID: <1133067468.6830.561.camel@localhost.localdomain> Ter, Given Ryan's recent post, I've revisited ParseTree and worked out that it can, in fact, be usefully used on entire files. Attached is a very crude script that dumps s-expressions using a more or less Lisp syntax. Just a few steps up from "unholy abomination" and "evil kludge", my script falls neatly under "ugly hack". But it does work for pretty much everything save perhaps __DATA__. First argument is the file to parse and to dump, second argument is the file to dump to. If the second argument is omitted, it dumps to stdout. I've also got this in our Rubyforge CVS, under the 'grammar-test' module. (I'm no fan of CVS, but it's what we've got available...) -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: dump-ruby.rb Type: application/x-ruby Size: 1064 bytes Desc: Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/9ab4ede0/dump-ruby.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051126/9ab4ede0/attachment.bin From mfp at acm.org Sat Nov 26 16:21:33 2005 From: mfp at acm.org (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Sat, 26 Nov 2005 22:21:33 +0100 Subject: [grammarians] [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> References: <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> Message-ID: <20051126212133.GA4643@tux-chan> On Sat, Nov 26, 2005 at 11:54:03AM -0800, Terence Parr wrote: > Ah. So the AST construction treats referenced values as method > calls? Interesting. So the AST construction routine has to have > flow analysis to see if something is defined on the path of that > execution? That doesn't sound like a very easy approach. Imagine No such analysis is needed, see below. > if ... > zort=3 > end > blah = zort > > In this case, which tree do you generate? The flow analysis is the > only thing that can answer that...something usually WAY further down > the compilation pipe. Ruby resolves it statically at parse time: if false zort = 3 end blah = zort # => NODE_LASGN(ID:blah, NODE_LVAR(zort)) Whenever the parser sees foo = ...., it treats foo from that point on as a local. def foo; "method" end a = foo # => "method" if false; foo = 1 end b = foo # => nil In the second case, the foo lvar is uninitialized. > Well, I'm going to build a grammar. If people want to use it for > doing Ruby tools like refactoring engines, cool. If people need it > to replace the front-end of the Ruby interpreter, seems like there is > much more involved than just a grammar with AST construction. > 'course you'd start with a grammar and add actions... Great news :) [I was going to write this in a separate msg -> ] As I see it, we need substantially different ASTs for the two major use cases * code tools (analysis, refactoring...) * interpretation The current AST, which is walked by a recursive eval function afterwards, is very interpretation-oriented: several node types have been introduced incrementally in order to change the language, fix bugs, etc. without altering (too much) eval.c. Also, the canonical nodes change all the time, so nothing can depend on them... I've often thought when reading parse.y that a refactoring tool would use a different AST. Does anybody share that feeling? So I was going to ask... what's the official goal of this project? Replacing the current YACC parser + handmade lexer (seems so based on the initial msgs about comparing the produced ASTs with the canonical ones) or creating the basis for refactoring etc. tools? Or will there be two sets of actions to produce two different ASTs? -- Mauricio Fernandez From mfp at acm.org Sun Nov 27 04:23:44 2005 From: mfp at acm.org (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Sun, 27 Nov 2005 10:23:44 +0100 Subject: [grammarians] [Rubygrammar-grammarians] embedded expressions and dynamic quotes In-Reply-To: References: <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> Message-ID: <20051127092344.GC4643@tux-chan> On Sat, Nov 26, 2005 at 03:39:52PM -0800, Terence Parr wrote: > On Nov 26, 2005, at 2:04 PM, Martin Traverso wrote: > > Double-quoted strings can also be constructed with % or %Q and an > > delimiter character, according to these rules (from the Pickaxe book). [...] > > Holy crap! That is VERY tough for a static lexer to deal with. I > wonder if a semantic predicate will help us out here...hmm... What about this then? puts "foo #{% #{< References: <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <20051126212133.GA4643@tux-chan> Message-ID: <1133095763.6830.568.camel@localhost.localdomain> On Sat, 2005-11-26 at 22:21 +0100, Mauricio Fern?ndez wrote: > So I was going to ask... what's the official goal of this project? > Replacing the current YACC parser + handmade lexer (seems so based on > the initial msgs about comparing the produced ASTs with the canonical > ones) or creating the basis for refactoring etc. tools? Or will there > be two sets of actions to produce two different ASTs? Well, for the short-term I just care about getting a grammar that matches any valid Ruby program. The nice thing about ANTLR is that we can probably get from there to a Ruby-style AST via a tree transformer rather than writing a completely new grammar. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/b394d474/attachment.bin From parrt at cs.usfca.edu Sun Nov 27 13:56:52 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 27 Nov 2005 10:56:52 -0800 Subject: [grammarians] [Rubygrammar-grammarians] Ruby Grammar Project In-Reply-To: <1133062053.6830.541.camel@localhost.localdomain> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> Message-ID: <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> On Nov 26, 2005, at 7:27 PM, MenTaLguY wrote: > For a bare identifier to be treated as a variable (rather than a > method > call), it's sufficient for an assignment to it to appear anywhere > prior > to that point within a method body. Good to know. So a scope is a lexical scope of the method? For the random code I type outside of a method, what "method" is it a part of? Where is there an informal description of scoping rules and general syntax? I'm beginning to think I should build a natural grammar from scratch rather than modify the yacc beast...that said, I do have it converted...it's just ugly. Jean Bovet's nice ANTLRWorks tool did the left-recursion removal with a single menu item. :) Ter From parrt at cs.usfca.edu Sun Nov 27 13:59:53 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 27 Nov 2005 10:59:53 -0800 Subject: [grammarians] s-expression dumper: works In-Reply-To: <1133067468.6830.561.camel@localhost.localdomain> References: <1133067468.6830.561.camel@localhost.localdomain> Message-ID: On Nov 26, 2005, at 8:57 PM, MenTaLguY wrote: > Ter, > > Given Ryan's recent post, I've revisited ParseTree and worked out that > it can, in fact, be usefully used on entire files. Nice! > Attached is a very crude script that dumps s-expressions using a > more or > less Lisp syntax. Just a few steps up from "unholy abomination" and > "evil kludge", my script falls neatly under "ugly hack". But it does > work for pretty much everything save perhaps __DATA__. Hmm...having a spot of trouble; can't seem to get past my comments: ~/antlr/code/examples-v3/java/ruby $ dump-ruby.rb ../extest /usr/local/bin/dump-ruby.rb: line 6: require: command not found /usr/local/bin/dump-ruby.rb: line 7: require_gem: command not found /usr/local/bin/dump-ruby.rb: line 9: syntax error near unexpected token `(' /usr/local/bin/dump-ruby.rb: line 9: `def parse( text )' ~/antlr/code/examples-v3/java/ruby $ head ../extest #!/usr/bin/ruby # # Build, compile, and run each example in this dir. # # Each dir must have at least one .g file and an input/output pair so # the script can check for valid output. [this is my first ruby script] # # TODO: make stderr go away # ... Ter From parrt at cs.usfca.edu Sun Nov 27 14:00:34 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 27 Nov 2005 11:00:34 -0800 Subject: [grammarians] [Rubygrammar-grammarians] embedded expressions and dynamic quotes In-Reply-To: <20051127092344.GC4643@tux-chan> References: <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> <20051127092344.GC4643@tux-chan> Message-ID: <5E781893-1039-4CD2-8248-F64790FC9557@cs.usfca.edu> On Nov 27, 2005, at 1:23 AM, Mauricio Fern?ndez wrote: >> Holy crap! That is VERY tough for a static lexer to deal with. I >> wonder if a semantic predicate will help us out here...hmm... > > What about this then? > > puts "foo #{% #{< 1 2 3 > 4 5 6 > E1 > blergh > E2 > > Output: > > foo 1 2 3 > 4 5 6 bar blergh baz Yikes! Can you parse that for me? Ter From mental at rydia.net Sun Nov 27 14:28:32 2005 From: mental at rydia.net (MenTaLguY) Date: Sun, 27 Nov 2005 14:28:32 -0500 Subject: [grammarians] s-expression dumper: works In-Reply-To: References: <1133067468.6830.561.camel@localhost.localdomain> Message-ID: <1133119713.24508.2.camel@localhost.localdomain> On Sun, 2005-11-27 at 10:59 -0800, Terence Parr wrote: > Hmm...having a spot of trouble; can't seem to get past my comments: > > ~/antlr/code/examples-v3/java/ruby $ dump-ruby.rb ../extest > /usr/local/bin/dump-ruby.rb: line 6: require: command not found Ah, looks like /bin/sh is trying to interpret dump-ruby.rb. Because I failed to add a hashbang line. Either add an appropriate hashbang line to dump-ruby.rb, or run dump-ruby.rb explicitly with the interpreter: ruby dump-ruby.rb ../extest -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/ebe1df02/attachment.bin From mental at rydia.net Sun Nov 27 14:40:07 2005 From: mental at rydia.net (MenTaLguY) Date: Sun, 27 Nov 2005 14:40:07 -0500 Subject: [grammarians] s-expression dumper: works In-Reply-To: References: <1133067468.6830.561.camel@localhost.localdomain> Message-ID: <1133120407.24508.7.camel@localhost.localdomain> Here is a more reliable version of the script, with an appropriate hashbang line also. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: dump-ruby.rb Type: application/x-ruby Size: 1109 bytes Desc: not available Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/31ab89ab/dump-ruby.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/31ab89ab/attachment.bin From mental at rydia.net Sun Nov 27 15:54:29 2005 From: mental at rydia.net (MenTaLguY) Date: Sun, 27 Nov 2005 15:54:29 -0500 Subject: [grammarians] variable scoping In-Reply-To: <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> Message-ID: <1133124870.24508.51.camel@localhost.localdomain> On Sun, 2005-11-27 at 10:56 -0800, Terence Parr wrote: > Good to know. So a scope is a lexical scope of the method? For the > random code I type outside of a method, what "method" is it a part of? Well, there is an enclosing file-level scope for local variables in that case which gets used when you're outside a class or method definition. > Where is there an informal description of scoping rules and general > syntax? I've got a bad feeling we're writing it. The best resources I can think of offhand would be the Pickaxe book (http://www.rubycentral.com/book/index.html), and Why's (poignant) Guide to Ruby (http://www.poignantguide.net/ruby), though in the latter case it's spread pretty thin and very informal indeed. [ However, the poignant guide is recommended reading simply for its own sake. ] Files, class/module definitions, and method definitions all have their own independent scopes for local variables. A method definition cannot see local variables in its enclosing class/module definition, and a class/module definition cannot see local variables in any enclosing things. For blocks (that is, anonymous functions), variables introduced within the block (this includes its parameters) are not visible outside it, but blocks still capture variables from the enclosing lexical scope. A block parameter named the same as an already visible variable will alias that variable rather than shadowing it. No other constructs (including begin/if/case/rescue/etc...) affect local variable scoping. The scoping rules for self, @instance_variables_like_this, @@class_variables_like_this, and CONSTANTS_LIKE_THIS, are different but (I believe) are not relevent to parsing. Basically though, as far as local variable scoping rules go, they are mostly reflected in the AST. (scope ...) indicates a "blank slate" scope for local variables; variables from the parent node (if applicable) are not visible within it (block ...) indicates a scope in which new local variables can be declared by first-assignment; existing variables are inherited from the parent node (enabling lexical closure) The following ruby fragment: foo = bar class Eek baz = zum def meep( tot ) zot = um foo.each { |plok| la = 32 } end end parses as: (lasgn foo (vcall bar)) (class Eek (scope (block (lasgn baz (vcall zum)) (defn meep (scope (block (args tot) (lasgn zot (vcall um)) (iter (call (vcall foo) each) (dasgn_curr plok) (block (dasgn_curr la) (dasgn_curr la (lit #<32>)))))))))) Obviously there is also some evaluator junk in there which has little to do with syntax, though. n.b. the current dump-ruby.rb strips the outermost [file-level] (scope (block ... )) in its output, which now that I think about it is probably wrong. But I don't know what the actual parser does. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/d5c37b7b/attachment.bin From parrt at cs.usfca.edu Sun Nov 27 16:02:50 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 27 Nov 2005 13:02:50 -0800 Subject: [grammarians] variable scoping In-Reply-To: <1133124870.24508.51.camel@localhost.localdomain> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> <1133124870.24508.51.camel@localhost.localdomain> Message-ID: On Nov 27, 2005, at 12:54 PM, MenTaLguY wrote: > On Sun, 2005-11-27 at 10:56 -0800, Terence Parr wrote: >> Good to know. So a scope is a lexical scope of the method? For the >> random code I type outside of a method, what "method" is it a part >> of? > > Well, there is an enclosing file-level scope for local variables in > that > case which gets used when you're outside a class or method definition. > >> Where is there an informal description of scoping rules and general >> syntax? > > I've got a bad feeling we're writing it. ;) Well, i found this: http://www.math.sci.hokudai.ac.jp/~gotoken/ruby/man/syntax.html#lexical which seems ok. Just bought the thomas 2nd edition (used); be here in 8 days. > Files, class/module definitions, and method definitions all have their > own independent scopes for local variables. A method definition > cannot > see local variables in its enclosing class/module definition, and a > class/module definition cannot see local variables in any enclosing > things. cool. > For blocks (that is, anonymous functions), variables introduced within > the block (this includes its parameters) are not visible outside > it, but > blocks still capture variables from the enclosing lexical scope. A This is something python can't do, right? I.e., no general blocks? > block parameter named the same as an already visible variable will > alias > that variable rather than shadowing it. really? wow. > No other constructs (including begin/if/case/rescue/etc...) affect > local > variable scoping. > > The scoping rules for self, @instance_variables_like_this, > @@class_variables_like_this, and CONSTANTS_LIKE_THIS, are different > but > (I believe) are not relevent to parsing. They should resolve by looking at the innermost enclosing class and then upwards in mixin/inheritance, right? > Basically though, as far as local variable scoping rules go, they are > mostly reflected in the AST. > > (scope ...) indicates a "blank slate" scope for local variables; > variables from the parent node (if applicable) are not visible > within it > > (block ...) indicates a scope in which new local variables can be > declared by first-assignment; existing variables are inherited from > the > parent node (enabling lexical closure) Wouldn't that be a dynamic scope rather than a lexical? > The following ruby fragment: > > foo = bar > > class Eek > baz = zum > > def meep( tot ) > zot = um > foo.each { |plok| la = 32 } > end > end > > parses as: > > (lasgn foo > (vcall bar)) > (class Eek > (scope > (block > (lasgn baz > (vcall zum)) > (defn meep > (scope > (block > (args tot) > (lasgn zot > (vcall um)) > (iter > (call > (vcall foo) each) > (dasgn_curr plok) > (block > (dasgn_curr la) why the repeated assign? > (dasgn_curr la > (lit #<32>)))))))))) > > Obviously there is also some evaluator junk in there which has > little to > do with syntax, though. That tree makes sense essentially... > n.b. the current dump-ruby.rb strips the outermost [file-level] (scope > (block ... )) in its output, which now that I think about it is > probably > wrong. But I don't know what the actual parser does. no worries. Ter From mental at rydia.net Sun Nov 27 17:08:05 2005 From: mental at rydia.net (MenTaLguY) Date: Sun, 27 Nov 2005 17:08:05 -0500 Subject: [grammarians] variable scoping In-Reply-To: References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> <1133124870.24508.51.camel@localhost.localdomain> Message-ID: <1133129285.24508.93.camel@localhost.localdomain> On Sun, 2005-11-27 at 13:02 -0800, Terence Parr wrote: > On Nov 27, 2005, at 12:54 PM, MenTaLguY wrote: > > ;) Well, i found this: > > http://www.math.sci.hokudai.ac.jp/~gotoken/ruby/man/syntax.html#lexical > > which seems ok. Just bought the thomas 2nd edition (used); be here > in 8 days. Ahh, good. I vaguely remember that from my early days as a Rubyist, but it's gotten buried in all the more up-to-date documentation (which sadly concerns itself with only the APIs). > > For blocks (that is, anonymous functions), variables introduced within > > the block (this includes its parameters) are not visible outside > > it, but blocks still capture variables from the enclosing lexical scope. > > This is something python can't do, right? I.e., no general blocks? Not sure what you mean... python can (sort of) do lambdas with lexical closure, but if I remember right it's awkward and explicit. They do (again, as I recall) get their own variable scope automatically. > > block parameter named the same as an already visible variable will > > alias that variable rather than shadowing it. > > really? wow. I believe it's a feature that will be going away in Ruby 2. while blah = nil result = something.zopmof { |blah| ... } # do something with blah is cute, it's not particularly readable; this is probably clearer: blah = nil result = something.zopmof { |value| blah = value ; ... } # do something with blah The former is also not all that common; the main time I've seen it done is with callcc. > > (block ...) indicates a scope in which new local variables can be > > declared by first-assignment; existing variables are inherited from > > the parent node (enabling lexical closure) > > Wouldn't that be a dynamic scope rather than a lexical? Er, no... visibility/scoping of local variables is lexical, not dynamic. Maybe I'm explaining this badly. :/ > They should resolve by looking at the innermost enclosing class and > then upwards in mixin/inheritance, right? Sort of. self is the receiver of the current method, and determines how the following are resolved. @instance variables are specific to a particular object. @@class variables just go up the mixin/inheritance hierarchy. CONSTANTS do the enclosing+mixin/inheritance dance more or less like you say, I forget the specifics. > > (block > > (dasgn_curr la) > > why the repeated assign? > > > (dasgn_curr la > > (lit #<32>)))))))))) Hmm, good eye. I don't know... -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/a665cbda/attachment.bin From parrt at cs.usfca.edu Sun Nov 27 17:15:45 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 27 Nov 2005 14:15:45 -0800 Subject: [grammarians] starting over from scratch In-Reply-To: <1133129285.24508.93.camel@localhost.localdomain> References: <1132813180.6830.309.camel@localhost.localdomain> <1132944073.6830.320.camel@localhost.localdomain> <5D08ADEA-69DA-45C4-9C17-6F2C6BBAB601@cs.usfca.edu> <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> <1133124870.24508.51.camel@localhost.localdomain> <1133129285.24508.93.camel@localhost.localdomain> Message-ID: <7E2DF9C6-AAAD-4A3B-9F11-C3037B34EE10@cs.usfca.edu> Ok, so the parse.y grammar has lots of duplicated grammar fragments so yacc can figure out how to do different actions depending on context. Further there is lots of weird stuff going on to resolve ambiguities and be as tight as possible. Some of it is beyond my ken giving my ruby nuby-ness. The lexer seems mostly or all handwritten; haven't looked at it much. I'm building a grammar from scratch starting with assignments. :) Damn thing even seems to be working... Ter From mfp at acm.org Sun Nov 27 18:07:52 2005 From: mfp at acm.org (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Mon, 28 Nov 2005 00:07:52 +0100 Subject: [grammarians] [Rubygrammar-grammarians] embedded expressions and dynamic quotes In-Reply-To: <5E781893-1039-4CD2-8248-F64790FC9557@cs.usfca.edu> References: <1132950615.6830.363.camel@localhost.localdomain> <3554807E-435B-422B-95C6-C826CB5BB322@cs.usfca.edu> <1132960279.6830.468.camel@localhost.localdomain> <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <9F786D84-C9DA-4641-AEE1-E0D9B235DA3A@cs.usfca.edu> <82b9e79a0511261404p1baf178as99afafe18c599f2f@mail.gmail.com> <20051127092344.GC4643@tux-chan> <5E781893-1039-4CD2-8248-F64790FC9557@cs.usfca.edu> Message-ID: <20051127230752.GD4643@tux-chan> On Sun, Nov 27, 2005 at 11:00:34AM -0800, Terence Parr wrote: > > On Nov 27, 2005, at 1:23 AM, Mauricio Fern?ndez wrote: > >> Holy crap! That is VERY tough for a static lexer to deal with. I > >> wonder if a semantic predicate will help us out here...hmm... > > > > What about this then? > > > > puts "foo #{% #{< > 1 2 3 > > 4 5 6 > > E1 > > blergh > > E2 > > > > Output: > > > > foo 1 2 3 > > 4 5 6 bar blergh baz > > Yikes! Can you parse that for me? The AST would be a NODE_EVSTR, NODE_HEREDOC, etc. mess. It's too late to expand it manually and anyway ParseTree knows better. Here are some hints instead: puts "foo #{% #{< References: <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> <1133124870.24508.51.camel@localhost.localdomain> <1133129285.24508.93.camel@localhost.localdomain> <7E2DF9C6-AAAD-4A3B-9F11-C3037B34EE10@cs.usfca.edu> Message-ID: <20051127233254.GE4643@tux-chan> On Sun, Nov 27, 2005 at 02:15:45PM -0800, Terence Parr wrote: > Ok, so the parse.y grammar has lots of duplicated grammar fragments > so yacc can figure out how to do different actions depending on > context. Further there is lots of weird stuff going on to resolve > ambiguities and be as tight as possible. Some of it is beyond my ken > giving my ruby nuby-ness. The lexer seems mostly or all handwritten; > haven't looked at it much. > > I'm building a grammar from scratch starting with assignments. :) > > Damn thing even seems to be working... Have you seen the following? EBNF corresponding to Ruby 1.4 http://www.outerbody.com/ruby/ruby-man-1.4/yacc.html The same with some changes towards Ruby 1.6 http://www.ruby-lang.org/ja/man/?cmd=view;name=%B5%BF%BB%F7BNF%A4%CB%A4%E8%A4%EBRuby%A4%CE%CA%B8%CB%A1 "Ruby's grammar" http://www.ruby-lang.org/ja/man/?cmd=view;name=Ruby%A4%CE%CA%B8%CB%A1 -- Mauricio Fernandez From mental at rydia.net Sun Nov 27 21:41:19 2005 From: mental at rydia.net (mental@rydia.net) Date: Sun, 27 Nov 2005 21:41:19 -0500 Subject: [grammarians] starting over from scratch In-Reply-To: <20051127233254.GE4643@tux-chan> References: <1C5523BC-9DB5-4747-B90B-B25F60D784C3@cs.usfca.edu> <1132965899.6830.482.camel@localhost.localdomain> <20051126023630.GB13981@tux-chan> <24B01502-1943-416B-87D9-7E6979012A62@cs.usfca.edu> <1133062053.6830.541.camel@localhost.localdomain> <0F857ABB-58E3-4F45-9B5E-F0F0B81513C4@cs.usfca.edu> <1133124870.24508.51.camel@localhost.localdomain> <1133129285.24508.93.camel@localhost.localdomain> <7E2DF9C6-AAAD-4A3B-9F11-C3037B34EE10@cs.usfca.edu> <20051127233254.GE4643@tux-chan> Message-ID: <1133145679.438a6e4fbc32c@www.rydia.net> Quoting MauricioFern?ndez : > Have you seen the following? > > EBNF corresponding to Ruby 1.4 > http://www.outerbody.com/ruby/ruby-man-1.4/yacc.html > > The same with some changes towards Ruby 1.6 > http://www.ruby-lang.org/ja/man/?cmd=view;name=%B5%BF%BB%F7BNF%A4%CB%A4%E8%A4%EBRuby%A4%CE%CA%B8%CB%A1 Unfortunately, now that I've had time to read it, it looks plainly wrong in some places (c.f. the definition of HERE_DOC, for example, or the lack of attention to operator precedence). So we've still got our work cut out for us. (Guess I'm not shocked though. Stuff like HEREDOCs isn't really expressable in EBNF anyway.) That doesn't mean it's useless, though. It's actually better than anything else I've personally seen so far. Good work, fellow Grammarian. Please continue digging; you're turning up some great stuff I never knew we had available. -mental From mental at rydia.net Sun Nov 27 23:41:50 2005 From: mental at rydia.net (MenTaLguY) Date: Sun, 27 Nov 2005 23:41:50 -0500 Subject: [grammarians] work beckons Message-ID: <1133152910.24508.106.camel@localhost.localdomain> Alas, now that the holiday is ended I must heft my pickaxe and return to regular employment at the happy sunshine mines. I shall continue to hover, but probably you will find me just a bit less vocal until this coming weekend. And that is why. -mental -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://rubyforge.org/pipermail/rubygrammar-grammarians/attachments/20051127/2613295c/attachment.bin From parrt at cs.usfca.edu Mon Nov 28 12:29:30 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Mon, 28 Nov 2005 09:29:30 -0800 Subject: [grammarians] work beckons In-Reply-To: <1133152910.24508.106.camel@localhost.localdomain> References: <1133152910.24508.106.camel@localhost.localdomain> Message-ID: On Nov 27, 2005, at 8:41 PM, MenTaLguY wrote: > Alas, now that the holiday is ended I must heft my pickaxe and > return to > regular employment at the happy sunshine mines. > > I shall continue to hover, but probably you will find me just a bit > less > vocal until this coming weekend. And that is why. No worries...thanks for getting this all set up. Building the grammar is having the two desirable effects: 1. learning ruby sufficient for building simple scripts, learning it's philosophy 2. finding bugs / wackiness in ANTLR Found 2 bugs yesterday alone in the pre-release ANTLR! :) Ordered Thomas 2nd edition book yesterday. Ter From parrt at cs.usfca.edu Tue Nov 29 15:59:53 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 29 Nov 2005 12:59:53 -0800 Subject: [grammarians] ambiguity with method call? Message-ID: Howdy, What does foo [1] parse as? foo could be a variable and then foo is a simple array var: foo = [a,b] foo[1] If foo is a method, it could return an array and then you could access the 1st element: def foo return [a,b] end foo[1] It could also be a method with args: def foo x ... end foo [1] # same as foo([1]) This 5x ways to do anything syntax is truly a burden for the implementor (and as a user one of the reasons I hated perl). Regardless, is this a true *syntactic* ambiguity as I suspect? I.e., you need to simply pick an interpretation by dictum in the manual? Thanks, Ter From sean.ohalpin at gmail.com Tue Nov 29 16:23:08 2005 From: sean.ohalpin at gmail.com (Sean O'Halpin) Date: Tue, 29 Nov 2005 21:23:08 +0000 Subject: [grammarians] ambiguity with method call? In-Reply-To: References: Message-ID: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> On 11/29/05, Terence Parr wrote: > Howdy, > > What does > > foo [1] > > parse as? foo could be a variable and then foo is a simple array var: > > foo = [a,b] > foo[1] > > If foo is a method, it could return an array and then you could > access the 1st element: > > def foo > return [a,b] > end > foo[1] > > It could also be a method with args: > > def foo x > ... > end > foo [1] # same as foo([1]) > > This 5x ways to do anything syntax is truly a burden for the > implementor (and as a user one of the reasons I hated perl). > Regardless, is this a true *syntactic* ambiguity as I suspect? I.e., > you need to simply pick an interpretation by dictum in the manual? > > Thanks, > Ter It gets worse ;) There is also the case where [] can be a synonym for Proc#call, e.g. foo = proc {|x| x*x } foo [2] #=> 4 Regards, Sean From parrt at cs.usfca.edu Tue Nov 29 16:40:13 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 29 Nov 2005 13:40:13 -0800 Subject: [grammarians] ambiguity with method call? In-Reply-To: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> References: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> Message-ID: <937693A0-C1B0-4586-811C-235C9FC844EF@cs.usfca.edu> On Nov 29, 2005, at 1:23 PM, Sean O'Halpin wrote: > It gets worse ;) > > There is also the case where [] can be a synonym for Proc#call, e.g. > > foo = proc {|x| x*x } > > foo [2] #=> 4 Yikes. If the AST needs to differentiate, then we've got trouble. ;) I'm getting a sneaking suspicion that I'll not be able to build a grammar until I've spent a LONG time with ruby. Hmm...not sure I want to reward this sort of language design... Ter From mental at rydia.net Tue Nov 29 18:28:21 2005 From: mental at rydia.net (mental@rydia.net) Date: Tue, 29 Nov 2005 18:28:21 -0500 Subject: [grammarians] ambiguity with method call? In-Reply-To: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> References: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> Message-ID: <1133306901.438ce4155387f@www.rydia.net> Quoting Sean O'Halpin : > It gets worse ;) > > There is also the case where [] can be a synonym for Proc#call, > e.g. > > foo = proc {|x| x*x } > > foo [2] #=> 4 The difference between an array and a proc doesn't matter, at least not when parsing. [] in such a context is simply a method and dispatched as such. The discriminator is whether foo is determined to be a local variable or a method. If foo is a method, foo [2] parses as: (fcall foo (array (array (lit #<2>)))) If foo is a local variable, foo [2] parses as: (call (lvar foo) [] (array (lit #<2>))) So, yet another place where this local variable versus method name thing really matters. -mental From mental at rydia.net Tue Nov 29 18:42:06 2005 From: mental at rydia.net (mental@rydia.net) Date: Tue, 29 Nov 2005 18:42:06 -0500 Subject: [grammarians] ambiguity with method call? In-Reply-To: <937693A0-C1B0-4586-811C-235C9FC844EF@cs.usfca.edu> References: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> <937693A0-C1B0-4586-811C-235C9FC844EF@cs.usfca.edu> Message-ID: <1133307726.438ce74ea591c@www.rydia.net> Quoting Terence Parr : > Yikes. If the AST needs to differentiate, then we've got > trouble. ;) It doesn't. Assuming 'anything' is a local variable, anything [whatever] simply sends the message [] to the object referenced by 'anything', with the value of 'whatever' as a parameter. In response to the [] message, Proc objects call themselves with the given parameters. In response to the [] message, Array objects return the indexed value(s). But that's a runtime consideration you don't have to care about. > I'm getting a sneaking suspicion that I'll not be able to build a > grammar until I've spent a LONG time with ruby. Hmm...not sure I > want to reward this sort of language design... I'm afraid folks are making it sound much worse than it really is. See my other post. If you've already got the lvar-versus-method thing worked out, foo [2] shouldn't be a problem. -mental From rubygrammar at d10.karoo.co.uk Tue Nov 29 19:20:07 2005 From: rubygrammar at d10.karoo.co.uk (daz) Date: Wed, 30 Nov 2005 00:20:07 -0000 Subject: [grammarians] ambiguity with method call? References: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> <1133306901.438ce4155387f@www.rydia.net> Message-ID: <001d01c5f543$d2625200$42b46453@vanna> From: mental > > The discriminator is whether foo is determined to be a local > variable or a method. > ... and matters (for a method call) daz ============================= meth no spc def foo [a,b] end foo[1] # sees foo.[](1) ----------------------------------------- NODE_BLOCK: NODE_NEWLINE: [rb8233.TMP:1] NODE_DEFN: method 9713 (foo) NODE_SCOPE: NODE_BLOCK: NODE_ARGS: count = 0 additional default values: NODE_NEWLINE: [rb8233.TMP:2] NODE_ARRAY: size = 2 NODE_VCALL: self.a NODE_VCALL: self.b NODE_NEWLINE: [rb8233.TMP:5] NODE_CALL: to method: 331 ([]) <-- CALL Receiver: NODE_VCALL: self.foo <-- VCALL Parameters: NODE_ARRAY: size = 1 NODE_LIT: Fixnum: 1 ================================ meth spc def foo [a,b] end foo [1] # sees foo([1]) ----------------------------------------- NODE_BLOCK: NODE_NEWLINE: [rb8233.TMP:1] NODE_DEFN: method 9713 (foo) NODE_SCOPE: NODE_BLOCK: NODE_ARGS: count = 0 additional default values: NODE_NEWLINE: [rb8233.TMP:2] NODE_ARRAY: size = 2 NODE_VCALL: self.a NODE_VCALL: self.b NODE_NEWLINE: [rb8233.TMP:5] NODE_FCALL: to function: 9713 (foo) <-- FCALL Parameters: NODE_ARRAY: size = 1 <-- [] NODE_ARRAY: size = 1 <-- [[]] NODE_LIT: Fixnum: 1 <-- [[1]] ============================== var no spc foo = [a,b] foo[1] ----------------------------------------- NODE_BLOCK: NODE_NEWLINE: [rb8233.TMP:1] NODE_LASGN: NODE_ARRAY: size = 2 NODE_VCALL: self.a NODE_VCALL: self.b Assign to LV 2 (foo) NODE_NEWLINE: [rb8233.TMP:2] NODE_CALL: to method: 331 ([]) <-- Receiver: NODE_LVAR: LV 2 (foo) Parameters: NODE_ARRAY: size = 1 NODE_LIT: Fixnum: 1 ================================= var spc foo = [a,b] foo [1] ----------------------------------------- NODE_BLOCK: NODE_NEWLINE: [rb8233.TMP:1] NODE_LASGN: NODE_ARRAY: size = 2 NODE_VCALL: self.a NODE_VCALL: self.b Assign to LV 2 (foo) NODE_NEWLINE: [rb8233.TMP:2] NODE_CALL: to method: 331 ([]) <-- Receiver: NODE_LVAR: LV 2 (foo) Parameters: NODE_ARRAY: size = 1 NODE_LIT: Fixnum: 1 ruby 1.8.0 (2003-08-30) [i586-bccwin32] http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/rough/ext/nodedump/ From bwatt at austin.rr.com Tue Nov 29 19:30:03 2005 From: bwatt at austin.rr.com (Brian Watt) Date: Tue, 29 Nov 2005 18:30:03 -0600 Subject: [grammarians] ambiguity with method call? In-Reply-To: References: Message-ID: <438CF28B.2010407@austin.rr.com> Minor alternative syntax: def foo [a,b] # removed return end foo[1] and def foo x [a,b] # replaced ... end foo [1] # same as foo([1]), note: ruby 1.8.1-15 issues "warning: parenthesize argument(s) for future version" Sincerely, Brian Watt Terence Parr wrote: >Howdy, > >What does > >foo [1] > >parse as? foo could be a variable and then foo is a simple array var: > >foo = [a,b] >foo[1] > >If foo is a method, it could return an array and then you could >access the 1st element: > >def foo > return [a,b] >end >foo[1] > >It could also be a method with args: > >def foo x >... >end >foo [1] # same as foo([1]) > >This 5x ways to do anything syntax is truly a burden for the >implementor (and as a user one of the reasons I hated perl). >Regardless, is this a true *syntactic* ambiguity as I suspect? I.e., >you need to simply pick an interpretation by dictum in the manual? > >Thanks, >Ter > >_______________________________________________ >Rubygrammar-grammarians mailing list >Rubygrammar-grammarians at rubyforge.org >http://rubyforge.org/mailman/listinfo/rubygrammar-grammarians > > > > From mfp at acm.org Tue Nov 29 18:01:54 2005 From: mfp at acm.org (Mauricio =?iso-8859-1?Q?Fern=E1ndez?=) Date: Wed, 30 Nov 2005 00:01:54 +0100 Subject: [grammarians] ambiguity with method call? In-Reply-To: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> References: <3736dd30511291323p6e64b9cbib6070c1ef3f294d2@mail.gmail.com> Message-ID: <20051129230154.GA6038@tux-chan> On Tue, Nov 29, 2005 at 09:23:08PM +0000, Sean O'Halpin wrote: > On 11/29/05, Terence Parr wrote: > > Howdy, > > What does foo [1] parse as? > > foo could be a variable and then foo is a simple array var: > > foo = [a,b] > > foo[1] > > If foo is a method, it could return an array and then you could > > access the 1st element: > > def foo > > return [a,b] > > end > > foo[1] > > It could also be a method with args: > > def foo x > > ... > > end > > foo [1] # same as foo([1]) [...] > There is also the case where [] can be a synonym for Proc#call, e.g. > > foo = proc {|x| x*x } > > foo [2] #=> 4 That's just another instance of a call to #[]. Before assigning to foo: * foo is considered a method call until the parser sees an assignment to the foo variable (as said before, this does not depend on whether that assignment is actually executed or not). * somemethod[0] means sending the #[] message to the value returned by somemethod, like (x = somemethod; x[0]) * somemethod [0] is a call to the somemethod method (argument [0]) After assigning to foo: * both foo[0] and foo [0] send the #[] message to the object referred to by foo (argument 0) These are all reflected in the AST built in parse.y: def foo(x = :default); [x,1] end foo[0] # => :default # [:call, [:vcall, :foo], :[], [:array, [:lit, 0]]] foo [0] # => [[0], 1] # [:fcall, :foo, [:array, [:array, [:lit, 0]]]] foo=[1] foo[0] # => 1 # [:call, [:lvar, :foo], :[], [:array, [:lit, 0]]] foo [0] # => 1 # [:call, [:lvar, :foo], :[], [:array, [:lit, 0]]] foo = lambda{|x| [x]} foo[0] # => [0] # [:call, [:lvar, :foo], :[], [:array, [:lit, 0]]] foo [0] # => [0] # [:call, [:lvar, :foo], :[], [:array, [:lit, 0]]] -- Mauricio Fernandez From mental at rydia.net Tue Nov 29 19:58:03 2005 From: mental at rydia.net (mental@rydia.net) Date: Tue, 29 Nov 2005 19:58:03 -0500 Subject: [grammarians] ambiguity with method call? In-Reply-To: <438CF28B.2010407@austin.rr.com> References: <438CF28B.2010407@austin.rr.com> Message-ID: <1133312283.438cf91b83b4b@www.rydia.net> Wait. All of you. Thank you very much, but all those additional details don't alter the way 'foo[1]' and 'foo [1]' are parsed. Doesn't matter how many arguments a foo method takes. Doesn't matter whether a foo lvar references an Array, a Proc, or Cheese. Please have pity on Ter and stick to minimal complete examples like these (each parsed separately): 1. foo[1] 2. foo [1] 3. foo = nil foo[1] 4. foo = nil foo [1] Then focus on the relevent differences: space no space method: foo([1]) foo.[](1) lvar: foo.[](1) foo.[](1) So, there are really only two possibilities he has to consider: method, with space: foo([1]) any other combination of method/lvar/space: foo.[](1) [ side note: it is good to point out, as Brian did, when a particular construct (#2 in this case) is deprecated. Sure, we'll still have to parse it for the time being, but we can at least stick a comment in the grammar. ] -mental From bwatt at austin.rr.com Tue Nov 29 20:20:44 2005 From: bwatt at austin.rr.com (Brian Watt) Date: Tue, 29 Nov 2005 19:20:44 -0600 Subject: [grammarians] Language test cases Message-ID: <438CFE6C.3070705@austin.rr.com> All, Does anyone know if there is a set of pre-existing language test cases that exercise the lexer and parser? Sincerely, Brian Watt From mental at rydia.net Tue Nov 29 20:31:01 2005 From: mental at rydia.net (mental@rydia.net) Date: Tue, 29 Nov 2005 20:31:01 -0500 Subject: [grammarians] Language test cases In-Reply-To: <438CFE6C.3070705@austin.rr.com> References: <438CFE6C.3070705@austin.rr.com> Message-ID: <1133314261.438d00d5b2cd1@www.rydia.net> Quoting Brian Watt : > Does anyone know if there is a set of pre-existing language test > cases that exercise the lexer and parser? Rubicon, maybe? Not sure how up-to-date it is now, but it's worth a look. -mental From rubygrammar at d10.karoo.co.uk Tue Nov 29 20:34:45 2005 From: rubygrammar at d10.karoo.co.uk (daz) Date: Wed, 30 Nov 2005 01:34:45 -0000 Subject: [grammarians] Language test cases References: <438CFE6C.3070705@austin.rr.com> Message-ID: <006101c5f54e$3f43f680$42b46453@vanna> From: Brian Watt > > All, > > Does anyone know if there is a set of pre-existing language test cases > that exercise the lexer and parser? > /sample/test.rb in the distro, or hereabouts: http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/sample/test.rb daz From ryand-ruby at zenspider.com Wed Nov 30 02:47:51 2005 From: ryand-ruby at zenspider.com (Ryan Davis) Date: Tue, 29 Nov 2005 23:47:51 -0800 Subject: [grammarians] ambiguity with method call? In-Reply-To: References: Message-ID: <34D4231C-FC3E-4CE0-BC9D-FB2E89A7A361@zenspider.com> On Nov 29, 2005, at 12:59 PM, Terence Parr wrote: > Howdy, > > What does > > foo [1] sudo gem install ParseTree (!!) % echo "foo [1]" | parse_tree_show -f ... [:fcall, :foo, [:array, [:array, [:lit, 1]]]] without any other context to go on, it is assumed it is an fcall, in other contexts it could be an lvar, dvar, or a regular call. You should also do a gem unpack ParseTree and look through the unit tests. That might answer quite a few of your questions as well. From sean.ohalpin at gmail.com Wed Nov 30 03:28:02 2005 From: sean.ohalpin at gmail.com (Sean O'Halpin) Date: Wed, 30 Nov 2005 08:28:02 +0000 Subject: [grammarians] ambiguity with method call? In-Reply-To: <1133312283.438cf91b83b4b@www.rydia.net> References: <438CF28B.2010407@austin.rr.com> <1133312283.438cf91b83b4b@www.rydia.net> Message-ID: <3736dd30511300028g15b578e0re94ccbbe4f3f4931@mail.gmail.com> On 11/30/05, Ryan Davis wrote: > > On Nov 29, 2005, at 1:23 PM, Sean O'Halpin wrote: > > > It gets worse ;) > > no, it doesn't. > > > There is also the case where [] can be a synonym for Proc#call, e.g. > > at this point, it is not a synonym, it is just a message. > > > foo = proc {|x| x*x } > > foo is just an lvar here > > > foo [2] #=> 4 > > [2] is just a call: > > [:call, [:lvar, :foo], :[], [:array, [:lit, 2]]] > I stand corrected - apologies for the FUD. Sean From ryand-ruby at zenspider.com Wed Nov 30 04:14:31 2005 From: ryand-ruby at zenspider.com (Ryan Davis) Date: Wed, 30 Nov 2005 01:14:31 -0800 Subject: [grammarians] ambiguity with method call? In-Reply-To: <3736dd30511300028g15b578e0re94ccbbe4f3f4931@mail.gmail.com> References: <438CF28B.2010407@austin.rr.com> <1133312283.438cf91b83b4b@www.rydia.net> <3736dd30511300028g15b578e0re94ccbbe4f3f4931@mail.gmail.com> Message-ID: On Nov 30, 2005, at 12:28 AM, Sean O'Halpin wrote: > I stand corrected - apologies for the FUD. No worries... this is an icky area of the language... From parrt at cs.usfca.edu Wed Nov 30 13:10:28 2005 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 30 Nov 2005 10:10:28 -0800 Subject: [grammarians] ambiguity with method call? In-Reply-To: <34D4231C-FC3E-4CE0-BC9D-FB2E89A7A361@zenspider.com> References: <34D4231C-FC3E-4CE0-BC9D-FB2E89A7A361@zenspider.com> Message-ID: <627D473A-AAB6-4B5B-8A18-4AC908C48B62@cs.usfca.edu> On Nov 29, 2005, at 11:47 PM, Ryan Davis wrote: > > On Nov 29, 2005, at 12:59 PM, Terence Parr wrote: > >> Howdy, >> >> What does >> >> foo [1] > > sudo gem install ParseTree (!!) > > % echo "foo [1]" | parse_tree_show -f > ... > [:fcall, :foo, [:array, [:array, [:lit, 1]]]] > > without any other context to go on, it is assumed it is an fcall, > in other contexts it could be an lvar, dvar, or a regular call. > > You should also do a gem unpack ParseTree and look through the unit > tests. That might answer quite a few of your questions as well. Yep. Thanks, Ryan. Here's the thing. The impl does not specify the language definition; i've seen mistakes in C compilers. My point is that the language rules as described everywhere I look could easily be seen to interpret foo[1] in two ways: (foo)[1] and foo([1]) even when we know foo is a function call. how could it not? There is no syntax to distinguish. Remember that structure imparts meaning. Just because his implementation picks one doesn't mean there isn't a real ambiguity here. For example, C++ clearly identifies the ambiguity between decl and expr statements and says in the book that if some input can be either, choose decl (it's still a bad thing, but it's known and specified). Languages do have ambiguities such as the if-then-else bugaboo, but it needs to be clearly outlined rather than just biting somebody. More importantly, computer languages should be defined as deterministically/unambiguously as possible. It's already bad enough in ruby that I have to look backwards to see how it was used first lexically (particularly when there are two interpretations possible given foo=1 and def foo). Adding in this unspecified preference to resolve the ambiguity is pretty rough on a guy. It's a bug in hiding waiting to bite you 2 years hence when that code execs and gives you a runtime type exception. All that said, the point of this exercise is probably to get a real spec driven from a real grammatical description. Am I the first person to run across this ambiguity before? Ter From mental at rydia.net Wed Nov 30 14:42:49 2005 From: mental at rydia.net (mental@rydia.net) Date: Wed, 30 Nov 2005 14:42:49 -0500 Subject: [grammarians] ambiguity with method call? In-Reply-To: <627D473A-AAB6-4B5B-8A18-4AC908C48B62@cs.usfca.edu> References: <34D4231C-FC3E-4CE0-BC9D-FB2E89A7A361@zenspider.com> <627D473A-AAB6-4B5B-8A18-4AC908C48B62@cs.usfca.edu> Message-ID: <1133379769.438e00b93aafb@www.rydia.net> Quoting Terence Parr : > Here's the thing. The impl does not specify the language > definition; Well... currently for Ruby, matz's implementation effectively _is_ the language definition which the other implementations try to track, except where he identifies a particular thing as a bug. Today, there is no other language definition, except as a platonic ideal. > All that said, the point of this exercise is probably to get a > real spec driven from a real grammatical description. That and replacing YACC, yes. matz's implementation will continue to define the leading edge of Ruby, but at least there will be a stable spec and all of this stuff will have been dragged screaming into the sunlight. > Am I the first person to run across this ambiguity before? At this point my impression is that pretty much anyone who's seriously worked with the parser is aware of it (c.f. the ParseTree unit tests I think daz referenced). I wouldn't say it's widely known, though. People who aren't writing Ruby parsers tend not to notice the ambiguity because the rule for resolving it lines up pretty well with what most intend. -mental