atom feed15 messages in org.perl.perl5-portersRe: Pattern matching in SNOBOL4 (long...
FromSent OnAttachments
Mark-Jason DominusApr 15, 1998 10:23 pm 
Ilya ZakharevichApr 15, 1998 11:34 pm 
Moore, PaulApr 16, 1998 2:16 am 
Moore, PaulApr 16, 1998 2:49 am 
Chaim FrenkelApr 16, 1998 6:50 am 
Mark-Jason DominusApr 16, 1998 7:20 am 
Ilya ZakharevichApr 16, 1998 9:53 am 
Ilya ZakharevichApr 16, 1998 10:08 am 
Larry WallApr 16, 1998 10:41 am 
Chaim FrenkelApr 16, 1998 11:03 am 
Ton HospelApr 16, 1998 3:18 pm 
kst...@chapin.eduApr 16, 1998 4:41 pm 
Peter PrymmerApr 16, 1998 4:55 pm 
Ton HospelApr 17, 1998 1:39 pm 
Ton HospelApr 17, 1998 2:20 pm 
Subject:Re: Pattern matching in SNOBOL4 (long, digression)
From:Ilya Zakharevich (il@math.ohio-state.edu)
Date:Apr 16, 1998 9:53:55 am
List:org.perl.perl5-porters

Moore, Paul writes:

If you know Emacs syntax tables, you know that this is mostly useless, since you need to know quoting/commenting/escaping rules too. But Perl can do BAL *now*.

I agree with your point on quoting etc (in principle). But for simplified cases (80-20 stuff where 80% of the cases are good enough) BAL can be helpful.

Can you explain how to do BAL inside an arbitrary RE? I can't think where to start, even.

See ./t/op/pat.t, find the longest and urgliest multiline pattern. OK, OK:

sub matchit { m/ ( \( (?{ $c = 1 }) # Initialize (?: (?(?{ $c == 0 }) # PREVIOUS iteration was OK, stop the loop (?! ) # Fail: will unwind one iteration back ) (?: [^()]+ # Match a big chunk (?= [()] ) # Do not try to match subchunks | \( (?{ ++$c }) | \) (?{ --$c }) ) )+ # This may not match with different subblocks ) (?(?{ $c != 0 }) (?! ) # Fail ) # Otherwise the chunk 1 may succeed with $c>0 /xg; }

As above, agreed, and great. Can you show how, please? And is it possible to package the code up as a named, reusable chunk?

Put it into $matched_parens (you may need to change () to (?:) to make it relocatable).

I wanted to add (?. ) code for generic relocatable stuff (whatever is inside (?. ) will use separate numeration of parens and \17), but that moment the endless discussion on hierarchical matches looked like it may produce something more general and simple. Unfortunately, now I do not remember the results of this discussion.

*(EXPR) for match-time expression evaluation

??? What does it do?

Named, reusable chunks of code, is my picture of the important point.

You can put any RE into $expr now. This may need some "relocatable support", of course, if they use \17.

POS(N) jump to position N

??? What does it do?

POS(N) is a zero-width assertion that we are currently at position N in the string. You could maybe do that with (?{...}) except for the fact that that cannot fail. Could (?{...}) be extended so that the code could return a succeed/fail indicator?

See documentation for (?(cond)blah|foo) and (?!), and the example above.

Ilya