Regular expression regular expression

A convenient tool for extracting character information is a regular expression. Have you ever been frustrated trying to remember this? I haven't been able to use RegExp for more than 10 years, so I tried many books, but I gave up on the way. But you know. You can use it in just 10 minutes at http://rubular.com. The trick is, after all, TDD.


Rubular screen

Look at the Rubular screen. Put the regular expression at the top. Put the test string in the lower left. Then the extraction result will appear on the right.

First as a test string

#+qiita_id: daddygongon

Please enter.

I will take out daddygongon from now on. The string capture everything enclosed is

(.+)

It is taken out with. A'.' For any single caharacter and a (+) for one or more are included. Please think that I will remove the annoying characters from now on. With a little trial and error,

:\s*(.+)

Could only daddygongon be taken out for some reason?

To incorporate in ruby

line =~ /:\s*(.+)/
p $1

res = line.scan(/:\s*(.+)/)
p res[0] unless res

m = line.match(/:\s*(.+)/)
p m[1] unless m

And so on. Each has a few characteristics, but it works when it works. When it doesn't work. .. ..

It's easy to think that you have to remember a lot of regular expressions, but it's made up of only the elements summarized in the Regex quick reference. Broadly classified, they are range specifier (upper left column), position specifier (lower left column), character specifier (middle column), and quantity specifier (right column). Furthermore, it will be extracted with parentheses (). All you have to do is make trial and error while looking at the anchocolate. How is it, TDD?

Extract data from files

There is a dedicated library for reading from commonly used formatted files such as yaml and json. It's easy to use, and you can use it immediately with a little google. The problem needs to be pre-formatted for that. So, I would like to introduce a more general method.

This is a convenient method when a lot of complicated syntax is repeated [^ RubyBestPractice]. A common block is a block surrounded by keywords as shown below.

StartCharMetrics 315
C 32 ; WX 278 ; N space ; B 0 0 0 0 ;
C 33 ; WX 278 ; N exclam ; B 90 0 187 718 ;
C 34 ; WX 355 ; N quotedbl ; B 70 463 285 718 ;
C 35 ; WX 556 ; N numbersign ; B 28 0 529 688 ;
C 36 ; WX 556 ; N dollar ; B 32 -115 520 775 ;
....
EndCharMetrics

There may be no End.

In such a case, a convenient method is to extract the data while judging whether it is inside or outside the block as follows.

setion = []
File.foreach(file_name) do |line|
  case line
  when /^Start(\w+)/
    section.push $1
    next
  when /^End(\w+)/
    section.pop
    next
  end
end

The state (in "FontMetircs", or other) is stored in the section and processed sequentially.

For those who read the data

case section
when ["FontMetrics", "CharMetrics"]
  next unless line =~ /^CH?\s/

  name                  = line[/\bN\s+(\.?\w+)\s*;/, 1]
  @glyph_widths[name]   = line[/\bWX\s+(\d+)\s*;/, 1].to_i
  @bounding_boxes[name] = line[/\bB\s+([^;]+);/, 1].to_s.rstrip
when ["FontMetrics", "KernData", "KernPairs"]
  next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/
  @kern_pairs[[$1, $2]] = $3.to_i
when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"]
  next
else
  parse_generic_afm_attribute(line)
end

With that feeling, the contents to be read are divided according to the case.

I use an automaton or a finite state machine that came out in the compiler class. You are making a simple parser.

References

[^ RubyBestPractice]: Ruby Best Practices-Professional Code and Techniques, Gregory Brown (Author), Masayoshi Takahashi (Translation), Takashi Sasai (Translation), O'Reilly Japan (2010/3/26), Ruby Best Practices by Gregory T Brown, O'Reilly,

source ~/git_hub/ruby_docs/chart_style_ruby/c05_rubular.org

Chart type ruby-appendix-V (rubular)

Regular expression regular expression

Extract data from files

References