Thursday, April 2, 2015

Ruby on Medicine: Hunting For The Gene Sequence

Previous articles in this series focused on handling very large text files. At some point, you may be interested in searching for a specific pattern in those large files. Manually searching through a large text file is a non-starter, so leveraging the incredible tools of the developer's trade is where we turn for help in today's article.


Regular expressions


Regular expressions (Regex) are built for this task. They are encoded text strings focused on matching and manipulating patterns in the text. They were born into our world in the 1970s. They are extremely useful and considered the key to powerful text processing.


To be more precise, a regular expression is a string that contains a combination of normal characters and special metacharacters. The normal characters are present to match themselves. On the other hand, the metacharacters represent ideas such as quantity and location of characters.


Regex is a language in and of itself, with special syntax and instructions to implement. It can be used with programming languages, like Ruby, to accomplish different tasks, such as:



  • Finding text that matches the pattern within a larger text (i.e. our very large text file)

  • Replacing the text matching the pattern with other text

  • Searching for a file containing the text ant for example, but not if that text is at the end of the word (i.e. want)


These are just a few of the example tasks that are possible. Such tasks can range in complexity from a simple text editor's search command to a powerful text processing language.


The bottom line is that you, as a Ruby programmer, will be armed with a very versatile tool that can be used to perform all sorts of text processing tasks.


The example today will focus on the main types of tasks regex performs: Search (locate text) and Replace (edit located text).


Continue reading %Ruby on Medicine: Hunting For The Gene Sequence%




by A. Hasan via SitePoint

No comments:

Post a Comment