Regular Expressions – a Powerful Mechanism for Search & Replace Operations and Input Validation
Posted on 2015-07-13 in the RP Photonics Software News (available as e-mail newsletter!)
Permanent link: https://www.rp-photonics.com/software_news_2015_07_13.html
Author: Dr. Rüdiger Paschotta, RP Photonics Consulting GmbH
Abstract: Regular expressions are very powerful tools for searching in text files, for replacing text and for checking whether inputs are in accordance with certain standouts. Although it is not easy to learn the full functionality of this mechanism, it can be very worthwhile to learn at least the basics. Software from RP Photonics also supports regular expressions, and this very much facilitates e.g. the processing of input data files.
More or less every computer user knows that one can often select some items with a search mask containing wildcard characters like the asterisk (*) or the question mark (?). For example, in Windows Explorer there is a search field where one could enter *.fpw in order to find only those files having the filename extension .fpw.
Only few people, however, know that some software products support some vastly more powerful mechanism for defining a search operation: regular expressions. For example, I am using the nice text editor EditPad Pro which supports regular expressions for search & replace operations. There, you can not only search and replace literal text, but perform more complex operations. Just a simple example for that: in an html file, you may want to add a class attribute to all links which do not already have it. A link like
<a href="https: / / www.rp-photonics.com / ">RP Photonics< / a>
may be replaced with:
<a href="https: / / www.rp-photonics.com / " class = "redlink">RP Photonics< / a>
As the link texts and link targets can vary, you need a flexible way of telling the software how to identify a link and its parts, and how to reassemble the parts in order to get the actual replace text. With a regular expression, you could easily restrict the operation to those links where the target belongs to the domain rp-photonics.com.
Another example is finding e-mail addresses in a document, or checking whether an address entered by a user is a valid e-mail address. A simple regular expression (in short, a regex) for that purpose:
\b[A-Za-z0-9._% + -] + @[A-Za-z0-9.-] + \.[A-Za-z]{2,6}\b
(A far more detailed version, not shown here, can fully implement the detailed official rules according to RFC 2822.)
How to Learn That?
The challenge is that it takes a while to learn the use of regular expressions, and beginners are often struggling a lot until they manage to compose such expressions correctly. Nevertheless, it can be very worthwhile to invest a couple of hours into learning that, since you can then solve a lot of problems which would normally have required writing specialized programs (or alternatively a huge amount of manual work). And once you have learned that stuff, you will quickly stumble across additional nice applications.
The principles of regular expressions are explained on various websites. A good example is http://www.regular-expressions.info/tutorialcnt.html. If you want to get started seriously, it could be very useful to get the utility RegExBuddy – a nice assistant, helping you to compose and test regular expressions.
Where Can You Use Regular Expressions?
Of course, you can use regular expressions only in software which supports those. In addition, you need to know about that support; for example, I assume that only few users of Microsoft Word are aware that the advanced find & replace dialog box has the option “Use wildcards”, which switches on the support of something like regular expressions – in that case, unfortunately only some subset, lacking essential features.
Some advanced text editors like the above mentioned EditPad Pro offer full support of regular expressions, in that case using the “Regex” option below the search and replace fields.
The same supplier offers the truly amazing search & replace tool PowerGREP. With that, one can perform complex search-and-replace operations on multiple files. I have rarely been so much impressed by a software tool.
Since May 2013, the script language of our software products RP Fiber Power, RP Resonator, RP ProPulse and RP Coating also supports regular expressions:
- The function matches_re() can check whether a string (which may, for example, contain everything from a whole text file) contains certain items, and it can also store found matches in an array.
- The function replace_re() is used for search-and-replace operations. For example, one could find all links in a downloaded html file and insert or remove the nofollow attribute as shown above.
- The function split_re() can split a character string into substrings, where the separator is given as a regular expression. For example, the separator could be defined to be a comma, followed by any number of blank characters.
These functions allow one to perform rather advanced operations with only a little amount of script code. As an example, imagine that you have a text file which defines waveguide parameters of multiple fibers. Each line would contain the fiber's name, its core diameter, the numerical aperture and possibly other data. A script for RP Fiber Power could use a regular expression to extract all these data from the line corresponding to a particular fiber name, identified by a string variable in the script. You would not have to deal with awkward programming, e.g. searching for delimiters, cutting out substrings, etc. Similarly, all sorts of other input data could be conveniently read from text files – for example, CSV files produced by some laboratory instruments, say an optical spectrometer.
Obviously, regular expressions can enormously simplify life when one has to solve certain problems which actually occur quite often in practice. Therefore, they further enhance the great flexibility of simulation software which supports a powerful script language. A software having only a graphical user interface may make it easy to get started, but can never provide the amount of flexibility which is often required in a real life.
This article is a posting of the RP Photonics Software News, authored by Dr. Rüdiger Paschotta. You may link to this page, because its location is permanent.
Note that you can also receive the articles in the form of a newsletter or with an RSS feed.
If you like this article, share it with your friends and colleagues, e.g. via social media:
These sharing buttons are implemented in a privacy-friendly way!