Now, this is a bit more complicated because any regular expression pattern is ordered from left to right. What if you want to search a given text for pattern A AND pattern B-but in no particular order? If both patterns appear anywhere in the string, the whole string should be returned as a match.
#Regular expression not a specific string how to#
Positive Lookahead Example: How to Match Two Words in Arbitrary Order? Next, it “backtracks”-which is just a fancy way of saying: it goes back to a previous decision and tries to match something else. If it doesn’t, the regex engine cannot move on. It just “looks ahead” starting from the current position whether what follows would theoretically match the lookahead pattern. The advantage of the lookahead expression is that it doesn’t consume anything. In other words, the regex engine tries to “consume” the next character as a (partial) match of the pattern. At each point, it has one “current” position to check if this position is the first position of the remaining match. The regex engine goes from the left to the right-searching for the pattern. Think of the lookahead assertion as a non-consuming pattern match. Then it checks whether the remaining pattern could be matched without actually matching it. The regular expression engine matches (“consumes”) the string partially. The consumed substring cannot be matched by any other part of the regex.įigure: A simple example of lookahead. The regex engine “consumes” partially matching substrings. In normal regular expression processing, the regex is matched from left to right. However, the concept of lookahead makes this problem simple to write and read.īut first things first: how does the lookahead assertion work?
#Regular expression not a specific string code#
It’s a challenging problem and without the concept of lookahead, the resulting code will be complicated and hard to understand. A friend recently told me that he had written a complicated regex that ignores the order of occurrences of two words in a given text. The concept of lookahead is a very powerful one and any advanced coder should know it. In case you had some problems understanding the concept of lookahead (and why it doesn’t consume anything), have a look at this explanation from the matching group tutorial on this blog: Positive Lookahead (?=…) Together, this regular expression matches all lines that do not contain the specific word '42'. You can read more about the flags argument at this blog tutorial. As it turns out, there’s also a blog tutorial on the dot metacharacter.įinally, you need to define the re.MULTILINE flag, in short: re.M, because it allows the start ^ and end $ metacharacters to match also at the start and end of each line (not only at the start and end of each string). which matches all characters except the newline character '\n'. Thus, you need to consume it manually by adding the dot metacharacter. The lookahead itself doesn’t consume a character. If you need a refresher on lookaheads, check out this tutorial.
Which characters do you match? Only those where you don’t have the negative word '42' in your lookahead. If you need help understanding the asterisk quantifier, check out this blog tutorial. In between, you match an arbitrary number of characters: the asterisk quantifier does that for you. If you need a refresher on the start-of-the-line and end-of-the-line metacharacters, read this 5-min tutorial. The regex pattern '^((?!42).)*$' matches the whole line from the first position '^' to the last position '$'. The re.finditer(pattern, string) accomplishes this easily by returning an iterator over all match objects. The general idea is to match a line that doesn’t contain the string ‘ 42', print it to the shell, and move on to the next line. You can see that the code successfully matches only the lines that do not contain the string '42'.
I’ll show you the code first and explain it afterwards: import reįor match in re.finditer('^((?!42).)*$', s, flags=re.M):