Skip Navigation

Regular Expressions Cheat Sheet (V2)

Overview

Regular Expressions Cheat Sheet The Regular Expressions cheat sheet is a one-page reference sheet. It is a guide to patterns in regular expressions, and is not specific to any single language.

This is the second version of the Regular Expressions cheat sheet. The previous version can be found at http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet-version-1/.

If you like the cheat sheets, and want to say thanks, please consider buying me something from my Amazon Wishlist. Thankyou very much to those who have already hunted it down and sent me something - I'm very grateful!

Downloads

The Regular Expressions Cheat Sheet is released under a Creative Commons License (Attribution, Non-Commercial, Share Alike).

Please note: If you wish to link to a cheat sheet from elsewhere, please link to this page so others find all available versions, the license and the description.

What's New?

There are a few small changes from the first version of the Regular Expressions Cheat Sheet (which you can still download if you prefer). The most obvious change may be that it now looks different. Hopefully it's now clearer and a little easier to find the information you're looking for.

About This Guide

I have included a little more detail in this document where I felt it would be helpful to those less familiar with regular expressions, to demonstrate some of the items on the sheet. Please feel free to let me know if any additions would be helpful.

Please also note that not everything on this sheet will work with every language that has regular expression support. Different languages use regular expressions in different ways, and in some, support is incomplete.

Anchors

Thumbnail highlighting Anchors section. Anchors in regular expressions refer to the start and end of things. This can be, for example, a string or word. These characters and symbols represent these anchors in regular expressions. For example, a pattern that matched a string that started with numbers might be the following, where "^" represents the start of the string.

  1. ^[0-9]+

Without the "^" symbol, the pattern would match any string with a digit in it.

Character Classes

Thumbnail highlighting Character Classes section. Character Classes in regular expressions match a selection of characters at once. For example, "\d" will match any digit from 0 to 9 inclusive. "\w" will match letters and digits, and "\W" will match everything but letters and digits. A pattern to indentify letters, numbers or whitespace could be:

  1. \w\s

POSIX

Thumbnail highlighting posix section. POSIX is a relatively new addition to the regular expressions family, and is quite similar to the idea behind character classes, allowing you to use a shortcut to represent a particular group of characters.

Assertions

Thumbnail highlighting assertions section. Almost everyone has some trouble with assertions at first. They are tricky to get to grips with, but once you are familiar with them, you will use them alarmingly often. They provide a way to say "I want to find out every word in this document with a q in it, as long as that q isn't followed by 'werty'".

  1. [^\s]*q(?!werty)[^\s]*

The above code starts by matching non-whitespace characters ([^\s]*), then a q (err ... q). Then the parser reaches the lookahead assertion. This makes the q conditional. The q will only be matched if the assertion is true. In this case, the assertion is a negative assertion. It will be true if what it checks for is not found.

So, it checks the next few characters against the pattern it has (werty). If they are found, the assertion is false, and so it will "ignore" the q - it will not match. If it doesn't find "werty", the assertion is true, and the q is matched. It then carries on checking for non-whitespace characters.

Sample Patterns

Thumbnail highlighting Sample Patterns section. Finally, there is a selection of sample patterns. These patterns are intended to allow you to look at how regular expressions might be used in day-to-day work, and the various ways you can use regular expressions. Please note, however, that they will not necessarily work in every language, as each has its own idiosyncracies and varying support for regular expressions.

Quantifiers

Thumbnail highlighting Quantifiers section.Quantifiers allow you to specify a part of a pattern that must be matched a certain number of times. For example, if you wanted to find out if a document contained between 10 and 20 (inclusive) of the letter "a" in a row, you could use this pattern:

  1. a{10,20}

Quantifier are "greedy" by default. So the quantifier "+", which means "one or more", will match as many items as possible. This can be a problem on occasion, so you can tell a quantifier to not be greedy (to be "lazy"), using a modifier. Consider the following code:

  1. ".*"

This will match text contained in quotation marks. However, you may have a string like this:

  1. <a href="helloworld.htm" title="Hello World">Hello World</a>

The pattern above will match the following from the above string:

  1. "helloworld.htm" title="Hello World"

It has been too greedy, matching as much text as it could.

  1. ".*?"

The above pattern will also match any characters contained in quotation marks. The non-greedy version (note the "?" modifier) will match as little as possible of the string, so will match each item in quotation marks separately:

  1. "helloworld.htm"
  1. "Hello World"

Special Characters

Thumbnail highlighting Special Characters section. Regular expressions use symbols to represent certain things. However, that presents a problem if you want to detect a character in a string where that character is a symbol. A period (".") for example, in a regular expression, represents "any character except the new line character". If you want to find a period in a string, you can't just use "." as a pattern - it will match just about everything. So, you need to tell the parser to treat the period as a literal period rather than a special character. This you do with an escape character.

An escape character precedes the special character and tells the parser to ignore what follows. There are certain characters that will need to be escaped in the majority of patterns and languages, and you can find these characters listed at the bottom right of the cheat sheet.

The pattern to match a period is:

  1. \.

Other special characters in regular expressions represent unusual elements in text. New lines and tabs, for example, can be typed using a keyboard, but are likely to trip up programming languages. The special characters use the escape character as well, to tell the regular expression parser that the following character is to be treated as a special character rather than a normal letter or number.

String Replacement

Thumbnail highlighting String Replacement section. String replacement is covered in more detail in the "Groups and Ranges" section below, however one small point to note is the existence of "passive" groups. These are groups that are ignored for the purposes of replacement. This is very useful when you want to match something that requires an "or" section, but don't want it in the replacement.

Groups and Ranges

Thumbnail highlighting Groups and Ranges section. Groups and ranges are very very useful. Ranges are perhaps the easiest place to begin. They allow you to specify a selection of characters to match. For example, if you wanted to see if a string contained hexadecimal characters (zero to nine and a to f), you would use this range:

  1. [A-Fa-f0-9]

If you wanted to see if a string did not contain the same, you would use a negative range, which in this case will match any character that isn't zero to nine or a to f.

  1. [^A-Fa-f0-9]

Groups are essential to regular expressions, and are most often used when you want to use "or" in a pattern, or you want to reference part of a pattern later in the same pattern, or where using regular expression string replacement.

To use "or" is very simple - the following will match "ab" or "bc":

  1. (ab|bc)

If you want to reference a previous group in a regular expression, you would use "\n", where "n" is the number of the group. You might need a pattern to match "aaa" or "bbb", followed by numbers, followed by the same 3 letters, and this would be done with groups, like so:

  1. (aaa|bbb)[0-9]+\1

The above matches "aaa or bbb", and groups the match with the brackets. This is followed by a pattern for one or more numbers ("[0-9]+"), then finally "\1". The "\1" backreferences the first group, and looks for the same thing. It will match the matched text from the string, not the pattern, so "aaa123bbb" will not match the above pattern, as the "\1" will be looking for "aaa" to follow the numbers.

String replacement is one of the most useful tools of regular expressions. You can use "$n" to reference groups matched with the pattern when replacing text. Let's say you are want to make every instance of the word "wish" bold in a block of text. You would use a regular expression replacement function for this, which might look a little like this:

  1. replace(pattern, replacement, subject)

The pattern is first, and would be something like the following (you would need a few extra characters for this specific function.

  1. ([^A-Za-z0-9])(wish)([^A-Za-z0-9])

This will find any instance of the word wish where it is preceded and followed by any non-alphanumeric character.

Your replacement can then be:

  1. $1<b>$2</b>$3

This replacement will replace the whole pattern matched above. We start with the first character matched above ($1) (the first non-alphanumeric one), otherwise we'll be deleting characters from the block of text. The same applies at the end ($3) of the match. In the middle, we add the HTML tags for bold text (though you should use CSS or <strong>, of course), with the second group matched in the pattern ($2).

Pattern Modifiers

Thumbnail highlighting Pattern Modifiers section. Pattern modifiers are used in several languages, most notably Perl. These allow you to change how the parser works. For example, the "i" modifier will tell the parser to ignore case.

In Perl, regular expressions contain the same character at the beginning and end. This can be any character at all (often "/"), and is used like so:

  1. /pattern/

Modifiers would be added at the end of this, like so:

  1. /pattern/i

Metacharacters

Thumbnail highlighting Metacharacters section. Finally, the last section of the cheat sheet lists the meta-characters. These are the characters that have special meaning in regular expressions, so if you want to use them literally, they must be escaped.

So, if you wanted to match test consisting of a bracket, you would need to use the following pattern:

  1. \(

Translations

17 comments

Excellent. Thank you. This will help me a lot. Bookmarked and I will link to this in my blog. I always keep on forgetting some things of regex, as I only need them occasionally.
thanks you for this update, the first one wasnt good enough and i think this is much better :)
This is the only organized, very well at that, regex explanation I have ever seen. I feel like I can actually start using them more often now because now I know what I am looking at! Thanks!
 United States #4: July 25, 2008
I am a novice programmer, and love the potential of regular expressions yet hate the nuances of implementation. I can't wait until I work on my next project so I can make use of this excellent cheat sheet. Thanks for the work which I am sure has gone in to each of the great cheat sheets on this site!
Thomas Knowles
United Kingdom #5: August 4, 2008
Thank you, you really have assisted me in my work.
 United States #6: August 5, 2008
Let me second what Billy posted above. While I'm not a novice programmer, I am a newbie when it comes to regular expressions and, like Billy, I am easily confused by the nuances of implementation. Now, I will keep your regular expression cheat sheet next to my laptop to use as a quick reference guide!
Excellent, thanks!
excellent sheet. thanks
Great! I just can say that about your blog. It's even more great! Thank you so much. I'm downloading your sheets and very like them!
Thanks a lot, mate! :)
Nathan Mahon
United States #11: September 16, 2008
Some of the sample patterns could be simplified...
images: (\S+\.(gif|jpg|png)$)
1-50: (^([1-9]|[1-4][0-9]|50)$)
Hex: (^#?[A-Fa-f0-9]{3}([A-Fa-f0-9]{3})?$) without the ^ and $ it'd match anything with 3 consecutive valid characters anywhere, like #adhdaa...
Email can have numbers and hyphens in the domain, and underscores are invalid, but to allow be specific to match all of the legitimate email addresses is beyond a simple example. see http://www.regular-expressions.info/email.html ... :)
Thank you VERY much! It's very usefull sheet!! Perfect work!
Josh
United Kingdom #13: October 12, 2008
Suggestion: perhaps a one-click link to the image so that we can skip the "you are downloading a file, well done" page? Or perhaps send the MIME (assuming that's the problem here) header so that the download could be opened in the browser instead of a "what do you want to do with it" dialogue.

It seems the new site has taken a step back in this regard :/

Thanks for the useful cheat sheet though!
Thanks for such a nice sheet.
\w{3}
Earth #15: 3 days ago
I have a comment about the regular expression to match 1-50 digits. Wouldn't the following expression be better?

^[1-9][0-9]{0,49}$/

Or even better, with \d

^\d\d{0,49}$

The {x,y} notation is very useful
\w{3}
Earth #16: 3 days ago
Sorry, I forgot something in the last expression, it should have been

^[1-9]\d{0,49}$
hanbiaoo
China #17: 14 hours ago
Perfect work! Thank you VERY much! It's very usefull sheet!!

Post Your Comment

· Comments with keywords instead of a name have their URLs removed.
· Your email address will not be displayed or shared.

Live Comment Preview

 United States #18: 1 minute ago