Regex examples

A regular expression, regex or regexp is a sequence of characters that define a search pattern. Usually this pattern is then used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation. - wikipedia

- Translator - regextranslator.com - Test expressions - regextester.com - Cheat sheet - rexegg.com - PCRE manual - pcre.org - Library - regexlib.com

# Issues

Issue: I don't seem to be able to pipe regular expressions any more - stackoverflow
This may be related to Livecode bug where matchchunk returns empty instead of true or false and does nothing.

# Examples

The following code snippets were extracted from handlers in various libraries. They need testing and organising for quality:

# Regex chunks

put "([^<]*)" into notOpeningBracket put "([^>]*)" into notClosingBracket put "([^" & quote & "'" & "]*)" into notQuote put "(?Uim)" into ungreedyMultiReg put "['" & quote & "]" into anyQuote put "\s+" into someSpace put "<\s*/\s*" & tagName & "\s*>" into closingTagReg

# Tags

get "(<\s*" & tagName & "\s[^>]*>)"

Some more...

get "(" & ">" & "|" & "\s+.*>" & ")" -- openingReg get "(?im)(<\s*" & tagName & "\s+[^>]*\s*>)" get "(?Uim)(<\s*" & tagName & someEnding & ")" -- closingReg get "(?Uim)(<\s*/" & tagName & "\s*>)" -- everything between two XML tags <primaryAddress>[\s\S]*?<\/primaryAddress>

- http://regexlib.com/REDetails.aspx?regexp_id=2762 - http://regexlib.com/REDetails.aspx?regexp_id=2301

# Quoted

get "(" & quote & "[^" & quote & "]*" & quote & ")" get matchchunk (sText, it, sChar, eChar) -- regular expression for any quoted text put kwote ("([^"]|\n)*") into sReg -- double quoted strings on multiple lines "([^"]|\n)*"

# HTML

-- html_ConstructColouredText get "<font color=" &quote& "([^\>]+)" &quote& ">" -- html_RevToColourSpan get "(?miU)(<font color=').*(</font>)" -- html_RevToBoldSpan get "(?miU)(<b>).*(</b>)"

# HTML Tags and Comments

Matches any HTML tag with any parameters and HTML Comments - regexlib.com

<!*[^<>]*>

# Find HTML Link Tags This pattern matches link tags in html and returns the contents of the href attribute and the text of the link - regexlib.com

-- regExp ^<a[^>]*(http://[^"]*)[^>]*>([ 0-9a-zA-Z]+)</a>$

-- Matches <a href="http://www.google.com">Google</a>

# html_ExtractAllLinks

get "href[\s]*=[\s]*'([^\n']*)'" replace "'" with quote in it

# html_ExtractLinkNumArray

Requires each link on a new line The regular expressions do not require that - and if you use XML handlers we get one big single line So could make more robust by deleting at end of each repeat

<li><a href="Tower" title="Victoria Tower">Victoria Tower</a></li> <li><a href="Aden" title="Big Ben Aden">Big Ben Aden</a></li>

-- U is for non-greedy get "(?miU)(<span style='color:).*(</span>)" get "(?Uim)(<\s*" & tagName & someEnding & ")"

# mediawiki_SetTemplateOffsets

put "[^\}]+" into untilCurlyBracket get "(\{\{" & tName & untilCurlyBracket & "\}\})" put "(?m)" before someReg -- multiline search

# html_ParamDeconstuct

Replace lineFeed with empty in inputTag -- seems necessary - regexlib.com

Pattern: \s(type|name|value)=(?:(\w+)|(?:"(.*?)")|(?:\'(.*)\')) Author: Carey Bishop Matching Text: <input type="text" value='somevalue' name=fred> Non-Matching Text: Any attributes that aren't "type", "name", or "value"

Returns the three most important attributes from an HTML <input> tag: 'type', 'name' and 'value'. Supports attribute values that are double- or single-quoted or unquoted.

Returns four references, the first being the name of the attribute, and the other three being the value, of which only one will be populated based on the way the value was quoted.

get "\s(type|name|value)=(?:(\w+)|(?:" & quote & "(.*?)" & quote & ")|(?:\'(.*)\'))"

# Script processing

- script_SetPrivateHandlerOffsets - script_SetHandlerOffsets - script_MatchEnd

get "(?mi)^\s*(" & oWrd & " +" & hName & ")\W" get "(?mi)^(private +" & oWrd & " +" & hName & ")\W" get "(?mi)^\s*(" & "end" & " +" & hName & ")\W"

# Wiki

private command _SetImageTagOffsets put "(\[\[Image:.+\]\])" into someReg put matchchunk (...) into someBoolean return someBoolean end _SetImageTagOffsets private command _SetFileTagOffsets put "(\[\[File:.+\]\])" into someReg put matchchunk (...) into someBoolean return someBoolean end _SetFileTagOffsets

# fedwikipedia_CleanInternalLinks

-- strips out image files as well (carefull) -- some starting text Machtergreifung|came to power something Real Link|well a second one at end -- Berlin was devastated by Bombing of Berlin in World War II|bombing raids, fires and street battles during

get "\[\[([^\]]+)\|([^\]]+)\]\]" into someReg put "(?m)" before it

-- fedwikipedia_CleanSeeAlso put "[^\}]+" into untilCurly get "\{\{See also\|(" & untilCurly & ")\}\}" put "(?m)" before it

# Other

-- put "(?Umi)\W*(graph\W\[.*\];)" into someReg -- does not work for multiline's -- put "(?Umi)\W(graph\W*\[.*)" into someReg put "(?Uim)\W*(graph\W*\[" into someReg put CR & ".*" after someReg put "\];)" after someReg

# General handlers

private command _StripReg @wikiText, sReg local rStart, rEnd repeat get matchchunk (wText, sReg, rStart, rEnd) if rStart is not a number then exit repeat -- regExp bug??? end if if it is true then put char rStart to rEnd of wText into sTest delete char rStart to rEnd of wText else exit repeat end if end repeat end _StripReg

# See also

- Regular expression