Sunday, November 16, 2014

Regular expressions in java

http://www.vogella.com/tutorials/JavaRegularExpressions/article.html


Generic Java function to match RE

public static boolean testRE (String s){
    Pattern pattern = Pattern.compile("\\d{3}");
    Matcher matcher = pattern.matcher(s);
    if (matcher.find()){
      return true; 
    } 
    return false; 










[tT]rue - match true or true

[tT]rue|[yY]es - matches true,True,yes,Yes

*true.* - matches exactly true

[a-zA-Z]{3} - match 4 character word

^[^\\d] - true if String dosent match a digit at beggining

([\\w&&[^b]])* - true if string contains characters excluding b

"[^0-9]*[12]?[0-9]{1,2}[^0-9]*" - less than 300 (HOW????)

.*(jim|joe).* - Match either jim or Joe

A phone number in this example consists either out of 7 numbers in a row or out of 3 number, a (white)space or a dash and then 4 numbers. 

\\d\\d\\d([,\\s])?\\d\\d\\d\\d - 1233323322

[0-9]{2} (Check Correctness) -check if a text contains a number with 3 digits.
\\d{3} 

\b(\w+)\s+\1\b  - find duplicated words

 following regular expression allows you to find the "title" word, in case it starts in a new line, potentially with leading spaces.

(\n\s*)title 


Password compexity :


((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%]).{6,20})


(   # Start of group
  (?=.*\d)  #   must contains one digit from 0-9
  (?=.*[a-z])  #   must contains one lowercase characters
  (?=.*[A-Z])  #   must contains one uppercase characters
  (?=.*[@#$%])  #   must contains one special symbols in the list "@#$%"
              .  #     match anything with previous condition checking
                {6,20} #        length at least 6 characters and maximum of 20 
)   # End of group



^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$

   #start of the line
  [_A-Za-z0-9-]+ #  must start with string in the bracket [ ], must contains one or more (+)
  (   #  start of group #1
    \\.[_A-Za-z0-9-]+ #     follow by a dot "." and string in the bracket [ ], must contains one or more (+)
  )*   #  end of group #1, this group is optional (*)
    @   #     must contains a "@" symbol
     [A-Za-z0-9]+       #        follow by string in the bracket [ ], must contains one or more (+)
      (   #    start of group #2 - first level TLD checking
       \\.[A-Za-z0-9]+  #      follow by a dot "." and string in the bracket [ ], must contains one or more (+)
      )*  #    end of group #2, this group is optional (*)
      (   #    start of group #3 - second level TLD checking
       \\.[A-Za-z]{2,}  #      follow by a dot "." and string in the bracket [ ], with minimum length of 2
      )   #    end of group #3
$   #end of the line


([^\s]+(\.(?i)(jpg|png|gif|bmp))$

Image file extension

(   #Start of the group #1
 [^\s]+   #  must contains one or more anything (except white space)
       (  #    start of the group #2
         \.  # follow by a dot "."
         (?i)  # ignore the case sensitive checking
             (  #   start of the group #3
              jpg #     contains characters "jpg"
              |  #     ..or
              png #     contains characters "png"
              |  #     ..or
              gif #     contains characters "gif"
              |  #     ..or
              bmp #     contains characters "bmp"
             )  #   end of the group #3
       )  #     end of the group #2 
  $   #  end of the string
)   #end of the group #



IP address
^([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.
([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])$


^  #start of the line
 (  #  start of group #1
   [01]?\\d\\d? #    Can be one or two digits. If three digits appear, it must start either 0 or 1
  #    e.g ([0-9], [0-9][0-9],[0-1][0-9][0-9])
    |  #    ...or
   2[0-4]\\d #    start with 2, follow by 0-4 and end with any digit (2[0-4][0-9]) 
    |           #    ...or
   25[0-5]      #    start with 2, follow by 5 and end with 0-5 (25[0-5]) 
 )  #  end of group #2
  \.            #  follow by a dot "."
....            # repeat with 3 time (3x)
$  #end of the line


12 hr TIME format 
(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)


(    #start of group #1
 1[012]    #  start with 10, 11, 12
 |    #  or
 [1-9]    #  start with 1,2,...9
)    #end of group #1
 :    #    follow by a semi colon (:)
  [0-5][0-9]   #   follow by 0..5 and 0..9, which means 00 to 59
            (\\s)?  #        follow by a white space (optional)
                  (?i)  #          next checking is case insensitive
                      (am|pm) #            follow by am or pm


24hr time format

([01]?[0-9]|2[0-3]):[0-5][0-9]

(    #start of group #1
 [01]?[0-9]   #  start with 0-9,1-9,00-09,10-19
 |    #  or
 2[0-3]    #  start with 20-23
)    #end of group #1
 :    #  follow by a semi colon (:)
  [0-5][0-9]   #    follow by 0..5 and 0..9, which means 00 to 

dat format dd/mm/yyyy

(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/((19|20)\\d\\d)

   #start of group #1
 0?[1-9]  #  01-09 or 1-9
 |                   #  ..or
 [12][0-9]  #  10-19 or 20-29
 |   #  ..or
 3[01]   #  30, 31
)    #end of group #1
  /   #  follow by a "/"
   (   #    start of group #2
    0?[1-9]  # 01-09 or 1-9
    |   # ..or
    1[012]  # 10,11,12
    )   #    end of group #2
     /   # follow by a "/"
      (   #   start of group #3
       (19|20)\\d\\d #     19[0-9][0-9] or 20[0-9][0-9]
       )  #   end of group #3





html regular exprerssion

<("[^"]*"|'[^']*'|[^'">])*>

<    #start with opening tag "<"
 (  #   start of group #1
   "[^"]*" # only two double quotes are allow - "string"
   |  # ..or
   '[^']*' # only two single quotes are allow - 'string'
   |  # ..or
   [^'">] # cant contains one single quotes, double quotes and ">"
 )  #   end of group #1
 *  # 0 or more
>  #end with closing tag ">"
For advanced regular expressions the java.util.regex.Pattern and java.util.regex.Matcher classes are used. You first create a Pattern object which defines the regular expression. This Pattern object allows you to create a Matcher object for a given string. This Matcher object then allows you to do regex operations on a String.

No comments:

Post a Comment