Pattern Matching Using Python, in CSC250

Due Friday February 12, 2010, class time

You are very much encouraged to work with one or two partners.

Refer to the Pattern Matching Lab held on February 8.
Answer the following questions. The whole thing should be stored in a file called relab.py. Include propoer documentation at the top of the file, using python's comment character #.
Something like:
# Names: Judy Franklin, Jean-Luc Ponty, and Stanley Clarke
# Class: csc250
# Contents: functions and text answers for relab
# Date: February 1, 2010

import re
Don't forget to put in the import re statement to import the re functions. Use python function definitions to test your regular expressions. This is easier than retyping and editing on the python interpreter command line. You will submit this file electronically, by Friday February 12, class time, by typing
submit relab relab.py
from your 111b-?? account on beowulf. Of course, make sure you have placed your file, relab.py in your class account by then.
  1. Question 1
    When we left the lab last Monday, we used backreferencing to match two html tags (see this web page, http://www.regular-expressions.info/named.html). Write a more complex expression, using two backreferences to match two sets of html tags, one embedded in the other. Get this to produce a match on the string
    >>> as = r'<html><title> The spring 2010 foundations class</title></html>'
    as well as
    >>> as = r'<body><h3> The spring 2010 foundations class</H3></BODY>'
    
    Don't forget to turn off case sensitivity.
    Recall that for a single set of tags we used
    >>> match = re.search(r'<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>', as, re.IGNORECASE)
    
    and typed both
    >>> print match(0)
    and
    >>> print match(1)
    
    to see the results. Do this in a function definition in python, in your file called relab.py.

  2. Question 2
    Type all of your answers to this part into the same file, relab.py. Start each line of text with python's comment symbol, #.
    For example:
    # \b is a word boundary
    # \d{1,3} indicates between 1 and 3 digits
    # etc.
    
    In the IP address example on the same examples web site, http://www.regular-expressions.info/examples.html, explain exactly how the three regular expressions work:
    1.
     \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
    2.
    \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
             (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
    (all on one line)

    3.
     \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
  3. Question 3
    Read http://www.regular-expressions.info/completelines.html, information on using regular expressions to find lines of text.
    We've already started looking at this, with our brief description of negative and positive lookahead in the lab (PatternLab.html). Write a regular expression that matches a complete line of text that contains all of the words
    "melody", "similarity", and "computer", in any order. Use the regular expression and examples within a function definition in your file relab.py.
    Describe how your regular expression works, in detail. Again
    # use python's comments to answer the text
    #   part of this homework.
    

    Don't forget to submit by class time Friday Feb 12:
    submit relab relab.py