After installing numpy, you can define a matrix in one of the following ways:
from numpy import matrix
A = matrix( ( (3,4),(2,16)))
# or
A = matrix("3 4; 2 16")
# or even
A = matrix( (3,4,2,16) )
A.resize( (2,2) )
You can produce the transpose of A, written AT with:
print("transpose of A = \n%s" % A.transpose() )
producing output:
transpose of A = [[ 3 2] [ 4 16]]
or if you already have a numpy.array, you can create it from that. We’ll see later why you need to use numpy.matrix and not numpy.array:
from numpy import matrix, array
A2 = array( ( (3,4),(2,16)))
A = matrix(A2)
print("A[1,0]=%s" % A[1,0])
This produces
A[1,0]=2
Note that, when referring to individual elements of the matrix, numpy is 0-based, so A[1,0] is the 2nd row, 1st column.
Adding two matrices and multiplying a matrix by scalar is straightforward. Taking the inverse of a matrix is a little less obvious. The example uses matrix A as defined above:
from numpy import linalg
inverseOfA = linalg.inv(A)
print("inverseOfA =\n%s" % inverseOfA)
print("A * inverseOfA =\n%s" % (A * inverseOfA) )
shows the inverse, written A-1, and shows that multiplying A and A-1 produces the identity matrix:
inverseOfA = [[ 0.4 -0.1 ] [-0.05 0.075]] A * inverseOfA = [[ 1. 0.] [ 0. 1.]]
Finally, matrix multiplication is the reason to use numpy.matrix, and numpy.array. Here’s an example:
arrayA = array( ( (3,4),(2,16))) matrixA = matrix(arrayA) print "Multiplication as array = \n%s" % (arrayA * arrayA) print "Multiplication as matrix = \n%s" % (matrixA * matrixA)
outputs:
Multiplication as array = [[ 9 16] [ 4 256]] Multiplication as matrix = [[ 17 76] [ 38 264]]
In the array multiplication, Resultij = Aij * Aij, which is not what is needed for matrix multiplication!
]]>aTuple = (1, 2, 3, 4, 5) print (aTuple[1:3]) print (aTuple[0:7]) print(aTuple[3:1:-1])
which produces:
(2, 3) (1, 2, 3, 4, 5) (4, 3)
It doesn’t matter that, in the second instance, the end index is beyond the last index of the string. Also, the end index is excluded.
If you want to go to the start of the tuple, then you need to set the end index to -1:
print(aTuple[3:-1:-1])
But this doesn’t produce the expected result.
( )
This is because negative indexes are from the end of the tuple. So -1 is the last element in the tuple.
print(aTuple[2:-1])
produces:
(3, 4)
Again, this excludes the final index. To include either the start or the end of the tuple as the final element of the result, the end index should be left empty:
print(aTuple[2:]) print(aTuple[2::-1])
giving:
(3, 4, 5) (3, 2, 1)
Finally, in most cases where you can use a tuple, you can use a list instead:
aList = list(aTuple) print (aList[1:3]) print (aList[0:7]) print(aList[3:1:-1]) print(aList[3:-1:-1]) print(aList[2:-1]) print(aList[2:]) print(aList[2::-1])
producing similar results as before, but lists instead of tuples:
[2, 3] [1, 2, 3, 4, 5] [4, 3] [] [3, 4] [3, 4, 5] [3, 2, 1]
But this doesn’t work for the % operator:
print("aTuple[0]=%s, aTuple[1]=%s, aTuple[2]=%s, aTuple[3]=%s, aTuple[4]=%s" % aTuple)
print("aList[0]=%s, aList[1]=%s, aList[2]=%s, aList[3]=%s, aList[4]=%s" % aList)
aTuple[0]=1, aTuple[1]=2, aTuple[2]=3, aTuple[3]=4, aTuple[4]=5
Traceback (most recent call last):
File "c:/prr/cgibin/data/prr/codebright/tuple.py", line 29, in <module>
print("aList[0]=%s, aList[1]=%s, aList[2]=%s, aList[3]=%s, aList[4]=%s" % aList)
TypeError: not enough arguments for format string
]]>First main_module.py:
g_main_value = None
def get_main_value():
return g_main_value
def main_test():
print("sub_value = %s" % get_sub_value())
from sub_module import get_sub_value, sub_test, sub_init
def main_init():
global g_main_value
g_main_value = 23
if __name__ == "__main__":
main_init()
sub_init()
main_test()
sub_test()
print("main_value (in main_module) = %s" % get_main_value() )
Then sub_module.py:
g_sub_value = None
def get_sub_value():
return g_sub_value
def sub_test():
print("main_value (in sub_module) = %s" % get_main_value())
def sub_init():
global g_sub_value
g_sub_value = 31
from main_module import get_main_value
The output is:
sub_value = 31 main_value (in sub_module) = None main_value (in main_module) = 23
What’s happening? A global with 2 different values? Add the following code to the bottom of main_module:
def show_globals(module_name):
mod = __import__(module_name)
print(module_name)
for k in dir(mod):
if k.startswith("g_"):
print(" %s: %s" % (k, getattr(mod, k)))
print("")
if __name__ == "__main__":
print("")
show_globals("__main__")
show_globals("main_module")
show_globals("sub_module")
The output is:
__main__ g_main_value: 23 main_module g_main_value: None sub_module g_sub_value: 31
This gives us a clue as to what’s happening: the __main__ module may be loaded twice if used by another module. If you wanted to set a global in the __main__ module that is usable by other modules, then one way to do it is:
Add the following to main_module.py
mod = __import__("main_module")
setattr(mod, "g_main_value2", 34)
if __name__ == "__main__":
print("")
show_globals("__main__")
show_globals("main_module")
show_globals("sub_module")
And the output will be
__main__ g_main_value: 23 g_main_value2: None main_module g_main_value: None g_main_value2: 34 sub_module g_sub_value: 31]]>
<?php
$res = explode(",", "");
print_r($res);
?>
produces:
Array
(
[0] =>
)
I was expecting (needing?) an empty array. After thinking about it a bit more, it is clear that returning a single empty string is a logical return value. However, different languages have different returns for this special case. Python (2.6.4 and 3.1.2) are both like PHP (5.3.5), but Perl (5.10.1) behaves differently:
For PHP:
<?php
function test_explode($txt)
{
print("Exploding '$txt' to give [" );
$sep = "";
$res = explode(",", $txt);
foreach ($res as $elt)
{
print "$sep'$elt'";
$sep = ", ";
}
print "]\n";
}
test_explode("a,b");
test_explode(",");
test_explode("");
?>
produces:
Exploding 'a,b' to give ['a', 'b'] Exploding ',' to give ['', ''] Exploding '' to give ['']
For Python:
def test_split(txt):
print("Splitting '%s' to give %s" % (txt, txt.split(",")))
test_split("a,b")
test_split(",")
test_split("")
produces:
Splitting 'a,b' to give ['a', 'b'] Splitting ',' to give ['', ''] Splitting '' to give ['']
For Perl:
sub test_split
{
my $txt = shift;
my @res = split(",", $txt);
print("Splitting '$txt' to give [");
my $sep = "";
foreach my $elt (@res)
{
print "$sep'$elt'";
$sep = ", ";
}
print "]\n";
}
test_split("a,b");
test_split(",");
test_split("");
produces:
Splitting 'a,b' to give ['a', 'b'] Splitting ',' to give [] Splitting '' to give []]]>
My test data consists of 950,000 lines in 5 gzipped Apache log files. I compared the times using the Python gzip module, and the zcat method suggested in http://bugs.python.org/issue7471.
A quick summary of results: yes, the issue is still relevant. The average times are:
For Python 2.6.4
For Python 3.1.2
Below are the details. First, the source code to time the tests:
import os
import sys
if sys.version.startswith("3"):
import io
io_method = io.BytesIO
else:
import cStringIO
io_method = cStringIO.StringIO
import gzip
import subprocess
import time
dirname = "test"
fl = os.listdir(dirname)
fl.sort()
ttime = [0, 0]
runs = 5
for i in range(2 * runs):
st = time.time()
for fn in fl:
if not fn.endswith(".gz"):
continue
cc = 0
lc = 0
sz = 0
fullfn = os.path.join(dirname, fn)
sz += os.stat(fullfn)[6]
if i % 2 == 0:
fh = gzip.GzipFile(fullfn, "r")
else:
p = subprocess.Popen(["zcat", fullfn], stdout = subprocess.PIPE)
fh = io_method(p.communicate()[0])
assert p.returncode == 0
for line in fh:
lc += 1
cc += len(line)
et = time.time()
dt = et - st
ttime[i % 2] += dt
print("time-taken = %0.2f seconds, 1000 characters per second = %0.0f, file size per second = %0.0f, character count=%s, line count=%s file size = %s" % (dt, 0.001 * cc / dt, 0.001 * sz / dt, cc, lc, sz))
print("\nAverages")
print(" gzip.open - %0.1f seconds" % (ttime[0] / runs))
print(" zcat and pipe - %0.1f seconds" % (ttime[1] / runs))
Timings for Python 2.6.4:
time-taken = 9.71 seconds, 1000 characters per second = 5237, file size per second = 480, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.28 seconds, 1000 characters per second = 22300, file size per second = 2044, character count=50859394, line count=194151 file size = 4661207 time-taken = 9.48 seconds, 1000 characters per second = 5363, file size per second = 492, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.24 seconds, 1000 characters per second = 22750, file size per second = 2085, character count=50859394, line count=194151 file size = 4661207 time-taken = 9.44 seconds, 1000 characters per second = 5389, file size per second = 494, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.30 seconds, 1000 characters per second = 22156, file size per second = 2031, character count=50859394, line count=194151 file size = 4661207 time-taken = 9.42 seconds, 1000 characters per second = 5397, file size per second = 495, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.26 seconds, 1000 characters per second = 22516, file size per second = 2064, character count=50859394, line count=194151 file size = 4661207 time-taken = 9.61 seconds, 1000 characters per second = 5293, file size per second = 485, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.31 seconds, 1000 characters per second = 22033, file size per second = 2019, character count=50859394, line count=194151 file size = 4661207 Averages gzip.open - 9.5 seconds zcat and pipe - 2.3 seconds
Timings for Python 3.1.2:
time-taken = 11.65 seconds, 1000 characters per second = 4364, file size per second = 400, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.87 seconds, 1000 characters per second = 17691, file size per second = 1621, character count=50859394, line count=194151 file size = 4661207 time-taken = 11.66 seconds, 1000 characters per second = 4362, file size per second = 400, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.56 seconds, 1000 characters per second = 19834, file size per second = 1818, character count=50859394, line count=194151 file size = 4661207 time-taken = 11.54 seconds, 1000 characters per second = 4408, file size per second = 404, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.64 seconds, 1000 characters per second = 19295, file size per second = 1768, character count=50859394, line count=194151 file size = 4661207 time-taken = 11.58 seconds, 1000 characters per second = 4393, file size per second = 403, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.59 seconds, 1000 characters per second = 19659, file size per second = 1802, character count=50859394, line count=194151 file size = 4661207 time-taken = 11.68 seconds, 1000 characters per second = 4354, file size per second = 399, character count=50859394, line count=194151 file size = 4661207 time-taken = 2.97 seconds, 1000 characters per second = 17123, file size per second = 1569, character count=50859394, line count=194151 file size = 4661207 Averages gzip.open - 11.6 seconds zcat and pipe - 2.7 seconds]]>
1. enumerate
A built-in function, which allows you to replace code like this:
day_list = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day_index in range(len(day_list)):
day_name = day_list[day_index]
print(day_index, day_name)
with the more succinct:
day_list = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
for day_index, day_name in enumerate(day_list):
print(day_index, day_name)
2. gzip.GzipFile
I used this whilst processing Apache log files last week. It offers compression to less than 10% of the original uncompressed size, but seems to take the same length of time to read/write, implying a good trade-off between processor use and hard-disk activity. Having changed the open code from
fh = open(<logfile>, "r")
to:
import gzip ... fh = gzip.GzipFile(<logfile>, "r")
the rest of the code runs unchanged.
Note: if you are running on Linux, then the issue of speed, documented at http://bugs.python.org/issue7471, is relevant, and the zcat approach is faster.
3. collections.defaultdict
This provides a dictionary, which initialises a key on first read using a method.
For instance:
import collections ... zerodict = collections.defaultdict(int) print(zerodict["key"], zerodict[4])
produces:
0 0]]>
Try and predict the outcome (should work with Python 2.5 – Python 3.1)
import re abcd = "abcd" ad = "ad" cre = re.compile(r"^(a)(bc)?(d)$") mos1 = cre.match(abcd) print(cre.sub(r"\1, \2, \3", abcd)) mos2 = cre.match(ad) print(cre.sub(r"\1, \2, \3", ad))
This produces:
a, bc, d
Traceback (most recent call last):
File "<filename>", line <linenumber>, in <module>
print(cre.sub(r"\1, \2, \3", ad))
File "C:\programs\Python25\lib\re.py", line 266, in filter
return sre_parse.expand_template(template, match)
File "C:\programs\Python25\lib\sre_parse.py", line 793, in expand_template
raise error, "unmatched group"
sre_constants.error: unmatched group
(Line numbers vary with different versions of Python – this is Python 2.5)
Adding the following code illustrates the problem:
print(mos1.group(1), mos1.group(2), mos1.group(3) ) print(mos2.group(1), mos2.group(2), mos2.group(3) )
For Python 2.5, this outputs
('a', 'bc', 'd')
('a', None, 'd')
So now it is clear why it is failing, what is the solution. First fix was:
cre2 = re.compile(r"^(a)((bc)?)(d)$") print(cre2.sub(r"\1, \2, \4", ad))
This fix works, but it has side-effects – the last group number has changed. So a better solution might be:
cre3 = re.compile(r"^(a)((?:bc)?)(d)$") print(cre3.sub(r"\1, \2, \3", ad))
Using the non-capturing group notation, (?:…) the problem is fixed without side-effects.
I’m not entirely happy that the problem occurs in the first place, but it does.
]]>1. PHP is more verbose. In particular, PHP requires braces around code blocks, semi-colons, $ on variable names. Python requires only the colon at the end of certain lines.
2. Both have good libraries. However, the ability of Python to easily interface to shared libraries written in any language, and not specifically written for Python, is a big advantage, in my opinion.
3. PHP has more unexpected gotchas. I hadn’t noticed this when writing in Python (the absence of a problem is less noticeable than the presence of the same problem). One example I came across today:
is_file() tests for the existence of a file, as you would expect by the name. But it is cached. So if you are waiting for a file to disappear, and you repeatedly call is_file to test this, then you will wait for a long time. You need to call clearstatcache() between each call to get an honest answer.
Of course, the documentation states this. But it also states that file_exists (an equivalent function) is also cached. But it appears not to be (PHP v5.3.3):
<?php
print "Waiting for test.txt to appear\n";
while (!is_file("test.txt"))
{
}
print "Waiting for test.txt to disappear\n";
while (True)
{
sleep(.2);
if (!is_file("test.txt"))
{
print "is_file detected disappearance\n";
exit(0);
}
elseif (!file_exists("test.txt"))
{
print "file_exists detected disappearance\n";
exit(0);
}
}
?>
Running this code (and creating, then deleting file test.txt) invariably results in the detection by file_exists, even though I have biased it to let is_file test it first.
So ideally you need accurate documentation. But I think even better is to write in a way that requires minimal documentation.
]]>For Ubuntu (Hardy Heron, 8.04), I found xautomation, which worked fine. For many (but not all) of the non alpha-numeric keystrokes, you need to use the name of the key – there is a helpful list here.
I tried xautomation with Fedora 13, but it was not happy. So I had a look around for an alternative. One possibility (I found them hard to find) was xdotool. Of course, the syntax for sending keystrokes is slightly different! It is slightly more consistent in that non-alphanumerics all need to be sent using the name of the key, so the helpful list (above) is still helpful!
I’ll try to post some examples at a later date.
]]>This deals with DLLs, and so is Windows-centric. I need to find out how / if ctypes works on Linux.
Start with a trivial DLL. It contains a single function, nAdd, which takes an array of integers, and a length of the array, and returns the sum of the array elements:
#ifdef __cplusplus
extern "C"
{
#endif
int nAdd(int * pnData, int nDataLen)
{
int nTotal = 0;
for (int i = 0; i < nDataLen; ++i)
{
nTotal = nTotal + pnData[i];
}
return nTotal;
}
#ifdef __cplusplus
}
#endif
and build it using MSYS / MinGW:
gcc -c strange.cpp gcc -fPIC -shared -o strange.dll strange.o
Now lets call the function in the DLL, using ctypes:
import ctypes
import sys
# define an array of 3 integers
array3type = ctypes.c_int * 3
array3 = array3type()
array3[0] = 4
array3[1] = 5
array3[2] = 21
# define an array of 2 integers
array2type = ctypes.c_int * 2
array2 = array2type()
array2[0] = 65
array2[1] = 34
def call3then2_noTypes():
print("call3then2_noTypes")
dll = ctypes.cdll.LoadLibrary("strange.dll")
dll.nAdd.restype = ctypes.c_int
#dll.nAdd.argtypes = (ctypes.POINTER(ctypes.c_int), ctypes.c_int)
try:
print(" %d" % dll.nAdd(array3, 3))
print(" %d" % dll.nAdd(array2, 2))
except Exception:
e = sys.exc_info()[1]
print(" %s" % e)
print("")
def call2then3_noTypes():
print("call2then3_noTypes")
dll = ctypes.cdll.LoadLibrary("strange.dll")
#dll.nAdd.restype = ctypes.c_int
#dll.nAdd.argtypes = (ctypes.POINTER(ctypes.c_int), ctypes.c_int)
try:
print(" %d" % dll.nAdd(array2, 2))
print(" %d" % dll.nAdd(array3, 3))
except Exception:
e = sys.exc_info()[1]
print(" %s" % e)
print("")
call3then2_noTypes()
call2then3_noTypes()
For Python 2.5+ and Python 3.x, this correctly prints the following:
call3then2_noTypes 30 99 call2then3_noTypes 99 30
So the question is, what will it do in IronPython 2.6?
The answer is unexpected (at least, to me):
call3then2_noTypes 30 expected c_long_Array_3, got c_long_Array_2 call2then3_noTypes 99 expected c_long_Array_2, got c_long_Array_3
It seems to be trying to learn the signature of the function from the first call, but unfortunately, in this case, getting it wrong. Note that the first call to the function was successful in each case – it is the second call that is failing.
Two of the following three methods work for Python 2.5+, Python 3.x and IronPython 2.6:
import ctypes
import sys
# define an array of 3 integers
array3type = ctypes.c_int * 3
array3 = array3type()
array3[0] = 4
array3[1] = 5
array3[2] = 21
# define an array of 2 integers
array2type = ctypes.c_int * 2
array2 = array2type()
array2[0] = 65
array2[1] = 34
def call():
print("call")
dll = ctypes.cdll.LoadLibrary("strange.dll")
dll.nAdd.restype = ctypes.c_int
dll.nAdd.argtypes = (ctypes.POINTER(ctypes.c_int), ctypes.c_int)
try:
print(" %d" % dll.nAdd(array3, 3))
print(" %d" % dll.nAdd(array2, 2))
except Exception:
e = sys.exc_info()[1]
print(" %s" % e)
print("")
def call_voidPointer():
print("call_voidPointer")
dll = ctypes.cdll.LoadLibrary("strange.dll")
dll.nAdd.restype = ctypes.c_int
dll.nAdd.argtypes = (ctypes.c_void_p, ctypes.c_int)
print(" %d" % dll.nAdd(array3, 3))
print(" %d" % dll.nAdd(array2, 2))
print("")
def call_voidPointer_addressOf():
print("call_voidPointer_addressOf")
dll = ctypes.cdll.LoadLibrary("strange.dll")
dll.nAdd.restype = ctypes.c_int
dll.nAdd.argtypes = (ctypes.c_void_p, ctypes.c_int)
print(" %d" % dll.nAdd(ctypes.addressof(array3), 3))
print(" %d" % dll.nAdd(ctypes.addressof(array2), 2))
print("")
call()
call_voidPointer()
call_voidPointer_addressOf()
Extra hint: the output for IronPython 2.6 is as follows:
call3then2 expected c_long, got c_long_Array_3 call_voidPointer 30 99 call_voidPointer_addressOf 30 99]]>