Comments for Codebright's Blog https://codebright.wordpress.com Random thoughts on code Wed, 16 Apr 2014 09:52:46 +0000 hourly 1 http://wordpress.com/ Comment on Reading gzip files in Python – fast! by Leszek Pryszcz https://codebright.wordpress.com/2011/03/25/139/#comment-49 Wed, 16 Apr 2014 09:52:46 +0000 http://codebright.wordpress.com/?p=139#comment-49 Note, when you do zcat and pipe (zcat file | python program.py or subprocess) you will use two independent processes and therefore two processor cores will be used if available, while running Python program with gzip.open() will be bound to single core.
Of course using two independent processes can not explain 3-4 fold time difference πŸ˜‰

]]>
Comment on Reading gzip files in Python – fast! by devboell https://codebright.wordpress.com/2011/03/25/139/#comment-30 Tue, 16 Apr 2013 07:21:45 +0000 http://codebright.wordpress.com/?p=139#comment-30 very interesting. How would this work for a remote .gz file? would you call curl from Python?

]]>
Comment on Reading gzip files in Python – fast! by How to test a directory of files for gzip and uncompress gzipped files in Python using zcat? | BlogoSfera https://codebright.wordpress.com/2011/03/25/139/#comment-29 Mon, 11 Mar 2013 15:02:31 +0000 http://codebright.wordpress.com/?p=139#comment-29 […] which I’ve β€œstitched” together using these SO answers (here) and blogposts (here) […]

]]>
Comment on Reading gzip files in Python – fast! by Tommy Carstensen https://codebright.wordpress.com/2011/03/25/139/#comment-28 Fri, 04 Jan 2013 10:47:41 +0000 http://codebright.wordpress.com/?p=139#comment-28 In reply to Tommy Carstensen.

However, I just read the following:
http://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate
β€œThe data read is buffered in memory, so do not use this method if the data size is large or unlimited.”

Instead I opted for this solution by Spencer Rathbun:
http://superuser.com/questions/381394/unix-split-a-huge-gz-file-by-line

]]>
Comment on Reading gzip files in Python – fast! by Tommy Carstensen https://codebright.wordpress.com/2011/03/25/139/#comment-27 Fri, 04 Jan 2013 10:43:33 +0000 http://codebright.wordpress.com/?p=139#comment-27 β€œme”, thanks for pointing me to subprocess.communicate.

]]>
Comment on Python regular expression surprise by Nagesh https://codebright.wordpress.com/2011/02/19/python-regular-expression-surprise/#comment-26 Thu, 20 Sep 2012 11:54:29 +0000 http://codebright.wordpress.com/?p=117#comment-26 Thanks, was looking out for this! πŸ™‚

]]>
Comment on SendKeys in Linux by Henry Charles https://codebright.wordpress.com/2010/06/27/sendkeys-in-linux/#comment-23 Fri, 09 Mar 2012 13:54:57 +0000 http://codebright.wordpress.com/?p=108#comment-23 Found this, works great: https://sourceforge.net/projects/x11guitest/

]]>
Comment on Reading gzip files in Python – fast! by me https://codebright.wordpress.com/2011/03/25/139/#comment-22 Mon, 30 Jan 2012 17:42:23 +0000 http://codebright.wordpress.com/?p=139#comment-22 I’d:

– avoid reading the entire file into memory using subprocess.communicate() and cStringIO
– initialize the cc, lc and sz outside of the internal loop

]]>
Comment on Reading gzip files in Python – fast! by insider https://codebright.wordpress.com/2011/03/25/139/#comment-21 Wed, 18 Jan 2012 10:32:20 +0000 http://codebright.wordpress.com/?p=139#comment-21 cool, i’m doing some researches in parsing syslog logs, do you know any syslog parse module like in Perl Parse::Syslog?

]]>
Comment on Linear Algebra Review and numpy by codebright https://codebright.wordpress.com/2011/10/07/linear-algebra-review-and-numpy/#comment-8 Sun, 09 Oct 2011 20:45:28 +0000 http://codebright.wordpress.com/?p=203#comment-8 Many thanks for all the hints and tips. I appreciate the input.
Paul

]]>