Python GIL / Threads / Processes

Recently, I wrote a tool to verify data in a .xlsx spreadsheet. This tool checks each column in the sheet against a specific regex pattern defined for that column. The tool was working great and I had checked a few of the typical smaller sheets. A few days later, I ran across a larger .xlsx file. I kicked off the script, checked back a few minutes later for the expected result and saw it was still processing this file. A few minutes I returned again to see it still running, I realized I had a performance problem on my hands.

The time it took the single threaded version to completely process this large file was:

real 54m59.385s
user 54m54.503s
sys 0m4.123s

I immediately started looking at the options available to me. I started with the threading module. This seemed like the obvious solution.

I started all the threads and started monitoring the load on the machine. I was unimpressed with promised “threading”, it appeared to have made no difference in machine load or core utilization. When time came back, I was surprised to see that performance was worse than single-threaded implementation.

real 74m39.904s
user 71m49.452s
sys 15m45.056s

There was a lot of time spent in the kernel space as you can see by the increased time in sys, but there was no performance increase.

I kept researching for answers and started to study the Global Interpreter Lock or GIL. I had heard of the GIL before, mostly in articles complaining about Python. I started to wonder if I had wasted my time writing this tool in Python and should have chosen my new found friend Go. Go does not suffer from the GIL and is designed for concurrency, but no-one that has to use or maintain the tool other than myself are familiar with Go, hence why I wrote it in Python. I knew that there had to be a solution out there somewhere, since Python is a very popular language. Some suggestions included using an alternative interpreter like Jython or IronPython, but that seemed just as extreme as a rewrite in another language.

I finally found the alternative to threading, the Multiprocess package. Multiprocess advertises itself as: “a package that supports spawning processes using an API similar to the threading module.” After my experience with the poor performance of using the threading module, I was already very skeptical. Interestingly enough the arguments passed to the multiprocess.Process function are almost identical syntax to threading.Thread, meaning that this required minimal code changes on my part. I started the benchmark again, very skeptical of what was going to happen. I could almost tell immediately there was going to be a difference, even before opening up top, because almost immediately the fans on my laptop kicked on. I could see that all cores were being used, and load was where I expected.

Here is the result:

real 17m32.867s
user 137m13.869s
sys 0m18.850s

As you can see a night and day difference. With just a couple of changes to the package import and function call that I made for threading and now I am seeing the results I expected. I wonder what the use case for the threading package is, since it appears that the Multiprocess package is significantly better.  I was able to continue to use Python for this project and in addition lessen my worries about performance problems that I might encounter in future projects.

Python GIL / Threads / Processes

Building a Python Application w/ GUI

The lesson learned in this was how much time goes into creating an effective GUI. I think that many of us take our day to day GUI’s for granted. The project was to create a more effective application to use with ElementTree, which is an additional library in Python to use for creating and parsing XML. The reason for the XML was that the XML was used for configuration file of a JAR based application.

First you have to start with:

#Tkinter calls for GUI
from Tkinter import *
import ttk

As the comment states this will call the Tkinter application. In Python 2.7 you must use CAPS Tkinter vs. Python 3.5 you use tkinter.

Everything is based into a root or main loop. This allows the application to run until the user quits, or the program is set to move on.

#Root window
root = Tk()

Inside of this is where your GUI will go. There are a ton of different ways to call for user input, checkbox, lists, text fields, etc. I choose to go with 1. Text Field, 2. Listbox and 3. Button for the user to run the program.

Most of the original program was a script that was running in standard Terminal. Previous to the GUI that application would ask the standard set of questions required for my variables.

while loop:  ## While loop which will keep going until loop = False
    choice = input("Options 1-4 set the configuration of the report, option 5 runs the report: ")

    if choice == 1:
        print "Meaningful Use Stage 1 selected: "
        periodYear = raw_input('What year is this report for?      Ex. 2012-2014')
    elif choice == 2:
        print "Meaningful Use Modified Scheduled Stage 1 selected: "
        meaningfulUseStage = "MODIFIED_SCHEDULED_STAGE_1"
        periodYear = "2015"
    elif choice == 3:
        print "Meaningful Use Stage 2 selected: "
        meaningfulUseStage = "STAGE_2"
        periodYear = "2014"
    elif choice == 4:
        print "Meaningful Use Modified Scheduled Stage 2 selected: "
        meaningfulUseStage = "MODIFIED_SCHEDULED_STAGE_2"
        periodYear = raw_input('What year is this report for?      Ex. 2015-2016')
    elif choice == 5:
        print "Running report: "
        ## You can add your code or functions here
        loop = False  # This will make the while loop to end as not value of loop is set to False

        raw_input("Wrong option selection. Enter any key to try again..")

This was much more simple since it did not require as many entries to accomplish the same goal. However, in the interest of the end users the GUI was a requirement.

This is the code for the Tkinter calls for a similar part of it. It does however, not contain the def and classes.

#Label for meaningfulUseStage list
providerIdLabel = Label(root, text="Please select a Meaningful Use Stage: ")
#meaningfulUseStage list selection
meaningfulUseStageList = Listbox(root, selectmode=SINGLE, height=4, width=30)
    meaningfulUseStageList.insert(END, item1)
    meaningfulUseStage = meaningfulUseStageList.curselection()

#Label for  periodYear
periodYearLabel = Label(root, text="Please the reporting year: ")
#periodYear list selection
periodYearList = Listbox(root, selectmode=SINGLE,height=5, width=30)
for item2 in ["2012", "2013", "2014", "2015", "2016"]:
    periodYearList.insert(END, item2)
    periodYear = periodYearList.curselection()

It is hard to say that it is inefficient as much as it was just more difficult since I was up to this point unfamiliar with Tkinter.

Some of the most common functions were:

Label() – Used to put a label, does not accept user input.

Listbox() – Structured user input, that can accept multiple selections.

Entry() – Text field that accepts user input.

Checkbox() – Check box that takes input in a binary function (on or off), the selection and lack thereof can be set to different tasks.

The tricky part of this was that some of the functions did not all behave in the same fashion. For example, calling Entry() and getting a str from the user input, was much less complex that getting the input from Listbox() or Checkbox(). The args were different and there is additional syntax. The benefit, however of using a more structured approach with Listbox() and Checkbox() is that the user cannot introduce variables that you haven’t already accounted for.

*More to follow


Building a Python Application w/ GUI

Using parallel to multi thread scripts

Traditional awk, sed or grep commends do not multi thread by default. Multi Threading a task on a large data set can improve the time up to 50%.

For additional information this package:



sudo  apt-get install parallel

CentOS 6


yum install parallel

Some use cases:
You want to find a string in text:
cat ./logs.csv | parallel  awk ‘/string/’ > ./stringoutput.csv

This parallel contains no additional functionality.

Lets say that you want to add the ability to break the file up into block sizes. After adding –block and size of block (ex. 100M, 10M, etc) you must add –pipe

cat ./logs.csv | parallel –block –pipe 100M  awk ‘/string/’ > ./stringoutput.csv

Since parallel is focused on maximizing the threads you can limit this by using: –jobs

By default –jobs is the same as the number of CPU cores. Arguments such as:

cat ./logs.csv | parallel –jobs 1 –pipe 100M  awk ‘/string/’ > ./stringoutput.csv

This will limit job to single CPU thread.


parallel is very powerful addition to scripts that require additional or focused resources.



Using parallel to multi thread scripts

Installation of Fedora 23 on HP EliteBook 840 G1

Before I start with the brief tutorial I want to just list the configuration, results may vary based on the specs of the machine.

  • 14″ inch 1920 x 1080 display
  • i5-4300U 1.9 Base clock 2 core 4 thread
  • 8gb DDR3 12800 ADATA RAM
  • 180GB 530 2.5′ Intel SSD

I decided to use the latest version of Fedora based on the good experience that I have had with my desktop. The latest version of Fedora runs a very clean, stable and beautiful version of GNOME.

First, I downloaded the .iso and burned it to a USB drive. Since, the laptop might either ship with Windows 7 or 8.1 the BIOS might be set to either UEFI or Legacy Boot. In order to check this power off the machine, at BOOT when at the HP logo screen select ESC.


Once you have selected ESC. You will see a list of options, you will select the option f10.


This will take you to the BIOS setup screen, go to the Advanced Tab at the top right corner.


Make sure that USB boot is selected.


Scroll down further to the Boot Mode and select Legacy Boot. Once you have set these settings make sure to save and exit. You should now be able to boot from your USB bootable drive.

Most of the devices worked out of the box, the only devices that did not were:

  • Sound
  • Fingerprint Reader
  • f8 Mic Disable hot-key (There were so few I had to list it.)

Before I got into resolving the issues with these devices I wanted to get all the updates:

sudo dnf update

After all the updates finished I went to install Google Chrome, Google has really made this easier than in past years.

It should open in Software Updater, then select install and enter root password.

There are a few guides to “Things to Install after Installing Fedora 23”

It will vary based on your usage what will want to install, however here is a link:

The issue with the sound driver was related to an incomaptibilty issue with PulseAudio. I was easily able to resolve this by simeple Terminal Command:

sudo dnf|yum remove alsa-plugins-pulseaudio

Once I rebooted the issue was immediately resolved. I have tried so far without success in the past few hours since I have had the machine to probe the FIngeprint driver to see what drivers might exist for it. I will update this blog if I do find a fix for this. Please let me know if you think there are any other important additions.

Installation of Fedora 23 on HP EliteBook 840 G1

How to access EC2 instances on OSX using Terminal

It took me a few minutes to find this information so I am documenting here how to accomplish this:

  1. Download the .pem file.
  2. Place it in a folder and note where you stored it.
  3. Open Terminal.
  4. using cd go to the folder where your .pem file is stored.  EX. cd /Home/User/Desktop
  5. Amazon requires that the private key privileges be set to 400.
  6. Using chmod set the privileges for 400. EX. chmod 400 test.pem
  7. To access using SSH type: ssh -i test.pem
  8. Type in: Yes
  9. That’s it!

Note: You can make simplify frequent connections by placing this in a .sh script.

  • Create a text file.
  • Paste your original command: ssh -i /Home/User/Desktop/test.pem
  • Save the text file with the extention .sh
  • Now you can go to the directory with .sh and type: bash
How to access EC2 instances on OSX using Terminal

Structuring a script to return value based on options. (BASH)

Let’s say you want to write a script to return a value or statement based on user input. How would you accomplish this?

Write a scrip to pass a color, return a result based on the color.

My first attempt was through the use of users selection in a particular set of available options. For instance:

OPTIONS=”Blue Green Red”
select opt in $OPTIONS; do
    if [ “$opt” = “Blue” ]; then
      echo The sky is blue.
      elif [ “$opt” = “Green” ]; then
       echo The grass is green.
      elif [ “$opt” = “Red” ]; then
echo Roses are red.
       echo Not a valid option. Please choose the following above. 



When run this will show:
1) Blue
2) Green
3) Red 
If you input 1 and press Enter it will output:
      The sky is blue.
If you input 2 and press Enter it will output:
      The grass is green.
If you input 3 and press Enter it will output:
      Roses are red.
This is an IF ELSE statement that determines the output against the input.
Structuring a script to return value based on options. (BASH)

How to Telnet. *Old School

So lets say that you want to get a 200 request from a website such as: How would you go about accomplishing that? What about

The structure of the telnet request is:

telnet portnumber (press enter)

GET (space) HTTP/1.1


  • telnet is the command, this is separated by a single space.
  • Then the URL of the site that you are making the telnet request to.
  • Followed by the port number, in most cases a website would be port 80.
  • Once you press Enter, you will put the request type.
  • In this instance GET (note the caps) then the request protocol HTTP/1.0 or 1.1. Then press Enter.
  • To finish the request type | host: and select Enter.

NOTE: If you want to output the request into a text file, you can use: telnet 80 > telnetoutput.txt | This will place the output of the request into a text file in the current directory.



How to Telnet. *Old School