Introduction to Python for Economics¶

Aims & Outcomes:

Why Python?
Open Source vs. Proprietary
Provide an overview of Programming vs. Scripting
Discuss Scientific Computing

Why Python?¶

Easy to learn, and well designed language
Massive scientific ecosystem
Open Source
Used extensively in datascience and machine learning communities.

It is a good fit for many Scientific Computing tasks as python has strong tools for working with data, vectorization, jit compilation, parallelization, visualization, etc.

It is a versatile language that is used extensively across many domains.

Proprietary vs Open Source¶

Proprietary¶

Excel
MATLAB
STATA

These are typically good tools for specific tasks but also have some drawbacks.

Pro:

Simpler to use with Integrated graphical interfaces
Company that provides user support
Stable

Con:

Cost Money
(Change of Tools Problem) Typically constrained to a narrow task set when compared to general high-level programming languages so can be difficult to achieve some task types
Access to higher performance can be expensive

Open Source¶

Python
Julia
R

Are considered the leading programming languages to support scientific computing.

Pro:

Free
Versatile & Flexible
Ultimately customisable and auditable being open source
Quick moving communities writing packages and extensions

Con:

Initially less user friendly with limited point and click style of interaction (higher initial fixed costs)
Sometimes more verbose syntax
Quick moving communities writing packages and extensions :-).

Choice of Programming Languages¶

There is a large variety of programming languages available to choose from and this is largely down to a trade-off between productivity for writing code, execution speed, and/or design for a domain specific purpose.

Low Level Languages¶

Provide fine grained control at the hardware level

Languages such as:

Assembly
LLVM (Assembler)

Example: 1 + 1 in assembly

pushq   %rbp
movq    %rsp, %rbp
movl    $1, -12(%rbp)
movl    $1, -8(%rbp)
movl    -12(%rbp), %edx
movl    -8(%rbp), %eax
addl    %edx, %eax
movl    %eax, -4(%rbp)
movl    -4(%rbp), %eax
popq    %rbp

      
    

Intermediate Level Languages¶

Intermediate level languages used to be considered high level languages. Many languages in this range are often fit for purpose type languages.

Languages such as:

C/C++
Fortran
Java

Design for Purpose:

Linus Torvalds thinks C is the best language choice – for writing the linux kernel. This is because the C programming language is a nice balance between access to low-level and productivity. It is productive enough to write large and complex systems reasonably quickly but has enough access to low-level features to build interfaces for hardware and systems.

It is a good fit for writing operating system kernels such as linux.

Example: 1 + 1 in C

#include<stdio.h>
int main()
{
int sum = 0;
sum = 1 + 1;
printf("Sum = %d\n", sum);
return 0;
}

      
    

High Level Languages¶

Provide a high degree of productivity through abstraction and automation etc. and typically include features such as:

Automatic memory management
Advanced Input/Output (IO)
Advanced data structures
(Often) interpreted vs. compiled

Languages such as:

Python
Julia
Ruby
Rust

Example: 1 + 1 in python

1 + 1

Scripting vs. Programming Languages

Most interaction with Stata is in a scripting context. The do file is a convenient way to write a set of instructions that can be repeated for a given workflow. But it lacks features of more general programming languages such as the use of objects to store data and methods.

This is often a design choice!

Stata would like a high productivity environment to run complex statistical models and the syntax is less general then python. For example, many stata commands work over rows of data because of the domain.

Mata (Stata’s Programming Language)

There is mata which can be used to write stata programs and is very similar to C with a focus on matrix operations.

The current feature list

Scientific Computing¶

Scientific Computing ultimately needs to be:

Productive - easy to read, write, debug, explore
Fast computations
Flexible across domains

In most scientific computing applications we also don’t want to have to worry much about interfacing or managing the hardware on a day to day basis – however we would like the ability to maximise usage of that hardware when required so we don’t have to change languages.

Productivity vs. Execution Speed¶

Productivity and Execution typically come with a trade-off:

One of the strengths of python is its adaptability to many different contexts while retaining a very high level of productivity. This is largely due to language design (i.e. everything is an object), and advances in access in compute power.

The `python` ecosystem¶

The python ecosystem has strong tools in working with data, vectorization, JIT compilation, parallelization, visualisation, etc.

The next session on the python ecosystem will introduce many useful packages such as:

Scipy, NumPy, matplotlib, Pandas (Scientific computing infrastructure)
Numba (JIT compilation, multi-threading)
NetworkX (Domain tools)

This will hopefully help to:

reduce search cost
demonstrate the versatility of the ecosystem

History¶

Python was first released in 1991.

It has taken 30 years to build the ecosystem and become as popular as it has become.

Current Popularity¶

There is a nice wired article that talks about the continued popularity of python.

The redmonk rankings put python in the #2 spot!

The key takeaway from the article:

O’Grady cites Python’s versatility as one reason for its ongoing popularity. Companies like Google, Dropbox, and Instagram all rely heavily on Python, as do countless smaller ventures. It also has a home in academia as the preferred data-crunching language of many scientists and mathematicians.

Platform Independence: Windows, macOS, Linux¶

Modern languages such as python and julia are interpreted languages. If your platform has an interpreter the code is often cross-platform.

I often work and collaborate with others across all three platforms and your own productivity largely comes down to preference.

Note on Windows

The user interface on Windows (in my view) is the most difficult but that is largely due to less accessibility to terminal based workflows. This is changing quickly and can now be largely solved using Windows subsystem for linux which runs a virtualised linux kernel that enables access to many high performance tools and productive terminal based workflows.

Resources¶

Some additional reading for those interested:

RSIT Workshop (Uni Tübingen) 2021

Introduction to Python for Economics¶

Why Python?¶

Proprietary vs Open Source¶

Proprietary¶

Open Source¶

Choice of Programming Languages¶

Low Level Languages¶

Intermediate Level Languages¶

High Level Languages¶

Scientific Computing¶

Productivity vs. Execution Speed¶

The python ecosystem¶

History¶

Current Popularity¶

Platform Independence: Windows, macOS, Linux¶

Resources¶

The `python` ecosystem¶