Integrated Workflows

There are three primary interfaces to running python within stata:

  1. Running python Interactively with a First Example

  2. Running python in a do file

  3. Running python scripts in stata

We will then look at how to transfer data between python and stata in both directions through the stata function interface.

Running python Interactively with a First Example

You can run python interactively within stata in a manner that is the equivalent of running the python REPL program through a terminal.

This is activated by typing python in the command window.


You are now interfacing directly with the python interpreter as indicated in the Result window.

You can now write python code such as:

print("Hello World!")

once you hit enter stata sends the code snippet to the python interpreter for processing and shows the result


To stop interfacing with the python interpreter you need to type end in the command window


this will return you to the standard stata interface.


If you have a one line python command you can use

python: print("Hello World!")

which will pass the code to python, display the results directly below in the Results window, and return you to the stata command environment.


Running python in a do file

Another option for running python code is through the do file.

Let’s open the do file editor and add:

di "Stata Here"
python: print("Python Here")

and when you click on the Do button you get the result:


where the results from python are displayed similarly to stata output.

However, most of the time you will want to add in a block of code such as:

for i in range(0,2):
    print("Python Here")

This can be done by delimiting the python code within the do file using either

<python code>


<python code>

The difference between these two delimiters is in how stata handles any errors in python.

The python delimiter will continue to execute the rest of the python code if an error is encountered, while the python: delimiter will immediately return control to stata once the error is encountered.

di "Stata Here"
for i in rang(2):
   print("Python Here")
print("Python Done")
di "Back in Stata Land!"

As you can see stata has continued to execute code past the point at which there is an error.

However if you use python: the execution will halt at the point of the error.

di "Stata Here"
for i in rang(2):
   print("Python Here")
print("Python Done")
di "Back in Stata Land!"


I tend to use python: as I prefer to get to the error quickly to fix the problem without any distracting output below it. Also in a long running program you will want to fix the issue prior to the rest of the program executing.

We can use the error message to fix the issue now and run the fixed do file

di "Stata Here"
for i in range(2):
   print("Python Here")
print("Python Done")
di "Back in Stata Land!"

The Do File Editor and White Space


Whitespace is used by python to declare scopes and is an integral part of the language definition

The do file editor doesn’t provide you with full text editor support when writing python code in the do file editor.

For example if you type:

for i in range(10):
|<curser placed here>

the editor will not automatically indent your code.

However once you have set the curser to the correct indentation level it will retain that indentation level for subsequent lines.

for i in range(10):
    |<curser placed here>

So you need to be careful with whitespace

Also what you type in the delimiters is directly passed to python so you can’t indent these code-blocks such as:

di "Stata Here"
    print("Python Here")

python will return the following error:


Running python scripts in stata

A third option is to run a python script that contains some python code

If you save the following code in a file

print("Python Here")
for i in range(2):
    print(f"{i} times hello")
print("I'm outta here")

you can then run this script in stata using:

python script

with the output:



This can be a very useful way to run python code as it leaves you to write python code in any text editor you like such as vscode.

Interacting between Stata and Python


In many cases it can be simpler to keep python and stata workflows independent of each other and use files to transfer data between them.

This is covered in File based Workflows

So far the python and stata runtime environments have been independent of each other to learn about how to run python code within stata (i.e. they haven’t shared any data)

For many applications we want some level of interaction between stata and python by copying back and forth objects between the different runtime environments.

Stata makes various components of its internals available to python via the stata function interface (sfi) to enable such interaction with:

  1. Dataset which connects python with the current in memory stata dataset

  2. Macros which connects python with stata macros

In addition it also provides access to many other stata components.

Copying Data from Stata to Python

Stata Blog Post

This section is heavily inspired by this excellent stata blog post

sysuse auto
list foreign

Listing the foreign data in stata shows


We can then use sfi.Data to transfer the raw data to python using the .get method of the Data object from the stata function interface package.

from sfi import Data
dataraw = Data.get('foreign')

and it looks like


Notice that the data looks different.


stata has a concept of labels

If you use the data explorer you will see that the foreign variable consists of 0,1 that are associated with labels domestic and foreign (respectively).


We may want to get more information about the get method so the best place to look is the documentation on sfi.Data. Then you can click on the get method


You can’t use the ipython features such as Data.get? in this context because python is interfacing directly with the python interpreter and not the ipython interpreter (such as when you’re using jupyter)

That page looks like:


You can see that an option is to fetch the value label using valuelabel=True

from sfi import Data
dataraw = Data.get('foreign', valuelabel=True)

and the raw data is now returned as strings taking the value of the labels that have been applied to the data


Obtaining more variables at once

You can obtain more variables using the get method. Based on the documentation you can use the following methods to specify what variables to fetch:

var (int, str, or list-like, optional) – Variables to access.
It can be specified as a single variable index or name, or an
iterable of variable indices or names. If var is not specified,
all the variables are specified.

In addition you can also specify which observations (obs) you would like:

obs (int or list-like, optional) – Observations to access.
It can be specified as a single observation index or an iterable
of observation indices. If obs is not specified, all the
observations are specified.

So let’s use this information and run

from sfi import Data
dataraw = Data.get('foreign mpg rep78', range(45,56))

this code saves a list of list type object into the python object dataraw


The data is written as a list of rows/obs in the order that the variables are requested, which in this case is: foreign mpg rep78 such as the first element:

[[0, 18, 2], ...

The range(45,56) request will fetch observations 46 to 56 as shown in the data browser


As per the documentation you can also specify a list-like object instead of a string separated by a space such as ['foreign', 'mpg', 'rep78']:

from sfi import Data
dataraw = Data.get(['foreign', 'mpg', 'rep78'], range(45,56))

which will return the same data


Exercise 2

What happens now if you specify valuelabel=True for the above python code?

pd.DataFrame and pd.Series:

The discussion so far has focused on fetching raw data out of stata and copying it to the python environment. But in many applications we are likely to want higher productivity objects such as pandas DataFrame and Series.

Let’s try

from sfi import Data
import pandas as pd
dataraw = Data.get('foreign mpg rep78', range(45,56))
df = pd.DataFrame(dataraw)

You will notice that the raw data has now been placed in a pd.DataFrame but columns and index variables haven’t come across:


You may want to parameterize your requests so you can use them in both the sfi.Data.get method in addition to a pd.DataFrame method when converting the raw data into a pd.DataFrame

You can save the variable selection as a python variable:

vars = ['foreign', 'mpg', 'rep78']

then you can use these variables for both stata and python

from sfi import Data
import pandas as pd
vars = ['foreign', 'mpg', 'rep78']
dataraw = Data.get(vars, range(45,56), valuelabel=True)
df = pd.DataFrame(dataraw, index=range(46,57), columns=vars)

which provides a much more consistent pd.DataFrame and lines up closely with the stata context.


You can compare with stata using in the command window

list foreign mpg rep78 in 46/56

Exercise 3

How can you explain the value for the variable rep78 for observation 51?


There is also a method available sfi.Data.getAsDict() that includes the variable names in a returned dictionary so you can use:

from sfi import Data
import pandas as pd
vars = ['foreign', 'mpg', 'rep78']
dataraw = Data.getAsDict(vars, range(45,56), valuelabel=True)
df = pd.DataFrame(dataraw)

Missing Values:

Missing values in stata are internally represented by the largest value for each type.

Within stata you typically work with missing values using . such as:

list rep78 if rep78 != .

and much of this detail is taken care of for you.

AS missing values are represented by the maximum value:


python will interpret this data as an actual value.

You will want to specify missingval=np.nan

from sfi import Data
import numpy as np
import pandas as pd
vars = ['foreign', 'mpg', 'rep78']
dataraw = Data.get(vars, range(45,56), valuelabel=True, missingval=np.nan)
df = pd.DataFrame(dataraw, index=range(46,57), columns=vars)

which returns the following


Copying Data from Python to Stata

Stata Blog Post

This section is heavily inspired by this excellent stata blog post

It is often the case you will want to do some data work in python and have a need to transfer it to stata to do some statistical anaylsis.

The sfi.Data interface also contains methods for saving data from python into the default stata dataframe (or a frame which is new in Stata16)

Let us fetch some data from Yahoo Finance using the yfinance package in python

import yfinance as yf
dowjones = yf.Ticker("^DJI")
data = dowjones.history(start="2010-01-01", end="2020-12-31")[['Close', 'Volume']]

the yfinance package has returned the dowjones history tables containing data between 2010-01-01 and 2020-12-31


Now we need to migrate that data from python into stata

from sfi import Data

the stata data editor now contains space for len(data) observations to be transferred.


You can then setup 3 variables in stata to save date, close, and volume information across.

Data.addVarStr("date", 10)  # Str10
Data.addVarDouble("close")  # Double
Data.addVarInt("volume")    # Int

the stata data editor now contains 3 variables



You should start this work with an empty stata dataset. The sfi.Data package can return some cryptic errors. When trying to create a date Str variable using the code above you will get the following error if the variable already exists in the dataset.

>>> Data.addVarStr("date", 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Applications/Stata/ado/base/py/", line 487, in addVarStr
    return _stp._st_addvarstr(name, length)
SystemError: failed to add a variable of type str to the current Stata dataset

Clearing can be done in stata using


The next step is to migrate the actual data.

You might try saving the data directly from the pandas dataframe into the stata dataset using the method.


This method interface is expecting

static store(var, obs, val, selectvar=None)


  1. var, obs, and val are python arguments, and

  2. selectvar=None is a python keyword argument with a default value of None

This means that var, obs, and val are required inputs

This deviates from sfi.Data.get()

python"date", None, data.index)

however you will run into trouble with the following error:


Stata is similar to numpy in that it is very specific about how it saves data in memory in accordance with specified types.

In the code above we tried to send through a list of datetime objects from pandas and the stata function interface doesn’t know how to represent this data in the stata dataset.


As you can see the index from the pandas dataframe data consists of Timestamp objects:


Therefore some translation is required in this case to convert dates into a format that stata can copy into its dataset and then use stata tools to convert to stata dates.

We know stata has a date function that we can use:

gen stringdates = ""
set obs 1
replace stringdates = "2010-01-04" in 1
gen date = date(stringdates, "YMD")
format %tdCCYY-NN-DD date

So now we can look to convert the pandas.Timestamp objects to be represented as simpler string based data that contain the information needed for stata to convert those dates.

Pandas has a useful method .astype() for useful data conversions.

data.index = data.index.astype(str)

this has used the in-built type converter to represent the index as strings that is formatted as YYYY-MM-DD

Now lets try and save this information into the stata dataset:

python"date", None, data.index)

You can now open the data viewer and see that the dates (as strings) has been copied over to stata:


Let’s bring in the numerical data, which is a much simpler process

python"close", None, data.Close)"volume", None, data.Volume)

We now have the data we need in the stata dataset as seen in the data editor


Now that the data is copied across we can switch back to stata to run any analysis or construct a plot

We will first want to convert those dates in stata as a post transfer step

gen sdate = date(date, "YMD")
format %tdCCYY-NN-DD sdate

and we can check the conversion in the stata data editor


and then we can construct the plot as demonstrated in the original blog post

replace volume = volume / 1000000
twoway (line close sdate, lcolor(green) lwidth(medium))           ///
       (bar volume sdate, fcolor(blue) lcolor(blue) yaxis(2)),    ///
       title("Dow Jones Industrial Average (2010 - 2019)")        ///
       xtitle("") ytitle("") ytitle("", axis(2))                  ///
       xlabel(, labsize(small) angle(horizontal))                 ///
       ylabel(5000(5000)30000,                                    ///
              labsize(small) labcolor(green)                      ///
              angle(horizontal) format(%9.0fc))                   ///
       ylabel(0(5)30,                                             ///
              labsize(small) labcolor(blue)                       ///
              angle(horizontal) axis(2))                          ///
       legend(order(1 "Closing Price" 2 "Volume (millions)")      ///
              cols(1) position(10) ring(0))

which produces the following stata chart


You may be interested in comparing this to a chart built with matplotlib and pandas in the python environment.

You can download this notebook, or open this notebook in the cloud

which produces the following matplotlib figure:


Persistence between python code-blocks in stata

Once the python interpreter is initialised it is used throughout the stata session.

This means that once variables are created in python they will be available in future python code-blocks.

import pandas as pd
df = pd.DataFrame(range(4), index=['a','b','c','d'])

then you can run some other things in stata and then return to python and fetch the df object


such as in this short demonstration


The stata function interface sfi

The python api documentation contains the details about the sfi package from stata.




Access stata characteristics


Access to the current stata dataset


Access to stata datetimes


Access to stata Frames


Access to stata macros


An interface with global mata matrices


Access to stata matrices


Access to stata missing values


Access to platform information


Access to stata scalars


a set of core tools for interacting with stata


Provide access to stata strL datatype in Data and/or Frame


Access to stata value labels