Questions & Answers¶
I have updated the exercise notebook with solutions
from the tutorial.
You can launch the notebook
Q: How to create a numpy array with np.nan¶
import numpy as np
a = np.zeros(4)
a[:] = np.nan
The a[:] broadcasts across all elements of the array a and assigns the value np.nan
There does not seem to be a separate method for setting up these arrays automatically.
But one other option is to use full method for numpy >= 1.8+
import numpy as np
a = np.full(4, np.nan)
There is an excellent discussion of this on stackoverflow including performance comparisons!
Q: How do you combine pd.DataFrame¶
There are a number of ways of doing this. After brushing up on pandas docs here are some methods that will be useful.
In the exercise notebook we saved data for usa and others which can be combined:
combined = usa.append(others)
combined.T.plot()
This appends additional rows. Given the columns are the same in each dataframe this a simple option.
We had a discussion focused around combining dataframes and had a look at:
pd.concat(works to appendrowsby default)pd.join(works to appendcolumnsby default)
but they can be used on different axis by specifying axis=1 etc.
so we could have used
pd.concat([usa, others])
or with the usa.T, others.T data which is a format more convenient for plotting with countries as columns.
pd.join([usa.T, others.T])
or by changin the axis
pd.concat([usa.T, others.T], axis=1)
Q: Will we cover more of statsmodels¶
Yes, we can do more statsmodels in Session 7.
There is also a package called linearmodels that we can take a look at as well that includes support for Panels etc.
Still doing some research around high dimensional fixed effects
models in python.
There is linearmodels.iv.absorbing.AbsorbingLS in the IV family of models where \(z\) may be high-dimension.
Q: Parsing data and constructing Age Categories¶
I have put together a notebook
discussing some possible options in pure python using a
dictionary to store the results and a comparison with pandas
using categoricals.
You can run this notebook here
I have received some sample data from a participant so we can use
that in this weeks Tutorial to review using real world data.