{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Web scraping examples\n", "\n", "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR116\n", "\n", "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR415\n", "\n", "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR416\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**\"Easy method\"**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "tables = pd.read_html(\"https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR116\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(tables)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Schulabgänge aus öffentlichen und privaten Schulen seit dem Schuljahr 1983/84 nach Abschlussarten Schularten Landkreis Esslingen
JahrAbschlussart1)
JahrohnemitMittlerer Abschluss2)FachhochschulreifeHochschulreife
JahrHauptschulabschlussHauptschulabschlussMittlerer Abschluss2)FachhochschulreifeHochschulreife
JahrAnzahlAnzahlAnzahlAnzahlAnzahl
01983/844782.4522.95171.382
11984/854072.2602.815141.483
21985/863972.0562.74191.382
31986/872892.0682.233271.361
41987/883771.7842.2171.414
51988/893331.7231.949101.280
61989/904331.6051.692151.193
71990/913691.6871.716121.104
81991/923551.5861.649101.068
91992/934721.5701.674101.023
101993/944071.5641.641191.056
111994/954241.5731.676121.038
121995/964681.5611.751161.061
131996/973511.5981.920151.023
141997/983571.6301.98212993
151998/993911.5811.954251.128
161999/004711.4881.947331.157
172000/014281.4841.890251.099
182001/024051.6361.975321.119
192002/033821.6721.905281.207
202003/043911.8162.177281.082
212004/053461.7132.085301.154
222005/063231.7342.215271.316
232006/072801.6512.215171.364
242007/082801.6402.242291.421
252008/092691.5092.244391.527
262009/102401.3632.225281.630
272010/112341.3782.196331.613
282011/122141.1742.225422.794
292012/132181.1402.463361.694
302013/142251.1692.558171.595
312014/151891.1342.518411.696
322015/162289872.374481.666
332016/172728542.391471.644
342017/182708132.241391.584
351) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...
\n", "
" ], "text/plain": [ " Schulabgänge aus öffentlichen und privaten Schulen seit dem Schuljahr 1983/84 nach Abschlussarten Schularten Landkreis Esslingen \\\n", " Jahr \n", " Jahr \n", " Jahr \n", " Jahr \n", "0 1983/84 \n", "1 1984/85 \n", "2 1985/86 \n", "3 1986/87 \n", "4 1987/88 \n", "5 1988/89 \n", "6 1989/90 \n", "7 1990/91 \n", "8 1991/92 \n", "9 1992/93 \n", "10 1993/94 \n", "11 1994/95 \n", "12 1995/96 \n", "13 1996/97 \n", "14 1997/98 \n", "15 1998/99 \n", "16 1999/00 \n", "17 2000/01 \n", "18 2001/02 \n", "19 2002/03 \n", "20 2003/04 \n", "21 2004/05 \n", "22 2005/06 \n", "23 2006/07 \n", "24 2007/08 \n", "25 2008/09 \n", "26 2009/10 \n", "27 2010/11 \n", "28 2011/12 \n", "29 2012/13 \n", "30 2013/14 \n", "31 2014/15 \n", "32 2015/16 \n", "33 2016/17 \n", "34 2017/18 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " Abschlussart1) \n", " ohne \n", " Hauptschulabschluss \n", " Anzahl \n", "0 478 \n", "1 407 \n", "2 397 \n", "3 289 \n", "4 377 \n", "5 333 \n", "6 433 \n", "7 369 \n", "8 355 \n", "9 472 \n", "10 407 \n", "11 424 \n", "12 468 \n", "13 351 \n", "14 357 \n", "15 391 \n", "16 471 \n", "17 428 \n", "18 405 \n", "19 382 \n", "20 391 \n", "21 346 \n", "22 323 \n", "23 280 \n", "24 280 \n", "25 269 \n", "26 240 \n", "27 234 \n", "28 214 \n", "29 218 \n", "30 225 \n", "31 189 \n", "32 228 \n", "33 272 \n", "34 270 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " mit \n", " Hauptschulabschluss \n", " Anzahl \n", "0 2.452 \n", "1 2.260 \n", "2 2.056 \n", "3 2.068 \n", "4 1.784 \n", "5 1.723 \n", "6 1.605 \n", "7 1.687 \n", "8 1.586 \n", "9 1.570 \n", "10 1.564 \n", "11 1.573 \n", "12 1.561 \n", "13 1.598 \n", "14 1.630 \n", "15 1.581 \n", "16 1.488 \n", "17 1.484 \n", "18 1.636 \n", "19 1.672 \n", "20 1.816 \n", "21 1.713 \n", "22 1.734 \n", "23 1.651 \n", "24 1.640 \n", "25 1.509 \n", "26 1.363 \n", "27 1.378 \n", "28 1.174 \n", "29 1.140 \n", "30 1.169 \n", "31 1.134 \n", "32 987 \n", "33 854 \n", "34 813 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " Mittlerer Abschluss2) \n", " Mittlerer Abschluss2) \n", " Anzahl \n", "0 2.951 \n", "1 2.815 \n", "2 2.741 \n", "3 2.233 \n", "4 2.217 \n", "5 1.949 \n", "6 1.692 \n", "7 1.716 \n", "8 1.649 \n", "9 1.674 \n", "10 1.641 \n", "11 1.676 \n", "12 1.751 \n", "13 1.920 \n", "14 1.982 \n", "15 1.954 \n", "16 1.947 \n", "17 1.890 \n", "18 1.975 \n", "19 1.905 \n", "20 2.177 \n", "21 2.085 \n", "22 2.215 \n", "23 2.215 \n", "24 2.242 \n", "25 2.244 \n", "26 2.225 \n", "27 2.196 \n", "28 2.225 \n", "29 2.463 \n", "30 2.558 \n", "31 2.518 \n", "32 2.374 \n", "33 2.391 \n", "34 2.241 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " Fachhochschulreife \n", " Fachhochschulreife \n", " Anzahl \n", "0 7 \n", "1 14 \n", "2 9 \n", "3 27 \n", "4 – \n", "5 10 \n", "6 15 \n", "7 12 \n", "8 10 \n", "9 10 \n", "10 19 \n", "11 12 \n", "12 16 \n", "13 15 \n", "14 12 \n", "15 25 \n", "16 33 \n", "17 25 \n", "18 32 \n", "19 28 \n", "20 28 \n", "21 30 \n", "22 27 \n", "23 17 \n", "24 29 \n", "25 39 \n", "26 28 \n", "27 33 \n", "28 42 \n", "29 36 \n", "30 17 \n", "31 41 \n", "32 48 \n", "33 47 \n", "34 39 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \n", " \n", " Hochschulreife \n", " Hochschulreife \n", " Anzahl \n", "0 1.382 \n", "1 1.483 \n", "2 1.382 \n", "3 1.361 \n", "4 1.414 \n", "5 1.280 \n", "6 1.193 \n", "7 1.104 \n", "8 1.068 \n", "9 1.023 \n", "10 1.056 \n", "11 1.038 \n", "12 1.061 \n", "13 1.023 \n", "14 993 \n", "15 1.128 \n", "16 1.157 \n", "17 1.099 \n", "18 1.119 \n", "19 1.207 \n", "20 1.082 \n", "21 1.154 \n", "22 1.316 \n", "23 1.364 \n", "24 1.421 \n", "25 1.527 \n", "26 1.630 \n", "27 1.613 \n", "28 2.794 \n", "29 1.694 \n", "30 1.595 \n", "31 1.696 \n", "32 1.666 \n", "33 1.644 \n", "34 1.584 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tables[0]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR116\n", "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR415\n", "https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR416\n" ] } ], "source": [ "base_url = \"https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab\"\n", "\n", "districts = [116, 415, 416]\n", "output = {}\n", "for dnum in districts:\n", " url = base_url + f\"?R=KR{dnum}\"\n", " print(url)\n", "\n", " output[dnum] = pd.read_html(url)[0]\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Schulabgänge aus öffentlichen und privaten Schulen seit dem Schuljahr 1983/84 nach Abschlussarten Schularten Landkreis Reutlingen
JahrAbschlussart1)
JahrohnemitMittlerer Abschluss2)FachhochschulreifeHochschulreife
JahrHauptschulabschlussHauptschulabschlussMittlerer Abschluss2)FachhochschulreifeHochschulreife
JahrAnzahlAnzahlAnzahlAnzahlAnzahl
01983/841961.3941.37814806
11984/851891.2921.3794887
21985/861861.2651.22012814
31986/871501.2511.1583782
41987/881531.1021.110790
51988/891801.0289855723
61989/902261.0079343673
71990/9118694684414661
81991/921959408993598
91992/932109687893586
101993/942169558144608
111994/952479678439566
121995/962569749039577
131996/972099761.03010602
141997/982311.0801.0347571
151998/992399991.02121633
161999/002669951.0768692
172000/011999619728698
182001/022231.1211.08916701
192002/032211.0991.13810729
202003/041991.1461.09624706
212004/051791.0691.20534730
222005/062001.1601.21935772
232006/071661.0681.15019856
242007/081731.0801.1619857
252008/091619811.20222876
262009/101429251.19813910
272010/111108591.08220921
282011/121366861.233131.662
292012/131607061.45634869
302013/141456321.36319832
312014/151296991.41120898
322015/161196491.41211919
332016/171555331.30915899
342017/181824761.30522886
351) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...1) Allgemeinbildende Schulen; ohne Schulen des...
\n", "
" ], "text/plain": [ " Schulabgänge aus öffentlichen und privaten Schulen seit dem Schuljahr 1983/84 nach Abschlussarten Schularten Landkreis Reutlingen \\\n", " Jahr \n", " Jahr \n", " Jahr \n", " Jahr \n", "0 1983/84 \n", "1 1984/85 \n", "2 1985/86 \n", "3 1986/87 \n", "4 1987/88 \n", "5 1988/89 \n", "6 1989/90 \n", "7 1990/91 \n", "8 1991/92 \n", "9 1992/93 \n", "10 1993/94 \n", "11 1994/95 \n", "12 1995/96 \n", "13 1996/97 \n", "14 1997/98 \n", "15 1998/99 \n", "16 1999/00 \n", "17 2000/01 \n", "18 2001/02 \n", "19 2002/03 \n", "20 2003/04 \n", "21 2004/05 \n", "22 2005/06 \n", "23 2006/07 \n", "24 2007/08 \n", "25 2008/09 \n", "26 2009/10 \n", "27 2010/11 \n", "28 2011/12 \n", "29 2012/13 \n", "30 2013/14 \n", "31 2014/15 \n", "32 2015/16 \n", "33 2016/17 \n", "34 2017/18 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " Abschlussart1) \n", " ohne \n", " Hauptschulabschluss \n", " Anzahl \n", "0 196 \n", "1 189 \n", "2 186 \n", "3 150 \n", "4 153 \n", "5 180 \n", "6 226 \n", "7 186 \n", "8 195 \n", "9 210 \n", "10 216 \n", "11 247 \n", "12 256 \n", "13 209 \n", "14 231 \n", "15 239 \n", "16 266 \n", "17 199 \n", "18 223 \n", "19 221 \n", "20 199 \n", "21 179 \n", "22 200 \n", "23 166 \n", "24 173 \n", "25 161 \n", "26 142 \n", "27 110 \n", "28 136 \n", "29 160 \n", "30 145 \n", "31 129 \n", "32 119 \n", "33 155 \n", "34 182 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " mit \n", " Hauptschulabschluss \n", " Anzahl \n", "0 1.394 \n", "1 1.292 \n", "2 1.265 \n", "3 1.251 \n", "4 1.102 \n", "5 1.028 \n", "6 1.007 \n", "7 946 \n", "8 940 \n", "9 968 \n", "10 955 \n", "11 967 \n", "12 974 \n", "13 976 \n", "14 1.080 \n", "15 999 \n", "16 995 \n", "17 961 \n", "18 1.121 \n", "19 1.099 \n", "20 1.146 \n", "21 1.069 \n", "22 1.160 \n", "23 1.068 \n", "24 1.080 \n", "25 981 \n", "26 925 \n", "27 859 \n", "28 686 \n", "29 706 \n", "30 632 \n", "31 699 \n", "32 649 \n", "33 533 \n", "34 476 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " Mittlerer Abschluss2) \n", " Mittlerer Abschluss2) \n", " Anzahl \n", "0 1.378 \n", "1 1.379 \n", "2 1.220 \n", "3 1.158 \n", "4 1.110 \n", "5 985 \n", "6 934 \n", "7 844 \n", "8 899 \n", "9 789 \n", "10 814 \n", "11 843 \n", "12 903 \n", "13 1.030 \n", "14 1.034 \n", "15 1.021 \n", "16 1.076 \n", "17 972 \n", "18 1.089 \n", "19 1.138 \n", "20 1.096 \n", "21 1.205 \n", "22 1.219 \n", "23 1.150 \n", "24 1.161 \n", "25 1.202 \n", "26 1.198 \n", "27 1.082 \n", "28 1.233 \n", "29 1.456 \n", "30 1.363 \n", "31 1.411 \n", "32 1.412 \n", "33 1.309 \n", "34 1.305 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \\\n", " \n", " Fachhochschulreife \n", " Fachhochschulreife \n", " Anzahl \n", "0 14 \n", "1 4 \n", "2 12 \n", "3 3 \n", "4 – \n", "5 5 \n", "6 3 \n", "7 14 \n", "8 3 \n", "9 3 \n", "10 4 \n", "11 9 \n", "12 9 \n", "13 10 \n", "14 7 \n", "15 21 \n", "16 8 \n", "17 8 \n", "18 16 \n", "19 10 \n", "20 24 \n", "21 34 \n", "22 35 \n", "23 19 \n", "24 9 \n", "25 22 \n", "26 13 \n", "27 20 \n", "28 13 \n", "29 34 \n", "30 19 \n", "31 20 \n", "32 11 \n", "33 15 \n", "34 22 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... \n", "\n", " \n", " \n", " Hochschulreife \n", " Hochschulreife \n", " Anzahl \n", "0 806 \n", "1 887 \n", "2 814 \n", "3 782 \n", "4 790 \n", "5 723 \n", "6 673 \n", "7 661 \n", "8 598 \n", "9 586 \n", "10 608 \n", "11 566 \n", "12 577 \n", "13 602 \n", "14 571 \n", "15 633 \n", "16 692 \n", "17 698 \n", "18 701 \n", "19 729 \n", "20 706 \n", "21 730 \n", "22 772 \n", "23 856 \n", "24 857 \n", "25 876 \n", "26 910 \n", "27 921 \n", "28 1.662 \n", "29 869 \n", "30 832 \n", "31 898 \n", "32 919 \n", "33 899 \n", "34 886 \n", "35 1) Allgemeinbildende Schulen; ohne Schulen des... " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "output[415]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Slightly more involved**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import io\n", "import requests" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "download_link = \"https://www.statistik-bw.de/BildungKultur/SchulenAllgem/13013020.tab?R=KR116&form=csv\"" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "res = requests.get(download_link)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Read it into Python_" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "pd.read_csv?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.\n", " pd.read_csv(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
012345
01983/844782.4522.95171.382
11984/854072.2602.815141.483
21985/863972.0562.74191.382
31986/872892.0682.233271.361
41987/883771.7842.2171.414
51988/893331.7231.949101.280
61989/904331.6051.692151.193
71990/913691.6871.716121.104
81991/923551.5861.649101.068
91992/934721.5701.674101.023
101993/944071.5641.641191.056
111994/954241.5731.676121.038
121995/964681.5611.751161.061
131996/973511.5981.920151.023
141997/983571.6301.98212993.000
151998/993911.5811.954251.128
161999/004711.4881.947331.157
172000/014281.4841.890251.099
182001/024051.6361.975321.119
192002/033821.6721.905281.207
202003/043911.8162.177281.082
212004/053461.7132.085301.154
222005/063231.7342.215271.316
232006/072801.6512.215171.364
242007/082801.6402.242291.421
252008/092691.5092.244391.527
262009/102401.3632.225281.630
272010/112341.3782.196331.613
282011/122141.1742.225422.794
292012/132181.1402.463361.694
302013/142251.1692.558171.595
312014/151891.1342.518411.696
322015/16228987.0002.374481.666
332016/17272854.0002.391471.644
342017/18270813.0002.241391.584
\n", "
" ], "text/plain": [ " 0 1 2 3 4 5\n", "0 1983/84 478 2.452 2.951 7 1.382\n", "1 1984/85 407 2.260 2.815 14 1.483\n", "2 1985/86 397 2.056 2.741 9 1.382\n", "3 1986/87 289 2.068 2.233 27 1.361\n", "4 1987/88 377 1.784 2.217 – 1.414\n", "5 1988/89 333 1.723 1.949 10 1.280\n", "6 1989/90 433 1.605 1.692 15 1.193\n", "7 1990/91 369 1.687 1.716 12 1.104\n", "8 1991/92 355 1.586 1.649 10 1.068\n", "9 1992/93 472 1.570 1.674 10 1.023\n", "10 1993/94 407 1.564 1.641 19 1.056\n", "11 1994/95 424 1.573 1.676 12 1.038\n", "12 1995/96 468 1.561 1.751 16 1.061\n", "13 1996/97 351 1.598 1.920 15 1.023\n", "14 1997/98 357 1.630 1.982 12 993.000\n", "15 1998/99 391 1.581 1.954 25 1.128\n", "16 1999/00 471 1.488 1.947 33 1.157\n", "17 2000/01 428 1.484 1.890 25 1.099\n", "18 2001/02 405 1.636 1.975 32 1.119\n", "19 2002/03 382 1.672 1.905 28 1.207\n", "20 2003/04 391 1.816 2.177 28 1.082\n", "21 2004/05 346 1.713 2.085 30 1.154\n", "22 2005/06 323 1.734 2.215 27 1.316\n", "23 2006/07 280 1.651 2.215 17 1.364\n", "24 2007/08 280 1.640 2.242 29 1.421\n", "25 2008/09 269 1.509 2.244 39 1.527\n", "26 2009/10 240 1.363 2.225 28 1.630\n", "27 2010/11 234 1.378 2.196 33 1.613\n", "28 2011/12 214 1.174 2.225 42 2.794\n", "29 2012/13 218 1.140 2.463 36 1.694\n", "30 2013/14 225 1.169 2.558 17 1.595\n", "31 2014/15 189 1.134 2.518 41 1.696\n", "32 2015/16 228 987.000 2.374 48 1.666\n", "33 2016/17 272 854.000 2.391 47 1.644\n", "34 2017/18 270 813.000 2.241 39 1.584" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv(\n", " io.StringIO(res.text), delimiter=\";\",\n", " header=None, skiprows=6, skipfooter=3\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Save file to computer_" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "with open(\"test.csv\", \"w\") as f:\n", " f.write(res.text)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Read pdf**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "res = requests.get(\n", " \"http://www.africau.edu/images/default/sample.pdf\"\n", ")" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "with open(\"test.pdf\", \"wb\") as f:\n", " f.write(res.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://pypi.org/project/PyMuPDF/\n", "\n", "Installing camelot: https://camelot-py.readthedocs.io/en/master/user/install.html#pip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Uncomment to install package\n", "# !pip install PyMuPDF" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "import fitz\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "doc = fitz.open(\"test.pdf\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "' A Simple PDF File \\n This is a small demonstration .pdf file - \\n just for use in the Virtual Mechanics tutorials. More text. And more \\n text. And more text. And more text. And more text. \\n And more text. And more text. And more text. And more text. And more \\n text. And more text. Boring, zzzzz. And more text. And more text. And \\n more text. And more text. And more text. And more text. And more text. \\n And more text. And more text. \\n And more text. And more text. And more text. And more text. And more \\n text. And more text. And more text. Even more. Continued on page 2 ...\\n'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "doc.get_page_text(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clicking through pages and filling in boxes:\n", "\n", "https://pypi.org/project/pyppeteer/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }