Stockholm Trio university libraries

Online

Sep 22-30 2020

10.00 AM - 15.00 PM

Instructors: Thomas Lind, Joakim Philipson, Glenn Haya, Lina Andrén

Helpers: Rosa Lönneborg

General Information

The Carpentries aims to help researchers get their work done in less time and with less pain by teaching them basic research computing and data skills. This hands-on workshop will cover the basics of a data workflow, starting with SQL databases and going on to introduce data cleaning with OpenRefine and basic programmig and plotting using Python. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Click here to sign up for the workshop

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: This training will take place online. The instructors will provide you with the information you will need to connect to this meeting.

When: Sep 22-30 2020. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are dedicated to providing a positive and accessible learning environment for all. Please notify the instructors in advance of the workshop if you require any accommodations or if there is anything we can do to make this workshop more accessible to you.

Contact: Please email linaandr@kth.se for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Schedule

September 22

Before Pre-workshop survey
10:00 Databases and SQL
11:00 Morning break
11:15 Databases and SQL (continued)
12:00 Lunch break
13:00 Databases and SQL (continued)
14:00 Afternoon break
14:15 Databases and SQL (continued)
15:00 END

September 24

10:00 Data cleaning and OpenRefine
11:00 Morning break
11:15 Data cleaning and OpenRefine (countinued)
12:00 Lunch break
13:00 Data cleaning and OpenRefine (countinued)
14:00 Afternoon break
14:15 Data cleaning and OpenRefine (countinued)
15:00 END

Syllabus

Plotting and Programming in Python

  • Running and quiting
  • Variables and assignment
  • Data types and type conversion
  • Built-in functions and help
  • Loops and Conditionals
  • Libraries
  • Pandas DataFrames
  • Plotting
  • Lists
  • For loops
  • Conditionals
  • Looping over data sets
  • Writing functions
  • Variable Scope
  • Programming style
  • Reference...

Databases and SQL

  • Selecting data
  • Sorting and removing duplicates
  • Filtering
  • Calculating new values
  • Missing data
  • Aggregation
  • Combining data
  • Data hygiene
  • Creating and modifying data
  • Reference...

Open Refine

  • Working with OpenRefine
  • Filtering and sorting with OpenRefine
  • Examining numbers in OpenRefine
  • Using scripts
  • Exporting and saving data from OpenRefine
  • Other resources in OpenRefine
  • Reference...

We will use the following documents for taking collaborative notes during the workshop:

Notes for SQL, September 22

Notes for OpenRefine, September 24

Notes for Python, September 29-30


Setup

To participate in a Software Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

Install the videoconferencing client

If you haven't used Zoom before, go to the official website to download and install the Zoom client for your computer.

Set up your workspace

Like other Carpentries workshops, you will be learning by "coding along" with the Instructors. To do this, you will need to have both the window for the tool you will be learning about (a terminal, RStudio, your web browser, etc..) and the window for the Zoom video conference client open. In order to see both at once, we recommend using one of the following set up options:

This blog post includes detailed information on how to set up your screen to follow along during the workshop.

Python

Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.6 is fine).

We will teach Python using the Jupyter Notebook, a programming environment that runs in a web browser (Jupyter Notebook will be installed by Anaconda). For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

  1. Open https://www.anaconda.com/products/individual#download-section with your web browser.
  2. Download the Anaconda for Windows installer with Python 3. (If you are not sure which version to choose, you probably want the 64-bit Graphical Installer Anaconda3-...-Windows-x86_64.exe)
  3. Install Python 3 by running the Anaconda Installer, using all of the defaults for installation except make sure to check Add Anaconda to my PATH environment variable.

Video Tutorial

  1. Open https://www.anaconda.com/products/individual#download-section with your web browser.
  2. Download the Anaconda Installer with Python 3 for macOS (you can either use the Graphical or the Command Line Installer).
  3. Install Python 3 by running the Anaconda Installer using all of the defaults for installation.

Video Tutorial

  1. Open https://www.anaconda.com/products/individual#download-section with your web browser.
  2. Download the Anaconda Installer with Python 3 for Linux.
    (The installation requires using the shell. If you aren't comfortable doing the installation yourself stop here and request help at the workshop.)
  3. Open a terminal window and navigate to the directory where the executable is downloaded (e.g., `cd ~/Downloads`).
  4. Type
    bash Anaconda3-
    and then press Tab to autocomplete the full file name. The name of file you just downloaded should appear.
  5. Press Enter (or Return depending on your keyboard). You will follow the text-only prompts. To move through the text, press Spacebar. Type yes and press enter to approve the license. Press Enter (or Return) to approve the default location for the files. Type yes and press Enter (or Return) to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).
  6. Close the terminal window.

SQLite

SQL is a specialized programming language used with databases. We use a database manager called SQLite in our lessons.

  • Run "Git Bash" from the Start menu
  • Copy the following curl -fsSL https://linajandren.github.io/2020-09-22-stockholmtrio-online/getsql.sh | bash
  • Paste it into the window that Git Bash opened. If you're unsure, ask an instructor for help
  • You should see something like 3.27.2 2019-02-25 16:06:06 ...

If you want to do this manually, download sqlite3, make a bin directory in the user's home directory, unzip sqlite3, move it into the bin directory, and then add the bin directory to the path.

SQLite comes pre-installed on macOS.

SQLite comes pre-installed on Linux.

If you installed Anaconda, it also has a copy of SQLite without support to readline. Instructors will provide a workaround for it if needed.

OpenRefine

For this lesson you will need OpenRefine and a web browser. Note: this is a Java program that runs on your machine (not in the cloud). It runs inside a web browser, but no web connection is needed.

  1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.
  2. Download software from http://openrefine.org/
  3. Create a new directory called OpenRefine.
  4. Unzip the downloaded file into the OpenRefine directory by right-clicking and selecting "Extract ...".
  5. Go to your newly created OpenRefine directory.
  6. Launch OpenRefine by clicking openrefine.exe (this will launch a command prompt window, but you can ignore that - just wait for OpenRefine to open in the browser).
  7. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
  1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser. It may not run correctly in Safari.
  2. Download software from http://openrefine.org/.
  3. Create a new directory called OpenRefine.
  4. Unzip the downloaded file into the OpenRefine directory by double-clicking it.
  5. Go to your newly created OpenRefine directory.
  6. Launch OpenRefine by dragging the icon into the Applications folder.
  7. Use Ctrl-click/Open ... to launch it.
  8. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.
  1. Check that you have either the Firefox or the Chrome browser installed and set as your default browser. OpenRefine runs in your default browser.
  2. Download software from http://openrefine.org/.
  3. Make a directory called OpenRefine.
  4. Unzip the downloaded file into the OpenRefine directory.
  5. Go to your newly created OpenRefine directory.
  6. Launch OpenRefine by entering ./refine into the terminal within the OpenRefine directory.
  7. If you are using a different browser, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to use the program.