Astroinformatics school program - "Rise of the machines"

Rise of the machines: An introduction to machine learning

The school will take place over three days (Mon-Wed). It will be followed by the 2019 ANITA workshop.

Topics covered:

  • Data preparation and exploratory data analysis with pandas
  • Classification
    • Cross validation
    • Learning curves
    • Model tuning
    • Reporting
  • Regression
  • Clustering and dimensionality reduction
  • Introduction to deep learning
    • building a network from scratch (Artificial Neural Network example)
    • training and evaluation of the model
  • Convolutional Neural Networks
  • Transfer learning
  • Hands on examples and exercises throughout

Note: We will not cover rigorous mathematical working and proofs. There simply isn't enough time in a workshop format. We will provide links and material that you can and should read.

Speakers are listed below, followed by the schedule. Some sessions require preparation or specific software, details of which are listed here.

Lecture notes

Lecture notes (Jupyter notebooks) can be found here. You can clone the git repo using:

                    git clone https://github.com/ADACS-Australia/ANITA-school-2019
                    

Speakers

The workshop material was prepared by:

from the Curtin Institute for Computation, and

from the International Centre for Radio Astronomy Research.

Daniel and Rebecca will be teaching the hands on material.
We are currently confirming additional presenters on domain specific examples on the last day.

Location
The School will be held at Swinburne University of Technology. The School will be held in the following rooms (map):
  • Monday February 4th: BA403
  • Tuesday February 5th: AGSE202
  • Wednesday February 5th: AGSE202
Social Night
On Monday the 4th, we will be heading to a local park for a BBQ. If you would like to attend, please complete the strawpoll.
Schedule
Monday 4th February

8:30 - 9:00 —Registration and coffee —
9:00 - 10:30 Session I Introduction to ML
10:30 - 11:00 —Coffee Break —
11:00 - 12:30 Session II Data preparation and ML workflow overview
12:30 - 13:30 —Lunch —
13:30 - 15:00 Session III Classification & Regression
15:00 - 15:30 —Coffee Break —
15:30 - 17:00 Session IV Classification & Regression continued
17:00 —Social night —

Tuesday 5th February

9:00 - 10:30 Session I Clustering and Dimensionality Reduction
10:30 - 11:00 —Coffee Break —
11:00 - 12:30 Session II Clustering and Dimensionality Reduction continued
12:30 - 13:30 —Lunch —
13:30 - 15:00 Session III Intro to Deep Leanring, artificial neural networks (ANN) and convolutional neural networks (CNN)
15:00 - 15:30 —Coffee Break —
15:30 - 17:00 Session IV CNN continued

Wednesday 6th February

9:00 - 10:30 Session I Transfer learning
10:30 - 11:00 —Coffee Break —
11:00 - 12:30 Session II Transfer learning
12:30 - 13:30 —Lunch —
13:30 - 15:00 Session III domain specific examples/tutorials and hack session
15:00 - 15:30 —Coffee Break —
15:30 - 17:00 Session IV domain specific examples/tutorials and hack session
Preparation

Software

A working knowledge of Python and Jupyter notebooks is essential for this workshop. i.e. knowledge of basic data structures, operations and how to write scripts. The Python notebooks used throughout the workshop have been developed using Python 3.6.3.

Required packages:

  • NumPy: a fast numerical array structure and helper functions
  • pandas: a DataFrame structure to store data in memory and work with it easily and efficiently
  • matplotlib: a basic plotting library; most other plotting libraries are built on top of it
  • seaborn: a advanced statistical plotting library
  • scikit-learn: a machine learning package, more info here
  • Keras: a high-level API for implementing neural networks, more info here

Data

The datasets used in this workshop are either part of the machine learning packages or were compiled from the Galaxy Zoo DR1, the Sloan Digital Sky Survey (SDSS, using the DR9 SQL search), and the Galaxy And Mass Assembly (GAMA) survey. The N-body simulation was produced by Jonathan Diaz.

The notebooks are descriptive and comprehensive enough to be attempted at your own pace - a solution notebook is also provided. The lecture notes explain the intuition behind how different machine learning algorithms work.