Statistical Computing and Visualisation Tutorials

Tutorial 1: Mar 26, 9:00-12:30

Title: Introduction to Machine Learning with tidymodels

Presenter: Max Kuhn

This is for data scientists wanting to improve their modeling skills, and gain a better understanding of machine learning models.

About the presenter: Max Kuhn is a software engineer at Posit. He has been instrumental in improving the modeling capabilities in R. Max has a Ph.D. in Biostatistics and is the author of numerous R packages for techniques in machine learning. He has a background in the pharmaceutical industry. With Kjell Johnson, his book Applied Predictive Modeling, won the 2015 Ziegel award from the American Statistical Association, for the best book reviewed in Technometrics. He is the co-author of the book Tidy Modeling with R available at https://www.tmwr.org/.

Background: Participants should have a good working knowledge of R, and be familiar with basic data wrangling and visualisation as described in R for Data Science by Wickham and Grolemund (2016).

The tutorial will follow selected materials from https://www.tidymodels.org/ and https://www.tmwr.org/ and will be made available a few days prior to the tutorial.

Tutorial 2: Mar 26, 9:00-12:30

Title: Visualising spatial uncertainty

Presenter: Petra Kuhnert

This is for research scientists wishing to visualise uncertainty on maps and keen to gain a greater understanding of visualisations available in the Vizumap R package and how to generate them. The objectives for the tutorial are:

To educate attendees on the importance of providing uncertainties with their spatial predictions.
To provide an overview of the Vizumap R package, its key visualisations and main functions of the package.
To take workshop attendees through an example that they can follow online and produce Vizumap maps.
Run breakout sessions (3) where each group gets given a problem to visualise uncertainties on maps. They need to use Vizumap to produce each of the visualisations, answer some questions in relation to the produced maps for their problem, put together a brief presentation and report back to the broader group.

The structure follows:

9:00-9:05 Welcome and Introductions

9:05-10:00 Session 1: Introduction to the Vizumap R package and methods for visualising uncertainties on maps (incl Q&A)

10:00-10:15 BREAK

10:15-11:00 Session 2: Hands on tutorial that uses Vizumap to visualise uncertainties on maps using the GBR pollutant load modelling outputs

11:00-11:15 BREAK

11:15-12:00 Session 3: Breakout session for groups to work on one of 3 problems

12:00-12:25 Presentations from each group and discussion around the problem, choice of map and features and experience with Vizumap.

12:25-12:30 Close and final comments.

About the presenter: Petra Kuhnert is the Associate Science Director – University Engagement at CSIRO’s Data61. She has a PhD in Statistics from QUT, focussing more recently on the development of methods at the intersection between statistics and machine learning to develop more informative predictions for risk-based decision-making. In 2023, Petra was a recipient of two Women in AI (WAI) awards for her modelling work related to the Great Barrier Reef: 1st runner-up for the APAC Women in AI Innovator of the Year Award, and the APAC Women in AI Environment and Biodiversity Award 2023. Petra co-designed the award-winning software, "Vizumap" with Lydia Lucchesi. Vizumap is an innovative visualisation package written in the R programming language. More recently her focus has been towards the development of emulation methods using machine learning to speed up slow running physical systems and the use of space technologies (e.g. remote sensing) to enhance the predictions of terrestrial environmental and agricultural processes.

Background: Participants should have a good working knowledge of R, and some background in modelling, particularly models of spatial and/or spatio-temporal processes and the data underpinning those.

More details can be found at https://statsocaus.github.io/tutorial_vizumap/. Materials will be provided a few days prior to the tutorial.

Tutorial 3: Mar 26, 1:00-4:30

Title: Visualising high-dimensional data

Presenter: Di Cook

This is for scientists and data science practitioners who regularly work with high-dimensional data and models and are interested in learning how to better visualise them. You will learn about recognising structure in high-dimensional data, including clusters, outliers, non-linear relationships, and how this can be used with methods such as supervised classification, cluster analysis and non-linear dimension reduction. The course will be structured as follows:

1:00-1:20 Introduction: What is high-dimensional data, why visualise and overview of methods

1:20-1:45 Basics of linear projections, and recognising high-d structure

1:45-2:30 Effectively reducing your data dimension, in association with non-linear dimension reduction

2:30-3:00 BREAK and PRACTICAL EXERCISES

3:00-3:45 Understanding clusters in data using visualisation

3:45-4:30 Building better classification models with visual input

About the presenter: Dianne Cook is Professor of Business Analytics at Monash University in Melbourne, Australia. She is a world leader in data visualisation, especially the visualisation of high-dimensional data using tours with low-dimensional projections, and projection pursuit. She is currently focusing on bridging the gap between exploratory graphics and statistical inference. Di is a Fellow of the American Statistical Association, past editor of the Journal of Computational and Graphical Statistics, current editor of the R Journal, elected Ordinary Member of the R Foundation, and elected member of the International Statistical Institute.

Background: Participants should have a good working knowledge of R, and some background in multivariate statistical methods and/or data mining techniques.

More details can be found at https://statsocaus.github.io/tutorial_highd_vis/. Materials will be provided a few days prior to the tutorial.

Tutorial 1: Dec 1, 9:00-12:30

Title: Deploying your model code into Production with Microsoft Azure

This is for data science practitioners who write python or R code to build predictive models, and are interested in industry best practice for converting experimental code into reliable ML services for their users,

to learn about the processes and tools to deploy local python or R modelling code into high-quality, robust, scalable production code.
to obtain a practical overview of Microsoft Azure Machine Learning Studio

Presenter: Dean Marchiori is Director and Principal Data Scientist at Wave Data Labs where he consults on statistical modelling, applied mathematics and advanced analytics. Dean holds a BSc. in Mathematics with University Medal from Charles Sturt University, a Master of Applied Finance degree, and a Master of Applied Statistics from Macquarie University where he was awarded the Julian Leslie Prize in Statistics. He has been named one of the top 10 analytics leaders in Australia by the Institute of Analytics Professionals of Australia (IAPA). He is also recognised as an Accredited Statistician with the Statistical Society of Australia where he is co-chair of the committee for Statistical Computing and Visualisation.

Details:

Session 1: Intro to Azure Machine Learning Studio
Session 2: Model Training and Inference
Session 3: Deployment and Monitoring

Background: Active participation will require users to sign up for an account with Microsoft Azure. This is a paid services, but Microsoft offers new users with a $200 USD credit to use within 30 days, https://azure.microsoft.com/en-au/free.

Link to materials will be made available to registered participants, before the tutorial.

Tutorial 2: Dec 1, 9:00-12:30

Title: Interactive web applications with Shiny for R

This is for data science practitioners who are interested in writing and deploying web apps with shiny.

Presenter: Mitchell O'Hara-Wild is an award-winning teaching associate and research assistant at Monash University with particular expertise in R package development, data analysis and statistical computing. He holds a Bachelor of Commerce (Honours) with a major in econometrics, and a Bachelor of Science majoring in computational science and mathematical statistics. Mitchell is the developer of several widely used R packages and operates a data consultancy that runs workshops and provides data analysis and tools.

Details:

Session 1: Basics of shiny
- Overview of shiny, summarising its capabilities and use cases.
- Creating a user interface (ui) in R with simple inputs and outputs.
- Writing reactive server code to make a plot using the inputs provided.
- Combine these concepts and make your first shiny app.
Session 2: Reactive programming
- Understand how shiny server code works with reactive programming.
- Using the building blocks for reactivity: observe() changes, reactive() objects, using input$*, render*() output and isolate() code.
- The how and why of validating user inputs to safely run code.
Session 3: Publishing apps
- A brief introduction to the many shiny extension packages available.
- Some quick improvements to the style and appearance of our app using extension packages.
- Discuss options for deploying your app online, including with docker containers and the http://shinyapps.io/ service

Background: Participants are expected to be familiar with writing R code, but no prior experience in website development or shiny is required to get the most out of this workshop. A http://shinyapps.io/ account is needed to deploy your app in session 3 (but is not required if you just want to watch this part).

Link to materials will be made available to registered participants, before the tutorial.

Tutorial 3: Dec 1, 1:00-4:30

Title: Creating data plots for effective decision-making using statistical inference with R

This is for statisticians and data science practitioners who are interested in improving their data visualisation skills.

Presenter: Dianne Cook is Professor of Business Analytics at Monash University in Melbourne, Australia. She is a world leader in data visualisation, especially the visualisation of high-dimensional data using tours with low-dimensional projections, and projection pursuit. She is currently focusing on bridging the gap between exploratory graphics and statistical inference. Di is a Fellow of the American Statistical Association, past editor of the Journal of Computational and Graphical Statistics, current editor of the R Journal, elected Ordinary Member of the R Foundation, and elected member of the International Statistical Institute.

Details:

Session 1: Review of making effective plots using ggplot2's grammar of graphics, including
- Organising your data to enable mapping variables to graphical elements
- Common plot descriptions as scripts
- Do's and don'ts following cognitive perception principles

Session 2: Making decisions and inferential statements based on data plots, including
- What is your plot testing? Determining the hypothesis based on the type of plot
- Creating null samples to build lineups for comparison and testing
- Conducting a lineup test using your friends to determine whether what you see is real or spurious, and to determine the best design for your plot

Background: Participants should have a good working knowledge of R, and tidyverse, and some experience with ggplot2. Familiarity with the material in R4DS is helpful.

Link to materials will be made available to registered participants, before the tutorial.

Join us

The Statistical Society of Australia (SSA) is the home for professionals working in statistics. A place where you belong, connect with others, advance your career and feel inspired.

Our core purpose is to connect you with great people and great opportunities, so you can be successful in your current role and with your career aspirations.

Join now

Statistical Society of Australia (SSA)

PO Box 213

Belconnen ACT 2616 Australia

02 6251 3647

www.statsoc.org.au

ABN 82 853 491 081

Please direct enquiries to:

the SSA Team via email at

contact@statsoc.org.au

@StatSocAus

Privacy Security Sitemap

Website by Converge Design