Tutorials IASCL 2017

Four tutorials will be proposed during IASCL 2017, on Monday, July 17th.

Tutorial fees

  • Half day tutorial: 30 euros
  • Full day tutorial: 50 euros


Brian MacWhinney (Carnegie Mellon University) –
Alex Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique, UMR CNRS, ENS-DEC, EHESS) –
Melanie Soderstrom (University of Manitoba) –
Marisa Casillas (Max Planck Institute for Psycholinguistics) –

8:30-12:00 | ROOM BR19
Registration fee: 30 euros
Max number of participants: 30
Prerequisites: none

Do you have LENA or other large-scale, naturalistic recordings you want to share with other researchers while protecting the privacy of the participants? Do you have a great research question about children's real-world language experiences and want to gain access to existing extensive child-centered recordings? Do you want to connect with other researchers studying children's real-world language experiences? Then you will find this workshop on HomeBank methods useful.

HomeBank is a public, permanent, extensible, online database of daylong audio recorded in naturalistic environments which serves two primary purposes:

  1. HomeBank provides a Web-based repository for raw audio and associated files. Associated files include metadata such as participant demographics and clinical diagnostics, automated annotations, and human-generated transcriptions and annotations. Many recordings use the child-perspective LENA recorders (LENA Research Foundation, Boulder, Colorado, United States), but other extensive recordings and metadata can be accommodated.
  2. HomeBank provides processing and analysis tools for HomeBank data and similar data sets. HomeBank facilitates researchers' access to large-scale data and tools, linking the acoustic, auditory, and linguistic characteristics of children's environments with a variety of variables including socioeconomic status, family characteristics, language trajectories, and disorders.

In this tutorial we will:
  • Describe the structure of HomeBank and provide an overview of current contents,
  • Explain how to donate files, and how to become a registered member to access confidential files,
  • Provide advice on ethical acquisition, vetting and sharing of large-scale recordings,
  • Provide a demonstration of HomeBank functionality in terms of conversion from ITS to CHAT format, and automatic analysis through scripts available at GITHUB.
  • Describe the installation and use of SpeechKitchen methods for automatic speech recognition processing,
  • Hold a roundtable to discuss specific projects and needs, and
  • Provide a forum for networking with other researchers engaged in studies with real-world child language recordings

Required equipment
Each participant needs to bring their own laptop (Wi-Fi access will be provided)


Tom Fritzsche (University of Postdam) –

8:30-12:00 and 1:30-5:00 | ROOM BR32
Registration fee: 50 euros
Max number of participants: 25
Prerequisites: There are no prerequisites. Basic knowledge of experimental research would be helpful and some familiarity with R ( is useful for the data analysis part.

The program will be the following:
Part 1. Introduction

  • Basics
  • Paradigms
  • Preparation
  • Technical issues

Part 2. Data & analysis
  • Eye-tracking measures
  • Data processing
  • Statistical analysis
  • Hands-on example (using R)

The course is aimed specifically at language acquisition research with infants and (preliterate) children. Therefore, reading paradigms are not covered. If you have any specific questions or wishes you can let Tom Fritzsche know in advance (mailto:

Required software and equipment For the hands-on example in the second part it would be advisable that every participant brought their own laptop with the software R installed.


Gard Jenset (University of Oxford) –

8:30-12:00 and 1:30-5:00 | ROOM BR33
Registration fee: 50 euros
Max number of participants: 20
Prerequisites: none

This hands-on workshop will guide participants through the basics of using the statistical environment R for corpus data analysis. The topics will include familiarization with the R interface, data types in R, loading and saving data, creating summary statistics and frequency lists, as well as some examples of simple text processing. The workshop will deal with how to use R’s powerful capability for generating high-quality plots, as well as examples of exploratory and confirmatory analysis. Additionally, the workshop will deal with how to set up a work flow, saving code and data, and some examples of how to best format your data. No previous knowledge of programming or statistics is assumed.

Part 1. Getting started with R
The first session will focus on the benefits of using R for corpus linguistic research and hands-on familiarization with R, including:

  • The R command interface
  • Basic R syntax
  • R data types
  • R data structures
  • Functions
  • Loading and saving data
  • How to format data
  • Typical operations on data
  • The power of R packages
  • Reproducible research
  • Working in R StudioAnalyzing data with R

Part 2. Analyzing data with R
The second session will give an overview of quantitative analysis of corpus data in R. We will see how common statistical tests, such as the Chi-square test are not always useful or informative in corpus linguistics and explore some alternatives. Some of the topics that will be discussed are:
  • Summary statistics of corpus data in R
  • Exploratory statistics in R
  • How to generate plots and graphs
  • Some examples of statistical tests and models, with focus on exploratory techniques

Required software and equipment
The workshop is a hands-on technical introduction which requires you to bring a laptop with the following software installed.

  • You will need to install a recent version of R, freely available for Windows, Mac, and Linux operating systems on
  • You will also need either:
  • R Studio (an integrated R environment), freely available for Windows, Mac, and Linux operating systems on Choose the open source desktop version. Or:
  • A text editor (not a word processor like MS Word). ). Most computers come with a basic text editor (such as Notepad for Windows) which will be sufficient. Other editors have R syntax highlighting included, such as Tinn-R or Notepad++. However, the recommended editor is R Studio.
  • You need to have R installed before the session, so it is recommended that you install it and test that the installation is working well in advance.


Séverine Maggio (Université Blaise Pascal, Clermont-Ferrand) –

1:30-5:00 | ROOM BR35
Registration fee: 30 euros
Max number of participants: 25
Prerequisites: Advanced knowledge on statistical analysis (ANOVA, multiple regressions, etc.) and familiarity with linguistic data. Participants need to be used to R environment.

This tutorial is dedicated to the use of mixed models in the statistical R environment. After a brief theoretical presentation of concepts, a practice will be proposed based on psycholinguistic data. Participants will have the opportunity to work on their own data. The following topics will be discussed:

  • Evaluation of the components of the variance using the empty/null model
  • Information criteria (-2 Log Likelihood, AIC, BIC,…): model-fit statistics
  • Assessment of the Intraclass Correlation Coefficient (ICC)
  • Random effects
  • Mixed effects
  • Chi-square test to compare models
  • Random intercepts and random slopes

Required software and equipment
Each participant needs to bring their own laptop with the software R installed.