Tag Archives: 2013

From Words to Networks – Information and Relation Extraction from Text Data and Network Analysis

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Jana Diesner, PhD, Assistant Professor at the iSchool/Graduate School of Library and Information Science (GSLIS), University of Illinois at Urbana-Champaign (UIUC)

Date:

30.09.2013, 14:00 – 17:30
01.10.2013, 14:00 – 17:30
02.10.2013, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English

Contents:

1. What is covered in the workshop? What will you learn?

This interdisciplinary workshop introduces you to selected fundamental theories, concepts, methods and applications for bringing together text analysis and network analysis. You will learn how to conduct data analysis at the nexus of these areas in an informed, systematic and efficient fashion, and how to:

  • Construct semantic networks and socio-technical networks from unstructured, natural language text data.
  • Visualize and analyze network data.
  • Interpret network analysis results.

Throughout the workshop, we will discuss practical applications from the academic, administrative and business domain. At the end of the workshop, you will be able to design and conduct research projects for scholarly and commercial use in these fields.

Semantic networks are structured representations of information and knowledge. Socio-technical networks represent interactions between social agents, infrastructures and information. The functioning and dynamics of these networks involve the continuous production, processing and flow of information. This information is often available as text data, and can serve as a single or complementary source of information about networks. Examples for data sources include news wire data, scientific information such as publications and patents, communication data such as conversations transcripts and emails, self-presentations such as mission statements and annual reports, and social media data such as tweets and wikis. Using text data to construct or enhance network data has been used to answer questions such as:

  • Who is talking to whom, and about what?
  • What are the mental models of individuals or groups about certain topics?
  • How do memes and innovations emerge and spread in society and online?
  • Who are the key entities in a network?
  • What benefits and risks result from an observed network structure for an organization and its wider context?

The main component of this workshop is to teach to you practical, hands-on skills in working with text analysis and network analysis tools. You will perform basic natural language processing techniques on the lexical, syntactic and semantic level including:

  • Pre-process texts with techniques such as reference resolution, stemming and parts of speech tagging.
  • Identify salient concepts and themes from single documents and entire text collections.
  • Create and apply codebooks, which are also known as dictionaries or thesauri.
  • Locate and classify entities that can serve as nodes for networks. We will move beyond the classic set of entity classes (people, organizations, locations) to also consider other classes that relevant for studying social processes and culture, e.g. tasks, resources and knowledge.
  • Relation Extraction, linking entities into edges based on various criteria.

You will also perform basic network analysis techniques, including:

  • Manipulate and visualize network data.
  • Compute basic network metrics on the graph and node level.
  • Identify meaningful groups and clusters of nodes.

Going from texts to networks involves some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as “Computational Thinking” – a basic skill like reading, writing and arithmetic that is crucial for solving problems and understanding human behavior across fields (Wing 2006). In this workshop, you are introduced to Computational Thinking and practice applying this way of thinking.

3. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of attendants from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with text analysis and network analysis methods and tools.

4. What to bring to the workshop?

Software: Prior to the workshop, we will send an email to confirmed participants with links to the software tools that we will use for the workshop. You are invited to bring a laptop to the workshop. If you cannot bring a laptop you will still fully benefit from the workshop as we screen-project all live walk-through exercises. At the workshop, we will provide you with a tutorial document and further learning resources.

Data: You can work with the sample data that we provide you with and/ or bring your own data. If you bring your own text data, we recommend a sample of not more than 20 text documents of less than two pages in length, and network data with not more than 200 nodes. The tools we use scale up to larger data sets, but large-scale data might not be practical for training purposes.

5. Readings

Prior to the workshop, we ask people to go read the following overviews on the concepts and methods addressed in the workshop (copies of both papers will be emailed to confirmed participants prior to the workshop):

  • Diesner, J., Carley, K. M. (2011): Semantic Networks. In G. Barnett (Ed), Encyclopedia of Social Networking, (pp. 595-598). Sage Publications.
  • Diesner, J., Carley, K. M. (2011): Words and Networks. In G. Barnett (Ed.), Encyclopedia of Social Networking, (pp. 958-961). Sage Publications.

All further readings are optional:

The instructor is available for pointing participants to further readings in their areas of interest.

6. Information about the instructor

Jana Diesner is an Assistant Professor at the iSchool (a.k.a. Graduate School of Library and Information Science) at the University of Illinois at Urbana-Champaign. Jana conducts research at the nexus of network science, natural language processing and machine learning. With her work, she aims to advance the understanding and computational analysis of the interplay and co-evolution of information and socio-technical networks. She develops and analyzes methods and technologies for extracting information about networks from text data and considering the substance of information for network analysis. In her empirical work, she studies networks from the business, science and geopolitical domain. She is particularly interested in covert information and covert networks. Jana obtained her PhD from Carnegie Mellon University, School of Computer Science. She has taught the “Words to Networks” workshop 24 times before at various institutions, and also teaches courses on Social Computing, Network Analysis and Digital Humanities. For more information about Jana see http://people.lis.illinois.edu/~jdiesner/.

7. Questions?

Contact Jana with any questions about the workshop.

You have to register for the 7th International Research Workshop to participate in this course.

Case Study Research

Institution: see Organisers & Acknowledgements

Program of study: International Research Workshop

Lecturer: Miriam Wilhelm (University of Groningen)

Date:

30.09.2013, 14:00 – 17:30
01.10.2013, 14:00 – 17:30

Room: n.s.

Max. number of participants: 20

Semester periods per week: n.s.

Credit Points: 5 CP for participating in the whole IRWS

Language of instruction: English/German (depending on participants)

Contents:

In this course participants will learn to design, conduct and publish case studies.

After participating in this course students will gain enhanced knowledge on the process of conducting a case study. Students must not possess prior knowledge with actual case study research but they should work on a research question that is in principle suitable for a case study design.

Day 1: Learning about case studies

  • Case study design
  • Case study process
  • Quality criteria for case study research
  • Day 2: Doing case studies
    • Paper discussion
    • Publishing with case studies

    Literature

    Eisenhardt, K.N. (1989): Building theories from case study research. Academy of Management Review, 14(4): 532-550.

    You have to register for the 7th International Research Workshop to participate in this course.

    Introduction to Survival Analysis

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Andrea Schäfer (University of Bremen)

    Date:

    30.09.2013, 14:00 – 17:30
    01.10.2013, 14:00 – 17:30
    02.10.2013, 14:00 – 17:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English/German (depending on participants)

    Contents:

    The goal of this course is to give an introduction to the topic of survival (or time to event) analysis and describes selected methods used for modeling and evaluating survival data. General statistical concepts and methods discussed in this course include survival and hazard functions, Kaplan-Meier estimator and graph, Cox proportional hazards model and parametric models. Accordingly, we will explore the different types of censoring and truncation and, discover the properties of the survival and hazard function. You will learn the derivation and use of Kaplan-Meier non-parametric estimates and learn how to plot the KM and test for differences between groups. Further, we explore the motivation, strength and limits of Cox’s semi-parametric proportional hazard model and know how to fit the model. Finally we will recap the basis of parametric models. For our computer sessions we will be using a sample of the SOEP (Socio-economic Panel) data set. The course requires participants to use STATA to analyze survival analysis data.
    In this course, you will learn about:

    • The goal, problem and strengths of survival analysis
    • Differences of survival analysis methods
    • Censoring and truncation (concepts and types)
    • The distribution of failure times (functions, rates and ratio, data layout, descriptive statistics)
    • Basics of non-parametric analysis (estimating Kaplan Meier estimator and comparing curves, graphing)
    • Basics of semi-parametric analysis (model definition and features, understanding and estimating Cox’s PH model)
    • Basics of parametric analysis (forms of distributions)

    Literature

    Cleves, Mario; William Gould, Roberto G. Gutierrez, and Yulia V. Marchenko (2010): An Introduction to Survival Analysis Using Stata, (3nd ed), Stata Press.

    Kleinbaum, David G. and Klein, Mitchel (2005): Survival analysis: a self-learning text (2nd ed), Springer.

    You have to register for the 7th International Research Workshop to participate in this course.

    Questionnaire Design

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Timo Lenzner (GESIS – Leibniz Institute for the Social Sciences)

    Date:

    30.09.2013, 14:00 – 17:30
    01.10.2013, 14:00 – 17:30
    02.10.2013, 14:00 – 17:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English

    Contents:

    The objective of this course is to give participants a thorough grounding in questionnaire design and to introduce them to principles that can be applied to write survey questions. It covers the general principles of questionnaire design, question wording and construction of answer formats, special issues faced in writing factual, attitudinal and sensitive questions, and an introduction to various methods of questionnaire pretesting. Sessions combine lectures with practical exercises and discussion.

    You have to register for the 7th International Research Workshop to participate in this course.

    Introduction to IAB Data

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Stefan Seth (IAB Nürnberg)

    Date:

    04.10.2013, 09:00 – 12:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English

    Contents:

    The Institute for Employment Research in Nuremberg has available a wealth of micro data on the German labor market and offers access to it in its Research Data Center (FDZ). The course’s goal is to arouse the participants’ interest in FDZ data and to guide their first steps into analyzing them. The focus will be on two large administrative data sets, namely the Sample of Integrated Employment Biographies (SIAB) and the Establishment History Panel (BHP). In hands-on sessions we will explore, cleanse and prepare the data, calculate durations, and implement simple imputation procedures. The course will also cover in some detail the IAB Establishment Panel, the FDZ’s most important survey data set, and the Linked Employer-Employee Dataset of the IAB (LIAB); other FDZ data will also be presented, but rather cursorily.

    FDZ website
    Overview of FDZ data

    You have to register for the 7th International Research Workshop to participate in this course.

    Introduction to the SOEP

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Elke Holst & Lea Kröger (SOEP at DIW)

    Date:

    01.10.2013, 09:00 – 12:30
    02.10.2013, 09:00 – 12:30

    Room: n.s.

    Max. number of participants: 25

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English

    Contents:

    The Socio-Economic Panel Study (SOEP) is a longitudinal study of private households in Germany. The panel provides information on all household members and was started in 1984. In 2011, there were more than 12,000 households with more than 21,000 persons sampled. Some of the many topics include household composition, occupational biographies, employment, earnings, health, well being, integration, values, lifestyles, and personality. The course gives an overview of the data structure and the research designs facilitated by longitudinal household studies that go beyond conventional surveys (household analysis, intergenerational analysis, life course research, etc.). In hands-on sessions using Stata, the course provides an applied introduction into the data retrieval, the construction of longitudinal data files, and illustrates some exemplary analyses.

    SOEP@DIW Berlin website:

    http://www.diw.de/soep (deutsch) or http://www.diw.de/en/soep (english)

    Reading the SOEP Desktop Campanion is a prerequisite for participation.

    You have to register for the 7th International Research Workshop to participate in this course.

    Introduction to MAXQDA for Case Studies

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Heiko Grunenberg (Leuphana University Lüneburg)

    Date:

    02.10.2013, 14:00 – 17:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English

    Contents:

    This workshop is directly affiliated to the course “Case Study Research”. We want to see, how the ideas and approaches of “Case Study Research” could be transacted with a software of qualitative research like MAXqda.

    It is not necessary to have deep knowledge about MAXqda, but please have a look at http://www.maxqda.com to understand the basic steps of computer assisted qualitative research.

    References

    Lewins, Ann/ Silver, Christina (2007): Using Software in Qualitative Research: A Step-By-Step Guide. SAGE: London.

    Gerring, John (2006): Case Study Research: Principles and Practices. Cambridge University Press: Cambridge.

    You have to register for the 7th International Research Workshop to participate in this course.

    Qualitative Inquiry and Content Analysis with MAXQDA

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Heiko Grunenberg (Leuphana University Lüneburg)

    Date:

    02.10.2013, 09:00 – 12:30
    04.10.2013, 09:00 – 12:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: German (the course Introduction to MAXQDA for Case Studies will be held in English and covers most topics of this course)

    Contents:

    MAXqda is a software to analyze textual data in a qualitative (but also quantitative) way. The course provides a basic introduction into the logic of the program and its broad possibilities. The goal is to enable you to use this tool accordingly to your own method of analysis. For this reason, everybody can practice our working-steps at an own computer. We will start at the very beginning and learn about the basic features of the program such as preparation and import of texts, basic analysis strategies and creation of codes, memos and variables. After this, we will focus on analysis strategies, simple and complex text retrievals and other procedures. At the end, we will take some excursions into teamwork funcions and quantitative content analysis of counting and numbers.

    References

    Corbin, Juliet/ Strauss, Anselm L. (2008): Basics of qualitative research: techniques and procedures for developing grounded theory. 3rd Edition. Los Angeles, Calif.: Sage Publ.

    Kuckartz, Udo (2010): Einführung in die computergestützte Analyse qualitativer Daten. VS-Verlag: Wiesbaden.

    Lewins, Ann/ Silver, Christina (2007): Using Software in Qualitative Research: A Step-By-Step Guide. SAGE: London.

    You have to register for the 7th International Research Workshop to participate in this course.

    Qualitative Methods: From Research Question to Study Design

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Dr. Anna Brake (Augsburg University)

    Date:

    30.09.2013, 09:00 – 12:30
    01.10.2013, 09:00 – 12:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English

    Contents:

    Methodological rigor is of vital importance for the success of an qualitative research project. Research question, methodological approach of data collection and strategies of (verbal) data analysis have to be well matched in order to ensure a compelling overall research process. The workshop aims at providing the opportunity to discuss these issues critically in the light of the participant’s own dissertation projects. It addresses Ph.D. students who seek further clarification for their methodological rationale in their qualitative study regarding sampling procedure, interview techniques, approaches for data analysis and others. Thus, we will not debate on general issues of methodological importance, but focus on the methodologically demanding topics the participants are facing within their own qualitative study.

    Participants are kindly asked to submit a research abstract no later than two weeks before the beginning of the workshop to Anna Brake.

    You have to register for the 7th International Research Workshop to participate in this course.

    Data Analysis with R

    Institution: see Organisers & Acknowledgements

    Program of study: International Research Workshop

    Lecturer: Marco Lehmann (University of Hamburg)

    Date:

    30.09.2013, 09:00 – 12:30
    01.10.2013, 09:00 – 12:30
    02.10.2013, 09:00 – 12:30
    03.10.2013, 09:00 – 12:30

    Room: n.s.

    Max. number of participants: 20

    Semester periods per week: n.s.

    Credit Points: 5 CP for participating in the whole IRWS

    Language of instruction: English/German (depending on participants)

    Contents:

    The course introduces the programming language R used for statistical analyses. The beginning of each lecture comes with a demonstration of programming and statistical functions that will be elaborated in the course of study. The students will then practice with many statistical examples. In addition to statistical functions the course will introduce the definition of R as a programming language and its syntax rules. Students will further learn to use R’s scripting capabilities.

    Literatur

    Wollschläger, Daniel (2012). Grundlagen der Datenauswertung mit R (2. Aufl.). Berlin: Springer.

    You have to register for the 7th International Research Workshop to participate in this course.