Corpus Pattern Hunt
vocabularygrammaraccuracymainpairsmedium prep30-45 min
Students use a corpus search (COCA, SKELL, Youglish, or Google's built-in corpus behaviour) to find how a target word is actually used. They extract the three most frequent collocates and patterns, then write a sentence using each. The corpus replaces the dictionary as authority.
Data-driven learning (DDL) technique originating with Tim Johns (1991). Meta-analyses (Boulton & Cobb, 2017) show medium-to-large effect sizes for DDL on vocabulary and grammar acquisition.
Procedure
- Assign each pair a target item — often a word learners keep misusing: effect vs affect, make vs do, suggest + ? structure, -ing vs to infinitive after specific verbs.
- Search the corpus: pairs open SKELL or COCA and run a query on the target item.
- Extract patterns: from the concordance lines, pairs identify:
- The 3 most frequent word partners (collocates).
- 2 recurring grammatical patterns (suggest + -ing, suggest that + clause).
- 1 pattern that surprised them.
- Formulate a rule or observation in one sentence.
- Produce: each pair writes 3 original sentences using the patterns they found.
- Share: each pair teaches the class one thing they discovered.
Why It Works
- Authority shifts from coursebook to data: students see that language is what speakers actually do, not what a grammar book claims.
- Noticing is built in: finding the pattern requires looking at it carefully.
- Authentic input at scale: 20 concordance lines give more context than any textbook.
- Empowers autonomous learning: students who learn to use a corpus can self-correct outside class.
Good Target Items for DDL
| Type | Example |
|---|---|
| Confusable pairs | affect / effect, advice / advise, whose / who's |
| Pattern-heavy verbs | suggest, recommend, mind (+ -ing vs that-clause) |
| Collocation-sensitive nouns | decision, impact, research (+ which verbs?) |
| Learner-typical errors | discuss about (no preposition), information (uncountable) |
| Hedging | tend to, may, it seems that |
Recommended Tools
- SKELL — learner-friendly, one-click pattern display, best first corpus for students.
- COCA — largest free corpus; more powerful but requires more training.
- Youglish — video examples from YouTube; great for pronunciation.
- Google Books Ngram Viewer — for frequency trends across decades.
Variations
- Translation-as-hypothesis: L1 translation suggests an L2 pattern; corpus check reveals whether the hypothesis is right.
- Error repair: give a sentence with a learner error; students use the corpus to find the conventional pattern and repair.
- Student-built mini-corpus: pairs collect 20 examples of a structure from articles they read; analyse together.
Tips
- Start small: 20 lines per search, not 500. Learners drown in unfiltered corpus output.
- Provide a worksheet scaffold the first 2–3 times: Fill in the three most common words before X. Fill in the three most common after X.
- Pair DDL with an explicit teaching moment — it surfaces patterns; you name them.
- Excellent for exam classes: IELTS writing candidates who can self-query for collocation make measurable writing gains.
Source
Johns, T. (1991) Classroom concordancing. Boulton & Cobb (2017) Corpus use in language learning: A meta-analysis. Language Learning, 67(2). SKELL at sketchengine.eu; COCA at english-corpora.org.