Sampling naturalistic data: how much is enough? 

 An issue inherent in studying language acquisition is the sheer difficulty of acquiring enough accurate naturalistic data.  In particular, since many questions hinge on what language input kids hear - and what language mistakes and capabilities kids show - it's important to have an accurate way of measuring both of these things.  Unfortunately, short of following a child around all day with a tape recorder (which people have done!), it's hard to get enough data to have an accurate record of low-frequency items and productions; it's also hard to know what would be enough.  Typically, researchers will record a child for a few hours at a time for a few weeks and then hope that this represents a good "sample" of their linguistic knowledge. 

 A paper by  Caroline Rowland  at the University of Liverpool, presented at the  BUCLD  conference in early November, attempts to assess the reliability of this sort of naturalistic data by comparing it to diary data.  Diary data is obtained by having the caregiver write down every single utterance produced by the child over a period of time; as you can imagine, this is difficult to persuade someone to do!  There are clear drawbacks to diary data, of course, not least of which is that as the child speaks more and more it becomes less and less accurate.  But because it has a much better likelihood of incorporating low-frequency utterances, it provides a good baseline comparison in that respect to naturalistic, tape-recorded data. 

 What Rowland and her coauthor found is perfectly in line with what is known about statistical sampling.  As the subsets of tape-recorded conversations got smaller, estimates of low-frequency terms became increasingly unreliable, and single segments less than three hours were nearly completely useless (as they said in the talk, they were "rubbish."  Oh how I love British English!)  It is also more accurate to use, say, four one-hour chunks from different conversations rather than one four-hour segment, as the former avoids "burstiness effects" that come from conversations and settings predisposing to certain topics. 

 Though this result isn't a surprise from a statistical sampling point of view, it is nice for the field to have some estimates of how little is "too little" (though of course how little depends somewhat on what you are looking for).  And the paper highlights important methodological issues for those of us who can't trail after small children with our notebooks 24 hours a day.