Just the Word (http://193.133.140.102/JustTheWord/) is gaining popularity with practitioners as well as researchers. WORDLE is a wonderful graphic interface to illustrate corpus frequency statistics. Few people are aware of the ADVANCED feature on WORDLE and how to ‘mash up’ input from a site like Just the Word.
Here is an example WORDLE based on high frequency collocates of RESEARCH using the pattern analysis of the BNC from Just the Word. I replaced the root RESEARCH with a bullet to make it less cluttered.
And here is how I did this:
WORDLE has an ‘advanced’ button (top right) that takes you to http://www.wordle.net/advanced – from here, you can specify not only the ‘size’ of the words, but also the colour.
For example, from Just the Word I generated the collocates of ‘RESEARCH’. I then did a little Excel ‘magic’ and sorted all the collocates by pattern, and filtered within the frequency range of 100 to 1000 (to produce a reasonable wordle not dominated by one or two really high frequency items). I then selected a different colour for each PATTERN. Because RESEARCH was the common root, I replaced it with a ‘bullet’ to make the graphic less dominated by the repeated word. I then put the data into the ADVANCED feature. See http://www.wordle.net/show/wrdl/2168943/Research_collocates
Here is the original filtered data from JTW. (I copied the JTW output, put it into EXCEL and then executed a few formulae to repeat the PATTERN and cluster data.)
research | FREQUENCY | cluster | PATTERN |
carry out research | 155 | cluster 1 | V obj *research* |
conduct research | 132 | cluster 1 | V obj *research* |
undertake research | 122 | cluster 2 | V obj *research* |
do research | 358 | cluster 3 | V obj *research* |
research show | 380 | cluster 1 | *research* subj V |
research suggest | 131 | cluster 1 | *research* subj V |
research have | 745 | cluster 4 | *research* subj V |
recent research | 171 | cluster 1 | ADJ *research* |
further research | 190 | cluster 9 | ADJ *research* |
more research | 115 | cluster 9 | ADJ *research* |
medical research | 242 | cluster 9 | ADJ *research* |
much research | 102 | cluster 9 | ADJ *research* |
own research | 153 | cluster 9 | ADJ *research* |
scientific research | 240 | cluster 9 | ADJ *research* |
social research | 182 | cluster 9 | ADJ *research* |
such research | 111 | cluster 9 | ADJ *research* |
market research | 425 | cluster 1 | N *research* |
Cancer research | 114 | cluster 2 | N *research* |
research into | 708 | cluster 2 | *research* PREP |
research on | 644 | cluster 2 | *research* PREP |
research in | 840 | cluster 2 | *research* PREP |
research by | 164 | cluster 2 | *research* PREP |
research at | 151 | cluster 2 | *research* PREP |
research department | 103 | cluster 1 | *research* N |
research group | 205 | cluster 1 | *research* N |
research institute | 214 | cluster 1 | *research* N |
research team | 151 | cluster 1 | *research* N |
research unit | 178 | cluster 1 | *research* N |
research study | 135 | cluster 2 | *research* N |
research work | 132 | cluster 2 | *research* N |
research method | 141 | cluster 3 | *research* N |
research programme | 316 | cluster 3 | *research* N |
research project | 482 | cluster 3 | *research* N |
research grant | 185 | cluster 5 | *research* N |
research council | 446 | cluster 7 | *research* N |
research center | 344 | cluster 7 | *research* N |
research finding | 128 | cluster 7 | *research* N |
research laboratory | 189 | cluster 7 | *research* N |
research student | 137 | cluster 7 | *research* N |
result of research | 117 | cluster 4 | N PREP *research* |
center for research | 109 | cluster 5 | N PREP *research* |
research and development | 359 | cluster 1 | *research* and N |
our research | 148 | cluster 1 | article *research* |
some research | 140 | cluster 1 | article *research* |
this research | 262 | cluster 1 | article *research* |
their research | 171 | cluster 1 | article *research* |
my research | 111 | cluster 1 | article *research* |
Here is the data coded for WORDLE (which I pasted into the ADVANCED feature of WORDLE–the number is the FREQUENCY, and the HEX value is the HTML colour code.) Note that I’ve replaced the word RESEARCH with a bullet.
carry out•:155:4411AA
conduct•:132:4411AA
undertake•:122:4411AA
do•:358:4411AA
•show:380:00FF48
•suggest:131:00FF48
•have:745:00FF48
recent•:171:6280AA
further•:190:6280AA
more•:115:6280AA
medical•:242:6280AA
much•:102:6280AA
own•:153:6280AA
scientific•:240:6280AA
social•:182:6280AA
such•:111:6280AA
market•:425:62FF48
Cancer•:114:62FF48
•into:708:6280FF
•on:644:6280FF
•in:840:6280FF
•by:164:6280FF
•at:151:6280FF
•department:103:0080FF
•group:205:0080FF
•institute:214:0080FF
•team:151:0080FF
•unit:178:0080FF
•study:135:0080FF
•work:132:0080FF
•method:141:0080FF
•programme:316:0080FF
•project:482:0080FF
•grant:185:0080FF
•council:446:0080FF
•center:344:0080FF
•finding:128:0080FF
•laboratory:189:0080FF
•student:137:0080FF
Neat, eh?