Just the Word (http://193.133.140.102/JustTheWord/) is gaining popularity with practitioners as well as researchers. WORDLE is a wonderful graphic interface to illustrate corpus frequency statistics. Few people are aware of the ADVANCED feature on WORDLE and how to ‘mash up’ input from a site like Just the Word.
Here is an example WORDLE based on high frequency collocates of RESEARCH using the pattern analysis of the BNC from Just the Word. I replaced the root RESEARCH with a bullet to make it less cluttered.
And here is how I did this:
WORDLE has an ‘advanced’ button (top right) that takes you to http://www.wordle.net/advanced – from here, you can specify not only the ‘size’ of the words, but also the colour.
For example, from Just the Word I generated the collocates of ‘RESEARCH’. I then did a little Excel ‘magic’ and sorted all the collocates by pattern, and filtered within the frequency range of 100 to 1000 (to produce a reasonable wordle not dominated by one or two really high frequency items). I then selected a different colour for each PATTERN. Because RESEARCH was the common root, I replaced it with a ‘bullet’ to make the graphic less dominated by the repeated word. I then put the data into the ADVANCED feature. See http://www.wordle.net/show/wrdl/2168943/Research_collocates
Here is the original filtered data from JTW. (I copied the JTW output, put it into EXCEL and then executed a few formulae to repeat the PATTERN and cluster data.)
research |
FREQUENCY |
cluster |
PATTERN |
carry out research |
155 |
cluster 1 |
V obj *research* |
conduct research |
132 |
cluster 1 |
V obj *research* |
undertake research |
122 |
cluster 2 |
V obj *research* |
do research |
358 |
cluster 3 |
V obj *research* |
research show |
380 |
cluster 1 |
*research* subj V |
research suggest |
131 |
cluster 1 |
*research* subj V |
research have |
745 |
cluster 4 |
*research* subj V |
recent research |
171 |
cluster 1 |
ADJ *research* |
further research |
190 |
cluster 9 |
ADJ *research* |
more research |
115 |
cluster 9 |
ADJ *research* |
medical research |
242 |
cluster 9 |
ADJ *research* |
much research |
102 |
cluster 9 |
ADJ *research* |
own research |
153 |
cluster 9 |
ADJ *research* |
scientific research |
240 |
cluster 9 |
ADJ *research* |
social research |
182 |
cluster 9 |
ADJ *research* |
such research |
111 |
cluster 9 |
ADJ *research* |
market research |
425 |
cluster 1 |
N *research* |
Cancer research |
114 |
cluster 2 |
N *research* |
research into |
708 |
cluster 2 |
*research* PREP |
research on |
644 |
cluster 2 |
*research* PREP |
research in |
840 |
cluster 2 |
*research* PREP |
research by |
164 |
cluster 2 |
*research* PREP |
research at |
151 |
cluster 2 |
*research* PREP |
research department |
103 |
cluster 1 |
*research* N |
research group |
205 |
cluster 1 |
*research* N |
research institute |
214 |
cluster 1 |
*research* N |
research team |
151 |
cluster 1 |
*research* N |
research unit |
178 |
cluster 1 |
*research* N |
research study |
135 |
cluster 2 |
*research* N |
research work |
132 |
cluster 2 |
*research* N |
research method |
141 |
cluster 3 |
*research* N |
research programme |
316 |
cluster 3 |
*research* N |
research project |
482 |
cluster 3 |
*research* N |
research grant |
185 |
cluster 5 |
*research* N |
research council |
446 |
cluster 7 |
*research* N |
research center |
344 |
cluster 7 |
*research* N |
research finding |
128 |
cluster 7 |
*research* N |
research laboratory |
189 |
cluster 7 |
*research* N |
research student |
137 |
cluster 7 |
*research* N |
result of research |
117 |
cluster 4 |
N PREP *research* |
center for research |
109 |
cluster 5 |
N PREP *research* |
research and development |
359 |
cluster 1 |
*research* and N |
our research |
148 |
cluster 1 |
article *research* |
some research |
140 |
cluster 1 |
article *research* |
this research |
262 |
cluster 1 |
article *research* |
their research |
171 |
cluster 1 |
article *research* |
my research |
111 |
cluster 1 |
article *research* |
Here is the data coded for WORDLE (which I pasted into the ADVANCED feature of WORDLE–the number is the FREQUENCY, and the HEX value is the HTML colour code.) Note that I’ve replaced the word RESEARCH with a bullet.
carry out•:155:4411AA
conduct•:132:4411AA
undertake•:122:4411AA
do•:358:4411AA
•show:380:00FF48
•suggest:131:00FF48
•have:745:00FF48
recent•:171:6280AA
further•:190:6280AA
more•:115:6280AA
medical•:242:6280AA
much•:102:6280AA
own•:153:6280AA
scientific•:240:6280AA
social•:182:6280AA
such•:111:6280AA
market•:425:62FF48
Cancer•:114:62FF48
•into:708:6280FF
•on:644:6280FF
•in:840:6280FF
•by:164:6280FF
•at:151:6280FF
•department:103:0080FF
•group:205:0080FF
•institute:214:0080FF
•team:151:0080FF
•unit:178:0080FF
•study:135:0080FF
•work:132:0080FF
•method:141:0080FF
•programme:316:0080FF
•project:482:0080FF
•grant:185:0080FF
•council:446:0080FF
•center:344:0080FF
•finding:128:0080FF
•laboratory:189:0080FF
•student:137:0080FF
Neat, eh?