Sunday, March 25, 2012

Count for each word in a field

Is it possible to create a SELECT query that will give me the count of all the words in a single field?

SELECT field1, count(field1) FROM table1 GROUP BY field1

won't do it because field1 has multiple words and I want them broken out and counted individually.

Data example:

Field1
-------
dog
dog ate my homework
cat
dog and cat
tail
dog tail
...

I want to get a count of every occurance of each word (e.g. "dog") whether in the field by itself or with other words.

Hope this makes sense.

AlNo, you would require a function that takes a text string as input, and counts the number of occurences of another string in it so that you could write:

select field1, word_count(field1,'dog')
from table1;

Depending on your DBMS's capabilities, such a function may already exist or you may be able to write one for yourself. In Oracle for example, you could certainly write one yourself and maybe one already exists in the Oracle Text tool (I don't know).|||Tony,

Thanks for the response. I forgot to mention the DB engine I'm using: MySQL.

I'm not sure I explained it well. Per your function example you're passing the word "dog" to the function. I need it to count all of the words in each field. Using my original example, the results would look like this (with an ORDER BY added):

field1 qty
-- --
dog 4
cat 3
tail 2
and 1
ate 1
my 1

It probably still requires a function to accomplish this.

Al|||Oh, I see - that's rather different. What you need is first to split out all the words into one per row like this:

word
--
dog
dog
dog
dog
cat
cat
cat
tail
tail
and
ate
my

Then of course it is easy to group and count the words. But how to split them up? If you could be sure there were no more than N words in any sentence then you could use a brute-force approach with a user-defined function like this:

select get_word(field1,1) as word from table1
union all
select get_word(field1,2) as word from table1
union all
...
union all
select get_word(field1,N) as word from table1

However, that probably isn't what you need. I don't know MySQL at all, but in Oracle you could achieve this (without the "N" words limit) by writing a function that returns a collection (like an array), and then selecting from the results of the function - a fairly complex operation.

It may be that this can't be done in MySQL using just a select statement - you may have to write a program that populates a temporary table with all the words, and then select from that.|||I'm thinking I'll create a new table that will house the individual words and after writing the "phrase" to field1, I'll parse the words and write them to the new table for future counting.

Thanks for your feedback!
Al

No comments:

Post a Comment