Digital Workshop

Welcome to the Digital Workshop Message Boards
It is currently December 22nd, 2024, 10:57 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Word Frequency in text object
PostPosted: September 19th, 2014, 1:31 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
In an Opus Professional pub (to be published as a Win exe) , is it possible at run time for the user to paste text into a text box, then have the frequency of each word in the text box counted and the words listed with their frequency count in another text box? Not sure if this can be done, what the script would look like?

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 19th, 2014, 2:52 pm 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
Basic proof of concept:
Code:
//Needs a Text Input Box with the variable myWords

myWordsArray = new Array();

function collectWords(){
var tmpLetter = "";
var tmpWord  = "";

for (var a=0;a<myWords.length;a++){
   tmpLetter = myWords.charAt(a);

   if (tmpLetter != " ") {
      if (tmpLetter != "," && tmpLetter != ";" && tmpLetter != "." && tmpLetter != "!") tmpWord = tmpWord + tmpLetter;
      } else {
         checkWord(tmpWord);
         tmpWord = "";
         }
      }

   checkWord(tmpWord);


for (var i=0;i<myWordsArray.length;i++){
   Debug.trace (myWordsArray[i].word + " : " + myWordsArray[i].count + "\n")
   }

}


function checkWord(w){

var tmpFound = false;

for (var b=0;b<myWordsArray.length;b++){

   if (String.toupper(myWordsArray[b].word) == String.toupper(w)) {
      myWordsArray[b].count++;
      tmpFound = true
      }
}

if (!tmpFound)myWordsArray[b] = new WORDOBJECT(w);

}


function WORDOBJECT(w){
this.word = w;
this.count = 1;
}


You do not have the required permissions to view the files attached to this post.

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


For this message mackavi has been thanked by : Stephen


Top
 Profile Visit website  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 19th, 2014, 4:33 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Hi Mack,

Wow!!! Very impressive!!! Thank you.
Have been trying to figure out if there is any way to sort the resulting word list array, from most frequent to least frequent (descending, and also allow a user choice for reversing this least to most, ascending sort).

I think this involves sorting: myWordsArray[i].count ?

But not making any progress. Since the for-loop iteration builds and joins the correct word to its correct count, not sure how to then un-couple to read just the count, sort numerically ascending or descending, then re-couple it to its correct word as an output list?

Ugh! (Maybe I'm trying to do this the wrong way? Maybe a better, easier way?)

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 20th, 2014, 1:46 pm 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
You don't need to uncouple anything. The array sort function takes a parameter that allows you to customise the sort based on one of the Object's properties.

Just replace the entire script:

Code:
//Needs a Text Input Box with the variable myWords

myWordsArray = new Array();

function collectWords(){
var tmpLetter = "";
var tmpWord  = "";

for (var a=0;a<myWords.length;a++){
   tmpLetter = myWords.charAt(a);

   if (tmpLetter != " ") {
      if (tmpLetter != "," && tmpLetter != ";" && tmpLetter != "." && tmpLetter != "!") tmpWord = tmpWord + tmpLetter;
      } else {
         checkWord(tmpWord);
         tmpWord = "";
         }
      }

   checkWord(tmpWord);

myWordsArray.sort(sortByCount)

for (var i=0;i<myWordsArray.length;i++){
   Debug.trace (myWordsArray[i].word + " : " + myWordsArray[i].count + "\n")
   }

}


function checkWord(w){

var tmpFound = false;

for (var b=0;b<myWordsArray.length;b++){

   if (String.toupper(myWordsArray[b].word) == String.toupper(w)) {
      myWordsArray[b].count++;
      tmpFound = true
      }
}

if (!tmpFound)myWordsArray[b] = new WORDOBJECT(w);

}


function WORDOBJECT(w){
this.word = w;
this.count = 1;
}


function sortByCount(a, b)
{
var y = a.count
var x = b.count
return ((x < y) ? -1 : ((x > y) ? 1 : 0));
}


I'm just glad it works. Using just OpusScript seems so old fashioned since starting with Opus Pro HTML5. With ECMA6 out soon, the different between the two versions is going to be huge.

Mack

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


Top
 Profile Visit website  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 20th, 2014, 3:27 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Thanks, Mack.

Next task is to see if there is any way to allow the user to click on a specific counted word in the word frequency list to trigger highlighting the original (s) in the pasted-in text. So they can easily see the word in context, edit it. One difficulty is that if, for example, there are more than 1 appearances of a specific word, frequency count 1+, how to highlight all in the original text, not just the first. This may require somehow putting the frequency list into a list box, so that the user can click on a specific word and have the click trigger a highlight-in-original-text action.

Yes, this would be a Windows desktop EXE app. Not sure if there would be any easy way to implement a word frequency counter for pasted-in text using HTML5. Would be nice, however.

Again, thanks.

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 20th, 2014, 9:54 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Made some progress: can save the word frequency list array to a list box using constant expressions (but need to write a constant expression for every word as an array member, without knowing how many words there will be).

In the list box, the user would choose a list entry (word) and (if I added an action) could click on the word to highlight it in the original text (would need to highlight all instances of that word, not just the first). So far can't get this to work.

Tried several variations, but still can't get it to work as described above.

(Also noticed that the added scroll bar is behaving oddly: it begins by occupying the entire height of the list box, even when the list box is populated beyond its height (not illustrated in the attached sample). If the up/down scroll bar arrow tab is clicked, it becomes the smaller, correct size.)

The test IMP is attached (adapted from Mack's original).

Maybe table with array members in each table cell?

Any help appreciated.


You do not have the required permissions to view the files attached to this post.

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 22nd, 2014, 10:31 am 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
I'd say you were along the right lines by reading my array with objects into your array with a string - the issue of getting array data into a list box has been covered before so there should be posts that will help you.

As for what you do next, you can either use Opus' findtext function to locate the word or you could change my search loop to record the position of the word whilst it's being index by adding some properties to the WORDOBJECT constructor function.

Either method should return the start and end values needed for the setselection function which would then use the setselectionstyle function.

Years ago, there was a sample publication on our website that showed how to do various things in Opus and I vaguely recall that it had some text selection. I have no idea what it's called but maybe somebody still has it as it might help you.

Mack

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


Top
 Profile Visit website  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 22nd, 2014, 12:09 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Thanks, Mack for these suggestions.

I found an old post that Sandyn had written to populate a listbox with an array.

Adapted this, and it works:

Code:
for (var i=0;i<myWordsArray.length;i++){
   wordList[i] = (myWordsArray[i].word + " : " + myWordsArray[i].count + "\n")
      T10.ReplaceSelection(wordList[i])   
      }
}


Next: searching the original text box for all instances of a specific listbox word highlighted/chosen by the user. Can use the following to identify the text search word (minus the ":" and count)

Code:
var myIndex = wL0.lastIndexOf(":")
var mySub = wL0.substring(0,myIndex-1) //wL0 is the text string saved in the listbox T10 for a user selection
Debug.trace(mySub)


Not sure, once specific search word is identified, how to find it plus somehow highlight it (change to bold, change color, etc.) in the original text? Looks like find text function counts instances in a text object. How to locate and highlight?

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 23rd, 2014, 12:02 am 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Making progress:

Further modified script so that now it can find and highlight first instance/appearance of word (chosen in listbox) in the original text object.

Code:
var myIndex = wL0.lastIndexOf(":")
var mySub = wL0.substring(0,myIndex-1)
var stringL = mySub.length
var searchT = wordsTI.FindText(mySub,0)
wordsTI.SetSelection(searchT, searchT+stringL)
var abc = wordsTI.GetSelectionText()
Debug.trace(abc)


However, haven't yet figured out how to highlight all instances/appearances of the word in the original text object? For example, if the word "lorem" appears 4 times in the original text object, the above script will find and highlight "lorem" 's first appearance but not the remaining 3 appearances. How to get all appearances highlighted?

Iterate through a "while" loop, but how to get all instances' start(indexof), end(lastindexof) to use for setting selection?

So far can't figure out this loop script.

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 23rd, 2014, 7:56 am 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
Code:
FindText( String, StartPosition, CaseSensitive )


FindText takes several parameters. You can loop through all occurrences using the second parameter. Once FindText has returned the first position, you can then offset the StartPosition parameter and search again to locate the next occurrence.

The number of iterations is found in myWordsArray[x].count.

Mack

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


Top
 Profile Visit website  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 23rd, 2014, 12:53 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Hi Mack,

Thank you for this lead. I'm not sure how to construct this loop from:

Code:
for (var b=0;b<myWordsArray.length;b++){
}


Would I modify and use:

Code:
var searchT = wordsTI.FindText(mySub,0)


somehow add searchT (which is a start position) to each next iteration?

Not sure then how to use the returns in SetSelection to actually highlight every instance of the word in the original text box.

So far, trying a test to see if more than one word instance can be selected/highlighted in the original, at the same time, can't get it to work to highlight 2 or more instances simultaneously.

The following code will highlight the third instance (if there are 3), but not instance 1 and 2.

Code:
var myIndex = wL0.lastIndexOf(":")
var mySub = wL0.substring(0,myIndex-1)
var stringL = mySub.length
var searchT = wordsTI.FindText(mySub,0)
wordsTI.SetSelection(searchT, searchT+stringL)
var abc = wordsTI.GetSelectionText()
for (var i=0;i<myWordsArray[i].count;i++){

var searchT2 = wordsTI.FindText(mySub,searchT+stringL+1)
wordsTI.SetSelection(searchT2, searchT2+stringL)
var abc = wordsTI.GetSelectionText()
}

for (var i=0;i<myWordsArray[i].count;i++){

var searchT3 = wordsTI.FindText(mySub,searchT2+stringL+1)
wordsTI.SetSelection(searchT3, searchT3+stringL)
var mmm = wordsTI.GetSelectionText()
}


Debug.trace(searchT3)



Unfortunately tried and tried but can't get my brain around how to create this loop?

Can 2 or more instances of the same word be selected and highlighted in the original text box simultaneously?

Stuck. :oops:

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 23rd, 2014, 9:41 pm 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
Updated the example to include word highlighting. Also switched to a simpler text as you'll need to improve the word building engine to handle various idiosyncrasies within more advanced text.

Code:

//Needs a Text Input Box with the variable myWords

myWordsArray = new Array();

function collectWords(){

var tmpLetter = "";
var tmpWord  = "";

for (var a=0;a<myWords.length;a++){
   tmpLetter = myWords.charAt(a);
//The database of words will only be as good as the word building engine below!
   if (tmpLetter != " ") {
      if (tmpLetter != "," && tmpLetter != ";" && tmpLetter != "." && tmpLetter != "!" && tmpLetter != '"' && tmpLetter != "(" && tmpLetter != ")") tmpWord = tmpWord + tmpLetter;
      } else {
         checkWord(tmpWord);
         tmpWord = "";
         }
      }

   checkWord(tmpWord);

myWordsArray.sort(sortByCount)
wordsLB.SetSelection(-1);

var tmpArray = new Array()
var tmpString = null;

for (var i=0;i<myWordsArray.length;i++){
   tmpArray[i] = myWordsArray[i].word + " (" + myWordsArray[i].count + ")";
   }
   
tmpString = tmpArray.join("\n");
wordsLB.ReplaceSelection(tmpString);   
   
}


function checkWord(w){

var tmpFound = false;

for (var b=0;b<myWordsArray.length;b++){

   if (String.toupper(myWordsArray[b].word) == String.toupper(w)) {
      myWordsArray[b].count++;
      tmpFound = true
      }
}

if (!tmpFound)myWordsArray[b] = new WORDOBJECT(w);

}


function WORDOBJECT(w){
this.word = w.tolower();
this.count = 1;
}


function sortByCount(a, b)
{
var y = a.count
var x = b.count
return ((x < y) ? -1 : ((x > y) ? 1 : 0));
}

function selectWord(){
var tmpPos = 0
var tmpLength = myWordsArray[myIndex].word.length
var newStyle = new Object();
var newStyleII = new Object();
newStyle.bold = true;
newStyleII.bold = false;
newStyle.colour = "FF0000";
newStyleII.colour = "000000";

//clear all formatting
wordsTI.SetSelection(0,-1)
wordsTI.SetSelectionStyle(newStyleII);
wordsTI.SetSelection(-1)

mySelectedWord = myWordsArray[myIndex].word

for (var i=0;i<myWordsArray[myIndex].count;i++){
   tmpPos = wordsTI.FindText(mySelectedWord,tmpPos,false);
   
   if ((myWords.charAt(tmpPos-1) == " " || tmpPos == 0)&& (myWords.charAt(tmpPos + tmpLength) == " " || myWords.charAt(tmpPos + tmpLength) == "." || myWords.charAt(tmpPos + tmpLength) == ",") ){
      wordsTI.SetSelection(tmpPos, tmpPos + tmpLength )
      wordsTI.SetSelectionStyle(newStyle);
   } else {
      i-- //counter false positives
   }


   tmpPos++;
   }

wordsTI.SetSelection(-1);
}


Mack


You do not have the required permissions to view the files attached to this post.

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


For this message mackavi has been thanked by : Stephen


Top
 Profile Visit website  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 23rd, 2014, 11:40 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
Thank you for this. Impressive scripting!

Wonderful example of what can be achieved with Opus.

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 24th, 2014, 3:55 pm 
Offline
Godlike
Godlike

Joined: November 11th, 2004, 1:18 pm
Posts: 1213
Location: New York
Opus: Opus Pro 9.75
OS: Windows 10 Pro 64 bit
System: Core i7, 16G RAM, Nvidia 640GT (desktop), plus Windows 10 and Android tablets
I think it would be helpful to remove (or filter out) from the listbox what are known as "common words." (Words like and, or, but, he, she, the, etc.)

Trying to figure this out. Would need to figure out a function that is part of the collectWords function? Only want to filter out as a whole word, and not filter out for example, the "the" from a word like "thesis," leaving only "sis."

Checked forum posts re: removing an array element using OpusScript-- no easy task-- viewtopic.php?f=4&t=4016&p=18070&hilit=remove+array+element#p18070
The following test script, checking for one of the "common words," removes "and" from the array that will populate the listbox, but also leaves a blank space (click on that space and "and" is selected and highlighted in the original text box.) So, it appears to leave that array element "undefined" but still there. Not a solution.

Code:
for (var i=0;i<myWordsArray.length;i++){
   if(myWordsArray[i].word != "and"){
   tmpArray[i] = myWordsArray[i].word + " (" + myWordsArray[i].count + ")";
   }
   }
tmpString = tmpArray.join("\n");
T10.ReplaceSelection(tmpString);   
   
}


Code:
if (myWords != a||able||about||across||after||all||almost||also||am||among||an||and||any||are||as||at||be||because||been||but||by||can||cannot||could||dear||did||do||does||either||else||ever||every||for||from||get||got||had||has||have||
he||her||hers||him||his||how||however||i||if||in||into||is||it||its||just||least||let||like||likely||may||me||might||most||must||my||neither||no||nor||not||of||off||often||on||only||or||other||our||own
||rather||said||say||says||she||should||since||so||some||than||that||the||their||them||then||there||these||they||this||tis||to||too||twas||us||wants||was||we||were||what||when||where||which||while||who||whom||why||will||with||would||yet||you||your)
//some of these common words are also reserved words(are highlighted in the script editor): for example, 'this' or 'for'--not sure how to include these as common words to be removed?


These "common words" would remain in the original text box, but not appear in the listbox (neither the word nor its count).
Not sure the best way to remove these common words (including some reserved words) from the listbox? Or if this is even possible?

_________________
Stephen


For this message Stephen has been thanked by : mackavi


Top
 Profile  
 
 Post subject: Re: Word Frequency in text object
PostPosted: September 26th, 2014, 9:43 am 
Offline
Godlike
Godlike
User avatar

Joined: March 21st, 2007, 10:44 am
Posts: 3188
Location: UK
Opus: Evolution
You have a for loop in the checkword function. You could add another similar loop here that read a different array of your common words but also set the tmpFound flag to true before it added the new WORDOBJECT(w).

Mack

_________________
When you have explored all avenues of possibilities, what ever remains, how ever improbable, must be the answer.

Interactive Solutions for Business & Education
Learn Anywhere. Learn Anytime.

www.interaktiv.co.uk
+44 (0) 1395 548057


Top
 Profile Visit website  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next

All times are UTC [ DST ]


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group