Friday, 25 September 2009

Timing and threads

Jackies carrots
Yesterday was market day and I forgot to take my camera with me. Sorry. But I will include an image from our market that, like Biddy Baxter, I took earlier. This one is from Jackie's famous fruit and veg stall. She's a real quick thinker, so no dithering!

Previous Episode

I said I'd get on to processing inputs from different sources. This is two problems really: one is, there might well be delays in transmission. In this case, it would be better to make regular reports (every 8 secs?) so you don't think your computer has died. You need a timer for this. The other is that your various sources have to be able to add words and counts to the same list without trading on each other and making a mess of the count. This suggests threading to me. Luckily, both timers and threads come from the same library in C#, as they've already forseen you might need both for some tasks. Whatever language you use, you will most likely need a special library for thread, because they go deeper than I want to into the workings of the machine in question. I am NOT a back-end developer (oooh missus!).

Thing is, I had to learn this as I went along, as I haven't even thought about simultaneous processes since Uni (and I'm too old to use that abbreviation, besides it was a Poly then). I turned to the one of my C# books I have only recently started reading again: Pro C# with .NET 3.0 by Andrew Troelsen. I usually refer to the more webby Pro ASP.NET 3.5 in C# 2008 by Matthew McDonald and Mario Szpuszta. Both are Apress books.

So, I now have a slightly different char reader called SlowedDownCharReader. This one uses a private random number to produce slight delays in transmission, which is perhaps a bit more like real life:

      Thread.Sleep(this._random.Next(200));

happens at the beginning of the fetchNextChar method. The random number returned is multiplied by 200 to give some hundredths of a second. This soon mounts up when you only get one char at a time! Note that the word Thread is already coming up. Random delays and timers are all things that affect the current thread of execution. Using these classes requires a using of System.Threading at the head of all relevant files.

Back in the main function, I've written a static setupTimer function. This is where it gets tricky. Since the operation we want to happen every 8 seconds is the WordList method makeReport, we need to create a C# delegate for the Timer to call. This has to be a void function passed a System.Object parameter:

    public delegate void makeReportOp(object state);

Here is the whole setupTimer function:

    static void setupTimer(WordList list, int secsToStart,int intervalSecs)
    {
        int millisecsToStart = secsToStart * 1000;
        int millisecsInterval = intervalSecs * 1000;
        WordList.makeReportOp reporter = new WordList.makeReportOp(list.makeReportCalledBack);
        TimerCallback timeCB = new TimerCallback(reporter);
        Timer everySoOften = new Timer(timeCB, null, millisecsToStart, millisecsInterval);
    }

First of all, the time before the Timer is called and the interval time are passed as seconds and then multiplied by 1000. No one thinks in milliseconds! Now the delegate is created and passed the WordList method to call. I have created a new report method, which sorts the list and writes it to console just as before but writes a short message before and after, so you can see what is going on. The method is locked, so that only one thread can access it (we are envisaging only one wordlist at a time). This message contains the thread number as well, because it is different from the main thread of the program. A TimerCallback delegate is created and passed the reporter delegate. Once that is done, all that is needed is to create a Timer object and pass it the callback, so it knows what to do. It is the Timer which works in its own thread of execution.

This all leads to a strange effect. Every 8 seconds (the interval I chose) a report is made to the console. These get progressively longer as more words get added. At some point, the fetchNextChar will generate its exception, print out a message to the console and stop the main thread. Nothing stops the timer except switching off the console window, so it will carry on printing the same list. I'm told killing threads is a difficult business. But this test case was imagining a continuous process of reporting on different streams coming in, so we won't worry about that.

Next time we'll set up 3-4 threads all putting their words into the same list.






Next Episode

Wednesday, 23 September 2009

Chars, words and sorted lists


Pictures today are autumn things in the garden. First is some pink Sedum, looking much like pink cauliflower (you can get pink varieties). The second is bright orange Pyracanthus berries before the blackbirds scoff them all.

I've recently been working on a little task which is to run on the console. Not a website at all, much simpler. I found myself starting up my Visual Studio and setting up a project for the first time. Even, rather later, gathering my nice new classes into one place and making them into a library. Hot stuff! I should certainly create libraries more often.

The task was to create an instance of a character reader and assign it a chunk of text as a string. This can then be accessed, one char at a time. In a friendly way it lets you know you have got to the end by throwing an EndOfStreamException. The output wanted was a  list of words in the string sorted by number of occurrences and then alphabetically. After that it got more complicated but that will do for the next time.

I got my text from the first 3 paras of a classic short story "The Star" by Arthur C Clarke. Thank you Wikipedia.

I decided to create a class wrapper for a list of words called WordList. Also a WordListBuilder to process each character as it came and wrap things up at the end. This character reader class is called RealTimeCharReader, there will be another type later.
So:

    ICharReader myCharReader = new RealTimeCharReader();
    assignReaderContent(myCharReader);
    WordListBuilder myWordListBuilder = new WordListBuilder(myCharReader, 
         new WordList());

This means I want an instance of ICharReader, which implements IDispose, so has to have a Dispose() method, to make sure it cleans up. It also has to have a method fetchNextChar() which delivers a character, moves the pointer along and throws an exception when it is finished.

assignReaderContent is a static function allowing me to pass in whatever string I feel like in the Main() function.

WordListBuilder is going to have to take each char and either add it to a new word, or finish the word, eat any nonletter characters and start a new word. Each word is then added to the WordList. A Word is another new class containing a string and a counter. If a word turns up more than once, you can just increment the counter. Both the char reader and the wordlist are passed to the WordListBuilder to do the real work. Here is the call:
myWordListBuilder.buildWordList();
myWordListBuilder.BuilderWordList.makeReport();
buildWordList will process the list, makeReport will sort it and print the results to the console.

buildWordList consists of a try/catch structure containing a forever while loop which takes each returned char and decides what to do with it. The catch intercepts the EndOfStreamException and exits the while loop. Any other runtime exception in the reader gets caught by a more general catch. Either way the finally clause means Dispose is always invoked.

    public void buildWordList()
    {
        char nextChar;
        string nextWord = string.Empty;
                
        try{
            while (true){            
                nextChar = BuilderReader.fetchNextChar();
                if(char.IsLetter(nextChar)){
                    nextWord += nextChar;
                }
                else{
                   // treat all puntuation, digits and whitespace the same
                    if(nextWord.Length > 0){
                        completeWord(nextWord);
                        nextWord = string.Empty;                            
                    }
                }
            }
         }
             
         catch(EndOfStreamException ex){
             // add last word - even if it is partial - discuss[EOS]ion point?
            completeWord(nextWord);
            Console.WriteLine(ex.Message);
         }
         catch(Exception ex){
             Console.WriteLine(ex.Message);
         }
         finally{
             BuilderReader.Dispose();
         }
    }

    private void completeWord(string s){
        if(s.Length > 0){
            Word w = new Word(s);
            BuilderWordList.addNewWord(w);
         }
     }

The local variable nextWord is just a string, set to empty string. Since you have to add all words, including the last one - even if it has only partially arrived before transmission ceased, I added a short method completeWord, to be called whenever a non-letter is read or an EndOfStreamException is hit. This calls the WordList method addNewWord if there is anything in the string.


    public void addNewWord(Word w)
    {  
        // check word exists
        bool foundMatch = false;
        foreach (Word listWord in MyList){
             // if yes, then just increment count
             if (w.Content == listWord.Content){
                  foundMatch = true;
                  listWord.incrementCount();
             }
        }
        if (!foundMatch){
             MyList.Add(w);
        }
    }

Again, a simple method (as I prefer it!). It simply iterated through the list of Word objects in the member variable myList. If it finds a match it sets the boolean foundMatch to true and adds one to the count (or I could have used Count++). If it doesn't, then a new word is added to the list.

So now we have a list which contains a max of one entry for each word but it is not in order. I needed to create a class derived from IComparer and pass it to the sort function for the list of words. An IComparer clas needs to implement the function Compare. In this case we have to compare counts and then alphabetical order. It can be done in one pass, which I think makes life a lot easier!

    public class WordComparer : IComparer<word>
    {
        public int Compare(Word x, Word y)
        {
            if (x.Count > y.Count){
                return -1;
            }
            else if (x.Count < y.Count){
                return 1;
            }
            else{
            // they're the same word count, so sort alphabetically
                return string.Compare(x.Content, y.Content);
            }
        }
    }
The only unexpected part of this was that I assigned 1 as the result if x.Count was the greater. This gave me a reverse sorted list! So I swapped the values round. Notice for the alphabetical order I just use string.Compare, no further specification needed.

Nearly there! The sort function in WordList is very short:
private void sortList()
        {
            WordComparer wc = new WordComparer();
            MyList.Sort(wc);
        }
Passing an instance of IComparer enables the list to sort itself.

The last function simply iterates through the list and writes each word to the console, together with its count.

What's this all good for? Well, you might be monitoring someone's Tweets for rude words or buzzwords to display a Tag_cloud. The next time we'll consider monitoring several streams (Tweeters?) at once with possible delays in transmission.

By the way, I even managed to put all my classes in a library and make the small project refer to it. I'm quite proud.

I don't have any idea how to place a zip file here. I doubt if it would be that interesting at this stage but I ought to find out.

Next Episode