Friday 29 April 2011

AWK - a simple tool for parsing files

Intro

Every once in a while, a programmer comes across the chalenge of collecting/parsing data from a structured file. For this, we need to parse the file with some tool/language. You could do this in any language, but let's see how it's done with awk.

So let's start explaining awk:


FACTS
  1. Programming language
  2. Simple and Fast
  3. Inspired Perl
  4. C syntax
  5. Ideal for precessing data files (ex: csv)

AWK file structure

BEGIN { code1 }
{ code2 }
END { code3 }

So:
  • code1 is ran before the actual parsing is initiated. It can be useful to initialize some variables, for example.
  • code2 is ran every time we parse a new line. Treat this as "what to do for each line".
  • code3 is ran in the end of the parsing. You can use this to process data collected at code2 and print it, for example.

Built-In variables

  • $0 -> The current line
  • $N -> The Nth element of the current line
  • NR -> Line number
  • NF -> Number of fields in the current line
  • FILENAME -> Name of the file being parsed
  • ...

Example 1

Consider this simple file "data.txt":

12
50
12
35
12
12
...

Theses values could be anything from grades, to how long did a process take to do a transaction, to a size of data we're saving to disk in every process, etc.

Imagine you wanted the sum and average of these numbers, you could simply write this in the console:

~$ awk '{ s += $1 } END { print "sum: ", s, " average: ", s/NR }' data.txt

You don't even have to write a source code file to do this, simple write directly you're code in apostrophes. Of course, if what you want to do is a little more complex, this is troublesome. Then you could write the code in a file, as we see in the next example.


Example 2

Consider this simple file grades.txt:

name number grade1 grade2
rei 666 20 20
deus 876 17 15
norad 555 5 9
mbp 000 0 0

We have a simple file with a students' name, number and grades.

Let's calculate the grade's average and save in a new file:



1:  BEGIN {} 
2: {
3: if(NR != 0) { # skip the 1st line
4: grades[0] += $3;
5: grades[1] += $4;
6: }
7: }
8: END {
9: # a space between strings concatenates them
10: name = FILENAME "_parsed.txt";
11: # the comma concatenates strings when printing
12: print "Average grade 1: ", grades[0]/(NR-1) >> name;
13: print "Average grade 2: ", grades[1]/(NR-1) >> name;
14: }

So this is the file parse.awk. To run it, simple type in the console:

~$ awk -f parse.awk grades.txt


Conclusion

There it is, two basic examples to get you started with AWK.
You can do many more things like using loops, etc.

The best way to learn more is to have an actual real case problem, so the next time you have to parse a file and do fairly basic stuff with it, give AWK a try.


Friday 4 March 2011

Yet Another Event-driven Post

If you follow us, you have certainly caught our previous post (from Miguel) about Libevent. I must say, Libevent seems to be really cool, but it is still C. And since only a few of us like C, and it almost forces us to use threads (which means more resource consumption and more complexity) to perform what prove to be simple tasks on more high-level languages, how come we can do all these things more easily?

So how about leaving Libevent to the Gurus behind database and filesystem access driver libraries development, and focus on a single-threaded powerful non-blocking event loop programming style?

Welcome to the wonderful world of Node.js!

At a glance, Node is a JavaScript server-side programming environment (framework style) that provides the ability to handle server requests and responses, be it HTTP or raw TCP, with a seamless event-driven approach with those JavaScriptish callbacks leveraging a non-blocking I/O event loop. Being JS-based, Node also inherits all the document processing tools for the client-browser-app's DOM, and a bunch of other cool stuff that allows you to develop kick-ass web applications using Javascript sometimes all over the stack (hello MongoDB!!).

To show you how painless Node can be, here is the implementation of the infamous chat server self-learning example:
1:  // this is how you load the "net" module (that encompasses the TCP utilities)
2: // it's actually better to assign it to some variable and use it throughout the code
3: net = require('net');
4:
5: // connected sockets pool
6: pool = [];
7:
8: // create a TCP server instance (using the "net" module)
9: server = net.createServer(function(socket) {
10: // add client socket to pool
11: pool.push(socket);
12: // listen on client's socket for incoming data
13: socket.on('data', function(content) {
14: for(var i = 0; i < pool.length; i++) {
15: // send message to all clients
16: message = socket.remoteAddress + ' > ' + content
17: pool[i].write(message);
18: }
19: });
20: // remove inactive sockets from pool
21: socket.on('end', function() {
22: var i = pool.indexOf(socket);
23: pool.splice(i, 1);
24: });
25: });
26:
27: // run server on port 8000 (or other)
28: server.listen('8000');

There are roughly 20 lines of code, and it is actually readable!! Are you serious?!?! Goodbye Erlang and other funky stuff (just kidding here).

I am also just entering this wonderful world, so there is not much more I can say to you about it. So why don't you check the links below for more info?


Feel free to leave more interesting resources in your comments (please do). Nonetheless, we will be tracking Node's evolution closely here, it's really promising stuff.

Kudos to Ryan Dahl, the man that made it all possible.

Wednesday 2 March 2011

Message Sequence Chart

Sometimes there is a need to create interaction diagrams, similar to the UML's sequence diagrams. If you don't want to use UML or to draw the diagrams on some kind of drawing app, you can use Message Sequence Chart.

Message Sequence Chart is an interaction diagram from the SDL family. The main area of application for MSC is as for communication behaviour in real-time systems.

I've started used MSC to create some interaction diagrams to include on my MsC dissertation because it's really simple and straightforward.

This tutorial will guide you thought the steps to create a MSC diagram.

So, first you need a MSC Generator. I've used mscgen in order to parse and render my .msc files. You can download it in: http://www.mcternan.me.uk/mscgen/

Now let's create an .msc file. For an example let's start with a simple one. Use your favorite editor and start writing:

msc {

a,b,c;


a->b [label="ab()"] ;

b->c [label="bc(TRUE)"];

c=>c [label="process(1)"];

c=>c [label="process(2)"];

...;

c=>c [label="process(n)"];

c=>c [label="process(END)"];

a<<=c [label="callback()"];

--- [label="If more to run", ID="*"];

a->a [label="next()"];

a->c [label="ac()"];

b<-c [label="cb(TRUE)"];

b->b [label="stalled(...)"];

a<-b [label="ab() = FALSE"];

}


Now let's render it.

$./mscgen -T png -i foo.msc -o bar.png

And it's done! You will obtain this diagram:


You can find more documentation and all the
MSC functions and options on the mscgen web page http://www.mcternan.me.uk/mscgen/

Libevent

Libevent is an asynchronous event notification library. It provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. It also supports callbacks due to signals or regular timeouts.

The great advantage of Libevent is the replacement of the event loops commonly found on event driver network servers and applications.

An application just needs to call event_dispatch() and then add or remove event dynamically without having to change the event loop.

Important information about event notification mechanisms can be found on Dan Kegel's web page: The C10K problem

More information can be found on he oficial Libevent web page: http://monkey.org/~provos/libevent/

I had to use libevent on some parts of my MsC work and open-source project. So here goes a simple starter example written in C to handle a SIGINT signal:

1:  #include <stdio.h>
2: #include <stdlib.h>
3: #include <unistd.h>
4: #include <signal.h>
5: #include <event.h>
6:
7: void signalhandler(int fd, short event, void *arg) {
8:
9: struct event *ev = arg;
10: printf("SIGINT triggered!\n");
11: }
12:
13: int main (int argc, char *argv[]) {
14:
15: struct event ev;
16:
17: /* event API needs to be initialized with before it can be used. */
18: event_init();
19:
20: /* set the event SIGINT with event type EV_SIGNAL for the hook signalhandler /*
21: event_set(&ev, SIGINT, EV_SIGNAL | EV_PERSIST, signalhandler, &ev);
22:
23: /* add the event */
24: event_add(&ev, NULL);
25:
26: /* process the event */
27: event_dispatch();
28:
29: return 0;
30: }


Later I will post more examples and further explanation.

SQL vs NOSQL - friends or foes

Here goes a light introduction to recent discussing revolving around NOSQL and Relational Databases.

Being this a hot topic, you can learn the basic, get some examples and learn about the pros and cons of each of this two distinct worlds.

It was a presentation that me and Pedro Gomes made at Braga Geek Nights on February (http://www.coactivate.org/projects/braga-geek-nights)

First Post

This blog started today because a friend asked "what is /dev/null ?".

So, why start a blog after that question? First of all, there is so much "garbage" information on the Web nowadays. Information that should be redirected to the null device :-P
Secondly, I've started talking with Pedro Gomes about the idea of create a blog with useful geek information. Like useful hacks, informational stuff, programming stuff, etc.

Besides sharing useful information with the world, it is a good repository!

Oh, btw /dev/null it's a virtual file of Unix-like operating systems. This special file discards all the information written to it but can be used for reporting successful writing of data.
It can be used to redirect the unwanted output of a program to it. It's called the black hole in computer jargon.

Psst, if you log in a machine and you don't want that your commands are logged, just redirect .bash_history to the black hole ;-)