BK-Trees
0A BK-Tree is a really cool data structure for building a “dictionary” of similar words. It can be used to guess that you meant “cat” when you wrote “cta”. It works by building a tree with words from a dictionary by using the first word as a root node and then attaching subsequent words with a branch of length d(root_word, new_word) where d is a function for finding the “distance” between two words. This is usually the Levenshtein distance, i.e. the minimum number of edits needed to transform one string into the other.
If the branch is “taken”, i.e. there is already another word connected along a branch of length d(root_word, new_word), the insert operation is done on this word instead. That is the whole algorithm for constructing the BK-tree.
So if we have the following list of words: ["cat", "cut", "hat", "man", "hit"], we will start by creating a “cat” node with no children. To add “cut” we calculate the Levensteinh distance to be one and insert it under the “cat” node.

2) The Levensteinh distance between “cat” and “man” is two, so “man” is connected with a branch of length two

3) d(“hat”,“cat”) = 1, so the insertion operation is done on the “cut” node, and “hat” is connected to “cut” with a branch of length two (d(“cut”,“hat”)=2)

4) d(“cat”,“man”) = 2, so the insertion operation is done on the “man” node, and “hit” is connected to “man” with a branch of length three (d(“man”,“hit”)=3)
The lookup algorithm is also quite simple: We find the distance from the query word to the root node and then find all the child nodes in the range [(d-n),(d+n)] where d is the distance and n is the maximum distance we will allow. We then recursively query the nodes that matched. The result is a list of words satisfying d(query_word, word) ≤ n.
The following code is a Haskell implementation of BK-Trees.
It even works! Here are a few examples where I used a dictionary of ~5800 words:
λ> query 1 "thie" bk_tree
["tie","the","this","thief"]
λ> query 1 "catt" bk_tree
["cat","cart"]
Making Vim go to the Correct Line/Column When Given a File Name From GCC
3Edit: It turns out – not surprisingly – that there is already a Vim script for this: http://www.vim.org/scripts/script.php?script_id=2184. Thanks to chmduquesne for pointint that out.
So, you have a compile error:
foo/bar/test.h:45:29: fatal error: include/framewrok.h: No such file or directory
At this point you probably want to open foo/bar/test.h to have a look at line 45 to correct the error. If the file path is long, you double click on the file path to copy it and then type “vim ” and paste the file path, so you get the following:
vim foo/bar/test.h:45:29:
The problem is that vim will try to open the file test.h:45:29: which obviously does not exist, so you backspace until you’re left with the file name and then go to line 45 in vim (or add a +45 to the command).
Google Summer of Code 2011: A summary
0Google summer of a code is an annual programme where Google pays students to work on open source projects in the summer months. The students have one or more mentors from the project who help them get started and function as a contact point for the developer community.
In 2008 I did a Google Summer of Code project for the Kate text editor where I wrote a Vi Input Mode – making it possible to use Kate in a modal, Vi[m]-like manner. The project was successful and I continued to maintain this code and added more features over the years. This summer I took the step of signing up as a mentor myself and put out a project proposal for further improving the Vi Mode in Kate and adding a good unit test framework for the code. To my surprise – and delight! – many students were interested in the project and we ended up getting quite a few applications from students wanting to work on “my” project. In the end the Russian computer science student Святослав Кузьмич (Svyatoslav Kuzmich) from Moscow Institute of Physics and Technology was chosen.

The Kate Editor
Over the summer he wrote a test framework for the Vi Mode and wrote test for the functionality already present. Having an easy way of adding tests for the Vi mode had been on my wish list for a long time and it was really nice to see it implemented. It did not take long for the new test system to show its value: Svyatoslav found some corner case bugs that had probably been in the code for a long time and fixed them. A short overview of the new test framework can be found at the Kate blog: Kate Vi Mode Test Suite.
Svyatoslav also made many more visible improvements to the Vi Mode such as jump lists (making it possible to jump back/forward to where you were in the text), making it possible to control sub windows with the keyboard, many new command line mode commands and much more. An overview can be found in his blog post at kate-editor.org.
For me personally it was an interesting experience to act as a mentor for another programmer. My function also changed as the project progressed. In the beginning I tried to give Svyatoslav an overview of the current code and answer some questions as they came up. The programming work itself was rarely a theme – Svyatoslav was already a really good coder and generally just needed to be pointed in the right direction or – even more commonly – just get a confirming nod that his suggested solution sounded good. Since I had a few years to think about many of the features he wanted to implement I also had a few “that does sound like a good idea, but…” often learned the hard way.
In the end the project turned out very well and the users will get many new features – some of which have been wished for for years.
As a wrap-up of Google Summer of Code 2011 I am going to San Francisco next week to take part in the GSoC 2011 mentor summit where mentors from the organizations taking part in the Summer of Code meet to exchange experiences and discuss future directions for the programme. I am really looking forward to that!
Simple In-line Code Highlighting with Vim
0I use the excellent gitit wiki for keeping notes. gitit uses markdown as its default markup format with some extensions, such as the possibility to have inline code snippets highlighted. These snippets start with “~~~{ .someLanguage }” and end with “~~~”, e.g:
~~~{ .haskell }
quicksort [] = []
quicksort (x:xs) = quicksort (filter (<x) xs) ++ [x] ++ quicksort (filter (>=x) xs)
~~~
Since I use the Firefox add-on Pentadactyl (a fork of Vimperator), I can use Vim to edit my wiki articles. I really wished that I could get the correct highlighting for the in-line code snippets in Markdown and I found Vim tip #857. I used this to create the following filetype plugin for markdown files that will find code snippets for the languages I use and make vim colourize those snippets with the correct syntax highlighting.
Vim tip: Highlight Function Names in C Code
0When looking at function signatures such as
EXPORT _foo_bar_handle_restricted _foo_common_alloc_restricted( … )
it’s often hard to quickly see what the function name is among all the spaces and underscores. To help with this, many editors support highlighting function names. Vim can also do this, e.g. by using a tags file, but for large code bases this can be quite slow. The following syntax definitions use a simple heuristic to highlight function names in C code by making their names boldface. In my experience it works quite well.
To use this, put it in ~/.vim/after/syntax/c/highlight_functions.vim. (All files in ~/.vim/after/syntax/[filetype]/ will be sourced when vim opens a file of type [filetype].)
C Trigraph Trap
0What will the above C code print?
Let’s take a look:
$ gcc –std=c99 trigraph.c -o test
$ ./test
x = 1
Wait – x was just set to be zero! How can it still be one‽
We get a hint if we don’t specify the language standard:
$ gcc trigraph.c -o test
trigraph.c:7:29: warning: trigraph ??/ ignored, use -trigraphs to enable
$ ./test
x = 0
So it looks like trigraphs have something to do with this weird behaviour. A trigraph is a way to write sequences of characters that represent other characters – usually special characters that were not always on all keyboards. If we take a look at the Wikipedia article under “C”, we see the following translation rule:
??/ → \
… which means that the comment line ends with ‘\’ and thus continues to the next line, commenting out x = 0.
Source: comment by SteveB in this discussion thread.
Indonesian pronouns: Saya, Anda, Dia, Kita, Kami, Kalian and Mereka
0While reading about Indonesian pronouns I made the following illustration of common pronouns from the perspective of a person with some friends addressing another person with a group of friends.
Usage notes: In colloquial Indonesian “kami” and “kita” are used interchangeably and “kami” is considered more formal than “kita”.
Photo blogging before “smart phones”
0Rummaging through an old Unix account I found the following Perl script, simply named “test.pl”. What could it be? It hasn’t been touched for over five years:
-rw-r--r-- 1 hamberg fidi_s 870 2005-09-17 04:50 test.pl
Reading the code it becomes apparent that it processes text and extracts base 64 encoded data…
… which made me remember that I used this as an email filter to extract images from MMS messages sent to myself from my mobile phone. Whenever I sent a formatted email to myself it would extract the image and text and make a blog post. I remember that was quite cool back then.
These posts even survived the transition to using WordPress for my blog: http://hamberg.no/erlend/category/mms-bilder/.
Hvorfor lesebrett?
2Min gode venninne Kanina spør på bloggen sin om bokormer er utrydningstruede og – nesten forundret – hvorfor jeg foretrekker å lesebøker på lesebrett. Og det er et godt spørsmål; jeg lurte selv på hvordan jeg ville oppleve å lese på et lesebrett fremfor en papirbok da jeg kjøpte en Kindle for halvannet år siden. For å ta det viktigste først – ja, jeg foretrekker lesebrettet fremfor å lese bok, av mange grunner. De første jeg kommer på er
- Lesebrettet mitt veier under 300 g. og inneholder (jeg sjekket akkurat) 61 bøker. Jeg har det derfor alltid i veska og kan lese i småpauser uavhengig om jeg leser en kort novelle eller en murstein for tiden.
- Jeg leser veldig mye på engelsk og den innebygde ordboken gjør at jeg kan markere ordet «chthonic» og få opp en definisjon nederst på siden og dermed lære hva det betyr med én gang.
- Jeg kan lese på sengen uten å måtte tilpasse liggestilling til boken.
- Bøker er ofte billige. De fleste nye bøkene jeg kjøper koster rundt $11 som i dag tilsvarer rundt 60 kroner. Gamle bøker finnes ofte i gratisversjoner fra f.eks. Gutenberg-prosjektet.
- Siden lesebrettet mitt har gratis mobilnett kan jeg kjøpe bøker samme hvor jeg er. Det var dermed ikke krise i så mange sekunder da jeg gikk tom for bøker på en båt til Lombok i Indonesia i sommer.
- En sparer miljøet. I følge Framtiden i våre hender er miljøgevinsten positiv allerede etter 22 bøker.

photo credit: Premshree Pillai
Jeg leser for tiden boken Gödel, Escher, Bach av Douglas Hofstadter. Denne kjempen av en bok er ikke tilgjengelig i e-bokutgave så det blir til at jeg bare leser den hjemme og jeg savner ofte ordbokoppslag.
Når dét er sagt må jeg også si at lesebrett ikke egner seg til alle bøker. Jeg leser for det meste romaner eller populærvitenskap. Disse bøkene egner seg veldig bra til lesing på brett, men jeg ville ikke ha lest pensumbøker på lesebrettet mitt – til dét er skjermen for liten og en mister oversikt. Figurer og matematiske symboler fungerer også dårligere på dagens lesebrett. Det er også betenkeligheter med kontrollen Amazon har over bøker du kjøper om du velger å handle direkte fra dem.
Alt i alt er jeg dog (kanskje ikke så overraskende) stor tilhenger av lesebrett – særlig med tanke på at de brettene som finnes i dag fortsatt er veldig ny teknologi. Her vil det nok skje en rask utvikling de kommende årene. Har jeg sluttet å kjøpe papirbøker? Nei. I hvert fall ikke ennå. Til dét er ikke teknologien helt der ennå, og jeg er også veldig glad i å se i bokhyllene mine. Det vil nok fortsatt friste å kjøpe veldig gode bøker selv om jeg har lest dem på brettet mitt, bare for å ha de i hyllen min.
Nei, nå tar jeg med vesken min med lesebrettet mitt og går på skolen. Papirboken min må nok – mest på grunn av dens størrelse – bli hjemme i dag, så ev. lesing i lunsjen blir på lesebrett.
P.S: Jeg kom på at jeg har helt glemt å nevne batteritid. Det er en grunn til dét: en tenker aldri på det. Kindle-en min har en batteritid som måles i uker og den får stort sett nok strøm bare ved at jeg kobler den til pc-en og overfører filer nå og da.
Visual block mode for Kate’s Vi Mode
16One of the most oft-missed features of Kate’s Vi input mode is Vim’s visual block mode. Visual block mode is entered by pressing ctrl+v and allow a rectangular block of text to be selected and manipulated. Also, text can be prepended or appended to the block, which is useful for, e.g. commenting out a range of lines.
Well, good news, everyone! There is now experimental support for visual block mode in Kate’s Vi input mode. Most text manipulation commands should already support visual block mode, and prepending/appending text (shift+i and shift+a, respectively) works, and you can select to end-of-line with $, as in Vim. I also made it possible to re-select the last visual selection with gv, and the marks ‘<’ and ‘>’ are set to the start and end position of the last visual selection. (Further down the road it will hopefully be possible to use marks in ex commands, too.)
There are probably still some rough edges, but visual block mode should already be usable. If you want to help test it, build Kate from git and try it out.


