Java-based Incremental Dialog Framework

Note: Jindigo has not been further developed for a while. The reason for this is that Gabriel Skantze is working on another dialogue system framework called IrisTK. It is designed to be much easier to use and has much better support for multi-modal face-to-face interaction.


An incremental dialog system process its input and output word-by-word instead of utterance-by-utterance. This allows, among other things, a more natural, rapid turn-taking.

Jindigo is a framework for developing and experimenting with incremental spoken dialog systems, developed at the Department of Speech Music and Hearing, KTH. Some features:

  • Java-based, platform independent
  • Open source
  • Modularized and extendible. Comes with a set of (all pure Java-based) built-in modules for building a complete spoken dialog system:
    • Speech recognizer (CMU Sphinx 4)
    • Parser & Semantic interpreter
    • Context modeler
    • Speech synthesizer (MaryTTS)
    • Inspector for monitoring and understanding how information flows between modules.
  • Incremental!
    • All modules and data-types are built an a general model of incremental processing.
    • Revision is fully supported in all modules, e.g. the speech recognizer might output an hypothesis and then later revoke it.
    • Of course, non-incremental systems can be implemented as well.

Video Example

This is an example video of a Jindigo application running:


You can test the Chess example application (shown above) yourself with Java Web Start:

  • Best: If you have have an English Windows OS (Vista or 7, with the latest .NET version), you can try using the native speech recognizer. [RUN Windows version]
  • Otherwise: You can run the system with the CMU Sphinx recognizer. This will be slower to start and the performance may vary a lot depending on your microphone and accent. [RUN Sphinx version]


Jindigo is still at a very early stage of development, but you can download a package with the latest binaries and source code from Sourceforge.

Note: Although Jindigo should in principle be platform independent, it has currently mostly been tested on Windows.

The distribution comes with the speech controlled chess board shown above. In order to run, you need a MaryTTS version 4.1 server running (as default on the same machine, but that can easily be configured).


Documentation has yet to be written. In the meantime, you can check out the (yet poorly annotated) Javadoc.


Source code can be checked out from the SVN repository at Sourceforge.


The Jindigo project is maintained by Gabriel Skantze. Email: gabriel@speech.kth.se


Schlangen, D., & Skantze, G. (2011). A General, Abstract Model of Incremental Dialogue Processing. Dialogue & Discourse, 2(1), 83-111. [pdf]Schlangen, D., Baumann, T., Buschmeier, H., Buss, O., Kopp, S., Skantze, G., & Yaghoubzadeh, R. (2010). Middleware for Incremental Processing in Conversational Agents. In Proceedings of SigDial. Tokyo, Japan. [pdf]Skantze, G. (2010). Jindigo: a Java-based Framework for Incremental Dialogue Systems. Technical Report, KTH, Stockholm, Sweden. [pdf]Skantze, G., & Hjalmarsson, A. (2010). Towards Incremental Speech Generation in Dialogue Systems. In Proceedings of SIGdial (pp. 1-8). Tokyo, Japan. (*) [abstract] [pdf](*) Best Paper Award at SIGdial 2010Abstract: We present a first step towards a model of speech generation for incremental dialogue systems. The model allows a dialogue system to incrementally interpret spoken input, while simultaneously planning, realising and selfmonitoring the system response. The model has been implemented in a general dialogue system framework. Using this framework, we have implemented a specific application and tested it in a Wizard-of-Oz setting, comparing it with a non-incremental version of the same system. The results show that the incremental version, while producing longer utterances, has a shorter response time and is perceived as more efficient by the users.Schlangen, D., & Skantze, G. (2009). A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09). Athens, Greece. [abstract] [pdf]Abstract: We present a general model and conceptual framework for specifying architectures for incremental processing in dialogue systems, in particular with respect to the topology of the network of modules that make up the system, the way information flows through this network, how information increments are 'packaged', and how these increments are processed by the modules. This model enables the precise specification of incremental systems and hence facilitates detailed comparisons between systems, as well as giving guidance on designing new systems.