Next: , Previous: , Up: Top  


1 Introduction

The k9 programming language is designed primarily for the analysis of data. It may surprise new users with two rather different paradigms, (1) fast data analysis and (2) concise syntax. After some familiarity these changes will both seem normal and going back to slow and verbose programming will be surprisingly difficult.

1.1 Going fast

Imagine you have a small, on-disk, 100 million row database containing a time-series with two float values at each time. Additionally this data could be split in three different tables covering different measurements. Here’s how fast k9 can read in the data from disk and compute a statistic, average difference over each table, which uses each and every row.

 bash-3.2$ 2020.xx.xx 17GB 4core (c) shakti
 \t q:2:`q;{select s:avg a-b from x}'q[!q]
495

That’s 495 ms to read the data in from disk and compute over all the 100 million values. The data read is the biggest bit. If the data was already in memory then it’s even faster.

 \t {select s:avg a-b from x}'q[!q]
230

230 ms, not bad for 100 million calculations.

The code to generate the on-disk database is presented below. Speed of course will depend on your hardware so times will vary.

nf:d+*|d:(|-d),d:683 954 997 1000;
T:^`t ?[;_8.64e7]@
B:100++\1e-2*-3+nf bin/:?[;*|nf]@
S:?[;1e-2*2,2,8#1]@
L:{select t,b,a:b+s from +`t`b`s!(T;B;S)@'x}
q:`eurusd`usdjpy`usdchf!L'60e6 20e6 20e6
`q 2:q

1.2 Going concise

The k9 language is more closely related to mathematics syntax than most programming lanauges. It requires the developer to learn to “speak” k9 but once that happens most find an ability to speak quicker and more accurately in k9 than in other languages. At this point an example might help.

In mathematics, “3+2” is read as “3 plus 2” as you learn at an early age that “+” is the “plus” sign. For trival operations like arithmetic most programming languages use symbols also. Moving on to something less math like most programming lanauges switch to words while k9 remains with symbols which turn out to have additional clarity. As an example, to determine the distinct values of a list most programming languages might use a synatx like distinct() while k9 uses ?. This requires the developer to learn how to say a number of symbols but once that happens it results in much shorter code that is quicker to write, harder to bug, and easier to maintain. The reason it’s actually more clear is that as your write out distinct you might write disinct instead.

In math which do you find easier to answer?

Math with text
Three plus two times open parenthesis six plus fourteen close parenthesis

Math with symbols
3+2*(6+14)

In code which do you find easier to understand?

Code with text
x = (0,12,3,4,1,17,-5,0,3,11);y=5;
distinct_x = distinct(x);
gt_distinct_x = [i for i in j if i >= y];

Code with symbols
x:(0,12,3,4,1,17,-5,0,3,11);y:5;
z@&y<z:?x

If you’re new to k9 then you likely appreciate symbols are shorter but look like line noise. That’s true but so did arithetic until you learned the basics.

When you first learned arithmetic you likley didn’t have a choice. Now you have a choice about learning k9. If you give it a try, then I expect you’ll get it quickly and move onto the power phase fast enough that you’ll be happy you gave it a chance.

1.3 Get k9.

https://shakti.sh

You will find the Linux version in the linux directory and the MacOS version under macos. Once you download the MacOS version you’ll have to change it’s file permissions to allow it to execute.

 chomd u+x k

Again on the mac if you then attempt to run this file you likely won’t succeed due to MacOS security. You’ll need to go to “System Preferences...” and then “Security and Privacy” and select to allow this binary to run. (You’ll have to have tried and failed to have it appear here automatically.)

1.4 Help/Info Card

Typing \ in the terminal gives you a concise overview of the language. This document aims to provide details to beginning user where the help screen is a bit too terse. Some commands are not yet complete and thus marked with an asterisk, eg. *update A by B from T where C.

Universal database, webserver and language -- full stack -- fast

Database
 select A by B from T where C; update A from T
 count first last sum min max *[avg var dev med ..]
 x,y      / insert, upsert, union, equi+asof leftjoin
 x+y      / equi+asof outerjoin (e.g. combine markets through time)
 x#y      / take/intersect
 x_y      / drop/difference

Language
Verb                      Adverb               Type              System
:    x         y          f/  over  c/  +join  bool 011b         \l a.k
+    flip      plus       f\  scan  c\  +split int  0N 0 2 3     \t:n x
-    negate    minus      f'  each  v'  +has   flt  0n 0 2 3.    \u:n x
*    first     times      f': eachp v': +bin   char " ab"        \v
%              divide     f/: eachr ([n;]f)/:  name ``ab         \w
&    where     min/and    f\: eachl ([n;]f)\:  uuid              \cd x
|    reverse   max/or
<    asc       less              .z.[md]       date 2001.01.01
>    dsc       more              .z.[hrstuv]   time 12:34:56.123456789
=    group     equal      I/O
~    not       match      0:  r/w line         Class             \f
!    enum      key        1:  r/w byte         List (2;3.4;`a)   \ft x
,    enlist    cat       *2:  r/w data         Dict `i`f!(2;3.4) \fl x
^    sort   [f]cut       *3:  k-ipc set        Func {[a;b]a+b}   \fc x
#    count  [f]take      *4:  https get        Expr :a+b
_    floor  [f]drop
$    string    cast+                           Table
?    unique+   find+      $[b;x;y] if else     t:[[]i:2 3;f:3 4.;s:`a`b]
@    type   [f]at         @[x;i;f[;y]] amend   utable [[b:..]a:..]
.    value  [f]dot        .[x;i;f[;y]] dmend   xtable `..![[]a:..]
sqrt sqr exp log sin cos div mod bar .. freq rank msum .. in bin within ..
\\   exit      / comment  *[if do while select[a;b;t;c]]

Interface
`csv?`csv t    / read/write csv
`json?`json t  / read/write json
-python:   import k;k.k('+',2,3)
-nodejs: require('k').k('+',2,3)
*ffi: "./a.so"5:`f!"i"  /I f(I i){return 2+i;}       //cblas ..
*c/k: "./b.so"5:`f!1    /K f(K x){return ki(2+xi);}  //feeds ..

*enterprise[unlimited data/users/machines/..] +overload

1.5 rlwrap

Although you only need the k binary to run k9 most will also install rlwrap, if not already installed, in order to get command history in a terminal window. rlwrap is “Readline wrapper: adds readline support to tools that lack it” and allows one to arrow up to go through the command buffer history.

In order to start k9 you should either run k or rlwrap k to get started. Here I will show both options but one should run as desired. In this document lines with input be shown with a leading space and output will be without. In the examples below the user starts a terminal window in the directory with the k binary file. Then the users enters rlwrap ./k RET. k9 starts and displays the date of the build, (c), and shakti and then listens to user input. In this example I have entered the command to exit k9, \\. Then I start k9 again without rlwrap and again exit the session.

 rlwrap ./k
Sep 13 2020 16GB (c) shakti
 \\

 ./k
Sep 13 2020 16GB (c) shakti
 \\

1.6 Simple example

Here I will start up k9, perform some trivial calculations, and then close the session. After this example it will be assumed the user will have a k9 session running and working in repl mode. Comments (/) will be added to the end of lines as needed.

 rlwrap ./k
Sep 13 2020 16GB (c) shakti
 n:10000                     / n data points
 s:`a`b`c                    / data for symbols a, b, and c
 q:+s!(-1+n?3;-1+n?3;-1+n?3) / table of returns (-1,0,1) for each symbol
 q                           / print out the table
a  b  c 
-- -- --
 0  1  1
-1 -1  0
-1  1  1
 0  1 -1
-1 -1 -1
..

At this point you might want to check which symbol has the highest return, most variance, or any other analysis on the data.

 #'=+(+q)[]                  / count each unique a/b/c combination
a  b  c |   
-- -- --|---
 0  1  1|407
-1 -1 -1|379
-1  0  0|367
 0 -1 -1|391
 1  1  1|349
..
 +-1#+\q                     / calculate the return of each symbol
a|-68
b|117
c|73
 {(+/m*m:x-avg x)%#x}'+q  / calculate the variance of each symbol
a|0.6601538
b|0.6629631
c|0.6708467

Now let’s exit the session.

 \\
bash-3.2$ 

1.7 Document formatting for code examples

This document uses a number of examples to familiarize the reader with k9. The sytax is input has a leading space and output does not. This follows the terminal syntax where the REPL input has space but prints output without.

 3+2 / this is input
5    / this is output

1.8 k9 nuances

One will need to understand some basic rules of k9 in order to progress. These will likely seem strange at first but the faster you learn a few nuances the faster you’ll move forward.

1.8.1 The language changes often (for now).

There may be examples in this document which work on the version indicated but do not with the version currently available to download. If so, then feel free to drop the author a note. Items which currently error but are likley to come back ’soon’ will be left in the document.

1.8.2 Colon (:) is used to set a variable to a value

a:3 is used to set the variable, a, to the value, 3. a=3 is an equality test to determine if a is equal to 3.

1.8.3 Percent (%) is used to divide numbers

Yeah, 2 divide by 5 is written as 2%5 and not 2/5.

1.8.4 Evaluation is done right to left

2+5*3 is 17 and 2*5+3 is 16. 2+5*3 is first evaluated on the right most portion, 5*3, and once that is computed then it proceeds with 2+15. 2*5+3 goes to 2*8 which becomes 16.

1.8.5 There is no arithmetic order

+ does not happen generally before or after *. The order of evaluation is done right to left unless parenthesis are used. (2+5)*3 = 21 as the 2+5 in parenthesis is done before being multiplied by 3.

1.8.6 Operators are overloaded depending on the number of arguments.

 *(3;6;9)    / single argument: * is first
3
 2*(3;6;9)   / two arguments: * is multiplication
6 12 18

1.8.7 Lists and functions are very similar.

k9 syntax encourages you to treat lists and functions in a similar function. They should both be thought of a mapping from a value to another value or from a domain to a range.

If this book wasn’t a simples guide then lists (l) and functions (f) would be replaced by maps (m) given the interchangeability. One way to determine if a map is either a list or function is via the type function. Lists and functions do not have the same type.

 l:3 4 7 12
 f:{3+x*x}
 l@2
7
 f@2
7
 @l
`I
 @f
`.

1.8.8 k9 is expressed in terms of grammar.

k9 uses an analogy with grammar to describe language syntax. The k9 grammar consists of nouns (data), verbs (functions) and adverbs (function modifiers).

In k9 as the Help/Info card shows data are nouns, functions/lists are verbs and modifiers are adverbs.


Next: , Previous: , Up: Top