Next: , Previous: , Up: k9: Manual  


19 Examples

This chapter presents an example from finance, as this is one of the primary application domains for k. For those not familiar with this field, here is a short introduction.

19.1 A Tiny Introduction to Financial Market Data

Finance has a large amount of data associated with it. In this section finance data will be limited to price and transaction information which typically includes prices to buy and sell (called quotes) and transactions (called trades). This data includes the date, financial instrument symbol, time, and exchange / venue. Additionally quotes will have a bid and ask price (where the deal is willing to buy and sell) and trades will have a price and size of the reported transaction.

Although real prices are often shown as fractions of a whole amount, eg. eurusd might be at 1.1904, the actual trades are dealt in whole cents or currencies so can be represented as integers.

Let’s start with a simple example of only times (t), bid (b), and ask (a).

 n:10
 T:^10:00+`t n?36e5  / sort randomly generated times
 B:100++\-1+n?3      / generate bids near 100, equivalent to: 100 + (+\((-1)+n?3))
 A:B+1+n?2           / generates asks 1 or 2 higher
 q:+`t`b`a!(T;B;A);q / build table t and then display
t            b   a  
------------ --- ---
10:01:48.464 100 102
10:23:12.033 100 102
10:30:00.432 101 102
10:34:00.383 101 103
10:34:36.839 101 102
10:42:59.230 100 102
10:46:50.478 100 102
10:52:42.189  99 100
10:55:52.208  99 101
10:59:06.262  98  99

Here you see that at 10:42:59.230 the prices update to 100 and 102. The price one could sell is 100 and the price to buy is 102. You might think that 100 seems a bit high so sell there. Later at 10:59:06.262 you might have thought the prices look low and then buy at 99. Here’s the trade table for those two transactions.

 t:+`t`p!(10:43:00.230 10:59:07.262:;100 99);t
t            p  
------------ ---
10:43:00.230 100
10:59:07.262  99

You’ll note that the times didn’t line up, because it apparently took you a second to decide to trade. Because of this delay, you’ll often have to look back at the previous prices to join trade (t) and quote (q) data.

Now that you’ve learned enough finance to understand the data, let’s scale up to larger problems to see the power of k9.

19.2 Data Manipulation

Let’s use k9 to generate a set of random quotes for a particular day and symbol.

 qs:`date`sym`time`exch`bid`ask            / quote table schema
 nf:d+*|d:(|-d),d:683 954 997 1000;        / normal func
 D:#[;2021.03.17]                          / date
 S:#[;`eurusd]                             / symbol
 T:^?[;_8.64e7]@                           / time in number of milliseconds
 E:?[;"ce"]                                / exchange
 B:11904++\-3+nf bin/:?[;*|nf]@            / bid price, start at 11904
 P:?[;2,2,8#1]@                            / bid/ask spread
 Q:{+qs!((D;S;T;E)@'x),(*;+/)@\:(B;P)@'x}  / generator quote table
 q:Q@_1e8;10#q
date       sym    time exch bid   ask  
---------- ------ ---- ---- ----- -----
2021.03.17 eurusd    0 c    11904 11905
2021.03.17 eurusd    0 e    11904 11906
2021.03.17 eurusd    2 e    11902 11903
2021.03.17 eurusd    3 c    11902 11903
2021.03.17 eurusd    9 c    11904 11906
2021.03.17 eurusd    9 c    11904 11905
2021.03.17 eurusd   10 c    11904 11905
2021.03.17 eurusd   12 c    11904 11905
2021.03.17 eurusd   12 c    11904 11905
2021.03.17 eurusd   12 e    11904 11906

At this point let’s run some basic statistics to see how quickly one can work with 100 million rows of data. On a reletaviley recent consumer laptop the spread calculation (the longest calculation of the bunch) is done in 350ms.

 select max bid,min ask from q
bid|18449
ask|5972 

select mid:avg 0.5*bid+ask from q
[mid:14198.32]

select spread:avg ask-bid from q
[spread:1.200035]

select first bid, first ask from q
bid|11904
ask|11905

select last bid, last ask from q
bid|14906
ask|14907

19.3 Understanding Code Examples

In the shakti mailing list there are a number of code examples that can be used to learn best practices. In order to make sense of other people’s codes, one needs to be able to efficiently understand k9 language expressions. Here is an example of how one goes about this process.

ss:{*{
      o:o@&(-1+(#y)+*x@1)<o:1_x@1;
      $[0<#x@1;((x@0),*x@1;o);x]}[;y]/:(();&(x@(!#x)+\:!#y)~\:y)
      }

This function finds a substring in a string.

 000000000011111111112222222222333333
 012345678901234567890123456789012345
"Find the +++ needle in + the ++ text"

Here one would expect to find “++” at 9 and 29.

 ss["Find the +++ needle in + the ++ text";"++"]
9 29

In order to determine how this function works let’s strip out the details...

ss:{
    *{
      o:o@&(-1+(#y)+*x@1)<o:1_x@1; / set o 
      $[0<#x@1;((x@0),*x@1;o);x]   / if x then y else z
      }
  [;y]/:(();&(x@(!#x)+\:!#y)~\:y)    / use value for inner function
  }
 

Given that k9 evaluates right to left, let’s start with the rightmost code fragment.

 (();&(x@(!#x)+\:!#y)~\:y)          / a list (null;value)

And now let’s focus on the value in the list.

 &(x@(!#x)+\:!#y)~\:y

In order to easily check our understanding, we can wrap this in a function and call the function with the parameters shown above. In order to step through, we can start with the inner parenthesis and build up the code until it is complete.

 {!#x}["Find the +++ needle in + the ++ text";"++"]
{!#x}["Find the +++ needle in + the ++ text";"++"]
^
:rank

This won’t work as one cannot call a function with two arguments and then use only one. In order to get around this, we will insert code for the second argument but not use it.

 {y;#x}["Find the +++ needle in + the ++ text";"++"]
36
 {y;!#x}["Find the +++ needle in + the ++ text";"++"]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ..

As might have been guessed #x counts the number of charcters in the first argument and then !#x generates a list of integers from 0 to n-1.

 {(!#x)+\:!#y}["Find the +++ needle in + the ++ text";"++"]  
 0  1
 1  2
 2  3
 3  4
 4  5
 5  6
 6  7
 7  8
 8  9
 9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
..

Here the code takes each integer from the previous calculation and then adds an integer list as long as the second argument to each value. In order to verify this, you could write something similar and ensure the output what you predicted.

 {(!x)+\:!y}[6;4]
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8

Now using the matrix above the code indices as the first argument and pull substrings that match the length of the search string.

 {x@(!#x)+\:!#y}["Find the +++ needle in + the ++ text";"++"]
Fi
in
nd
d 
 t
th
he
e 
 +
++
++
+ 
 n
ne
ee
ed
dl
le
e 
 i
in
..

At this point one can compare the search substring in this list of substrings to find a match.

 {(x@(!#x)+\:!#y)~\:y}["Find the +++ needle in + the ++ text";"++"]
000000000110000000000000000001000000b

And then one can use the where function, &, to determine the index of the matches.

 {&(x@(!#x)+\:!#y)~\:y}["Find the +++ needle in + the ++ text";"++"]
9 10 29

The rest of the ’ss’ function code is left as an exercise for the reader.


Next: Benchmark, Previous: Errors, Up: k9: Manual