Next: , Previous: , Up: Top  


18 Examples

This chapter presents an example from finance, as this is one of the primary application domains for k. For those not familiar with this field, here is a short introduction.

18.1 A Tiny Introduction to Financial Market Data

Financial market data generally are stored as prices (often call quotes) and trades. At a minimum, prices will include time (t), price to buy (b for bid) and price to sell (a for ask). Trades will include at a minimum time (t) and trade price (p). In normal markets there are many more prices than trades. (Additionally, the data normally includes the name of the security (s) and the exchange (x) from which the data comes.)

Let’s use k9 to generate a set of random prices.

 n:10
 T:^10:00+`t n rand 36e5 / sort randomly generated times
 B:100++\-1+n rand 3     / generate bids near 100, equivalent to: 100 + (+\((-1)+n rand 3))
 A:B+1+n rand 2          / generates asks 1 or 2 higher
 q:+`t`b`a!(T;B;A);q     / build table t and then display
t            b   a  
------------ --- ---
10:01:48.464 100 102
10:23:12.033 100 102
10:30:00.432 101 102
10:34:00.383 101 103
10:34:36.839 101 102
10:42:59.230 100 102
10:46:50.478 100 102
10:52:42.189  99 100
10:55:52.208  99 101
10:59:06.262  98  99

Here you see that at 10:42:59.230 the prices update to 100 and 102. The price one could sell is 100 and the price to buy is 102. You might think that 100 seems a bit high so sell there. Later at 10:59:06.262 you might have thought the prices look low and then buy at 99. Here’s the trade table for those two transactions.

 t:+`t`p!(10:43:00.230 10:59:07.262:;100 99);t
t            p  
------------ ---
10:43:00.230 100
10:59:07.262  99

You’ll note that the times didn’t line up, because it apparently took you a second to decide to trade. Because of this delay, you’ll often have to look back at the previous prices to join trade (t) and quote (q) data.

Now that you’ve learned enough finance to understand the data, let’s scale up to larger problems to see the power of k9.

18.2 Data Manipulation

Generate a table of random data and compute basic statistics quickly. The data here includes time (t), security (s), and price delta (d). This table takes about 4 GB and 3.3 seconds on a relatively new consumer laptop.

 n:_100e6                         / set n to 100 million 
 t:{09:00:00.000+x rand 10:00:00.000}  /  x random times between 9:00 and 19:00 (7 pm)
 s:{x rand `a`b`c`d`e}                 / x random symbols from `a`b`c`d`e
 m:0,(|m),365378984,m:271810244 42800467 2636454 62769 572 2;  / start m with 6 numbers, then regenerate m with 0, |m, 365378984, m
 d:{(-6+!13)@(+\m)bin x rand _1e9}  / (i) Take the running sum of m. 
                / (ii) find the binary search index values of the x random numbers drawn up to 1e9 using bin
                / (iii) use those indexes on the array of integers from -6 to +6 inclusive
 \t q:+`t`s`d!(t[n];s[n];d[n])    / build table and measure time in ms
3391

As this point one might want to check start and stop times, see if the symbol distribution is actually random and look at the distribution of the price deltas.

 select ti:min t, tf:max t from q / min and max time values
ti|09:00:00.000
tf|18:59:59.999

 select c: count s by s from q          / count each symbol
s|c       
-|--------
a|20003490
b|19997344
c|19998874
d|20000640
e|19999652

 select c: count d by d from q          / check normal dist (2s to run)
d |c       
--|--------
-6|1       
-5|55      
-4|6226    
-3|263801  
-2|4280721 
-1|27179734
 0|36531595
 1|27188092
 2|4279872 
 3|263610  
 4|6245    
 5|48

 select gain:sum d by s from q    / profit (or loss) over each symbol
s|gain 
-|-----
a|  872
b| 2765
c| 2668
d| 2171
e|-2354

 select loss:min +\d by s from q  / worst loss over the period
s|loss 
-|-----
a|-1803
b| -846
c|-2732
d|-2101
e|-2903

18.3 Understanding Code Examples

In the shakti mailing list there are a number of code examples that can be used to learn best practices. In order to make sense of other people’s codes, one needs to be able to efficiently understand k9 language expressions. Here is an example of how one goes about this process.

ss:{*{
      o:o@&(-1+(#y)+*x@1)<o:1_x@1;
      $[0<#x@1;((x@0),*x@1;o);x]}[;y]/:(();&(x@(!#x)+\:!#y)~\:y)
      }

This function finds a substring in a string.

 000000000011111111112222222222333333
 012345678901234567890123456789012345
"Find the +++ needle in + the ++ text"

Here one would expect to find “++” at 9 and 29.

 ss["Find the +++ needle in + the ++ text";"++"]
9 29

In order to determine how this function works let’s strip out the details...

ss:{
    *{
      o:o@&(-1+(#y)+*x@1)<o:1_x@1; / set o 
      $[0<#x@1;((x@0),*x@1;o);x]   / if x then y else z
      }
  [;y]/:(();&(x@(!#x)+\:!#y)~\:y)    / use value for inner function
  }
 

Given that k9 evaluates right to left, let’s start with the rightmost code fragment.

 (();&(x@(!#x)+\:!#y)~\:y)          / a list (null;value)

And now let’s focus on the value in the list.

 &(x@(!#x)+\:!#y)~\:y

In order to easily check our understanding, we can wrap this in a function and call the function with the parameters shown above. In order to step through, we can start with the inner parenthesis and build up the code until it is complete.

 {!#x}["Find the +++ needle in + the ++ text";"++"]
{!#x}["Find the +++ needle in + the ++ text";"++"]
^
:rank

This won’t work as one cannot call a function with two arguments and then use only one. In order to get around this, we will insert code for the second argument but not use it.

 {y;#x}["Find the +++ needle in + the ++ text";"++"]
36
 {y;!#x}["Find the +++ needle in + the ++ text";"++"]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ..

As might have been guessed #x counts the number of charcters in the first argument and then !#x generates a list of integers from 0 to n-1.

 {(!#x)+\:!#y}["Find the +++ needle in + the ++ text";"++"]  
 0  1
 1  2
 2  3
 3  4
 4  5
 5  6
 6  7
 7  8
 8  9
 9 10
10 11
11 12
12 13
13 14
14 15
15 16
16 17
17 18
18 19
19 20
20 21
..

Here the code takes each integer from the previous calculation and then adds an integer list as long as the second argument to each value. In order to verify this, you could write something similar and ensure the output what you predicted.

 {(!x)+\:!y}[6;4]
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8

Now using the matrix above the code indices as the first argument and pull substrings that match the length of the search string.

 {x@(!#x)+\:!#y}["Find the +++ needle in + the ++ text";"++"]
Fi
in
nd
d 
 t
th
he
e 
 +
++
++
+ 
 n
ne
ee
ed
dl
le
e 
 i
in
..

At this point one can compare the search substring in this list of substrings to find a match.

 {(x@(!#x)+\:!#y)~\:y}["Find the +++ needle in + the ++ text";"++"]
000000000110000000000000000001000000b

And then one can use the where function, &, to determine the index of the matches.

 {&(x@(!#x)+\:!#y)~\:y}["Find the +++ needle in + the ++ text";"++"]
9 10 29

Next: , Previous: , Up: Top