Monday, May 7, 2012

Odd Behavior of wj1

sym   v
jesse 0
jesse 1
a     2
a     3
b     4
sym   v xx
a     0 l 
b     1 d 
jesse 2 k 
a     3 n 
b     4 b 
jesse 5 m 
a     6 h 
b     7 d 
jesse 8 a 

-1 0 1 2 3
2  3 4 5 6

sym   v xx        
jesse 0 `symbol$()
jesse 1 ,`k       
a     2 `d`k`n    
a     3 `d`k`n    
b     4 `k`n`b 

I expected the following result:

sym   v xx        
jesse 0 `symbol$()
jesse 1 ,`k       
a     2 ,`n    
a     3 ,`n    
b     4 ,`b 

Perhaps there is something I have yet to understand about wj and wj1.

Update: The documentation makes it clear that the right-hand-side table must be sorted `sym`v.

Thursday, April 5, 2012

Tuesday, February 21, 2012

Protecting assignment of built-ins in qlang

I'm attempting to programmatically protect against attempted assignment to a built-in identifier, and I ran across some strange behavior

q)get `sum      / fails. This is OK, but I'd rather it return the function
q)sum:0         / fails. This is good.
q)set[`sum;0]   / succeeds. This should fail, in my opinion
q)sum           / This 'sum' is still a function
q)get `sum      / This 'sum' is now the int 0

I haven't yet found a way to check to see if a given symbol is somehow reserved by q (whether it be a user-defined identifier, or a built-in) without resorting to an unsafe eval-like implementation.

Wednesday, December 14, 2011

Learning q/kdb+ (qlang)

Over the past few days, I've started to get to know q. Not this Q, but perhaps just as arrogant and mischievous. This post will serve as my own personal reference, but I also hope it can become a resource for others in making sense of this very esoteric language.

What is q? 

Wikipedia is pretty good at answering that.

Searching for q

When attempting to learn q, the first thing you'll notice is that it's next to impossible to search google for anything related to it. Your only hope is to construct search terms using "kdb" or "kdb+" rather than trying anything with "q" because the "q" in the search will be matched by thousands of "Q&A"-type posts, usually not related to the language q. The language go gets around this problem by specifying "golang" as an extended version of the name of the language. If I had any sort of power in the q world, I'd certainly introduce the term "qlang" into the vocabulary. This is why I included qlang in the title of this post.

Getting q

Kx makes getting a q test (i.e. trial) environment up and running easy. Go to their Download page for the trial software, download the package, and put it somewhere convenient. I've set it up on Mac OS X, so for me, it's located in ~/q. Notice that the q executable is one level deeper. For my Mac, it's located at ~/q/m32.

Running q

Before you read anything, if you're anything like me, you'll just want to jump right in. You can simply execute ~/q/m32/q and be thrust into the q environment, but I wouldn't advise it. The bare q shell is missing functionality such as viewing previous and next commands (up and down arrow, respectively). Create a bash script with the contents:

cd ~/q
rlwrap m32/q "$@"

rlwrap will add lots of functionality to the shell that you're going to want later. This is mentioned in kx's documentation, but I feel it's important enough to be repeated.


Kx provides a heap of information regarding q and kdb+. Finding the correct order to read the documentation is a little difficult. I recommend visiting the Tutorials wiki and reading through the provided articles, in order.

The Reference page gives a good summary of all the keywords and symbols in q.

The Cookbooks offer some higher-level suggestions for developing in q/kdb+.

Why is q so weird about ____ ?

The answer is most likely efficiency. For example, the language is designed such that the entire interpreter fits into the L2 cache.

Tips & Tricks

In this section, I'll be keeping track of the little problems I've come across learning qlang and my solutions.

Syntax highlighting in vim.

Comparing strings for equality.
I've had the best luck converting the strings to symbols and comparing. For example:
  select from mytable where `stimpson=`$lastname
This casts the lastname column to a symbol and compares with the hard-coded symbol `stimpson. The value of the comparison is either 0b or 1b.

It seems you could also do
  all "stimpson"=lastname
since this also yields either 0b or 1b, but this only works for strings of the same length.

Edit: The above method adds an extra step that can end up costing a lot of time. Try using
  lastname like "stimpson"
as well.

Converting a list of number strings into a list of integer lists.
Suppose I have the list l:("123";"456") and I want to convert to (1 2 3;4 5 6). I could do ("I"$/:l[0];"I"$/:l[1]), but for something that scales, we just need to write a simple anonymous function:
  {"I"$/:x}each l

Beware of floating point!
Enter the following into your q prompt.
q)n:10*c-floor c:(floor 8990998%100)%10
q)floor n
On my installation of q, n reports 9f and floor n reports 8. However, n=9f reports 0b. Notice that n-9f reports -3.637979e-12. Weird, wild stuff. Just be careful when dealing with floating point numbers and expecting integer results. The expression for n above is definitely not the "right way" to do such a thing.

Since we know, from a mathematical perspective, that n is an integer, we can workaround the weirdness above with floor n+0.9f. We can be almost certain that adding 0.9f to n won't take us into the next integer's space.

There is a good explanation for how q deals with floating point on the kx wiki.

Running database queries.
This is from the KDB For Mortals tutorial, but it's important enough to be repeated:

"The first sub-phrase of a where clause in a query against a partitioned table must constrain either the virtual column or a column having an index. If you fail to do this, the entire database will be scanned. Long before the query completes, your colleagues will show up at your workstation wielding pitchforks."

Basically, any queries against the historical database (hdb) need to be contrained first by an indexed column. Otherwise, your q process will be eating up resources for hours.

Moreover, "Reading only partial column slices of a table having many partitions is a big memory and performance win."

Where clause logic.
In a where clause for a select statement, use "," for "and" logic.

I still haven't found a optimal way to do an "or", but the following works:
  f:select from mytable where mycol=`a
  t:select from mytable where myothercol=`a
It joins the tables from the queries together.

Aggregating data on partitioned tables.
It is better to run aggregate functions (such as sum, avg, max, etc) within the select template itself to take full advantage of table partitioning. For example, do
  select sum cost from t where date within 2011.12.01 2011.12.05

Don't do
  sum select cost from t where date within 2011.12.01 2011.12.05

In the former, kdb+ recognizes that it only needs the sum, and it performs map-reduce for the table partitions required. This means that memory use is optimized. In the latter, kdb+ loads all the cost data from the partitions into memory, and brute-forces the sum. No fancy map-reduce.

Day of week.
Compute the day of the week by date mod 7. Saturday is 0.

Rounding datetime down to the hour.
The dt.hh is simply an integer, so in order to add it to the date, we need to convert the integer to hour by using the value for number of msec in an hour.

Update: Seems to me that the function above should work, but if you try, you'll run into a problem doing things like this:
  {x.hh}dt / fails with error 'x.hh
What's going on here is simply a "quirk" with q. Dot (.) notation for accessing temporal constituents does not work on function arguments. Thus, the fixed function is as follows.

Creating a step function.
You can easily create a step function by adding an `s attribute to a dictionary. For example:
  d:(10*til 10)!til 10
  d 2*til 20
  step 2*til 20

Local variables.
q functions have a maximum of 23 local variables. I'll assume there's a reason for this.

Monday, October 17, 2011

Singletons singled out

In the past, I've written about design patterns, and in one post in particular used a Singleton to demonstrate a concept.

Ever since seeing my first Singleton, I've been wary of their benefits to a codebase. Reading the Gang of Four book, I was disappointed and confused to see that the Singleton was not condemned. I suppose the Four were keeping a mostly objective view on the limits of designing via design patterns. However, such a pattern is considered by most to be an anti-pattern, so I question whether it should be in the book at all.

Regardless, it's not difficult to dream up a scenario in which the developer thinks there should exactly one instance and then requirements changed, and so must the code. Design patterns are meant to soften the blow of changing requirements, but the Singleton in particular is rigid in this regard.

I admit I have used Singletons in the past, but I invite you to join me in my pledge to investigate all other opportunities before falling back on the convenient, dangerous Singleton.

Thursday, February 17, 2011

DirectShow part 2

I got SDK 6 installed, found Qedit.h and made my own Qedit.h in my project. Test build.


I don't have the DirectX SDK. Instead of adding a DirectX dependency on my app, I'll just ifdef-out all the DirectX related things in the header. Test build.


DES is up and running. Now to find out what else I need to do to set up my timeline. Ok I have an IAMTimeline. According to the documentation I need an IAMTimelineSrc, but when I try to create an instance with COM, it says the class isn't registered. The documentation says

To create a source object, call IAMTimeline::CreateEmptyNode with the value TIMELINE_MAJOR_TYPE_SOURCE

That creates an IAMTimelineObj. The documentation says I can query that object for the IAMTimelineSrc. None of the methods jump out at me, and reinterpret_cast failed spectacularly.

Ah, there's an actual method called QueryInterface. The question is, does it work like a cast or create a new object? ... Cast it is. No need to free this one.

Found a great example here:

I simply modified it so that my source points to the first JPG in my directory, built, ran, and...


The video starts to play. It's very slow to play back, presumably because the JPGs are about 800 KB each. Also, the video craps out early. I don't know why that is.
Oh! It ends early because I tell it that it's only 5 seconds long.. Yep that was it- just needed to lengthen the time.

In order to write to a file, it's tell me to make an instance of ICaptureGraphBuilder2. Why the 2? Who knows.

I followed some more example code to get the file-writing fleshed out. When I run, though, I get an access is denied error. This may be because I no longer have admin rights. So, I ran the exe as admin. It took a very long time to make a terrible quality video. This is not a good sign. Not to mention it's over 100 MB for 46 seconds of video. And the colors are all wrong.

I have a lot of work to do...

DirectShow part 1

We wanted to use FFmpeg, but frankly we're confused by and afraid of the legal ramifications, so we've fallen back onto DirectShow.

The build process of FFmpeg on Windows is painful. It can be built on Windows, but it is obviously not the preferred platform of the developers, and the documentation could be stronger. Once I learned I needed to include a particular configure options, the build went off without a hitch, though, and from that point on, using FFmpeg was a breeze. The command line interface is clear and the encoding is fast.

Getting up and running with DirectShow, however, is proving to be much more difficult. The libraries are included with the Windows SDK packages. On my machine, I have Windows SDK 7 and 7.1. Our project is built with msvc2005.

My first step was to follow along with the sample application provided by MSDN for playing back a video file-- not what I actually need to accomplish, but it would be a good exercise.

Fire up Visual Studio. New project. Include Dshow.h. Test build.

Compile failure.

Duh. I need to add the Windows SDK include directory to my project. No problem. $(INCLUDE) added. Bam. While I'm here, I'll add $(LIB) to the additional libraries directory. Test build.


Ok, let's start creating some DirectShow objects. Hm, haven't used COM before, so this is a good time to learn. Usage of COM is just as painful as any other Windows API--not surprising. Oh well, I can handle it. So I've created a few DirectShow objects, I'll do another test build.

Link failure.

Yep, expected that. I need to add Strmiids.lib, which is the DirectShow library. Strange name-- oh well. Test the build!

Link failure.

Debug information corrupt? That's odd. So I looked around the internet. DirectShow Windows SDK 7 isn't compatible with msvc2005. After some digging, there's a hotfix that will make it work through some Microsoft hack known as KB949009. Installed the hotfix. Test build.


Excellent. I finish out the rest of the sample application and attempt to use it to play a sample mp4 video from my machine.


Can't open the mp4. Try again with an AVI (not sure what the codec is).


Ok, well maybe that AVI used some obscure codec, so then I tried with an AVI that was created using Video for Windows.


Well it works for some formats. How can I tell what formats are supported? I'll have to come back to this. Great, I've completed their sample, but what I really need to do it compile a collection of images to a video. After some more poking and prodding of the internet, I came across a helpful StackOverflow post that pointed in the direction of DirectShow Editing Services (DES). For the DES interfaces, I need to include Qedit.h.

Compile failure.

No Qedit.h? Ok, search my system. Nothing. Search the internet. Qedit.h is not included in the Windows SDK 7 package because it isn't compatible with something about Direct3D, so instead of fixing it, they killed it. The Microsoft accepted solution is to download a previous version of the SDK (6) and duplicate the Qedit.h header in your own project. What a mess. So now I'm downloading a 1GB installer for the SDK 6 just to get at a tiny header file.