Skip to main content

Posts

Showing posts from July, 2020

Hidden Technical Debt in Machine Learning Systems

The software engineering industry precedes other industries in it's tooling, and with the understanding of the importance of dealing with technical debt.  For example software developers use versioning, markdown for fast documentation, text to diagram tools, continuous deployments, these among many other tools are things you don't and sometimes cannot see in other industries.   One place where the software industry is lagging a lot behind is the ability to create software in a well organized form, instead it's using processes which try to make sense of coding on the go like scrum.  As cal Newport said just imagine a car company where someone runs with a part, puts it in the car, then another email comes they and decide to change the color to blue, people moving around, decisions in mail with regards how many cars to take out this week, then something blasts at one part of the car manufacturing area, so they open the graphs they see that indeed they overutilized their robots

Statistical Significance in Hypothesis Testing

You started taking vitamin-D a couple of weeks ago and you notice it takes you less time to fall asleep at night, is it a result of the vitamin D or is it something else, is there maybe something else that causes you to fall asleep more easily? So you decide to do an experiment. You take 50 coworker volunteers, and you split them to two groups, one 125coworker group will get vitamin D the same one you took, while the other group will take a placebo. You notice that the guys who took the real vitamin did get shorter amount of time to fall asleep.  Was it the vitamin D as the cause? It could be, and it could also be the case that not, maybe they share a project they work on and it's going well, so they fall asleep better while the other ones are having hard time and it's not related therefore to any vitamin D they take or not. This is where hypothesis testing and significance come into play. Hypothesis testing is almost what we did here with the experiment, but we want to also kn

Boeing B Tree's

In July 1970 Rudolf Bayer and Mcreight from Boeing Scientific research laboratories published the original B-Trees paper in the mathematical and information sciences report.  They never said what BTrees stand for, and it could well be Boeing Trees though Balanced Tree's is also a good reminder of what they are. When studying BTrees before getting to the actual algorithm, wouldn't it be nice if we understand the exact motivation of the people who actually published the paper, the original people who thought of this idea. They explain it in their paper, first just remember we are talking about 1970, this was before most of us were born, computers were slow, really slow. In the paper they said that they are working on the organization and maintenance of index for dynamic random access file.  Let's dissect it, we know what index is, this is a thing that allows us to locate fast items in our data without scanning the whole data right.  So they want to understand and suggest bett