SoundCloud, I love you, but you’re terrible

I finally started using SoundCloud for a new jazz/electro project called Fynix. I casually used it in the past under my own name, in order to share WIP tracks, or just odd stuff that didn’t fit on bandcamp. But I never used it seriously until recently. Now I am using it every day, and trying to connect with other artists. I am remixing one track a week, listening to everything on The Upload, and liking/commenting as much as I can.

SoundCloud is the best social network for musicians right now. But it still has a terrible identity crisis. Most of the services seem to be aimed at listeners, or aimed at nobody in particular.

So in this post, I’m going to vent about SoundCloud. It’s a good platform, but with a few changes it could be great.

1. I am an artist. Stop treating me like a listener.

Is it really that difficult for you to recognize that I am a musician, and not a listener? I’ve uploaded 15 tracks. It seems like a pretty simple conditional check to me. So why is my home feed cluttered up with reposts? Why can’t I easily find the new tracks by my friends?

This is the core underlying problem with SoundCloud. It has two distinct types of users, and yet it treats all users the same.

2. Your “Who to Follow” recommendations suck. They REALLY suck.

I’ve basically stopped checking “Who to Follow” even though I want to connect with as many musicians as possible. The recommendations seem arbitrary and just plain stupid.

The main problem is that, as a musician, I want to follow other musicians. I want to follow people who will interact with me, and who will promote my work as much as I promote theirs. Yet, the “Who to Follow” list is full of seemingly random people.

Is this person from the same city as me? No. Do they follow lots of people / will they follow back? No. Are they working in a genre similar to mine? No. Do they like and comment on lots of tracks? No.

So why the heck would I want to follow them?

3. Where are my friends latest tracks?

This last one is just infuriating. When I log in, I want to see the latest tracks posted by my friends. So I go to my homescreen, and it is pure luck if I can find something posted by someone I actually talk to on SoundCloud. It’s all reposts. Even if I unfollow all the huge repost accounts, I am stuck looking at reposts by my friends, rather than their new tracks.

Okay, so let’s click the dropdown and go to the list of users I am “following”. Are they sorted by recent activity? No. They are sorted by the order in which I followed them. To find out if they have new tracks, I must click on them individually and check their profiles. Because that is really practical.

Okay, so maybe there’s a playlist of my friends tracks on the Discover page? Nope. It’s all a random collection of garbage.

As far as I can tell, there is no way for me to listen to my friends’ recent tracks. This discourages real interactions.

Ultimately, the problem is data, and intelligence. SoundCloud has none.

You could blame design for these problems. The website shows a lack of direction, as if committees are leading the product in lots of different directions. SoundCloud seems to want to focus on listeners, to compete in the same space as Spotify.

But even if that’s the case, it should be trivial to see that I don’t use the website like a regular listener. I use it like a musician. I want to connect and interact with other musicians.

And this is such a trivial data/analytics problem that I can only think that they aren’t led by data at all. Maybe this is just what I see because I lead our data team, but it seems apparent to me that data is either not used, or used poorly in all these features.

For instance, shouldn’t the “Who to Follow” list be based on who I have followed in the past? I’ve followed lots of people who make jazz/electro music, yet no jazz/electro artists are in my “Who to Follow” list. I follow people who like and comment on my tracks, yet I am told to follow people who follow 12 people and have never posted a comment.

The most disappointing thing is that none of this is hard.

4. Oh yeah, and your browser detection sucks.

When I am browsing your site on my tablet, I do not want to use the app. I do not want your very limited mobile site. I just want the regular site (and yes, I know I can get it with a few extra clicks, but it should be the default).

Tips for Managing Joins in Looker

Looker is a fantastic product. It really makes data and visualizations much more manageable. The main goal of Looker is to allow people who aren’t data analysts to do some basic data analysis. To some extent, it achieves this, but there are limits to how far this can go. Ultimately, Looker is a big graphical user interface for writing SQL and generating charts. Under-the-hood, it’s programmable by data engineers, but it’s limited by the fact that non-technical users are using it.

The major design challenge for Looker is joins. A data engineer writes the joins into what Looker calls “explores”. Explores are rules for how data can be explored, but ultimately just a container for joins. When someone creates a new chart, they start by selecting an explore, and thus selecting the joins that will be used in the chart.

They pick the join from a dropdown under the word “Explore”. This is the main design bottleneck. Such a UI encourages users to have only a limited number of joins that can fit in the vertical resolution of the screen. This means limiting the number of explores, and hence limiting the ways tables are joined. This encourages using pre-existing joins for new charts.

This creates two problems.

  1. A non-technical user will not understand the implication of choosing an explore. They may not see that the explore they chose limits how the data can be analyzed. In fact, a non-savvy user may pick the wrong explore entirely, and create a chart that is entirely wrong.
  2. The joins may evolve over time. A programmer might change a join for a new chart, and this may make old charts incorrect.

The problem is that SQL joins are fundamentally interpretations of the data. Unless a join occurs on id fields AND is a one-to-one relationship, then a join interprets the data in some way.

So how can you limit the negative impact of re-using joins?

1. Encourage simple charts

Encourage your teammates to make charts as simple as possible. If possible, a chart should show a single quantity as it changes over a single dimension. This should eliminate or minimize the use of joins in the chart, thus making it far more future-proof.

2. Give explores long, verbose names

Make explore names as descriptive as possible. Try to communicate the choice that a user is making when they choose an explore. For instance, you might name one explore “Products Today” and another one “Product Events Over Time”. These names might indicate that the first explore looks at the products table, but the second explore shows events relating to products joined with a time dimension.

One of the mistakes I made while first starting out with Looker is naming the explores with single word names. I now see that short names create maintenance nightmares. Before assessing the problems with a given chart, I need to know which explore the maker chose for it, and because the names were selected so poorly, the choice was often incorrect.

I hope these ideas help you find a path to a maintainable data project. To be honest, I have a lot of digging-out to do!

When Code Duplication is not Code Duplication

Duplicating code is a bad thing. Any engineer worth his salt knows that the more you repeat yourself, the more difficult it will be to maintain your code. We’ve enshrined this in a well-known principle called the DRY principle, where DRY is an acronym standing for Don’t Repeat Yourself. So code duplication should be avoided at all costs. Right?

At work I recently came across an interesting case of code duplication that merits more thought, and shows how there is some subtlety needed in application of every coding guideline, even the bedrock ones.

Consider the following CSS, which is a simplified version of a common scenario.

.title {
  color: #111111;
}
.text-color-gray-1 {
  color: #111111;
}

This looks like code duplication, right? If both classes are applying the same color, then they do the same thing. If the do the same thing, then they should BE the same thing, right?

But CSS and markup in general presents an interesting case. Are these rules really doing the same thing? Are they both responsible for making the text gray? No.

The function of these two rules is different, even though the effect is the same. The first rule styles titles on the website, while the second rule styles any arbitrary div. The first rule is a generalized style, while the second rule is a special case override. The two rules do fundamentally different things.

Imagine a case where we optimized those two classes by removing the title class and just using the latter class. Then the designer changes the title color to dark blue. To change the title color, the developer now has to replace each occurrence of .text-color-gray-1 where it styles a title. So, by optimizing two things with different purposes, the developer has actually made more work.

It’s important to recognize in this case that code duplication is not always code duplication. Just because these two CSS classes are applying the same color doesn’t mean that they are doing the same thing. In this case, the CSS classes are more like variables than methods. They hold the same value, but that is just a coincidence.

What looks like code duplication is not actually code duplication.

But… what is the correct thing?

There is no right answer here. It’s a complex problem. You could solve it in lots of different ways, and there are probably three or four different approaches that are equally valid, in the sense that they result in the same amount of maintenance.

The important thing is not to insist that there is one right way to solve this problem, but to recognize that blithely applying the DRY principle here may not be the path to less maintenance.

How Can Comments be Controversial?

Comments add information to your code. They don’t impact the execution of that code. So how can they be bad? I believe that more commenting is better, and that comments are a vital tool in maintaining an agile codebase, but I’ve met many experienced developers who argue with me about comments.

The Pragmatic Programmer is an excellent book. It outlines best practices that have solidified over the last thirty years or so as software development has grown as a field. The authors share these practices as a series of tips that are justified by real world anecdotes. In a section on comments the authors say

“The DRY principle tells us to keep the low-level knowledge in the code, where it belongs, and reserve the comments for other, high-level explanations. Otherwise, we’re duplicating knowledge, and every change means changing both the code and the comments. The comments will inevitably become out of date, and untrustworthy comments are worse than no comments at all”

They make the point that obsolete comments are worse than no comments at all. This is undoubtably true, but it has been bastardized by many programmers as an excuse for not commenting. I’ve heard this excuse, along with two others to justify a lack of comments in code.

  1. The code is self-commenting.
  2. Out of date comments are worse than no comments.
  3. You shouldn’t rely on comments.

In this post I want to address each of these points. I think that thorough commenting speeds up the rate of consumption of code, and that obsolete comments are a failure in maintenance, rather than in comments in general.

1. The code is self-commenting

This is the excuse I hear most often from developers who don’t comment their code. The thinking is that clear variable and method names replace the need for comments. There are situations where this is true. For very short methods with a single purpose, I see that no additional comments may be necessary.

But most methods are not very short, and most methods achieve several things that relate to one overall purpose. So in the first place, very few methods fit the bill.

This is not a valid complaint because good comments only make the code easier to read. While the code may be legible on its own, a comment that explains the purpose of a method, or the reason for
a string operation, can only make it easier to read. The author of Clean Code agrees due to the amount of time we spend reading code.

“the ratio of time spent reading vs. writing is well over 10:1. We are constantly reading old code as part of the effort to write new code. Because this ratio is so high, we want the reading of code to be easy, even if it makes the writing harder.”

Because we spend more time reading code than writing it, a responsible programmer knows that the extra comment in a method which takes very little time to write may well save another programmer many hours of learning.

2. Out of date comments are worse than no comments

To refute this argument, lets take cars as an analogy. If you buy a car, then drive it for twenty thousand miles without an oil change, it will break down. If you did that, the fault wouldn’t be in cars in general. Rather, the fault is in the lack of maintenance.

Similary, obsolete comments are not a failure of comments in general, but a failure of the programmer who updated the code without updating the comments. That other programmers have bad practices does not justify continuing that bad practice.

Obsolete comments are a broken window. If you see them, it is your job to fix them.

So while obsolete comments ARE worse than no comments, that argument is an argument against bad coding practices, not comments in general.

3. You shouldn’t rely on comments

The argument that a programmer shouldn’t rely on comments is a subtle dig at another programmer’s skill. They’re saying “a good programmer can understand this code with no comments.”

There is a grain of a valid point there. If you consider yourself a senior programmer, then you should be accustomed to wading through large volumes of bad code and parsing out the real meaning and functionality.

But even for senior programmers, comments make it easier to read and understand the code. It’s much easier to interpret a one line summary of a block of code than to parse that block line by line. It’s much easier to store that one line in your head than it is to store the whole block in your head.

The real reason why this is the worst of excuses is that not all programmers working on your code are senior programmers. Some are green graduates straight out of college. Yes, a more experienced programmer could understand the code without comments, but it will cost that graduate hours to figure out what they could have gotten in minutes with good comments.

My Commenting Principle

Comment your code so that a green programmer can understand it.

Or…

Comment your code so that your boss can understand it.

What does it take to make your code readable by a green graduate? Imagine them going through it line by line. If there are no comments, then they need to understand every function call, and every library routine. They need to understand a bit about your architecture and date model.

Now imagine that every single line is commented.

In that scenario, a green programmer can read your file by reading one comment after the next. If they need more information, they can still drill into the code and get the details, but they can understand the overview just through comments.

The closer you can get to that, the better.

Don’t listen to the arguments against thorough commenting habits. Just point the detractors here.

Why are Side Effects Bad?

Side effects are any observable change to the state of an object. Some people might qualify this by saying that side effects are implicit or unintended changes to state.

Mutator methods, or setters, which are designed to change the state of an object, should have side effects. Accessor methods, or getters, should not have side effects. Other methods should generally try to avoid changing state beyond what is necessary for the intended task.

Why are these guidelines best practices though? What makes side effects so bad? Students have asked me this question many times, and I’ve worked with many experienced programmers who don’t seem to understand why minimizing side effects is a good goal.

For example, I recently commented out a block of my peer’s code because it had detrimental side effects. The code was intended to add color highlighting to log entries to make his debugging easier. The problem with the code was that the syntax highlighting xml was leaking into our save files. He was applying the highlight to a key, then using that key both in the log and in the save file. Worse still, he wrote this code so that it only occurred on some platforms. When I was debugging a cross platform feature, I got very unpredictable behavior and ultimately traced it back to this block.

This is an example where code was intended for one purpose, but it also did something else. That something else is the side effect, and you can see how it caused problems for other developers. Since his change was undocumented and uncommented, I spent hours tracking it down. As a team, we lost significant productivity due to a side effect.

Side effects are bad because they make a code base less agile. Side effects cause bugs that are difficult to find, and lead to code that is more difficult to maintain.

Before I continue, let me be clear that all code style guidelines should be broken sometimes. For each situation, a guideline or design pattern may be better or worse, and I recognize that we are always working in shades of gray.

Generally, a block of code should be written for one purpose. If it is a method, then it should do one thing. If another thing needs to be done with an object, then that should be encapsulated in another method.

Here’s a hypothetical example that I’ve seen played out hundreds of times in my career.

A class needs to do task X. A programmer may write a method to do task X, but he accidentally includes logic that also does task Y. Later, he may see that he needs to do task Y all alone. So he writes a method to do task Y. That’s where the problem is compounded.

Later still, the definition of task Y changes. So another programmer has to rewrite task Y. He goes to the class, changes the method for task Y alone, does a few quick tests, and proceeds on his merry way.

Then mysterious bugs start occurring. QA can’t really track them down to one thing because they only occur sporadically after task Y. Finally, it takes many man-hours to remove task Y from the method for task X.

In this example, the side effect led to code duplication, which led to trouble when updating the code, which led to bugs that cost many hours to track down. The fewer side effects you introduce, the easier your code will be to maintain.

These two examples show how side effects can derail development and why they are so inimical. We all write code with side effect occasionally, but it’s our job to figure out how to do it in a way that doesn’t make the code more difficult to maintain.