My thoughts on software licenses

Whenever you release a piece of code, a library, or any other software package, you should include a license so that other people may legally use your software. You may not think it’s a big deal, but if you don’t include a license, a lot of people won’t be able to (or feel comfortable) using your code. This is an even larger issue when somebody needs to use it for a work project, since companies have to worry more about litigation than individuals do.

In the past, I’ve received requests to include a license with various packages. For example, GitHub.com asked me to specify a license for my sublime-nginx plugin so that they could include it with one of their packages.

I always license my code under the MIT license as a rule of thumb, although I’m happy to use a similarly permissive license (for example, I was asked to relicense my Varnish Dashboard under the FreeBSD license, which I did). The reason why I do this is that I write and release my open source projects to help other programmers for the most part. I want them to be able to use my code however they see fit, including it with their own project, redistributing it, including it in a commercial project, etc. The MIT license, BSD license, and Apache license are examples of very permissive licenses, while the GPL is a well known example of a non-permissive license.

Non Permissive Software Licenses

Here’s a breakdown of the rights you give people by releasing your software under the GPL (GNU General Public License):

  • People are free to copy the software wherever they want and as many times as they want
  • People are free to distribute the software however they want
  • People are free to charge a fee to distribute the software (but the redistributed software must still include the original license)
  • People are allowed to make whatever modifications to the software that they want

My problem with the GPL is with the last bullet point. People are allowed to make whatever modifications they want, but any projects that include GPL’d code must also be released under the GPL. This means that developers cannot use GPL-licensed code for proprietary software. The GNU Lesser General Public License (LGPL) is a variation of the GPL that allows this, but you cannot relicense GPL code as LGPL to magically work around this.

There are a lot of people who believe all software should be 100% open source, and try to avoid any software that isn’t. Richard Stallman, founder of GNU, is perhaps the most well known individual who has this mentality. He is so adamantly against closed source software, he recently decided against making changes to the GCC to allow better code intelligence in Vim because those changes could allow closed source software to access the intermediate code generated by the GCC without being compiled into the GCC (and therefore not required to be made available under the GPL). He handicapped the GCC and Vim to avoid benefiting non open source software. This is a choice programmers make, and I’ll reserve judgement on it. Luckily if people are unhappy with the licensing on a piece of software, they can create their own and license it however they wish (such as LLVM).

Permissive Software Licenses

In contrast to the GPL, the MIT license has very few stipulations about how you may use the software:

  • People can use, copy, and modify the software however they want. Nobody can prevent them from using it on any project, from copying it as many times as desired, or from changing it however they want to
  • People can give the software away for free or sell it. There are no restrictions on distribution.
  • People must redistribute the license with the software

Basically you’re saying people are free to do whatever they want with your software, as long as the distribute it with the license.

Source Code != Open Source

An important thing to note is that a company or individual may publish source code to a program or library without making it open source. The main distinction is that people are free to use open source software however they want, which isn’t the case for all proprietary/commercial software. For example, Epic Games has published the full source code for the Unreal Engine. You are free to download the Unreal Engine source code, and modify it, but you cannot freely re-distribute it. You must abide by the license Epic Games has created, part of which includes a royalty agreement.

Closing Thoughts

Software licensing, permissive vs non-permissive, closed source vs open source, etc, are all debates that many people are passionate about, and normally lean very hard towards one side or the other. Personally, I prefer permissively licensed open source software. I want my OS to be open source, and I like open source programs for the most part (its nice to see how things work, and to be able to contribute if I want to), however, I’m not against closed source software. There are people who were against Steam moving to Linux because the Steam client and many Steam games are closed source, but I think that’s ridiculous. I don’t think it’s fair to expect everybody to release their work for free.

Releasing the source code for your software doesn’t necessarily mean your work is free, but it’s very likely that you’ll see reduced profits from it.

I do commend commercial developers who release their source code, under any license. For example, Keen Software House recently released the source code for Space Engineers. You still have to buy the game, and you can’t re-use the code, but you can contribute to the code (which is awesome!), it makes modding easier, and it’s helpful to see how such a complex game is created.

Understanding S3 permissions

Amazon S3 is a great (and cheap) content storage and delivery service, used by millions of websites and applications around the world. However, their permissions system can be opaque to new users, and difficult to understand, due to the variety of ways you can set permissions, and inconsistent terminology in different UIs. This post aims to clear up the confusion.

First of all, it’s important to understand some AWS/S3 terminology:

  • Buckets are like a top level folder in S3. You typically create a bucket for each individual requirement you may have (e.g. an uploads bucket, a backup bucket, maybe a cdn/asset bucket)
  • Objects are files inside of a bucket. Each object can have it’s own permissions
  • IAM stands for identity and access management, and is the system used to create sub-accounts with limited permissions for your main AWS account (which you should almost never be using, kind of like the root user).

Also worth noting, while it may appear that S3 has folders, it really doesn’t, it just has file names that look like folders (e.g. path/to/some/file.jpg isn’t an object named file.jpg in a folder, its an object named path/to/some/file.jpg that’s in a specific bucket).

Getting Started

When dealing with S3, you have two distinct permission systems. Access Control Policies (ACPs) are a simplified permission system primarily used by the web UI, that basically just wraps the other permission system in a layer of abstraction. The other system are IAM access policies (broken down into user and bucket policies depending on what you apply them to), and are JSON objects that define very fine grained permissions.

The other thing to keep in mind is that permissions can apply either to a bucket or an object. Bucket permissions are different than object permissions, and are tracked differently. If you want someone to be able to view the list of files in a bucket, and actually view/download those files, you must grant them permission on the bucket itself, as well as each object.

ACPs/ACLs

ACPs (access control policies) or ACLs (access control lists) are a very simplistic permission system offered by S3. When using the web UI, the “Permissions” tab of an Object’s properties represents the ACP. Objects and Buckets can each have an ACL, and offer similar permissions.

Bucket ACLs affect bucket operations, but not operations on the contents of the bucket. For example, you may have the read permission on a bucket, so you can get a list of objects in the bucket, but if you don’t have the read permission on an individual object, you won’t be able to view it.

ACLs have 4 available permissions:

  • read (The web UI displays this as “List” for buckets and “Read/Download” for objects)
    • When applied to a bucket: authorized users can list the file names, their sizes, and last modified dates from a bucket
    • When applied to an object: authorized users can download the file
  • write (The web UI displays this as “Upload/Delete” for buckets, and does not display the option for objects)
    • When applied to a bucket: authorized users can upload new files to the bucket. They can also delete objects (even if they don’t have permissions on THAT object).
    • When applied to an object: authorized users can replace the file or delete it
  • read-acp (the web UI displays this as “View Permissions”)
    • When applied to a bucket: authorized users can view the ACL of the bucket
    • When applied to an object: authorized users can view the ACL of the object
  • write-acp (the web UI displays this as “Edit Permissions”)
    • When applied to a bucket: authorized users can modify the ACL of the bucket
    • When applied to an object: authorized users can modify the ACL of the object

Aside from the permissions, you must also specify the “Grantee”. You can have up to 20 policies per object, each having some combination of the above 4 permissions, applied to a specific Grantee. The grantee can be the email address of an AWS account to grant permissions on (it will include all IAM accounts), or one of the special predefined groups that AWS provides.

You can not apply an ACL to an individual IAM account.

The special groups available as grantees are:

  • aws – Represents your own AWS account (including all IAM users)
  • AllUsers (displayed as “Everyone” in the web UI) – Allow anyone to access the resource. Allows authenticated or anonymous requests. Essential the “public” group
  • AuthenticatedUsers (displayed as “Authenticated Users” in the web UI) – Allow any AWS account to access the resource. Requests must be signed.
  • LogDelivery (displayed as “Log Delivery” in the web UI) – Allows AWS to write server access logs to the bucket

ACL Permissions and their corresponding S3 operations

ACL permissions are just a combination of more fine grained AWS policy permissions. If you want to know exactly which operations you are permitting, refer to the following:

  • read
    • Buckets: s3:ListBucket, s3:ListBucketVersions, and s3:ListBucketMultipartUploads
    • Objects: s3:GetObject, s3:GetObjectVersion, and s3:GetObjectTorrent
  • write:
    • Buckets: s3:PutObject and s3:DeleteObject
    • Objects: Not applicable
  • read-acp:
    • Buckets: s3:GetBucketAcl
    • Objects: s3:GetObjectAcl and s3:GetObjectVersionAcl
  • write-acp:
    • Buckets: s3:PutBucketAcl
    • Objects: s3:PutObjectAcl and s3:PutObjectVersionAcl

Bucket Policies

Bucket policies are AWS Access Policies that apply to a specific S3 bucket, and are a great way to apply more fine grained access controls to an entire bucket, or to apply the same permissions to a large number of objects without the need to manually change them all to adjust the policy. A bucket policy can apply to the bucket itself, and all objects in the bucket, although you can easily specific specific resources or patterns (e.g. you could specify a resource of “movies/*” to apply the permissions to all objects in the movies folder).

You can add a policy to your S3 bucket using the web ui. The action is under the Permissions tab of the bucket properties:

For information on creating access policies, keep reading.

IAM User Policies

User policies are AWS Access Policies that apply to a specific S3 IAM user account. They are a great way to apply very limited permissions to an IAM role.

Here are a few common use cases:

  • A role used for database backups should only be able to create objects, and only in a specific bucket, not view/delete them (s3:PutObject)
  • A role used for a dashboard or some other sort of display should have read only permissions on the bucket/objects

Since ACL permissions can’t be a applied to a specific IAM account, IAM user policies are the answer. It can be difficult to decide if you should use an IAM or bucket policy in some cases. If you want to give a specific user permissions across various buckets, an IAM policy is probably best. Also, if you have a large number of users each needing different sets of permissions, IAM policies may be more suitable that a bucket policy, as bucket policies are limited to 20kb.

You can use inline user policies, group policies, or managed user policies.

Unlike bucket policies, you do not specify the principal for a user policy, as it always applies to whichever user is performing the operation.

Access Policies

An access policy is made up of one or more statements. Each statement specify principals which define who the statement apply to, resources define what the statement applies to, and then various allowed or disallowed operations.

Here is an example bucket policy (you’ll note that it’s a JSON object):

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "ExampleStatement1",
         "Effect": "Allow",
         "Principal": {
            "AWS": "arn:aws:iam::Account-ID:user/brandon"
         },
         "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket",
            "s3:GetObject"
         ],
         "Resource": [
            "arn:aws:s3:::examplebucket"
         ]
      }
   ]
}

As a note, the version must be one of the defined policy language versions. At the time of this post, 2012-10-17 is the latest version.

The example statement given uses the following elements:

  • Sid (statement id) – An optional identifier for a policy statement
  • Effect – A required element that specifies whether the statement will result in an allow or a deny. Can be one of “Allow” or “Deny”.
  • Principal – A required element that specifies the user (IAM user, federated user, or assumed-role user), AWS account, AWS service, or other entity that the statement applies to. For user policies, the principal is omitted as the policy always applies to the current user performing an operation
  • Action – Describes which specific actions will be allowed or denied (based on the specified Effect). Each AWS service has it’s own set of actions that describe tasks you can perform with that service. You can use the wildcard character (*)  to grant broad permissions (e.g. s3:* will allow a user to perform any valid s3 operation on a resource). Please note that there is also a “NotAction” element that lets you specify an exception to the list of actions.
  • Resource – Describes which objects the statement applies to, using ARN (Amazon Resource Name) format. There is also a “NotResource” element that lets you specify exceptions to the list of resources specified,

For a full list of available elements, their values, and what they do, refer to the AWS documentation on Access Policy Language Elements.

Here is an example of an IAM user policy that allows the user to upload files to a specific folder in a specific S3 bucket, but explicitly denies all other operations (regardless of other policies that may grant permissions on it). Perfect for a backup user:

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "BackupUserDenyEverything",
         "Effect": "Deny",
         "Action": "*",
         "NotAction": "s3:PutObject",
         "Resource": "*"
      },
      {
         "Sid": "BackupUserAllowUploads",
         "Effect": "Allow",
         "Action": [
            "s3:PutObject"
         ],
         "Resource": [
            "arn:aws:s3:::mybackups/mysql_backups/*"
         ]
      }
   ]
}

You’ll note that we use a combination of two statements to get the desired effect. The first statement prevents the user from performing any AWS operation except for s3:PutObject. However, it doesn’t actually grant the user permission to use the s3:PutObject operation so we must create a second statement to do that.

Evaluation Permissions

The last piece of the puzzle is how to determine if a request will be allowed or denied, which can be confusing if you don’t know the rules. However, once you know the rules, it’s very straight forward.

AccessPolicyLanguage_Evaluation_Flow.diagramAs you can see in the above diagram, it’s actually quite simple. If any of your policies explicitly deny an operation (e.g. our example user policy), the end result is a Deny. If any of your policies allows an operation, and there are no explicit denies, then the operation is allowed. If none of your policies explicitly allows or denies an operation, the operation is denied.

Conclusion

Hopefully after reading this post you have a basic understanding of S3 permissions and how to use them securely. I strongly recommend reading the AWS documentation for more information.

 

Understanding Git Rebase

I use git’s rebase command daily, it’s an invaluable tool for maintaining a clean and sane Git history. However, most people find it difficult to understand, or use it incorrectly, as it’s not the clearest command to use.

The first thing to understand, is that rebasing typically refers to two different (but similar) operations:

  • “Rebasing a branch” is the most common use of rebase, and refers to pulling changes from an upstream branch (like master or develop) to a feature branch (rebasing your branch is cleaner than the upstream branch into your branch)
  • “Interactive rebasing” can refer to cleaning up your commit history, squashing commits, and editing commit messages. You typically do an interactive rebase before submitting a pull request/patch/branch for review

Rebasing your branch

Say you have a main branch called master. Of course, you don’t develop directly against master, as that would be bad. Instead, you create feature branches (e.g. feature/foo-widget), develop on those, and when they’re done, you merge them back into master (or submit a pull request).

Now let’s say you created your feature branch a week ago, and now there are changes on master that you need on your branch to continue. You could simply merge it:

git merge master

Which is what I see a lot of people do. However, this dirties up your git history and is not the correct solution. Instead, you should rebase:

git checkout master
git pull
git checkout feature/foo-widget
git rebase master
git push -f origin feature/foo-widget

The very basic explanation is that you rewind all of your commits on feature/foo-widget, fast foward feature/foo-widget to the latest commit on master, then you reapply each of your commits on top of the latest commit from master.

It’s important to note that you should never rebase a shared branch (like master or develop) as it rewrites history and requires a force push, which can disrupt other developers.

Here is a more detailed walkthrough:

  1. Branch develop has commits 1-20
  2. You fork off of develop to create feature/foobar, from commit #20
  3. You add commits 21-24 to feature/foobar
  4. Some other dev add commits 21-22 to develop
  5. You want to get the most recent changes from develop
  6. You run ‘git rebase develop’ on your branch and this happens:
    1. Git looks at the last shared commit between feature/foobar and develop which is commit #20
    2. Git rewinds feature/foobar to commit #20, storing your commits off the branch
    3. Git fast forwards feature/foobar to commit #22 FROM the develop branch
    4. feature/foobar is now the same as develop
    5. Git re-applies each commit you had before (21-24) on top of #22, becoming 23-26
  7. If you run git status, you’ll like see something like this:

    Your branch and ‘origin/feature/foobar’ have diverged, and have 6 and 4 different commits each, respectively.

    This is because commits are identified by their SHA1 hash, and your original 4 commits were rolled back and re-applied, which gives them a different SHA1 hash. Git now thinks you’re missing the original 4 commits, and sees you have 6 new commits, the 2 new ones from develop and the 4 new commit hashes from re-applying your original 4 commits.

  8. Now however, if you try and push it’ll fail because the upstream version of feature/foobar has commits #21-24 that you made, which aren’t the same as your local branch’s commits #21-24, so you have to force push like this: git push -f origin feature/foobar. It is critical to always specify the branch. -f will overwrite the remote branch allowing you to push your corrected branch

Git Pull With Rebase

Another useful trick I use is git pull --rebase instead if regular git pull.

Let’s say you have some local changes on feature/foobar that you haven’t pushed yet, and your co-worker just pushed his local changes to feature/foobar. If you do a regular git pull, git will do a merge, and you’ll end up with ugly git history and a commit message like this:

Merged ‘feature/foobar’ into ‘feature/foobar’

You want to avoid this clutter, so if you run git pull --rebase, it’ll do the following:

  • Rewind your local branch to the last commit that is shared with the remote
  • Pull down the latest changes from the remote branch and apply them using a fast-foward
  • Re-apply your local changes

Then you can do a regular git push. Since you are only modifying history of commits you haven’t yet pushed, you do not need a force push.

Interactive Rebasing

Interactive rebasing is the use of git rebase with the -i flag. This tool lets you edit commit history which can be very useful when you have a feature branch that you want to clean up before submitting it. It’s very common to squash commits (combine multiple commits), edit commit messages, amend commits (roll back to a specific commit, make a change, then re-apply later commits), all of which you can do with interactive rebasing.

To start an interactive rebase session, you must specify the range of commits you wish to deal with. The most common way of doing this is using HEAD~n where n is some number of commits:

git rebase -i HEAD~5

The above command will start an interactive rebase session with the last 5 commits. After running that command, you’ll likely see vim (or some other editor) pop up with something like the following:

pick 1f7036f More useful terminal title
pick d61a3b3 Useful git aliases
pick 8439627 Add git pushu
pick cc8ba32 Tweak default email
pick daadd1b Add install script

# Rebase df47733..daadd1b onto df47733
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

At the top you see the 5 commits you are dealing with, from oldest to newest. In the comments, you see various actions. For example, if you modify the text to put this:

pick 1f7036f More useful terminal title
pick d61a3b3 Useful git aliases
s 8439627 Add git pushu
pick cc8ba32 Tweak default email
pick daadd1b Add install script

Git will combine d61a3b3 and 8439627 into a single commit (this is called squashing a commit). Like rebasing your branch, an interactive rebase alters git history. Each commit you modify will get a new commit hash, so as far as git is concerned, it is a completely different commit. This means that if you have previously pushed the commits you altered, you’ll have to do a force push on your branch. This also means you should never interactively rebase a shared branch like master or develop.

Enhanced Varnish Dashboard

I run Varnish on a number of servers, and I don’t always have a full metrics setup (e.g. Graphite/Statsd/Collectd) setup. Also, sometimes I just want a real time dashboard to watch traffic (or my clients do).

I’ve been using Varnish Agent 2 + the ITLinuxCL dashboard, which is lacking to say the least. The dashboard isn’t maintained, is very minimal, and somewhat broken. So I set out to build my own dashboard.

I wanted to make good use of the Varnish Agent API, so I added capabilities to purge URLs, view/upload VCLs, view params, logs, etc. I also added support for multiple Varnish backends, so you could host the dashboard somewhere else and just point it at multiple Varnish instances. This is currently held up until I get a patch merged into vagent to add CORS headers (currently pending review).

There are screenshots on GitHub, so check it out: https://github.com/brandonwamboldt/varnish-dashboard

Before asking for help, make sure you understand your tools

I’m writing this blog post in response to Why Rockstar Developers don’t Ask for Help and So you want to be a Developer Rockstar?. Despite the use of the ridiculous term “Rockstar Developer” and quite a lot of humble bragging, I think the author is on to something, but just barely missed the mark.

I wouldn’t say that good developers shouldn’t ever ask for help, that’s crazyness. I’d change the wording a bit:

Try not to ask for help if you don’t understand the tools/software that you’re having problems with

What do I mean? If you’re trying to figure out how to rebase your branch in Git, or why PHPUnit isn’t running, or how to fix a particular C error, you’re probably uneducated about something relevant to the problem (e.g. you don’t understand rebasing or Git’s internal data representation). If you learn more about Git, you’ll have the understanding to fix the problem, and probably many more related problems (or variations of the same problem). On the other hand, if you just ask your co-worker for the command to rebase a branch, you don’t understand what is happening or why it works, so the problem remains. Put in the time to learn your tools, libraries, frameworks, etc.

If people took the time to learn instead of trying to barely skate by with as little effort as possible, there would be far fewer interruptions in the work place, far more productive and well educated developers, and significantly less noise on StackOverflow.

Of course this isn’t always the case. Sometimes you reach the point where you’ve spent too much time on the problem and need help getting to the next step. Or maybe you’re so lost you don’t even know where to start (in which case you should ask for pointers, not a complete answer). And of course, you may not have the time to learn something well enough to continue due to deadlines, but if this is the case, you should definitely follow up afterwards when you aren’t racing the clock.

When should you ask for help? A common type of problem is when something should be working. You understand the system and what’s going on, but it’s not working the way you’d expect. You aren’t blindly running commands with no understanding of how or why they work. Maybe you’ve overlooked something, or it’s just some small thing your co-worker will point out right away. I think these types of problems are fine to get help with.

Likewise, domain specific knowledge where you need an understanding of the business are typically fine to ask a co-worker about. How do you interface with this heavy duty crane transmission, what does this scientific shorthand mean, or what are the rules around who can access what content?

I also want to point out that this doesn’t apply to getting feedback, such as asking a co-worker for feedback on your design, approach, or code.