Writing a Keygen for Stata

I encountered 2 courses that require Stata in my major this semester, namely Regression Analysis and Selected Topics on Statistical Methods. Stata is a proprietary statistical software, and it is not free.

Stata's student pricing

You can see that they're really asking for a lot, and it's not even perpetual. Before you mention that students can apply for a free license, I did apply for it, but I got this in reply:

...well, the fact that they can't legitimately provide me a license means that I can legitimately pirate it, right?

Stata about

Looking for samples

I searched for cracked versions of Stata on the Internet; most of the results were on Windows only, and they required patches to be applied to the executable. Official downloads require a registered account and have a limit on download counts. I eventually discovered a stock version of Stata 17 for macOS, as well as several keys. The first key was invalidated after an online update, but the second one survived. I also discovered that the key labeled for Stata 18 works with Stata 17.

After having a working Stata 17, I figured out how to upgrade my Stata to 18, by downloading offline update bundles from https://www.stata.com/support/updates/, and tweaking some files in the program directory so that the upgrade process recognizes it as compatible. However, some documents will be missing, so if you are a perfectionist, you would want to find a full installer for Stata 18 later (which can be grabbed from http://public.econ.duke.edu/stata/, FYI).

Reverse engineering

The tool I'm using is Binary Ninja. I'm surprised they didn't strip symbols from . This does not appear in the Windows version, nor the GUI and CLI executables on macOS, so I guess that they simply do not know how to accomplish it safely on Mach-O dynamic libraries. Even without symbols, we can still easily coordinate code working with license keys, by searching for strings that go like "perpetual". The only reference to the string is a function named , which I think is short for "get license string", in which the buffer of the serial number is printed. Then I looked for references to the buffer, and found that both the write and check of the serial number happen in a function named . Seems like a good place to start.

first read file in the installation directory, and split its content by "!" into 6 parts: serial number, "code", "authorization", user name, user organization, and a checksum. The checksum is only for validating the integrity of the file itself, in case anyone wants to edit this plain text file so that the user and organization can be changed. The only part that matters is the first 3 parts.

Decoding

Then, the "code" and "authorization" are passed to , in which the following procedure happens:

is highly SIMD optimized even on ARM, so it is hard to read. I had to learn some NEON to understand it.

Interpreting

If the checksum doesn't go wrong, the former part will be a "decoded" string, which consists of 7 or 8 parts, depending on whether it is for MP edition or not, separated with "$". It should look like this:

9963000047$180$24$5$9999$a$$8

The meaning of each part is:

Additionally, some serial numbers are hard-coded to be invalid:

The are still uninvestigated code paths that might check the serial number itself, but seems like they are only reachable when a variable indicating debug mode is set, which can be ignored.

Online updates

Now we know how to make our own "code" and "authorization", and we can happily perform offline updates since we can swap serial numbers at any time. What about online updates? If we use serial numbers from the Internet, we will get a "bad serial number" error. By using MitM package capture tools, we can inspect that during the update process, a request was sent to , where the is our serial number, to check if our license is valid for updates. However, this API works in a way you might not expect: it works as a blacklist! Both legit and non-existent serial numbers will get an empty body response, while invalid serial numbers will get a response with an error in the body. So, we can just randomly generate a serial number, and check if it is valid by sending a request to this API. If it is valid, we can use it for online updates. If you are using network tools that support request rewrite like Surge, we can even write a rule for it to intercept with an empty body response.

Surge rewrite rule

The rest of the update process doesn't even check the serial number, so we can call it a day. Below is a video as a proof of concept.

Easter egg

If you are in a hurry...

Written on 2024/01/11