It is very useful to convert logs to JSON, because JSON is immediately consume-able by almost all general-purpose data visualization tools — everything from jQuery UI to MatPlotLib, “speaks” JSON. So with Git logs converted to JSON, it becomes possible to perform all sorts of ad hoc historical analysis of source code repositories.
However, there is to my knowledge no simple, stand-alone tool to do the conversion! In practice, converting Git logs to JSON requires either relying on some large, third-party library that has already implemented git-log-to-JSON functionality, or writing regular expressions that turn out to be a bit of a pain in the ass.
Since there’s no commonly-available simple tool, I wonder how many people wind up putting off their Git log analysis because of the time overhead involved in JSON output conversion. If this were the case, it would really be too bad because code churn metrics are the best predictors of bugs; as discussed in this video from GTAC, which details a recent study of the Eclipse editor code base:
Here’s a gist I wrote, which hopefully takes the mystery out of converting to to JSON. It has since been referenced from a popular question on StackOverflow. I’m glad someone else found this useful!
I set up Plato for continuous static analysis recently and it was pretty simple. I’ve provided the code here: tl;dr! just take me to the code!
The other day I published the table of contents for a forthcoming book on technical QA. Since then I have several times heard the criticism that many QA analystss could not get started with this process as written.
I agree that setting up a simple Web site with PHP is beyond the technical skills of many people who would otherwise like to learn how to perform technical QA.
But I’d say to everyone who objects —are you seriously telling me that you think someone can “learn to be a technical tester” without also getting to know how the Web stack works?
learning to automate browsers well is hard enough without having to learn to program at the same time. learn to program first.— adam goucher (@adamgoucher) May 16, 2013
The mailing list for open source test tools are full of frustrated people who are legitimately experts in black-box testing, deduction, bug finding and problem-solving. But they are trying to extend and apply their limited or nonexistent software engineering expertise to, for example Selenium.
But Selenium is one of the most complex web development tools out there. Selenium is a tool comprised of program-inside-a-server-inside-a-Web-browser, and that’s before you’ve written a line of your own test code let alone picked a test harness and gotten your changes to run in CI.
It’s true that what I’m describing is a learning process that at first glance may appear totally beyond the capability of most QA analysts. But it’s also the case that… this is the only road I know —that anyone knows— to actually being a contributing member of a software test engineering team.
The challenge then is to work out how a Web site’s “non-technical staff” (eg, QA Analaysts) can bridge the software engineering knowledge gap.
The other day I published the table of contents for a forthcoming guide to self-teaching the “hard skills” associated with technical QA. Since then I have several times heard the criticism that many QA analysts could not get started with this process as written.
I agree that setting up a simple Web site with PHP is beyond the technical skills of many people who would otherwise like to learn how to perform technical QA.
Let’s face it: automated GUI-driven browser testing is one of the most complex and badly-understood domains available to the software engineer.
Software engineers don’t understand or implement automated testing very well yet. The discipline is young. The first known examples of what we now call “developer testing” only go back to the late 1960s. In much of the literature of computer science, testing is not mentioned at all.
In order to successfully operate as a QA engineer, one must first become a software engineer. And not just any software engineer, but one who focuses on testing which I’ve just argued is a hard area to focus on.
dear novice automators; stop trying to learn java!— adam goucher (@adamgoucher) May 16, 2013
I don’t think that anything I’ve said above devalues or detracts from the real concern that managers and senior engineers feel about asking people to take on work that will be very difficult to complete.
But that’s why the title of my table of contents contains that magical phrase teach yourself. I’m not writing a technical manual to be used in the enterprise. In order to follow the program I’m outlining, someone would need to dedicate a lot of time, on an ongoing basis, for a year or more. It would require giving up weekend and evening time to study. It might require volunteering for risky projects at work in order to get to work with new technology in order to learn about it. Anyone who tried it would have to be slightly obsessed, to be honest.
But hey, that approach worked for me, for the most part, so far ;-)
So bear with me. I do hear your concerns. But also, this is hard stuff. It’s not for everybody. But I’m writing about the topic because I think we as engineers exclude too many talented people who would be excellent QA engineers —excellent engineers— given the proper opportunity to self-teach. I know there’s a middle ground here. I just need (a lot of) your help to find it.
Almost no one would accept a text editor or IDE that lacked colorful syntax highlighting for source code. Color-coded fonts make program text dramatically more readable and thus greatly reduce overall cognitive load for engineers.
I spend a lot of time in the shell and I view a lot of files via
less. Over time I became increasingly frustrated that
less — my tool-of-choice for reading code — lacked the ability to colorize text. I’d consider that a dealbreaker in my editor — why accept the constraint in the former context but not in the latter?
Fortunately on OSX and Linux it’s easy to use
source-highlight to enable pretty-colored syntax highlighting for less!
Most of the world’s popular programming languages are supported as long as they have a canonical file extension (eg:
That last part is pretty cool: different-and-better than most IDEs,
less will apply proper syntax-highlighting to files lacking an extension as long as the file starts with a shebang line.
I’ve put together a gist (embedded below) which contains everything you need to know in order to get syntax highlighting working with
You can use
less -N to enable line numbers when viewing source code with
less. Combined with typing ‘/’ to search by regular expression,
less is a code reader whose capabilities are on par with the best code editors.
This is the first draft of the table of contents of a book that I have been writing.
It’s worth noting that this entire program can be worked through without spending a penny on proprietary software (with the optional exception of Charles). It also does not require a powerful computer — you could do all this on a five year old MacBook.
Use PHP to put up your own Web site. Spend a couple of weeks making it a reasonably decent looking site. It doesn’t matter what the site is or whether it has any transactional functionality. Just some pages that display photos and content is fine.
lynx --dump to retrieve the contents of your Web site. Just
hardcode all the page URLs. Redirect all the content to flat files,
grep to look for patterns in your content. Start by
looking for mistakes you commonly make. Save your greps in a file.
Do your greps do what you want now? You might want to stop and read the Mastering Regular Expressions book. You will never once in your long future career ever regret taking the time to read this book.
wget to retrieve source files from your site. Save the files in
a directory. Then write shell scripts to apply
csslint respectively. What did you learn?
grep wrapper for the static analysis performed in the
previous step. Filter out boring lines and/or highlight interesting
lines in red. Save it all and check it in to source
control. Seriously if you are still not in source control at this
point STOP NOW and figure it out. Otherwise, congratulations you
are just about to lose several hours of valuable work — prepare to
go back to step 3 and start over! Sorry to be harsh but you WILL
lose work. CHECK YOUR SITE INTO SOURCE CONTROL TOO.
curl to retrieve source instead of wget. What is different?
Which one do you like better? Check it into source control.
Now go back to your wget script and use your text editor to
batch-replace the end of every wget line with
&. Now all your wgets
run at once. What happens? Which strategy do you like better?
Create a branch called “concurrent,” check in your new code and
push it to github.
Now use GNU Parallel to do the same thing. Spend some time
squelching wget log noise if necessary.
Now start your script, open a separate window and open
does it all mean? What did you learn once you understood it? Now
repeat with iotop and iftop. Check your bash history, see if
there’s any lines in there worth remembering (spoiler alert: yes
there are). Drop those lines in a new
lib/snippets.sh and check it
in to source control.
Is your wget script overpowering your CPU? You can go back and tweak GNU Parallel to run fewer concurrent jobs. Check your config change into source control.
Implement the same script with curl. What is different? What is the same?
Extract the GNU Parallel runner into its own file, so that there
is no duplicated code between the curl and wget versions. Learn how
Now merge your “concurrent” branch back into master. You can always go back to just before the merge commit if you want to play with the old single threaded version. Tag the revision if you want, and/or just put in a comment you can remember to search for later.
curl to examine HTTP
headers. These won’t make sense unless you go and read Wikipedia
and StackOverflow and ServerFault and get a good understanding for
yourself of what happens during the life of a Web request. You will
never once regret having spent the time to learn this information.
Now use wget/curl to examine the headers for your Web site. Does it return the headers you would expect? What about for pages that don’t exist or have moved? Fix your site if the headers aren’t working properly.
Save your favorite header checks in SCM.
Wrap your header checks in greps that filter out boring lines.
Go to your Web server and and start a GNU screen session. You will have to spend some time reading up on what screen is and how to control it. Once you have this information you can pretty much always set up a CI server of your own on a Linux box, regardless of the other parameters of the environment.
Start your for-loop-wrapper inside the screen session. Congratulations! You are now running a production monitoring daemon.
You could use
cron instead of a for loop and then you would get
email notifications for free (assuming your ops person has
configured your host with a mail account). Go and learn how cron
works, it’s simple and you will use this knowledge about five times
an hour once you start dealing with software systems at Web
scale. Cron runs basically everything on the Web.
Now use Chrome Web Inspector Net tab to look at HTTP traffic on your site. It is assumed you learned the basics of Chrome inspector during the initial 2 weeks you spent building your site.
Use Charles to view the same traffic (optional if you don’t want to pay for Charles).
Use tcpdump to view network traffic. You just have to learn to filter tcpdump down to your web server traffic, then you can stop. TcpDump is huge, you should not try to learn all of it now. It’s OK if the commands that work for you seem somewhat obtuse and magic. You can come back and fill in those gaps in your knowledge later. For now you need to be thinking about TCP packets.
Go learn how TCP conversations work during the life of a Web request. You will use this information every day for the rest of your career as a Web developer.
Now go back and look at tcpdump again. Use wireshark to visualize and filter.
Save your tcpdump scripts in SCM.
Use cut and histo to graph log messages in real time. Save it I. SCM, leave it running inside screen. Congrats you built a log monitoring service.
Take a day to fix your worst log boo boos. watch the graphs of “bad ” messages go down. Take screenshots, celebrate! Now you have built a “refactoring dashboard” like the one Ross Snyder talked about at Surge 2011.
Now you are probably ready to get your head around the Selenium stack, one of the most complex application stacks in the Web industry.
First install wd (prounounced “would”) and open a wd shell. Follow the steps in the tutorial and get them to actually work. This will take a couple of days, probably and that’s OK. Go slowly and bookmark all the helpful selenium articles you find. You will come back to these again and again.
Now go back and implement all of the wget, curl, lynx projects above with Selenium. This will take a month or so, probably. You don’t have to use wd, use whatever driver/language makes sense to you.
As you go, install the selenium headless environment on your server and get your new scripts running heedlessly inside your for-loop (or your cron if you went that way). You can leave your old scripts running too, in fact this is recommended so you can cross check output between different versions of the same script.
Congratulations you now have an incredibly powerful Web testing and monitoring tool at your disposal.
Take a week to pay down your technical debt. This is the only time you will stop all production for a week like this, but it’s worth it. After this initial debt is paid, you can just fix technical debt as part of your normal work day, occasionally making a project out of a big task.
Now select a login form whose action you would like to automate. Get it working. This will take a while. Use your local selenium meet up, #selenium and StackOverflow. Bookmark everything and blog what you learn. This is a huge opportunity for you to demonstrate that you are have achieved beginner status in one of the most complex and least-understood technologies around. Don’t skimp on the time required to make your first good impression as a selenium hacker! And welcome to the community!
Pick another login form or similar single-page form. Automate it. Put it in your CI cron.
Read the Selenium API documentation carefully. Think about what kinds of capabilities exist that you could use to automate a slightly more complex flow (think: 3 or 4 page form such as credit card and shipping flow).
Automate that slightly more complex flow. You will have to read the Selenium/WebDriver source to fully understand how the APIs work. That is normal — you cannot master Selenium without reading the source, because the project evolves so fast that the documentation lags behind as a matter of course. Use your list of Selenium resources to back you up as you read the code. Ask whatever questions you need to, and try to do it on StackOverflow. If you are wondering about it and asking, then a lot of other people are probably out there getting stuck on it but not asking.
Get it working.
Put it in your CI cron.
That’s it. If you have done all of the above you are a qualified journeyman Selenium developer. Welcome to the world of tomorrow! I mean, the world of automated QA!
It happens. Sometimes the code is all on One Very Long Line™
Sure it’s easy to insert line breaks with
sed. But how sure are you that your regex inserts the right line breaks?
No, even though it’s a bit gross, at times it would be really convenient to be able to see what differences there are between two somewhat similar very long lines of text. Fortunately that is easy with
wdiff(short for word-diff) to diff very long lines of text
can diff files that don’t have line breaks or whitespace! For example I can use
wdiff in my toolbox!
colordiffin conjunction with
colordiff is a command line tool that colorizes
colordiff understands and will colorize the “word diff” output from
wdiff. It also understands all the output formats from regular old
diff as well —
colordiff is awesome :)
You should be able to install both
colordiff using your favorite package manager. You may also wish to install
colordiff from source if it is not immediately available to your package manager.
code blockscan be pasted into the terminal. It will all run as written, at least as of OSX Mountain Lion.
sudo port selfupdate sudo port upgrade outdated sudo port clean all
sudo npm -g update
This post described the monthly maintenance steps I perform to keep my primary development Macbook up to date! I wrote it up mainly for my own use, but if you read this far, maybe it was helpful to you too — I hope so!
UPDATED: git diff works anywhere on the filesystem! Can’t believe I never just tried this. Hat tip: @johngoulah.
But note that
colordiff is still useful for coloring patches (see below).
@noahsussman btw ‘git diff’ also works on non git repos, just give two files as the args :)— John Goulah (@johngoulah) September 20, 2013
Git does very nice syntax highlighting for diffs when browsing the revision history. Colorful fonts make diffs more readable and thus reduce overall cognitive load for engineers.
It would be great if such syntax highlighting were available throughout the filesystem, not just within Git repos. And on OSX and Linux it’s easy to enable pretty-colored syntax highlighting for all your diffs and patches!
colordiff parses and colors diff output — it doesn’t matter whether the output is on STDOUT or in a file. In the latter case, simply cat or echo the contents of your patch file, pipe it to colordiff and enjoy the resulting easier-to-read diffs!
Here are two videos and two slide decks from "Continuous Improvement," my talk on what I learned setting up QA process and test infrastructure for large software organizations like Etsy and Barnes & Noble.
For about a year now I have been giving this talk for mixed audiences of testing, dev and ops professionals. The topics under discussion seem to resonate across all three of those groups, which is good to see. But in the future I hope to also address the roles of Product Design and Customer Support (it’s hard to fit everything into 40 slides and 25 minutes!).
In the past few months, it seems like a lot of people are suddenly talking about “devops QA.” Maybe it’s just me :) but perhaps the devops cultural movement has attained enough maturity that its adherents are now facing much more specific (and subtle) challenges when it comes to moving fast and breaking things.
Whatever the cause, it’s gratifying to hear more people talking about these issues. Because I find the whole topic fascinating and look forward to continuing to travel around and talk about systems thinking in the context of continuous integration and deployment.