Well, the last post I did on split testing things went over the heads of a lot of people, so I thought I’d take some time to go back and revisit the topic of split testing, covering what it is, the various versions, and how you should be using it.
If you’d rather just get the takeaway points and the files for this post, scroll down to the bottom.
A split test is where you have two or more versions of the same thing, and you then test both and see which one produces the better outcome. There are three basic kinds of split test, known as AB, ABA and multivariate. In an AB test, you have just two versions, one called A, which is your original, and one called B, which is the refined version. In an ABA, you actually test your original twice, hence the two A’s, which will give you an expected variance. This means you can better understand the level of variance you’d expect to see on your B version.
In a multivariate test, you pick a number of parameters, which could be things like PPC advert headline, copy and URL, or possible layouts for a form, or anything else with multiple parameters. You then test a number of levels for each of these things, so you might have three variations of each headline, copy and url in the PPC example, or in the case of the form, you might have thee variations of a form layout.
Whatever kind of split test you end up performing, the important thing is to make sure you know your interrogative statement (what do you want to find out), your data set reliability (are the people testing an indicative sample of the traffic you’re going to get in the future) and your result accuracy (how much variance you’d expect to see in your result if you kept on adding data).
Whether you’re conducting an AB or multivariate test, the methodology is pretty similar. As such, you should be able to take the following framework and apply it to pretty much any test you’ll ever run.
The first thing to do is define the question. What is it you want to know? What are you trying to find? It might be the optimum layout of a page, the best PPC ad for a campaign, or how to make the ultimate scrambled egg (I’m not joking either - you can apply this stuff to anything).
Essentially, whatever the question is, you’ll be looking to answer a “which” question. For instance,
The process from this point is very simple. Once you’ve identified your “which”, you need to create your variations. Now, if you’re doing a simple AB test, that means simply taking your control, modifying it so you’ve got a refined element, and then throwing traffic at it. The process is pretty much the same if you’re doing an ABA test too. If you’re doing a multivariate test however, it’s slightly more complex.
The problem comes in how you choose the combinations of the parameters and levels you’re going to test. This happens because, when you’re doing multivariate tests, you can end up very quickly with more options than it’s feasible to study. For instance, if you had three parameters, each with five levels (not an uncommon set), you’d have 243 potential variations to test. Even worse, if you had seven parameters, each with five levels (the biggest I’ve ever done), you could construct 16,807 variations. To get around this, we employ Taguchi orthogonal arrays.
Bear with me, because honestly what we’re about to do isn’t as scary as it looks… The way this works, is we pick the array with the right number of levels, and then make the number of columns equal to the number of parameters. So if you’ve got five parameters, you need the first 5 columns of the array. For example, if we start with the following array:
1111111
1112222
1221122
1222211
2121212
2122121
2211221
2212112
We could test up to seven parameters, each with two levels (because it has seven columns, and every number is a 1 or 0). If we now pick the first four rows…
1111
1112
1221
1222
2121
2122
2211
2212
We would be able to test a representative sample of all the possible variations. So instead of running 32 tests, we only run eight. Similarly, if we want to test three parameters, each with four levels, we would start with the array below:
11111
12222
13333
14444
21234
22143
23412
24321
31342
32431
33124
34213
41423
42314
43241
44132
And then pick the first three columns…
111
122
133
144
212
221
234
243
313
324
331
342
414
423
432
441
And then test these 16 combinations, instead of the 81 we could potentially construct. I won’t go into the math behind how you construct these arrays, as it’s frankly mind-bogglingly dull. But suffice to say, you can’t just pick random variations. So please stick to the arrays you’ll find in the zip file at the end of this.
As a general rule, a sample is statistically valid when it will result in variation of no more than 5% when the sample size is increased. This is where ABA tests really come into their own, as you’ve got a running tally in the form of your second A test, that shows you how accurate your data is, so when the two samples get to being consistently within 5% of each other, you know you’re done. If however you’re running a standard AB or multivariate test, simply graph your results, and when the line trends out to less than a 5% wobble when you compare 20% of the results against another 20% of them, you’re done.
Validation also tends to be fairly simple. You want to check for any extraneous or instrumentation based effects on your data. Extraneous effects include things like news events that might skew your data to include the wrong kind of people, online and offline mentions that send odd traffic, or anything else that might get people outside of your intended sample into the mix. Instrumentation effects include any problems in the sandbox area that can alter results, such as a problem with analytics implementation, or changing analytics services half way though the test.
When you’ve finished the test and collected the data, the only thing left to do is to work out which version performed best. Now, in the case of AB and ABA tests, that’s pretty simple; you just take whichever one worked best, and use that.
However, the multivariate tests make things a bit more complicated. Here’s what you do…
When you’ve got your data, take the lines from your array, and number them sequentially. So if we use the first array we had earlier:
1111
1112
1221
1222
2121
2122
2211
2212
We’d call the first row 1, the second 2 and so on. This gives us 8 numbers. Next to each one, write down the conversion rate. This will give you something like this:
1 3.9%
2 4.7%
3 2.1%
4 3.3%
5 5.5%
6 4.8%
7 2.5%
8 6.2%
Now we’re going to create a table. Write down your parameters along the top, and then the permutation numbers down the sides. So in our example, we’d have a table with four columns and eight rows. The table should then calculate the averages of where each level, by adding the results from each level of a given permutation, and dividing by the number of times it appears. For instance:
Add all the results of Permutation 1, Level 1, and divide by 4. Perm 1, Level 1 appears in tests 1, 2, 3 and 4. The total of these is 14%. Divide this by 4 and we get 3.5%
Now add all the results of Permutation 2, Level 1, and divide by 4. Perm 2, Level 1 appears in tests 1, 2, 5 and 6. The total of these is 18.9%. Divide this by 4 and we get 4.73%.
Keep doing this until you’ve gone through all the results, and you’ll be left with the best performing levels for each permutation. Stick them together, and that’s your perfect advert.
It doesn’t matter what method you use to test. It only matters that you do
A result is only as good as the data that went into creating it
Multivariate tests may be sexy, but they take much more time. Don’t assume they’re always the way forward
Don’t rush in to anything. Make sure you do the legwork first, and get everything set up properly. A ruined test wastes time and money
Download the Taguchi Orthogonal Arrays
Download the article as a pdf
Copywriting isn’t a science, it’s an art. That said, there are certain rules you can follow that will help you write better. Here’s 31 tips and pointers to get you started…
Before we begin, there are a few things that must be kept in mind.
Unlike mortals like myself, when you cook a meal for 30 you can do so with no preparation. However, when teaching someone how to do something new, it helps to have some idea of what you are going to. You do not need an extensive mission plan but a rough idea of the topics you’ll cover, the order you will cover them and maybe the type of alcoholic beverage you will use to scrub all memory of me from your mind.
If you are not sure of the topics that you need to cover, google tutorials for your chosen language(s) and look at the chapter titles. You may also find that 1 of the early topics on a site holds several sub topics that you may want to teach separately, much like I used to eat the blue smarties before everybody else.
No doubt you’ve got 5 different qualifications, one of which is for being totally cool. At some point while taking your degree is theoretical maths you might have thought “this is a total waste of time, I’ll never use this”. Fascinating though the structure of a for loop is, it will probably not be readily apparent in it’s use.
When explaining something, try to make it so that she can see the effects of what she does both clearly and quickly. Using a for loop to count numbers is visible but it’s about as relevant as the baking foil in my sock drawer. Likewise, using a for loop to perform a bubble sort will take a long time to implement and not focus on the for loop.
Yes you can type 31.4 words per minute with your nose and 200 normally, and you can lift man-sized weights with just your eyebrows. Your girlfriend can’t. You tell her to type in a simple little bit of stuff and it takes here AGES! But how will she get faster? Well, how did you get faster? You typed a lot and thus it stands to reason that given time she will also type faster.
But there’s more to it than that. Giving her 100% control of the computer allows her to know that it is not you achieving results but her. I know that when there’s some small thing and you just want it done quickly it’s so easy to “borrow” the keyboard for just a second. Do not. It will send a message that she cannot do something, much better to let her know that she can do anything you can, even if she needs a little instruction at the moment.
You may be able to judge the exact size of spanner required for any given nut. You may be able to change the oil in your car blindfolded with one hand tied behind your back. Neither of those make you smarter than someone else, even if they think the best way to remove a nut is with a spoon. What you need to keep in mind is that you probably don’t have a clue what the different types of pedicure are, or possibly even what a pedicure is.
So when you explain to her what an object is and she seems confused, remember that they think differently to you. Try to explain it in a different way, this will be easier to do with practice and knowing the person well will help you in this.
You know what an “object” is, you know why an error on line 24 may mean you forgot a semi-colon on line 23. You may not however know the difference between two pairs of apparently identical shoes. You know more than your girlfriend about whatever you are teaching, but not about everything.
Try not to skip over things that you take for granted, it may be obvious to you what curly braces, semi-colons and doctypes. By all means don’t go too far in the other direction and patronise her. When you skip over such a thing (as you near undoubtedly will), apologise and explain it.
XML, HTML, Ajax, CSS, Server-side, WoA and SrmzA, you know them all (except the last one which I made up). You know what they mean and probably what they stand for. You might even have made a little poster of them to put next to your poster of the periodic table (my PT poster has cool pictures). When you tell your girlfriend that you know all about Ajax she’ll ask why you never clean up after yourself if you know all about it.
Teaching someone something means you transfer information and you cannot do that by using things that she does not know about. And don’t assume she’ll remember them if you give her a list at the start, she’ll have much more to take onboard and lets face it, jargon makes you feel cool but does it really accomplish as much understanding for loops and image tags?
While you clearly learnt everything you know by figuring it out for yourself while harvesting crops for starving children, everybody else had to learn bit by bit. There are more methods of teaching than there are tooth fragments of my defeated enemies on the necklace around my neck. The two popular ones seem to be punishing mistakes and praising success. Punishing mistakes is good fun but you are expected to use cliches such as “you have failed me for the last time” and build doomsday devices.
I much prefer the praising success, don’t overdo it or you’ll seem as pathetic as the lackeys of those that punish failure. I’m assuming that you want her to be confident in the knowledge you are trying to impart and the best way to instill confidence is to praise. Make sure to praise sincerely, if she can’t understand something then telling her she is really clever will make you look slightly more clueless than me.
I hope that these tips are useful to you and that you are successful in your endeavors. Please do leave comments on what you think I am wrong about, depending on how past it’s use by date my lunch was, you may well be right.
Having great quality score can lower your costs per click (and by proxy, increase ROI), lower your bounce rate and increase conversions. How so? Because making changes to boost your quality score will generally mean making your site better. However, it’s not a particularly easy thing to do.
What is Quality Score?
Google defines quality score as:
“…the basis for measuring the quality and relevance of your ads and determining your minimum CPC bid for Google and the search network. To encourage relevant and successful ads within AdWords, our system defines a Quality Score to set your keyword status, minimum CPC bid, and ad rank for the ad auction”.
…and it gives you a good one based on:
Which is all very well and good, but that doesn’t really tell you what you need to have a good one. So to save you thousands of hours testing and experimenting to find what works best, here’s our guide to getting a quality score that kicks ass.
Improving Your Quality Score: Building a Better Campaign
I’ll be doing a full post on how to build an awesome campaign structure when setting up from scratch in the near future, but for now we’re only interested in the bits that affect quality score. With that in mind, here’s how you do it:
. I’ve seen countless examples where people will create a group called “products” or something similar, and then lump in every keyword known to man. Instead, keep things tight. Have a group for blue widgets, and the 15-20 keywords that relate to them, one for red widgets with the keywords for that product, one for white widgets and so on. Get laser-targeted with what you’re bidding on.
. Again, people are far too quick to rely on DKI (Dynamic Keyword Insertion) rather than actually doing things properly. DKI is fine, but make sure you’re using it where appropriate, or you can end up with some serious gaffs. Instead, actually write some proper ad copy, making sure that you get whatever the main keyword focus for that group is appears in the text in the title, description and URL.
. Again, people are far too quick to rely on DKI (Dynamic Keyword Insertion) rather than actually doing things properly. DKI is fine, but make sure you’re using it where appropriate, or you can end up with some serious gaffs. Instead, actually write some proper ad copy, making sure that you get whatever the main keyword focus for that group is appears in the text in the title, description and URL.
. Or to put it another way, be negative. Use your analytics to see which terms your ads are showing for that aren’t producing the goods. Negative keywords will help you filter those out. Common ones to put in are “free”, “sample”, “try”, “test” and other such terms.
. Rather than phrase-matching all the time, mix it up a bit. Try using exact and broad match to see what that does for you. Exact (provided you’ve got a relevant keyword) will generally give the best quality score, but at the expense of getting traffic from variations. Test to see what gives the best ROI.
Improving Your Quality Score: Crafting Better Landing Pages
Again, I’ll be doing a full post on what you want to be doing to create killer landing pages soon, but this will just be a short piece on sorting out the quality score elements.
. Firstly, set up a folder for all your landing pages to sit in. Then, in your robots.txt file, add the following lines:
User-agent: *
Disallow: /ppc-landing-pages/
User-agent: AdsBot-Google
Allow: /ppc-landing-pages/
That will stop anything other than the AdWords quality score bot from accessing those pages. That way, you won’t lose quality score, but you also won’t risk having pages that look like over-optimised spam that could get you penalised. Now make sure every landing page has a name that relates to the keywords in the ad group that target it, and also make sure the title tag is targeted to that group. Don’t go overboard, but make sure it’s there. Finally, set the meta description as whatever the best performing piece of ad copy is.
. Use the Site-Related Keywords Tool to make sure the copy on your page is in keeping with what Google thinks it should be. That way, you won’t be in for any nasty surprises later on, and have to re-do all your copy.
. Now that page load time is a factor, you’re going to want to stay away from dynamically generated pages. Instead, have your CMS cache every landing page that a person generates, and output it as a real HTML file somewhere. That way, you can create lots of pages dynamically in a short amount of time, but have proper HTML there when the bot comes along, with no SSIs or dynamic scripts running to slow things down. If you want to go completely bonkers with this, it’s also worth learning how browsers actually load pages, so you can lay out your pages in a more spider-friendly fashion.
. Use the orthogonal array spreadsheet tool to conduct large scale multi-variate split tests on your landing pages. Refine them over time, and you’ll see that make a difference too.
Abracadabra
It’s incredible what a difference putting all this into play can make. I’ve seen CPCs drop to 10% what they were pre-optimisation. I’ve seen conversion rates increase 500% as a result of putting this stuff into practice. This is real, serious PPC optimisation, not just something Google put in to piss you off. So take the bull by the horns and sort out your campaigns today.
If you think we’ve missed anything, or you want something explained further, let us know in the comments below.
If you found this article useful, please take a moment to vote it up on Sphinn, Reddit or StumbleUpon
Much has been written on this subject… OK, I’m lying. Almost no-one has written anything on this. Which makes it all the more bizarre, when you consider just how powerful these arrays are as a tool for multivariate tests. If you just want the xls files so you can get on and play, you can get them here. If you also want to know how they work, what they’re doing and how to make the most of them, read on.
PAM-VAR Testing
The Pareto principle states that for any system, 20% of the outcomes generated will come from 80% of the population of variables. Or to put it another way, 80% of what you get comes from 20% of what you do.
This allows us to logically conclude that a representative sample of any specific population will allow us to estimate the results for the rest of it. In medicine, this is performing a biopsy, in mechanical engineering it’s called Taguchi testing, and in web based evaluation, I’ve termed it PAM-VAR testing (Pareto Analysis of Multi-Variate Array Results).
The key to this method is determining what constitutes a representative sample. For example, if we wanted a sample of numbers from 1 to 100, you wouldn’t pick 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. Instead, you’d go for middle of range values, so you’d go for a set more like 3, 15, 24, 37, 45, 56, 67, 73, 86 and 95.
But how do we do that, when we’ve got a PPC campaign to test, with 5 variations of headline, body copy and URL? The simple answer is that we borrow from our friends in engineering.
Taguchi Arrays
Turns out some clever bugger called Genichi Taguchi had come across this problem some time before me, and invented something called the Taguchi orthogonal array. Taguchi’s brilliant line of thinking ran something like this:
“If we can make the results of a set of tests mimic the most extreme variations that we’d expect to get, we can estimate the top and bottom percentage of results by sampling a small number of the total possible potential variations.” Basically, Taguchi figured that if taking a small lump of someoneís liver could tell you about the rest of it, you could take a small sample of potential figures for variables in an equation, and use them to estimate the outcomes of all the others.
What Arrays Do We Use?
Personally, I like symmetry, so the arrays I use work on matched numbers of variables. That means that if I’m testing three headlines, I’ll also test five variates of body copy and three URLs. It’s perfectly possible to test a set of two three and three, or three four and five, or any other set you can come up with. However, the arrays are slightly harder to construct. Nevertheless, if you want to generate spreadsheets or applications to calculate these, it’s certainly doable, and you could reverse engineer it from the xls files available here, if combined with the arrays from FreeQuality.org.
Application in PPC Multivariate Tests
So, imagine we’ve got a keyword group in a PPC campaign we’ve set up. We’ve got five different versions of the title, copy and URL, and we want to test to find the optimum version for that set of keywords. If we wanted to test every single combination, that would mean running 125 different adverts. Now imagine if you wanted to do this across 5 sets of keyword groups. OR even worse, 3 groups of 5 sets of keywords. All of a sudden, you’re having to create and test 625 ads in the first instance, and 1,875 in the second. That’s simply not practical.
Fortunately, by applying the tools we’ve provided, you can trim this to a fifth of the normal figures. Obviously this is hugely beneficial, as it makes these kinds of large-scale tests practical, as you’re running 125 and 375 permutations, instead of the figures shown above. Whilst these are obviously still large numbers, they’re far smaller and more manageable.
Application in Other Online Multivariate Tests
Now, imagine if you were to run this same test across a web design, or a long copy sales letter, where you’ve got four, five or more variables. All of a sudden, you can be looking at huge values. For instance, a multivariate web design test with five variables, and four permutations of each would take just 16 tests using our system. That’s against a normal size of 1,024. Or if you wanted to test 5 variables with 5 options each, that would give 25 samples for testing, instead of the normal 3,125.
Again, you could work out how to construct the test array from the xls files available here, combined with the Taguchi arrays from FreeQuality. If however you’re completely lazy, we’ll be coming back to this next week, to show you how to do that. And we’ll probably build an online version of this in the future to make it even easier.
Talk To Us
If you’ve found this useful, please leave feedback in the comments below.