For the past few months or so, I have been thinking of becoming a ‘cable-cutter’. Watching more and more movies and series on Netflix and other streaming websites means I don’t see the need for a cable TV subscription anymore. Also, I find myself subscribing to YouTube channels even more so. My subscription list is ever growing. In a good way, since these are videos I actually enjoy watching, instead of what cable TV wants to feed me. Another great thing about YouTube is that you can skip to any part of the video you want.
The production value of some YouTube videos is ridiculously high. A lot of time and effort is spent on these videos, as you can tell. Of course, these kind of videos get a lot of views, which will give the creator a decent amount of Google AdSense revenue over time. The downside is that this revenue is not reccurring, nor can it be estimated. Not a good thing if you want to keep your creative spirits high and make a living from YouTube.
A trend which has been going on for a while now is that creative (content) creators are signing up for Patreon.com. Patreon is a crowdfunding platform where creators can be supported for their creative expressions of any form. This results in extra and even exclusive content for patrons, although – and fortunately – the majority of content creators keep publishing on YouTube as they usually do. The downside (although I think they have all the rights to do it) is that they constantly ask viewers to support them on Patreon. I would not use the term “e-begging” as some might call it. But why not support creative minds and content in a good substantial way? That’s what Patreon is all about.
Show Me the Money
After I’d supported one of my favourite channels with a massive $1 / month, I wondered around Patreon some more. Curious as I am, I wanted to know which creator earns the most. Who can actually make a decent living from this? Of course it’s always all about the numbers.
Unfortunately Patreon does not have a top list of most supported creators. So, after viewing some featured (I thought it was quite hard to get out of your own filter bubble) Patreons, I was not statisfied. I wanted to see big numbers. I knew exactly what I was suppose to do: Scrape Patreon!
“This is huge”
A quick “site:patreon.com” showed me over half a million results, and excluding the updates and blog posts (site:patreon.com -inurl:posts) the results even add up to over 1 million (duplicate content anyone?). Time to bring out the big guns.
Start Collecting Data
Without giving away too many clues, I asked the DeepCrawl team if they were up for an experiment. Fortunately they were! After Darren granted me the credits (thanks again), I was able to set up the major plan to extract the earnings on the pages of the creators on Patreon.
On the profile page of the creator I wanted to get the subtitle (to get an idea of what is being created), the amount of patrons and of course the number after the dollar sign. Later, with all the data extracted and collected, it was easy to sort by the amount of money made.
Here’s what a Patreon profile page looks like:
Fortunately the HTML code is clear and consistent, so on that I easily could determine the RegEx. I say “I”, but to be sure my RegEx didn’t become too greedy and do any damage, the support team at DeepCrawl helped me out. All the data could be extracted neatly with the following RegEx rules to apply.
First off, I wanted to make sure that only the profile pages were crawled and not the updates, patron (supporter) page nor any pagination. After I started the crawl, every now and then, I paused the crawl to see the progress. If any unwanted pages showed up, I simply added them to the negative restriction.
Hopefully this saved me some valuable crawl credits and sped up the process. All in all it took 18 hours to crawl Patreon completely (even with the restrictions and pause time). An unbelievable achievement if you ask me.
With Big Data Comes Big Responsibility
After the massive crawl, I had my data. The data extraction did its work, I now had over 10,000 rows of creators including their earnings. Neat. So, now what? Well, sort it from high to low of course! The result is amazing. Who would have thought you can make over $32K per month with your creative outings.
Ready to crawl your own site?
Get a live demo or contact us for learn more about custom extraction and what you can extract from your site.
N.B. The crawl took place on September 18th.