Get the innerText of an element in Scrapy

Get the innerText of an element in Scrapy

I was trying to pull out a big description block for an item in a recent scraping project. Of course, this contains all kinds of weird and wonderful HTML formatting as it is probably built in a WYSIWYG editor. I found that Scrapy doesn’t have a good way to...
BigQuery and firebase analytics cookbook

BigQuery and firebase analytics cookbook

Firebase Analytics is an incredibly powerful tool to understand how users use your mobile or web app. The past year I have been using it to draw insights and help prioritise and improve various functions in our team. Firebase Analytics is mostly good for collecting...
How to download a list of URLs using bash

How to download a list of URLs using bash

Shell (or Bash as it is sometimes called), is the command line prompt of Unix-based systems such as Ubuntu or MacOS and, as we will see today, it can be a huge friend of yours. I have had the pleasure to work with some kings of bash and I was truly amazed bu what they...
Data merge with SVG in the browser

Data merge with SVG in the browser

Taking a source data set, say a CSV with rows of data, and generating a list of pictures or emails inserting these rows of data into a template is commonly called mail merge or data merge. Every so often I find myself wanting to do this. The first time was years ago,...
Gitlab CI Flutter with test reports in 15 minutes

Gitlab CI Flutter with test reports in 15 minutes

In this article, I want to show a practical example of how to set up Gitlab continuous integration for Flutter with test reports. Tests need to be run all the time to be valuable. When they fail just after you introduce an error, you will know right away that what...
Generating Flutter package badges

Generating Flutter package badges

I tend to automate the most basic tasks because it’s more fun than copy-pasting. If I need to do something more than ~5x I’m like to write a script for it. This time I was writing a blog post comparing some Flutter packages and had to generate badges for...
Getting a Firebase JWT for testing

Getting a Firebase JWT for testing

Why this? I use Firebase for many of my projects and a big reason is that it takes away the complexity of handling an authentication system for no cost. It integrates with google login for one thing immediately. That being said, writing a backend using this works...
Scraping HTML tables with Scrapy

Scraping HTML tables with Scrapy

Scraping tables The python Scrapy library is an excellent helper to build simple but powerful scrapers. It’s common to want to scrape HTML tables when we scrape text of pages and as I’m going to show it really doesn’t need to be difficult. The rough...
Save time debugging Scrapy with shell

Save time debugging Scrapy with shell

Scrapy is great, debugging Scrapy less so Are you adding print statements and then rerunning your scraper time and time again to get that one selector right? Do you have chrome open in the background and using jQuery to test those selectors live on the website you are...
XML as HTML using XSLT with Javascript in 30 minutes

XML as HTML using XSLT with Javascript in 30 minutes

That’s a mouthful of a title indeed but I found today that the plain Javascript SDKs bundled with modern browsers are capable of a lot. To be specific, to take a file, parse it as XML then put it through an XSLT parser and spit out HTML. Why David, do you want...