A classic part of the NoSQL sales pitch is that SQL JOINs are too expensive and don’t scale, and a classic response is to point to big websites running smoothly on SQL databases. The reality, as always, is a bit more complicated than that. Types of scale When engineers talk about scale, they’re almost always referring to some sort of usage scale, but even this is not always clear. Usage scale can take the form of:
Code Ownership is the practice of assigning explicit owners to areas of codebases. Before Google I worked at small companies where it’s easy to know who should review each code change, but that doesn’t scale far. Even in a team of 10 it wasn’t always obvious who knew an area of code the best, and it was certainly less clear for new starters. Various tools have been developed to help with this.
At Thread we went through several iterations of Search, evolving the technology as we evolved the business and our understanding of what our customers wanted. Later stages went beyond my naive understanding of search at the time, and may prove useful inspiration to others. Before we dive in, some clarification of terms. For us, search meant free text entry that generated product results, whereas filtering referred to distinct options that could be chosen by the user, such as filtering to next-day delivery or a particular brand.
In an attempt to self-host a low-cost fediverse node, I started with GoToSocial, but later decided to switch to Mastodon for better compatibility. This transition presented some challenges and got me thinking about whether existing web frameworks are well designed for linked data services. Activity Pub, the underlying protocol for the fediverse, necessitates storing URIs to resources on other nodes in the network, and as such, even after running GoToSocial for 24 hours, there were already many links to the node.
I’ve just started using Raycast, an application launcher for macOS. Like every other launcher before it, it does a lot more than just launch applications, and most of that functionality comes from extensions. Also like several other launchers before it, I decided to have a go at writing an extension and see what the process is like. Back in the mid-2000s I was an avid user of Quicksilver. It was the first launcher I used, and was quite extensible, but extensions were native code (typically Objective-C) written against the Apple developer APIs, in Xcode, bundled up and injected into Quicksilver in a sort of plugin model.
This is not a tutorial on how to write your own task queue, but rather an attempt to convince you that you should write your own. What’s a “task queue” in this context? For the purposes of this post, a task queue is a system for performing work out of band from a user interaction, often at some later time. Typically this is a core component of many web apps, and is used for performing long running tasks or things that can fail and may need to be retried like sending emails.
Stadia is Google’s cloud gaming service. Users who sign up can play games they purchase on nearly any device, as the game runs “in the cloud”. This is a new concept that has only just become possible in the last few years, with advancements in internet connections, video encoding, and web browsers. I frequent Reddit’s r/Stadia subreddit, and have noticed some misunderstandings about what Stadia is and isn’t. This post is an attempt to clear up misunderstandings.
At Thread I’m involved in hiring engineers for frontend, backend and iOS roles. One of the things I have become more aware of as I have gained experience in hiring and interviewing is how my biases affect the outcomes of interviews. This is something I’m always trying to improve – to understand what biases I have, to mitigate their effects – and in the process I have found a mental model that has helped me.
A mental framework for library design For those with plenty of experience managing complexity in large complex codebases, this post will likely be nothing new. However many open-source libraries, frameworks, and tools make mistakes in how they handle cross-cutting concerns and end up being difficult to use as a result. I’m no stranger to this, and have several times found myself unsatisfied with the design of a library that I’ve created only to realise that it’s due to mishandling of cross-cutting concerns.
There’s a common theme in software engineering communities of software that’s too complex. Slack and other Electron apps are frequent targets – why do we need yet another “web browser” using 2GB of RAM when IRC worked perfectly well? While I can empathise with the performance issues, the question often betrays a misunderstanding of the problem being solved or the target audience of the software. Slack is not designed primarily for software engineers who grew up on the internet in the 90s, it’s designed for non-engineers.
Information Exposure Vulnerability with Django and Memcached On Wednesday April 29th, Thread started experiencing a partial outage of our main backend service. We traced the issue down to the existence of malformed Memcached keys and corrected the issue on thread.com. Along the way we suspected that this could be exploited on some Django sites using Memcached to cause private data exposure – either internal service data or data about other users.
Last year I bought a copy of Scythe from publisher Stonemaier Games, based in large part on the art. I was very happy with the art and enjoy playing the game, but what I found even more satisfying was the design of the rulebook, the iconography, and the use of physical tokens to re-inforce processes used throughout the game. This week I bought Wingspan from the same publisher, again based in large part on the artwork, and once again I’m finding the other aspects even more satisfying.
I’m an armchair space enthusiast – I like to watch new launches but I know very little about rockets. Recently there’s been a lot of renewed interest in landing on the moon which is very exciting, and also a lot of press coverage of NASA’s Commercial Crew programme returning manned spaceflight capability to the United States. Between these two advances, there have been many pointing out the decades where we as a society, and the US in particular, were going backwards.
I’ve been reading this extensive breakdown by Bethany McLean and Peter Elkind of Enron’s collapse after a colleague’s recommendation (based on my enjoyment reading Bad Blood). I found it fascinating how much of the classic image I have of corporate greed stems from the relatively recent collapse of Enron in 2001. Since I just missed the Enron collapse, being about ten years old at the time, I had assumed that these ideas had existed for much longer, but during the bull market of the 90s the image had yet to fully form.
Not long after a recent one to one with my manager, discussing how we could improve our incident response process in engineering at Thread, I returned to my desk to find a copy of The Checklist Manifesto that he had kindly got for me. This is less of a book review and more of some highlights that I wanted to pull out from the book. Going into it, I had already read about the effectiveness of checklists in preventing human error, particularly in commercial aviation and medicine, but the book still had some great points to make.
Last month at their annual Worldwide Developers Conference (WWDC), most interesting announcements was Sign in with Apple. Built to compete with Facebook and Google’s single-sign-on (or social sign-on, SSO) offerings, Apple’s SSO will eschew control over the data and analytics that its competitors seek in favour of a privacy preserving design intended to advance Apple’s pro-privacy stance and ultimately to sell more devices by bringing more value to the Apple ecosystem.
GraphQL’s type system allows us to make many invalid states impossible to represent, which improves the usability and reliability of our APIs. Two features of the type system that contribute significantly to this are Interfaces and Unions, however they can be used to address similar design considerations so it’s not always obvious which is the right option. In this post we’ll look at several examples from the Thread API, and explore whether using an interface or a union is the right option.
Four of us from the Thread engineering team went to PyCon UK again in September for the third year running, and I was lucky enough to have my talk selected. At Thread we use Django for the backend of the main site which has grown to over 350 “apps”, and various members of the team have used the framework since not long after the initial public release. I’ve learnt many tips, tricks, and best practices for keeping engineers productive on a codebase of this size from my colleagues over the years, and I shared the highlights with the Python community in September.
At Thread one of our core beliefs is that technology allows for great change. This is important to our product, but it’s also important to how we work internally. Because of this way of working, we try to represent everything in data—products, measurements, styles, suppliers, locations in our warehouse, support ticket resolutions, and many more things that you’d never even think about. All of these data models come with a cost of needing a way for those in the company who use them to maintain the data.
Following on from my previous post about Haskell web frameworks, I’ve dived into making a non-trivial web application, with type-safe database access.
A quick overview of 5 of the popular Haskell web frameworks, and what they can do to improve the state of web development.
Qualys have become well known in the recent crop of SSL and TLS vulnerabilities as a first-responder with automated testing and validation, but scoring top marks on their SSL Labs test can be difficult. I explored what was required to score full marks.
RESTful APIs are a popular thing, but is anyone really doing it properly? This post highlights some common flaws in RESTful APIs, and explains why it’s important that we improve them beyond the current standard.
MongoDB, the company behind MongoDB published a new whitepaper this month, about ‘quanityfing business avantage’. As I’ve recently completed a research project at university where I critically analysed the design decisions taken in MongoDB, I thought it would be interesting to see how the company sells it. I’ll write about my research sometime, but for now, I’m going to pull out a few quotes from the whitepaper. You can download the paper here, that’s a directly link so you don’t have to sign up to their newsletter to get a copy.
It’s the middle of Build, the annual Microsoft development conference, and just a few months since Satya Nadella took the position of CEO at Microsoft. In recent years Microsoft has been making some very weird decisions, including the design of Windows 8, issues in the Xbox One developer licencing and then backtracking on that, pushing Internet Explorer with some very strange marketing campaigns and more. You can imagine my surprise then, when in the past few weeks, Microsoft have publicly taken some very different directions.
Apple recently released detailed descriptions of how many of their iOS security components work. This is a great step towards better security and transparency about security on iOS, and I’m really glad they have published the information. Included in the document were details about how iMessage is implemented from a security point of view, and it looks like a good system built on strong public-key crypto. Theoretically Apple don’t have the ability to read the messages you send, and that’s a good thing.
Last Wednesday, Stripe started their 3rd Capture the Flag competition. As a provider of online payment services, security has been critical to them, so over the last few years they have run two CTFs based around hacking and securing systems. This year they chose a different subject: distributed systems. The CTF happened over the course of the last week, and consisted of 5 levels of supposedly increasing difficulty, with many participants hanging out on the IRC channels and creating a fun community that was full of innovative ideas.
Today I found DyNAcrypt on IndieGoGo, and was disappointed to see yet another example of terrible cryptography practice in a project looking for crowdfunding. I don’t know whether the creators of DyNAcrypt are trying to scam people, or just ignorant, but either way, I’m going to go through some concerns I had while reading the project description. Reading the project description, and watching the video strongly suggests that the project creators are ’trolling’, making very obvious technical, and grammatical, mistakes.
I learnt to code when I was 15, by watching hours and hours of video tutorials about writing C# applications in Visual Studio, and copying code out of programming books. Wait, no. I learnt to code when I was taught programming in Visual Basic for a year at college (high school). Actually, I learnt to code in my first part time programming job, where I had to make a web application using Python and Java.
I’ve just read The Government wants to teach all children how to code, Here’s why it’s a stupid idea by Willard Foxton on his Telegraph technology blog. I found the article incredibly short sighted and full of bad stereotypes that miss the point of teaching children to code. I’m not going to pick apart the author’s points, from the looks of the comments section on the post I think that’s easy enough for anyone to do.
If I mentioned that I like C, C++ or Python to other students on my course, or colleagues, there would be no reaction. There are things you can criticise about each one, but they are all very safe bets. When I tell people that I enjoy writing Objective-C however, they are confused and often quite hostile towards the language. I am by no means an Objective-C expert, but I’ve been thinking through the reasons why I like it, so this is a random collection of reasons why I enjoy using the language.
Apple’s World Wide Developer Conference was yesterday, and I wanted to write down my opinions on what was announced and released. iOS 7 As much as I dislike the homescreen icons (and I really dislike them), the rest of the OS has some very interesting design choices in it. The lockscreen is very pretty, the translucent UI components are great, and I quite like the typograhy as well. At first I didn’t think it looked particularly great, but with a bit of use it has grown on me, and I think it will mature into a set of solid UI design patterns.
I’ve been writing Mac OS and iOS apps for a while now and while I haven’t got a massive amount of professional experience with it, I feel I understand the core concepts quite well. However, despite having written a fair amount of Java in the past I’ve never attempted Android development. After a very interesting, and promising I/O, I thought I’d give it a go. The first experience is not a good one.
A few hours ago Google held the keynote presentation of their yearly I/O developer conference. My housemates and I, all being computer science students, put it on the TV and discussed the announcements as they happened. This post is a summary of my thoughts on what happened. As a Developer There is only really one thing I can take away from the announcements, and that is that Google ‘gets it’. I suspect this comes from a deep-rooted respect for developers within the company, after all, the founders were both developers.
This was the title of a talk I attended this evening, given by Professor Alun Vaughan of the University of Southampton, and Professor Averil Macdonald of the University of Reading. As you can imagine the title is quite over-dramatic and the speakers did concede that it was to ‘spark discussion’, but they presented many facts that I, as a layman with an interest in electric cars, had not encountered before which have made me re-evaluate my position.
Yesterday Panic, a well known Mac and iOS development company, launched a new app for iPad. Status Board is based on their famous office status board. Since I worked at GoSquared last summer, know the API and use the service, so I thought it would be nice to get the timeline of current visitors and top pages from my site into a panel on my status board. I’ve set up a public API for this so that you can use it with no setup at all.
I’m writing this on the train home from Rewired State’s latest event: National Hack the Government Day 2013 (event summary page). It was another great event with the same friendly atmosphere that goes along with so many (especially Rewired State’s) developer events. My friend Elliot and I won in one of the categories, and so this post is mostly about what we did, how we did it, and why we think it’s important.
The Alfred 2.0 beta was released earlier tonight and as an avid user, I wanted to start writing workflows immediately. Workflows are able to take input from the user in many different ways: actions on files, keyworks, shortcuts, etc, and then return data as notifications, search results and actions such as opening a browser. After seeing David Ferguson’s brilliant Google Auto-Complete workflow I realised the power of workflows and set about making one that helped me use one of my favourite websites, Reddit.
The specification for this assignment was to create a basic 3D scene with mutliple objects, camera control, and various graphical effects under the title Mars in Fiction. The scene had to be written in C++ and use OpenGL, and ‘modern’ OpenGL techniques that have been the standard since version 3, such as the use of vertex and fragment shaders, and vertex arrays. Out of Date Documentation This was by far the biggest issue I had while developing the scene.
I have recently been developing a few static sites using Hammer, a great little Mac app that handles compiling resources and putting together parts of web pages to create static websites. Hammer is also able to publish drafts to hammr.co, which is great for getting some feedback and showing different versions of a design, but not suited for hosting a site in production. Although it would be easy to put the static site on any one of the nearly infinte shared hosting or VPS services out there, I already have a few sites on Heroku, I find the deployment process easy, and I know that most of the time I can get away with staying on the free plan!
A while ago, having just completed a module at university where we looked at the technology behind card payment systems, I wrote about the problems that Square and PayPal Here faced in moving abroad. I concluded that iZettle, a startup from Sweden, was well poised to take the European market, but maybe that’s not an issue for Square? In my previous article I passed over Square Wallet – previously ‘Pay with Square’ – without much of a mention, but this summer I met up with former Square employee Louis Mantia and he pointed out that it isn’t just a neat feature of Square, it’s the whole point.
There is a lot going for Linux in business already: it’s free, runs well on old hardware, and has a good range of office software, but I don’t think business is where Linux will take off, if anything I think it will take longer than home use. Unfortunately, I think one of the main reasons for Linux having such a good chance is Windows 8. It’s a train speeding out of control and hurtling towards the busy station at the end of the line that is release day.
There are basically 2 options for internet in the UK: BT (phone line based) connections through BT themselves and resellers, with a maximum speed of around 24Mb, or Virgin Media (fibre based) connections which max out at about 120Mb currently, although the technology supports >400Mb. For a house of 8 Computer Science students, we don’t really have an option but to go with the latter. It is the standard now in the UK for all ISPs to provide a modem and router, or a combined box that does both, free with a connection.
In the last decade, PayPal has slowly become a mainstream service, thanks mostly to it’s tight integration with and later purchase by eBay. This has given rise to competitors such as Google Wallet and Amazon Payments which, while they each have a slightly different purpose or target audience, have contributed to revolutionising online payments. When I buy things online now I rarely enter my billing and shipping details, and instead choose to use one of these other, simpler and easier, methods.
The security of GitHub’s website and systems has been the focus of a fair amount of news in the industry over recent months, this is an account of my experience finding a vulnerability, getting it fixed, and also my opinions on the recent ‘mass assignment’ exploit that was publicly demonstrated on GitHub. This was the first security issue I noticed in the wild, a problem with how GitHub was handling authentication for one of their API endpoints that provided an RSS feed of account activity.
Hosted at White Bear Yard in London, Friday 13th to Sunday 15th April, I and a group of friends hacked together a system for aggregating social data for events. We used the Twitter, Facebook and Foursquare streaming APIs and built a prototype of a scalable system using Node.js, Redis, RabbitMQ and Pusher, hosted on Heroku. The event was great fun, although again I had far too little sleep. We undertook quite an ambitious project and ended up with a pretty good proof-of-concept which we presented on the Sunday afternoon to the judges from the sponsoring API providers and one of the staff at White Bear Yard.
4.7% of users have the password password 8.5% have the passwords password or 123456 9.8% have the passwords password, 123456 or 12345678 14% have a password from the top 10 passwords 40% have a password from the top 100 passwords 79% have a password from the top 500 passwords 91% have a password from the top 1000 passwords If you go to a friend’s Facebook account right now, you quite possibly have a 1 in 20 chance of guessing it first time if you know what to guess.
The task given to me was to create a webserver that was exploitable with buffer overflow. This was my first attempt at networking code in C so it may be quite a bad implementation, I was also quite rushed with this coursework due to approaching exams. The server binds to port 8000 and delivers files from the directory it is run from. It will only handle 1 request at a time and it only supports GET requests, but it features basic protection against directory traversal attacks.
I have always had an interest in computer security, probably inspired by films like Hackers and scenes like ’this is a Unix system’, and along with this interest I have been fascinated by cryptography. This lead to me reading The Code Book by Simon Singh which I think is the perfect introduction to the subject of cryptography. It doesn’t require any technical knowledge, and doesn’t dive in to complex maths, but rather presents the history of cryptography starting with Caeser Cyphers and moving all the way through to modern public-key cryptosystems.
Most credit card fraud occurs because, somehow the fraudster is able to see the card owner’s PIN number. The most common way this happens is hidden cameras at ATM machines recording PIN numbers or dodgy Chip & PIN readers fitted with monitoring devices (this is significantly less common). The way to deal with this crime is to make the PIN change every time an attempt is made. There are already systems that utilise a small device (usually a key-fob) that generates a password on request that will only last for 30 seconds.