Incorporating a Movie Database - Building Flicksee Part 1

Incorporating a Movie Database - Building Flicksee Part 1

Prefer video over text? Watch this article instead of reading it!

I recently created my first mobile app, Flicksee, which was built to work cross-platform (iOS and Android) using React Native. This is the first in a series of articles that will discuss the process of building this app. If you’d like to learn more about Flicksee, you can do so here: https://www.flicksee.org

In short, it helps you discover new movies to watch using an extremely intuitive infinite scroll UI. Here’s what it looks like (in the app store and on my phone):

Flicksee in the iOS App Store (left). Flicksee on an iPhone (right).

This article aims to articulate how I acquired and improved the data for Flicksee. It might help save others some time who are attempting to do similar with apps they are thinking of building.

Choosing a Movie Database

The first thing I had to decide on when building this app was how I was going to acquire the necessary data, which in the very least meant a list of most movies and some data associated with each (images, actors, genres, runtimes).

At first glance, IMDB would be the first choice for this. That is, until you actually try to get a sense of what they offer, terms of use, and pricing. You have to submit a form just to be considered to be given access. This may work for a larger business, but for my needs of a simple app, this was not a great option, so I didn’t look into this further.

The next best option was TMDB, as you can dive right into using their API. However, once you look deeper, challenges become apparent. They are free to use for non-commercial purposes, but then things get very murky when you decide to build a paid app: https://developer.themoviedb.org/docs/faq

That says, “If you are interested in obtaining a license to use our API and/or our data/images for commercial purposes, please contact sales@themoviedb.org.” They don’t define what they consider a commercial purpose, they don’t outline pricing, and they really offer no guidance as to what to expect if you want to use their data commercially.

They also have conflicting messaging around this, as you can see on this page: https://www.themoviedb.org/talk/5cdcd9b59251416e32cfc9ac

Travis Bell, presumably a representative of TMDB, says, “Hi @iworldpark, we've never charged a penny to anyone in 10 years so there is no pricing model to share.”

Travis goes on to explain, “You are more than welcome to cache the data locally, that's fine. Just be sure to attribute TMDb as the source of whatever data and images you decide to use.”

Which is in contrast with their current terms of use: https://www.themoviedb.org/api-terms-of-use

Which state, "You must not: Cache, for longer than 6 months, any information obtained through or from TMDB or the TMDB APIs."

That wouldn’t work for my use case since I want to store the data as part of the app itself (that way, if any of these APIs vanish or change at some point, I don’t have to worry about my app failing).

It’s hard for me to know if I should rely on the official FAQ and terms of use, or messaging expressed casually several years ago. I did email them to ask for clarification, and they eventually got back to me weeks later. In the end, I decided this was a bit too risky for my taste, and so I found a third option.

I ended up going with OMDB. They offer data for tens of thousands of movies, including the movie poster images, and it is both volunteer-driven and entirely free without any substantial restrictions on the use of the data.

OMDB isn’t without its flaws, which I’ll dig into later, but for my purposes, it offers exactly what I need.

Cleaning Up the Movie Data

The first thing I did once I had mostly settled on using OMDB was to download their data and get a sense for it. There were several issues:

  • Images were very low resolution and overly compressed.

  • Images were often missing (only about half of the movies had images).

  • Data is usually incomplete and sometimes incorrect. Some examples include actors not being listed, and movie plots being vandalized to replace character names with silly names.

  • There were encoding issues with the XML data.

  • There was a substantial amount of data in a foreign language (I’m aiming for an English audience).

  • There were a fair amount of movies that I would not expect anybody to be interested in, such as some random person’s home movies.

  • There were some very obscure genres assigned to movies, with over 200 unique genres in all.

  • Many of the movies were adult titles, which my audience would not appreciate being exposed to.

All that said, they had a good amount of quality data, so I decided to spend some time qualifying the good data and cleaning up the bad data.

The first thing I did was apply some filtering to get rid of the most suspect data. For example, any movie under 50 minutes long and any movie title containing text characters not commonly used by English speakers. I also excluded any movies associated with the more adult genres. This trimmed a substantial amount of data (something like half).

Speaking of adult movies, even after excluding the most obvious ones, there were a few movie posters with more adult themes, like nudity and gore. I wanted to be fairly certain I wasn’t including these in their uncensored form in my app, so I manually reviewed tens of thousands of images and adjusted a handful of them. Very boring, but necessary.

I also unencoded the doubly encoded XML data. This might cause some quirks in a few edge cases, but this helped improve most of the encoding errors.

For the genres, I used AI to bucket them into the 10 most common genres. I could have done this myself, but it wasn’t super important to get this exactly right. I may modify the app later on to incorporate more of the obscure genres, but for now, this was a workable solution.

Finally, I wasn’t sure I could rely on the list of actors to be in a meaningful order (e.g., leading roles appearing first). I worked around this by sorting the actors in each movie according to their calculated popularity. For now, this simply counts the number of movies an actor appears in (the premise being that more popular actors would tend to be in more movies). While imperfect, it does a decent job. I may revise this later, such as by factoring in movie duration and movie budget as indicators of actor popularity, or by creating a list of the top 1,000 most popular actors according to my personal intuition.

The images were a lot of work, so I’ll explain those in more detail in the next section.

Improving Movie Poster Images

The biggest issue with the images is that they are often missing from movies. Rather than lose out on the opportunity to share over 10,000 movies with users who may find them interesting, I opted to find solutions to incorporate these movies, even though the app is built to lean on movie imagery.

One approach I took was to use AI to generate placeholder images according to genre. That way, if a movie didn’t have an image, a representative image could be shown in its place. I also covered the image with the movie name and the most popular actor to give it some extra value, giving users the ability to ascertain the basic movie details just by looking at the placeholder image. Here is what that looks like:

Placeholder images are shown for each of these movies. One is for “Drama” and the other is for movies missing genres entirely. The movie title and an actor are shown above the placeholder image.

While that was an OK solution, it does look a bit jarring when scrolling through movies and seeing these two distinct styles (i.e., normal poster image and placeholder image with text on top). To address that while still retaining the full set of movies, I created a toggle to enable or disable movies that don’t have poster images, and I disabled this by default (most people will appreciate the improved aesthetic of the original posters, and will still find value in 17,000 movies rather than 30,000).

Another issue I had with the images was that they were very low resolution and overly compressed. From what I understand, this is somewhat by design as some copyright laws dictate that the images must be of lower quality to be used in certain scenarios. Still, these seemed lower quality than even that standard would dictate, so I did my best to improve the situation with AI.

I reviewed several AI tools for upscaling images and found one that was significantly better than the rest. I don’t want to name and shame the tools that didn’t stack up, but if you are curious about the options, the various app stores have a few, and this website lists a few options: https://openmodeldb.info/

The one that really stuck out was Topaz Gigapixel: https://www.topazlabs.com/gigapixel. The other tools tended to suffer from two major issues. One was that they would distort the text in odd ways when upscaling (not just blurring, but swirling and other distortions). They would also do very strange things to human faces, especially eyes. Gigapixel, especially in their recent 7.1 release, addresses both of these problems very well. The new “Recovery” model uses AI to fill in the missing details, and they have a feature specifically to restore the details of human faces. It is a paid product, but given it all runs locally and doesn’t require a subscription, it was a no-brainer to invest in a one-time fee.

Here’s one example of an AI upscaled image:

The movie poster for “Frank”, before and after AI upscaling with Gigapixel.

The original is unpleasant to look at, given everything is pixelated/blurry/compressed. Some things you’ll be able to see from this example:

  • The text “Fassbender is Amazing” was upscaled perfectly (aside from the quote marks).

  • The woman’s face looks very decent considering the source.

  • The compression artifacts are gone, with everything looking much crisper.

There are definitely some quirks, as can be expected, but the goal here was to make the images more pleasant to look at, not to make every detail perfect. If you’re scrolling through hundreds of movies, you’re not going to spend much time looking at all the details of the posters.

The only real challenge with this one is the amount of time it takes to upscale an image, which is about 35 seconds per image in my case. Given the number of images I need to upscale, it will take a solid week to do so.

Before I do that, I’m going to wait to incorporate some new images. I’ve created a tool that can extract movie poster images from Wikipedia for inclusion into OMDB. That should add nearly 10,000 new movie images. Once that’s done, then I can start a second pass at upscaling all the images with the new Gigapixel 7.1 “recovery” model.

Legal Considerations

One thing I think is important to touch on for anybody reading this article is the legal aspects. I’m not a lawyer, so you shouldn’t take my advice in this article as a substitute for speaking with one.

The images I’m using from OMDB are all essentially under some free license, such as creative commons or fair use. That said, I’m not 100% sure my use case falls under the umbrella of these licenses since I’m currently charging for my app, albeit a nominal fee that is unlikely to recoup the cost I spent building the app itself. I’m not too worried about the legal ramifications of this, as I’m happy to remove any images (e.g., if a movie studio doesn’t want them displayed on the app) or, in the worst case, to make the app free.

That said, it’s hard to imagine why anybody would want their images removed from this app, as it doesn’t actually take away from the original creations (i.e., the movies themselves). This isn’t a streaming app, so people still need to find some means of watching the movies. If anything, this app is a bit like a free advertisement for these movies.

Hopefully I’m right about that, but it’s something to consider if you’re thinking of building a business based on the approaches I’ve outlined in this article.

Potential Next Steps for Flicksee

While I’ve largely accomplished what I wanted to with this app, here are some next steps I may or may not take, but which are certainly possibilities.

One thing I’d like to do is address a few quirks with the app, such as how it handles switching between portrait and landscape (that is a tale I’ll save for an upcoming article), or the small buttons that are a bit annoying to try to click.

Another thing I’d like to handle is improving image quality on Android devices, as the Google Play app store limits the bundle size to 200MB. There are a couple of ways around this, so given the time I’m confident I can implement a solution.

I’d also like to add a few more settings, such as a release year filter, and perhaps the ability to see related movies based on a given actor.

Finally, the images could also use some improvement, and I’ve already outlined a few ideas that are in progress. I could also take this a step further and supplement the data I already have with data from TMDB, using the best available data from both OMDB and TMDB. This would give me the option to completely sever ties with TMDB if they force the issue at some later time (they seem friendly enough, but it’s hard to tell from the official docs on their site). I could even have the TMDB data delete itself after 6 months to remain in compliance with the TMDB terms of use regarding caching for no longer than that amount of time (and I would just need to release a new version of the app every 6 months to keep the data fresh).

There are many places I could take this app, but I probably won’t do much of it, considering this is more of a pet project than an actual business venture. That said, if anybody is interested in this app or similar ideas, I’m open to the idea of making it into some sort of product. Until then, I hope a few people find it useful and gain some joy out of using it.