Liam's Blog

Fixing Seafile's Preview Implementation

Engineering

You know Dropbox. You know Google Drive. You know OneDrive. Some of you may know Nextcloud, but it’s time to give some love to the lesser-known (but way better) solution: Seafile.

There are a ton of reasons why I don’t want to use Google Drive, Dropbox and OneDrive, but a story for another day.

Unlike it’s nearest competitor Nextcloud (which is written in PHP), Seafile is written in Python (and in future versions, Go). This means it can do a way-better job at block-level synchronization, which means it’ll only upload parts of a file that changed instead of re-uploading the entire file.

Also, Nextcloud is refusing to acknowledge that its UI and architecture needs a rework, and they keep adding stuff you don’t need in a file syncing platform, so Seafile is definitely the better choice.

But what’s holding us back?

After this post, nothing anymore. But for the purposes of context, Seafile has a problem with video thumbnails.

Their documentation specifically says that they support it, but it hasn’t been updated for quite a while. At some point, I believe there was a changelog entry in Seafile v7.x that it was deprecated, so tough luck.

Their original implementation depended on ffmpeg as well as the pillow and moviepy modules. I have no idea if this ever worked, since digging deeper, these were only part of a bigger problem.

The original Python implementation is as follows:

def create_video_thumbnails(repo, file_id, path, size, thumbnail_file, file_size):
  t1 = timeit.default_timer()
  token = seafile_api.get_fileserver_access_token(repo.id,
    file_id, 'view', '', use_onetime=False)

  if not token:
    return (False, 500)

  inner_path = gen_inner_file_get_url(token, os.path.basename(path))
  tmp_path = str(os.path.join(tempfile.gettempdir(), '%s.png' % file_id[:8]))

  try:
    subprocess.check_output(['ffmpeg', '-ss', str(THUMBNAIL_VIDEO_FRAME_TIME), '-vframes', '1', tmp_path, '-i', inner_path])
  except Exception as e:
    logger.error(e)
    return (False, 500)
  
  t2 = timeit.default_timer()
  logger.debug('Create thumbnail of [%s](size: %s) takes: %s' % (path, file_size, (t2 - t1)))

  try:
    ret = _create_thumbnail_common(tmp_path, thumbnail_file, size)
    os.unlink(tmp_path)
    return ret
  except Exception as e:
    logger.error(e)
    os.unlink(tmp_path)
    return (False, 500)

I’m not much of an experienced Python developer, so I may be wrong on a lot of things below, but it works so I don’t care.

The points of concern in the code above:

  • It calls subprocess.check_output() which apparently returns a binary instead of a string as of Python 3 (at least that’s what StackOverflow tells me and I get b'' on Python 3 so this is a strong maybe)
    • I’m guessing this was a Python 2 script that was carried over but never updated when they shifted to Python 3
  • get_inner_file_get_url calls the Seafile file server hosted in 127.0.0.1:8082, but ffmpeg can’t process this correctly for some files due to it streaming the file instead of initiating a download, so the video file is broken before ffmpeg gets a chance to process it
    • Debugging this gave me Invalid NAL unit size (3331841 > 13015) which leads me to believe that my conclusion is correct, but again, not much of a Python developer to 100% confirm it

Due to those two points, screenshots can’t be created. You can get one part to work by calling subprocess.run() instead and add the check=True argument, but ffmpeg would fail because it can’t read the file correctly.

What I tried

One of the solutions I tried was to wget the file from the internal server so it’s buffered to the disk, and change the command so ffmpeg calls the buffered file instead, then delete it afterwards.

This works, but:

  • It needs to download the entire file, which works fine for files that are around 10 MB or so, but not for gigabyte-sized files
  • FFMPEG still fails to parse some videos for some reason, and the FFMPEG Trac isn’t much help

The next idea I had was to use curl -r 0-10485760 instead of wget and only download the first 10 MB (from 0 MB to 10 * 1024 * 1024 MB) of the file (or lower), since I only need the first 3 seconds anyway to generate a thumbnail (you can customize this in Seafile’s config files, but I chose the first 3 seconds because some videos may be short).

This works 50% of the time, since technically it would download the first 10 MB of the file, but FFMPEG would treat it as a broken file yet again (even though I can play it in Windows for some reason, albeit without finishing).

What worked

Turns out there’s another magical third-party program that’s also open source that can do this for us: mpv.

mpv is, by nature, a video player. But you can use it from the CLI to do all sorts of stuff, one of which is generating thumbnails from videos.

I didn’t even need to buffer the file to the disk, it just works. I installed mpv via apt-get install -y mpv, then I updated Seafile’s implementation (seahub/thumbnail/utils.py) to the following:

# added the datetime import for delta
import datetime

def create_video_thumbnails(repo, file_id, path, size, thumbnail_file, file_size):
  t1 = timeit.default_timer()
  token = seafile_api.get_fileserver_access_token(repo.id,
          file_id, 'view', '', use_onetime=False)

  if not token:
    return (False, 500)

  inner_path = gen_inner_file_get_url(token, os.path.basename(path))
  tmp_path = str(os.path.join(tempfile.gettempdir(), '%s.png' % file_id[:8]))

  timestamp = str(datetime.timedelta(seconds=THUMBNAIL_VIDEO_FRAME_TIME))

  try:
    out = '--o="' + tmp_path + '"'
    cmd = subprocess.Popen('mpv --no-audio --start=0' + timestamp + ' --frames=1 "' + inner_path + '" ' + out, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)
    out, err = cmd.communicate()
    if (cmd.returncode != 0):
      logger.error("conversion failed")
      try:
        os.unlink(tmp_path)
      except Exception as e:
        logger.debug("no tmp files to delete")
  except Exception as e:
    logger.error(e)
    return (False, 500)

  t2 = timeit.default_timer()
  logger.debug('Create thumbnail of [%s](size: %s) takes: %s' % (path, file_size, (t2 - t1)))

  try:
    ret = _create_thumbnail_common(tmp_path, thumbnail_file, size)
    try:
      os.unlink(buffer_path)
    except Exception as e:
      logger.error("no buffer to delete")
    try:
      os.unlink(tmp_path)
    except Exception as e:
      logger.error("no tmp to delete")
    return ret
  except Exception as e:
    logger.error(e)
    try:
      os.unlink(buffer_path)
    except Exception as e:
      logger.error("no buffer to delete")
    try:
      os.unlink(tmp_path)
    except Exception as e:
      logger.error("no tmp to delete")
    return (False, 500)

The code I wrote above uses subprocess.Popen() instead of subprocess.run() because for some reason the same command does not work correctly on subprocess.run(). Don’t ask me why, I’m a PHP/C# guy.

With this, image previews are now working correctly. This solves a bug that existed for years (I would know because every time I wanted to try Seafile out I dropped it because this didn’t work properly).

I would submit a pull request for this, although my code can probably use a bit more cleanup first.

But it works, and that’s what matters. I can finally tell what video a file is without having to play it first.