Karl Dubost: Saving Webcompat images as a microservice |
Update: You may want to fast forward to the latest part… of this blog post. (Head explodes).
Thinking out loud on separating our images into a separate service. The initial goal was to push the images to the cloud, but I think we could probably have a first step. We could keep the images on our server, but instead of the current save
, we could send them to another service, let say upload.webcompat.com
with a HTTP PUT
. And this service would save them locally.
That way it would allow us two things:
All of this is mainly thinking for now.
config/environment.py
defines:
UPLOADS_DEFAULT_DEST = os.environ.get('PROD_UPLOADS_DEFAULT_DEST') UPLOADS_DEFAULT_URL = os.environ.get('PROD_UPLOADS_DEFAULT_URL')
The maximum limit for images is defined in __init__.py
Currently in views.py, there is a route for localhost upload.
# set limit of 5.5MB for file uploads # in practice, this is ~4MB (5.5 / 1.37) # after the data URI is saved to disk app.config['MAX_CONTENT_LENGTH'] = 5.5 * 1024 * 1024
The localhost part would probably not changed much. This is just for reading the images URL.
if app.config['LOCALHOST']: @app.route('/uploads/') def download_file(filename): """Route just for local environments to send uploaded images. In production, nginx handles this without needing to touch the Python app. """ return send_from_directory( app.config['UPLOADS_DEFAULT_DEST'], filename)
then the api for uploads is defined in api/uploads.py
This is where the production route is defined.
@uploads.route('/', methods=['POST']) def upload(): '''Endpoint to upload an image. If the image asset passes validation, it's saved as: UPLOADS_DEFAULT_DEST + /year/month/random-uuid.ext Returns a JSON string that contains the filename and url. ''' … # cut some stuff. try: upload = Upload(imagedata) upload.save() data = { 'filename': upload.get_filename(upload.image_path), 'url': upload.get_url(upload.image_path), 'thumb_url': upload.get_url(upload.thumb_path) } return (json.dumps(data), 201, {'content-type': JSON_MIME}) except (TypeError, IOError): abort(415) except RequestEntityTooLarge: abort(413)
upload.save
is basically where we should replace this by an HTTP PUT
to a micro service.
In these musings, I wonder if we could mimick the way Amazon S3 operates at a very high level. No need to replicate everything. We just need to save some bytes into a folder structure.
boto 3 has a documentation for uploading files.
def upload_file(file_name, bucket, object_name=None): """Upload a file to an S3 bucket :param file_name: File to upload :param bucket: Bucket to upload to :param object_name: S3 object name. If not specified then file_name is used :return: True if file was uploaded, else False """ # If S3 object_name was not specified, use file_name if object_name is None: object_name = file_name # Upload the file s3_client = boto3.client('s3') try: response = s3_client.upload_file(file_name, bucket, object_name) except ClientError as e: logging.error(e) return False return True
We could keep the image validation on the size of webcompat.com, but then the naming and checking is done. We can save this to a service the same way aws is doing.
So our priviledged service could accept images and save them locally in the same folder structure a separate flask structure. And later on, we could adjust it to use S3.
I just found out that each time you put an image in an issue or a comment. GitHub is making a private copy of this image. Not sure if it's borderline with regards to property.
If you enter:

Then it creates this markup.
<p><a target="_blank" rel="noopener noreferrer" href="https://camo.githubusercontent.com/a285646de4a7c3b3cdd3e82d599e46607df8d3cc/687474703a2f2f7777772e6c612d6772616e67652e6e65742f323031392f30312f30312f323533352d6d6973657265"><img src="https://camo.githubusercontent.com/a285646de4a7c3b3cdd3e82d599e46607df8d3cc/687474703a2f2f7777772e6c612d6772616e67652e6e65742f323031392f30312f30312f323533352d6d6973657265" alt="I'm root" data-canonical-src="http://www.la-grange.net/2019/01/01/2535-misere" style="max-width:100%;">span>a>span>p>
And we can notice that the img src
is pointing to… GitHub?
I checked in my server logs to be sure. And I found…
140.82.115.251 - - [20/Nov/2019:06:44:54 +0000] "GET /2019/01/01/2535-misere HTTP/1.1" 200 62673 "-" "github-camo (876de43e)"
That will seriously challenge the OKR for this quarter.
Update: 2019-11-21 So I tried to decipher what was really happening. It seems GitHub acts as a proxy using camo, but still has a caching system keeping a real copy of the images, instead of just a proxy. And this can become a problem in the context of webcompat.com.
Early on, we had added s3.amazonaws.com to our connect-src since we had uses that were making requests to https://s3.amazonaws.com/github-cloud. However, this effectively opened up our connect-src to any Amazon S3 bucket. We refactored our URL generation and switched all call sites and our connect-src to use https://github-cloud.s3.amazonaws.com to reference our bucket.
GitHub is hosting the images on Amazon S3.
Otsukare!
http://www.otsukare.info/2019/11/20/saving-images-microservices
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |