Building the next-gen Chevereto (old topic)

Rodolfo · Jul 13, 2020

Parker821 said:
Saw some of the things your planning for 4.0 and I'm super excited! Definitely going to upgrade as soon as it's released.

Thank you!

I haven't been able to commit any updates on V4 as at this very moment I'm designing how the application will actually run. Internally, the new system is simple as input > plugins > output, the actual caveat is that in this new thing everything is distributed and tuned for performance. The boost is just insane, going from ~700 req/s up to 12K... In the very same machine, with the same code.

To be honest, I wasn't expecting such huge performance. Combined with the unique flexibility of this thing and its cleanliness, I'm sure that V4 will be all what I ever wanted for Chevereto. Add on that that I want to create a lot of other software, I just can't wait to start shipping new stuff for you guys.

Rodolfo · Jul 20, 2020

2020-07-20

Hey there,

The documenting for reference is complete and all the public reference is now available!

It took many systems and time to get there, in fact, I had to build entirely the system that parses the code and presents it in a human-readable format, with support for searching and all the neat stuff that you can see. 👍🏾 It was very important to have this reference available, as it sets the framework standard and it will be used everyday by developers wanting to work with Chevereto.

It took longer than expected, but the result is just next level and I'm very happy with it. You will find a hard time trying to find another reference of this level anywhere else because it is all in-house development.

Also docs, the system I built for this is just awesome stuff to work with. I've to keep documenting the remaining components and to fix the old stuff I wrote previously, but it shouldn't be that complicated as is just describe the how-to, I don't have to write a parser or anything. As the reference is now complete, the documenting is even easier from now on and it just flows naturally.

Code coverage has been pushed to 100%, and I've fixed many odd stuff that slipped by. At some point I realized that I was actually removing features so that's how I can tell that the hardest part of the job is already done. At this point, the framework is usable (and kind of safe) for production, but I will only now how good it is until I start using it for real.

I will start crafting V4 public API within this week, from there, I will keep pushing updates to docs and examples powered by the work on V4. The designing approach that I'm taking for V4 is centered in the API user, as the public API is what will define the application, it sounds more than reasonable to center the design around it.

Most of the application logic already exists (V3, remember?) but the code not only has to be adapted for the new V4 runtime logic, it must be deeply reviewed and refactored. At some point this process will became very mechanic and I will be doing it at a very insane speed, but is just too much code, it needs time.

I will keep you guys updated, thanks for the interest in the development process.

dinhphucv · Jul 23, 2020

@Rodolfo do you have any estimate for v4 to be released?

Rodolfo · Jul 23, 2020

dinhphucv said:
@Rodolfo do you have any estimate for v4 to be released?

Seems you missed that part.

Chevereto 13th anniversary

chevereto.com

ETA

Based in my horrible way to measure time, I could say that "it will take a while" to have it ready for being used in real projects. I guess that the time needed only to complete the replacement of V3 to V4 could take up to 24 months. Of course, this is accounting all what V3 currently does.

The base edition, similar to the current Chevereto-Free feature set, should be available within 12 months being very optimistical and remembering that is just the basic feature set.

No surprises here, it is a 13 year old project, is quite big for one person.

Please note that I'm assuming zero collaboration in my timing measurements. It will be ready way faster if I more people can help me out!

Rodolfo · Jul 23, 2020

Hey guys, I just introduced the framework.

https://rodolfo.is/2020/07/23/hello-chevere/

Rodolfo · Jul 31, 2020

2020-07-31

Another week in development and I've great news as I have already designed the base application model 🎉 and it will be based on message queue. This implementation will allow to separate each process as a collection of steps.

The current process in V3 to upload an image into your account is a long procedure that starts checking the login cookie, the image integrity/validity, checks for dupes, generate medium and thumb sized versions, calculate stats, update database and then it replies back. All these processes are hard wired directly in the application code, making impossible to fine tune the control flow. As you may imagine, the process in V3 is sequential and it happens in just one channel and it is very annoying to bring alterations or any improvement really in these instructions. Did you tried? Share your experience so I can make a point here.

V4 is different as it introduces the usage of workflows to define the application procedures. A workflow is a collection of orchestated steps, which are atomic functions that perform a single task. In V4, the process to upload an image to your account is a workflow, and stuff like the duplicate check is one of its steps.

A workflow works similar to a pipeline, and you can pass values from previous steps. It also supports to add steps at a named position with the intention of allowing third-party plugins to add more proceses. For example, you may want to add checking the image hash before doing anything and a plugin could provide such functionality.

It also allows to deliver a more reliable application as if something fails, the worflow can be resumed at any step. Perhaps B2 breaks (like it never happens) so rather than just throwing an error and fail, the workflow run will be avaibale for inspection and to resume the process once the incident have been addressed.

This application model is perfect for extending the software as it will really allow us to have very deep customized applications. It is also safer to run, to monitor, etc.

Hope you like the heads up!

Rodolfo · Aug 14, 2020

2020-08-14

This week I've worked very little in the thing, I'm not in the mood so rather than doing a bad job I simply try to organize my head before messing with it. I guess that after such a long time I just got tired.

Anyway, what I'm planning now is a layer for programmatically add conditions, limits, options, etc. The idea is to pack these values under the concept of permissions and that plugins can be added to add more keys, like the limit for daily uploads or how many people an user can follow per day.

For example, let's say that you want to monetize your service by implementing an user configurable watermark. The user can provide own branding, so you save all the hassle of having to watermark these images. A plugin cloud provide the condition "@vendorName-user-watermark" which returns true/false depending in the permissions for the user. Another value, declared like "@vendorName-thumb-filename", will contain the path for the logo that the user uploaded and want to use as watermark. This value is stored after the user provides the watermark logo from account settings.

The way it works is by defining a plugin that adds a step in the workflow, in this case, the plugin should determine if the user can customize the watermark and if so, provide the watermark logo file for the workflow run. The action associated declares the values needs to be used, like "@vendorName-user-watermark" so it panics when the value is missing when calling the action.

This level of abstraction allows to prepare the application for every possible need in the future, which means that we can go very wild in the scope of what the software does.

I hope to work on this for the next week.

mkerala · Aug 26, 2020

@Rodolfo Thanks for the regular update on V4 development and really excited to see the progress especially performance improvements.

I know its too early, but is there any changes to how the database will work for V4. I think this is one area I have struggled a lot with Chevereto. As the database grows it tends to slow down the site especially during search operations. Currently, I load the entire database to RAM with InnoDB buffer and that's the only way I can keep it running smoothly.

Rodolfo · Aug 26, 2020

mkerala2 said:
How the database will work for V4.

It will be a MySQL database, just as V3, but with changes aiming for modularity. For example, in V3 the user table includes several columns like real name, email, bio, URL, etc. All those will be now on its own table, so there will be an user_url table, user_bio, etc. Not only is the right way to use a relational database, it allows me to safely add plugin based data to the database.

Considering that V3 has fat tables, the above should increase the reading performance as the queries will only use and return the data needed. Writing shouldn't be affected.

There are other changes, like the implementation of an application cache for stuff like system settings, options and permissions. Several data will be cached for runtime which is different from V3 which doesn't cache at all so the database should be less stressed.

mkerala · Aug 26, 2020

Rodolfo said:
It will be a MySQL database, just as V3, but with changes aiming for modularity. For example, in V3 the user table includes several columns like real name, email, bio, URL, etc. All those will be now on its own table, so there will be an user_url table, user_bio, etc. Not only is the right way to use a relational database, it allows me to safely add plugin based data to the database.

Considering that V3 has fat tables, the above should increase the reading performance as the queries will only use and return the data needed. Writing shouldn't be affected.

There are other changes, like the implementation of an application cache for stuff like system settings, options and permissions. Several data will be cached for runtime which is different from V3 which doesn't cache at all so the database should be less stressed.

Cool. Looking forward to this.

Yeah, database is very heavy in V3 especially when you have millions of images. My database is around 8GB and search operation spanning multiple table locks the tables for a few seconds and site freezes.

Rodolfo · Sep 14, 2020

2020-09-14

Hey there,

As some may have noticed, for the previous month I've been focused in improving V3. Plenty stuff is now improved like the documentation, now we also got moderation and I've announced 3.17 with support for more login providers and address the background job processing, which should greatly improve performance and reliability for the time being. I know that ideally I should be working 100% on V4, but there are some critical needs in V3 that I'm happy to add long as it make the software better and safer.

Work on V4 has been in the background, precisely in bug tracking and just the last week I fixed ~60 issues detected in the base framework that I'm building. However, I still don't came up with a permissions architecture for V4 because I'm still taking in consideration more use cases, and while I keep extending the application scope, I'm also adding more and more usage cases so is like I build a model, check cases until I found a bump and I've to re-design. It is a very hard work indeed, can't wait to get back to coding which flows way faster. The process is slow, but I want to be sure on the kind of systems that this will allow to create.

As money making is what drives V4, I've already spotted the need of a multi-layered permissions system so we can go really wild on how we will be able to monetize. This goes from the basic "pay to use" up to users selling each other content subscriptions. This is harder than I expected, but it is a really nice challenge that I hope to accomplish sooner or later.

rdn · Sep 14, 2020

From what I read, v4 will require docker setup?

Rodolfo · Sep 14, 2020

rdn said:
From what I read, v4 will require docker setup?

Not sure yet, I prefer to use the term "container" because Docker is just one of many technologies for it. I believe that it will be always possible to root install it, but it will be way easier to run it from containers.

rdn · Sep 15, 2020

Ops, not all people wants or love working with containers :|
I still prefer bare bone server, less services much better to manage.

CentOS / Ubuntu
Apache / Nginx
PHP
Mysql / MariaDB

Will do it perfectly and simple.

Adding container to the server stacks just add more complexity, overhead and more maintenance.
Not sure of the benefit of those services.

Rodolfo · Sep 15, 2020

rdn said:
Ops, not all people wants or love working with containers :|

I don't understand why you care, the root install will be always a thing. I don't understand why it troubles you that I provision a container that you won't touch.

rdn · Sep 15, 2020

Ops sorry.
I just want to ask if it will be required.

rdn said:
v4 will require docker setup?

Good News root install will remain, peace 😘.

Rodolfo · Sep 20, 2020

2020-09-20

Hey there,

This time I want to update you on the stack improvements for V4. I hope to be able to explain the series of changes and challenges that I'm facing now and where the project is going forward.

Application

As I've mentioned in some posts already, the application code is completely about to be re-made under a drop-in pluggable architecture. This means that even core components will be allowed to be swapped for ultimate customization. There's more to it, here the list:

- Custom made PHP 8 framework (Chevere)
- Headless CMS
- Automatic API discovery (Interface description language)
- Uses Swoole (up to 40x faster application)
- Attribute-based access control (ABAC)

From that list, this is the first time that I address that access will be handled using attributes, which is the most modern way to deal with permissions. The concept is that actions are granted by attributes, which can be modified on the fly and subjects can be affected by environmental conditions. For example, this could allow to access a given album contents within business hours (just to frame a silly example here).

The goal with attributes is to be able to allow plenty flexibility for limiting what users can do, but also to limit how users will grant access to other users. For example, an user could define a followers tier based in certain rules or conditions that grants (or deny) access to a given content, but also to like, comment, on a content-based logic. This flexibility is great for money making as we will be able to fine tune what users will be able to do, and on top of that, users could subscribe to exactly what they need and no extras. For example, an influencer could use Chevereto to provide paid access to content and in that context, other users won't require an upload quota at all, so we can let users to build their own subscription packages, picking exactly what they need.

The above will allow users to build their own experience, and we will make way more profit because we will be offering an unprecedented flexibility.

Database

Many of my recent job is actually on the database re-design. For such process I've decided to keep using a relational database but there will be a lot of neat improvements:

- MySQL 8 / MariaDB 10 / Postgresql (only one of these)
- Normalized schema
- Replication (multi-server database)

I still don't pick my poison regarding which RDBS I will end up using, it will be nice to know your feedback on this matter based on your real life usage. How does these perform for you? In any case, I think that is safe to assume that V4 will use more than one database, so chances are that it will use a RDBS + extras to provide faster performance when fetching user relationships.

The schema in V4 will change to a normalized one, in which the existent tables will be re-distributed in many new small tables. Many new data types will be implemented like JSON for storing image Exif data.

Wrapping-up

For V3 I didn't take that much time to pick the elements that I mentioned in this post, and it caused a lot of technical debt. This time I'm being extremely careful on the stuff I pick, and I'm not framing myself just to these blog posts around the web telling you what to use. I'm investigating each technology on my own, reading tons of books and papers.

It is also extremely important that you share your thoughts on this topic as you account real-life usage so you have an insight that could be very important for the system that I'm building here, don't hesitate to provide your feedback.

Hope you like the heads up,
Rodolfo.

konj · Sep 20, 2020

Sir, you have done great work so far, but still at end when it comes to performance, it always depends how good the hardware and the server setup at end are, especially if you can afford yourself more servers for hosting the database in the cluster (galera 4+innodb) as example. I would like to see some benchmarks betwen MySQL 8 / MariaDB 10 (both tuned and optimized) with bigger database of Chevereto.

This is an analysis of the #percona benchmark article comparing MySQL & mariaDB performance with regards to SSD disks with NVMe vs SATA controllers, i have collected few links with changes, informations and so on:

Resources:
How MySQL and MariaDB Perform on NVMe Storage
How MySQL 8.0.21 and MariaDB 10.5.4 Perform in IO-Bound Scenarios on SATA SSD and NVMe Storage

It seems that MySQL 8 is performing better in the moment, because of this:
Evaluating Performance Improvements in MariaDB 10.5.5
(But i think that his is gonna be fixed very asap.)
Checkpointing in MySQL and MariaDB

MariaDB S3 Engine: Implementation and Benchmarking

What is MariaDB-ColumnStore?
Official MariaDB-ColumnStore Google Group (post linked)

ClickHouse (MySQL) and ColumnStore in the Star Schema Benchmark

MySQL 8 vs MariaDB 10.4

konj · Sep 20, 2020

In my opinion, i think that MariaDB it's better in long term, there are few another links:

Benchmark: MariaDB vs MySQL on Commodity Cloud Hardware: Amazon AWS EC2 and RDS, large to 12xlarge instances
What's New in MariaDB Enterprise Server 10.5

mkerala · Sep 21, 2020

I have been looking to go multi-server for a long time. The external storage for images made this possible to some extent, but still, the main server can't be put behind a load balancer. This is due to user images are always stored locally and adding additional front end server means I have to do some replication to sync these files.

I think this could be implemented if everything on the main servers remains static with all user files in external storage. Or if chevereto natively support multi-server with built in replication. Multi-server will be really useful to balance the load for heavy sites and scale up during a DDOS attack.

Building the next-gen Chevereto (old topic)

⭐ Chevereto Godlike

⭐ Chevereto Godlike

Chevereto Member

⭐ Chevereto Godlike

⭐ Chevereto Godlike

⭐ Chevereto Godlike

⭐ Chevereto Godlike

👽 Chevereto Freak

⭐ Chevereto Godlike

👽 Chevereto Freak

⭐ Chevereto Godlike

💖 Chevereto Fan

⭐ Chevereto Godlike

💖 Chevereto Fan

⭐ Chevereto Godlike

💖 Chevereto Fan

⭐ Chevereto Godlike

Chevereto Member

Chevereto Member

👽 Chevereto Freak