Routing Storage Servers for Upload

siddharth · Jun 12, 2021

Hi Team,

I believe the front web server is used for uploading to all the available web servers. So it is slowing down the uploads when people are uploading 1000 images. Since I own multiple file hosting script, all the script used to make use of the storage servers to upload.

But their storage server setup works a bit different as it requires PHP files, so we will be uploading the same set of files to the storage server as the main website.

So people come to the main website and then they choose the image and image information is passed to storage server via the front end web servers and they take care of the rest, this will result in significant upload speed increase. And this is the optimization being applied to the Chevereto script and it is not a feature.

I owned a file host which had more than 1PB of files and this idea is to make Chevereto better.

Thanks

Rodolfo · Jun 12, 2021

I don't understand.

Also, Chevereto won't store 1000 uploads to one server, it always distribute the images among all your enabled storages.

siddharth · Jun 12, 2021

Rodolfo said:
I don't understand.

Also, Chevereto won't store 1000 uploads to one server, it always distribute the images among all your enabled storages.

It is not about storing, it is about uploading. When I upload 1000 images on the front end, do the front end web servers handles the upload or storage servers are handling the upload.

I will give an example, consider front end is using the port of 100mbits, all the storage servers are 10gbits and we have 10 number of storage servers.

Now if someone uploads 100GB of images around 100000 image, if you use a front end webserver to route the images by temporary upload to them and then moving to the storage server will take huge time. Whereas if you just directly initiate the connection to all available storage servers, then the upload will be done in seconds if the client is also having 10gbits, as all his images directly hit all the available storage server and not routed to the web server first and then to storage servers.

This is how the file-hosting script works. And so they can make use of many options like geo-based upload and etc.

If you need more explanation let me know.

Rodolfo · Jun 12, 2021

If you need more application load then use it with php-fpm and spawn N servers for it. You could also use a load balancer.

siddharth · Jun 12, 2021

Rodolfo said:
Good luck then, what could I do know about Chevereto. 😉

I apologize, maybe my understandings are wrong. I thought of providing an improvement if we don't have that in chevereto.

I will try to explain with more example, consider we are using a front end web server of 10mbits now, which means it can upload now only at the speed of 1MB/s or less.

But now I have more than 10 storage servers of 1gbits speed, which means it can receive upload at a speed of 100MB/s on each server.

Now if I upload 1000MB of images.

Scenario 1: it will take 1000 seconds as we are using front end web server and then storage servers.
Scenario 2: it will take 10 seconds as we are using a storage server even now.

Please let me know which model chevereto is using.

siddharth · Jun 12, 2021

Ok, I have verified myself, Chevereto is using Scenario 1 as I can see downstream and upstream are showing status in web server when images are being uploaded.

So for above case, it will take 1000 seconds to upload.

Rodolfo · Jun 12, 2021

Is better (and cheaper) to load balance and go with the cheaper storage providers.

Anytime you distribute application load on remote machines you are wasting money because you assume that all these machines will be working in efficiency, which is not. Also, anything needed with those (sync, update, etc) is more painful to manage just because the thing isn't physically there.

siddharth · Jun 12, 2021

I believe this is not a bad idea because file hosting scripts needs to be more superior than the image hosting script, because they handle peta bytes of traffic daily or even more.

As I own both xfs and yetishare, both of them use the same formula which I told above in order to load balance else it will fail as the web servers will overload with uploaders using 10gbits connection from the RDP.

Now it applies the same over here in our chevereto. All these days, image host is used by all the uploaders who share content on forums, now they are using RDP to upload in this case they will be having a 10gbits connection and they will consume all the front end server bandwidth. And all the storage server bandwidth are being wasted over here.

Even though having 100s of storage servers, we are here limited by 1 front end server upload bandwidth which is huge limitation.

Advantages:

1) You won't be limited by 5 upload threads as like now
2) You won't be limited by the front end web server bandwidth
3) Client will see a huge improvement in their upload time
4) Conversion rate will be more higher due to less upload time
5) Front end don't require to be more powerful as storage servers comes with smaller CPU and also they can be used

Rodolfo · Jun 12, 2021

Network and storage are just resources. You can deliver those in many different configurations, is just that you want to keep using what you know, but that drives to Chevereto the load of offering such layer. Show me what I have to work with, is there an API or standard? Do I need to make everything from scratch or there's a base to work with?

Can I assume that if you need to store that massive amount of images you are willing to sponsor the development of this RFC?

siddharth · Jun 12, 2021

Rodolfo said:
Network and storage are just resources. You can deliver those in many different configurations, is just that you want to keep using what you know, but that drives to Chevereto the load of offering such layer. Show me what I have to work with, is there an API or standard? Do I need to make everything from scratch or there's a base to work with?

Can I assume that if you need to store that massive amount of images you are willing to sponsor the development of this RFC?

Agree that there are many different ways to achieve what I said, I can use LB and having many small web servers can solve my issue but generally, storage servers come with lots of incoming and outgoing traffic and web servers are not. So we don't have to waste the web servers incoming traffic.

Since you are going to work for V4, I am just sharing pointers to make the script much better. I really don't want to push my ideas. I just shared this since it feels better to have on this script.

siddharth · Jun 13, 2021

Update: I tried adding more web server and in front added the LB, still we will be hitting the same bandwidth limitation, as upload speed depends on LB port speed. So now again all traffic goes to LB and then to Web Server. We are just loading balancing the CPU and other resources, not the Bandwidth, so unless we have a load balancer with 10gbits speed, it will be of no use if we want to gain huge upload bandwidth.

I do have other ideas, I will try.

siddharth · Jun 13, 2021

Update: Most people here are not aware of what we are discussing or debating. Unless they owned or handled high traffic sites (TB of traffic daily) they never come across this issue, it is the future problem for all of them who are working hard to get their host to top position.

You can load balance the CPU, Memory and all others using the traditional method. But bandwidth you cannot by using traditional methods. It is good if Chevereto do this

Solution: Use an API/call on the main site that checks which server is closest or has the least load, then send it back as a URL to POST to as a secondary call.

The upload is the longest part, so make all the other parts fast (application is handled by LB, then issues an upload token) and the storage server simply validates the signed upload token and accepts a file upload directly)

We are not giving any stress to Storage Servers as the application will be handled at the Web Servers and we are just going to accept the uploads only at the storage server level. And we can easily load balance the CPU and other resources via load balancer if the Web Server is overloaded.

mkerala · Jun 13, 2021

In my experience, my front end webserver has never came under too much load and I got a pretty big site. Bottle neck has always been storage server and DB. Few years back I also thought it was front end server that was under load, but this is not true most times. My front end server has just 4GB Memory and can handle 10K-30K uploads every day.

In case of storage server, deploy multiple servers to distribute the load. This is much cheaper than using a LB. For bandwidth use nginx rules to limit bandwidth to 5mbps after serving 5MB per connection. If your site is big the choke point is going to be disk I/O on the storage server. If it hits the limit upload fails.

siddharth · Jun 13, 2021

mkerala said:
In my experience, my front end webserver has never came under too much load and I got a pretty big site. Bottle neck has always been storage server and DB. Few years back I also thought it was front end server that was under load, but this is not true most times. My front end server has just 4GB Memory and can handle 10K-30K uploads every day.

In case of storage server, deploy multiple servers to distribute the load. This is much cheaper than using a LB. For bandwidth use nginx rules to limit bandwidth to 5mbps after serving 5MB per connection. If your site is big the choke point is going to be disk I/O on the storage server. If it hits the limit upload fails.

Thanks for responding. That is applicable if the front end webservers are having a 1gbps port. If the front servers are going to have only 100mbits, then it is going to be a bottleneck as the maximum upload speed will be 10MB/s. And all the uploads are happening from the web to storage servers, people have to wait if there are 100s of people are uploading at the same time.

And all the IO issue can be solved using NVMe and RAID as I used to build clusters for many web portal and also I have gathered experience by owning one of the biggest file host which ranked under 2K in Alexa

mkerala · Jun 13, 2021

siddharthramakrishnan said:
Thanks for responding. That is applicable if the front end webservers are having a 1gbps port. If the front servers are going to have only 100mbits, then it is going to be a bottleneck as the maximum upload speed will be 10MB/s. And all the uploads are happening from the web to storage servers, people have to wait if there are 100s of people are uploading at the same time.

And all the IO issue can be solved using NVMe and RAID as I used to build clusters for many web portal and also I have gathered experience by owning one of the biggest file host which ranked under 2K in Alexa

I think you are comparing file hosting to image hosting. I allow image of size up to 100MB. Still my webserver usage is around 20mbps. Most images are less than 2MB and won't take that much bandwidth to choke 100mbps connection. Also, you can connect to your storage servers via a private network.

NVMEs are very very expensive to use for a storage server. RAID can help to some extend, but when you have millions of files it can also hit I/O limits.

siddharth · Jun 18, 2021

mkerala said:
I allow image of size up to 100MB. Still my webserver usage is around 20mbps.

When 1000s people are started uploading their 1000s of photos, web server port usage will go high and the users will experience slow upload speed as all the uploads are following this

client -> webserver -> multiple storage servers

But we can easily avoid this if we put storage servers in front and so the client doesn't see any performance degradation as ports of the upload servers are being used now.

Even if you follow the below steps

client -> LB -> multiple web servers -> multiple storage servers

Again all the traffic needs to go through LB and you need LB of huge port size, again storage servers are not useful as their incoming ports are idle and used when web servers are upload files to them.

mkerala · Jun 18, 2021

siddharthramakrishnan said:
When 1000s people are started uploading their 1000s of photos, web server port usage will go high and the users will experience slow upload speed as all the uploads are following this

I haven't reached that stage yet. I got only 100s of users uploading 1000s of image and its pretty stable.

In my experience the issue you are going to face will be on storage server ability to handle those 1000s of upload vs millions of hotlinking requests. Webserver just need to deal with just a one-time upload only. Storage server has to deal with the constant request for millions of images stored on it. At least this is what I am facing now.

Coming back to the original topic, I think you want to deploy multiple front webservers to handle the increased image upload load. I think this should be already possible. The webserver only has static files, so you should be able to deploy multiple front end webservers to distribute the load via a LB.

Also, for storage servers, if you add and enable multiple storage servers, Chevereto will distribute the uploaded images equally to all storage servers enabled.

siddharth · Jun 18, 2021

mkerala said:
Coming back to the original topic, I think you want to deploy multiple front webservers to handle the increased image upload load. I think this should be already possible. The webserver only has static files, so you should be able to deploy multiple front end webservers to distribute the load via a LB.

This, I already pointed here

siddharthramakrishnan said:
Even if you follow the below steps

client -> LB -> multiple web servers -> multiple storage servers

Again all the traffic needs to go through LB and you need LB of huge port size, again storage servers are not useful as their incoming ports are idle and used when web servers are upload files to them.

Bottleneck over here will be LB again, give it a try you will know. You need huge port LB to tackle the issue.

So unless the script points the upload to the storage servers for direct upload in round-robin or least connection, it will be an issue if the site receives too much traffic.

If the site is too big, obviously I will implement this solution myself as I profit from it.

mkerala said:
In my experience the issue you are going to face will be on storage server ability to handle those 1000s of upload vs millions of hotlinking requests. Webserver just need to deal with just a one-time upload only. Storage server has to deal with the constant request for millions of images stored on it. At least this is what I am facing now.

I can easily handle the millions of hotlink request as I know how to do. And I have gone through your DNS records, I can see there are lots of improvement in it.

siddharth · Jun 18, 2021

@Rodolfo If you think that I am trying to get this feature for free, really not. I am just sharing my ideas to make the script better. I love providing ideas.

If this is something I want to get myself done, I will customize myself with a freelancer when required as I did with the tons of script I owned from codecanyon.

I can give tons of examples of how this fasten the upload and reduce the hardware cost and make the site more stable.

Rodolfo · Jun 18, 2021

The problem with this RFC is that it seeks for a way to provide support for the cheapest possible storage server solutions, that's why you mention the network connectivity limitation, because you want to use home grade machines with 100mb wiring.

For that kind of context, your solution is possible the only way to make it happen. The question is... Do we need it? And the answer is no, we don't need it. There are now more modern services and solutions for this. The only reason why someone will need this is if you already have that infra.

By the way, at RFC we don't ask for how to implement something, we only want to know what's needed.

Routing Storage Servers for Upload

💖 Chevereto Fan

👑 Chevereto Godlike

💖 Chevereto Fan

👑 Chevereto Godlike

💖 Chevereto Fan

💖 Chevereto Fan

👑 Chevereto Godlike

💖 Chevereto Fan

👑 Chevereto Godlike

💖 Chevereto Fan

💖 Chevereto Fan

💖 Chevereto Fan

👽 Chevereto Freak

💖 Chevereto Fan

👽 Chevereto Freak

💖 Chevereto Fan

👽 Chevereto Freak

💖 Chevereto Fan

💖 Chevereto Fan

👑 Chevereto Godlike