I think what I'm missing most from the documentation is something that the team must have drawn up before they ever started work on developing this project: a network map of how they expect this software to be installed and/or run, and what rough lines of communication they expect to occur between each server and each piece of software.
I realize that the team wants to be as neutral as possible so that each news outlet can use the setup that works best for them, but there are some standards that exist, and there had to be an idea of what would or wouldn't be supported before Superdesk ever left the whiteboard.
A VERY simple network map of how things are expected to be run (perhaps 2-3 network maps, if numerous options exist) would be extremely helpful to users who are considering Superdesk. A simple drawing could answer most overview questions. (Questions I currently can't find a simple answer to anywhere.)
Questions such as:
What is the expected network layout for Superdesk use? I think it's clear that one (and only one) Superdesk Server is needed to be the boss of everything for an organization. And the server uses mongodb, so it's probably safe to assume that the database should be running on the same server, and should not be on a separate machine. Right off the bat, we just said no Amazon ElasticBeanstalk and RDS (which doesn't support mongodb). No problem. Internally managed and hosted servers are available options still, as is an Amazon EC2 instance (in theory, not that I can get it to work). Amazon Lightsail is also a potential for this server still (although, again, I haven't found a way to get Superdesk to install even on a clear Ubunutu installation on a Lightsail virtual machine). Then, assuming you have Superdesk Server up and running somewhere that is both secure and stable, each user of Superdesk (editors, reporters, admins, etc) can each have their own machines with Superdesk Client running on them. (Or is Superdesk Client expected to only run on one machine, the same server that's running Superdesk Server? I honestly can't tell. See what I mean about needed a simple network map?) If I'm right about multiple Superdesk Clients, then the network decisions now include whether to allow external-to-the-company access to Superdesk Server by the Superdesk Clients, or whether access is limited to only those inside the corporate network (whether physically inside or anywhere in the world via VPN). Those are normal network security decisions, not a big deal. At this point a simple list of all ports on the Superdesk Server that need to be accessible from non-localhost would be valuable, as security is the
#1 concern, before even "is it useful?".
Now we're to the big one: the web server. There appears to be an attempt to be web server software-neutral, and to let people use their own web software, with Superdesk just managing the content and delivery internally. It would be nice if there was a list of recommended web server software (such as Newscoop, which sadly is no longer supported), but I understand that many companies may prefer to build this in-house. Superdesk advertises that it's API-driven, so one could theoretically store no data on the web server, and have it request everything from Superdesk Server every time it attempts to display a web page. Now, that's fine in theory, but obviously that's not actually a viable option, since the way Superdesk Server appears to have been designed, there's one and only one Superdesk Server, and it won't handle the load if a ton of API requests come in at once. Load balancers are a must on today's Internet, as is the ability to have more than one load-balanced server handling requests. There's no way this development team completely ignored reality and created a news CMS that can't handle having an article link make it to the front page of reddit, so they must have planned for load balancers and multiple instances of the web server. I just can't figure out where.
I assume that means a web server must be built by each news organization that can both display AND store data (e.g. articles and media), and that all data will be pushed to it by the Superdesk Server. Most of us don't want to pay for all the tech people required to manage that many servers and that much software, and so the popular thing today when creating a reliable web server (that can handle occasional big-hits when a story goes viral, and won't crash just because there's a link to them from reddit) is to have the web server running on an Amazon EC2 or ElasticBeanstalk instance that can automatically grow as needed, with load balancers in multiple physical locations, have the data stored in an Amazon RDS (database) instance, and have any non-database media (e.g. any videos and photos not stored in the database) stored in an Amazon S3 instance. You can't store ANY data and the web server software on the same machine, since then multiple instances would mean data would get un-synced and all your processes would quickly break down.
Is that what you guys have envisioned? I'm a bit worried that it's not, since Superdesk Web Publisher appears to have been built to work with exactly the type of network map I described earlier, one that won't support a lot of traffic, one where the web server doesn't store the data and just queries everything from a single solitary instance of Superdesk Server. If users need to create their own web server software that will store any/all data pushed from Superdesk Server, don't they also need a list of all fields that might exist, so their database is prepared for all eventualities? Is this data list somewhere easy to find, along with a list of all externally-accessible ports that are required for Superdesk Server? Am I completely misunderstanding how you expect this to be installed? (Again, a simple network map or two would solve such basic high-level questions for me and all other potential users.) One cannot be expected to install this software without such a high-level, very basic overview of what the network and software interactions should look like. Any help you guys can provide to me (and to anyone else who is trying to wrap their head around this) would be greatly appreciated. Remember, high-level and simple, there's no need to go into depth. (But if, somehow, the drawing on the whiteboard doesn't include a place for load balancers, then I'm pretty sure you've made a fundamental error, and your software can never be used by any 'real' organization, which would be really sad.) I'm sure you guys thought of this stuff before you spent the months required to write all this code, I just can't see the 'big picture' that you're envisioning, and I can't find a simple overview of it anywhere.
Thanks!
*edit - formatting
*edit2 - If I was completely off base about all this, and you really did design Superdesk Server as a standalone server made to handle the load of multiple web hosts making non-stop API requests, what sort of load did you test for, on what type of network? How many requests per second before things start to bog down?
Thanks!