White paper
SOFiE allows you to securely share files (grouped into a package) with customers and suppliers. Using internal and external detection engines, it is able to block dangerous or unauthorized content.
Application architecture
Several components work in coordination to ensure the application’s function. These are both internal components (written by us) and external components (written by third parties).
The internal components are:
a web server that allows users to upload and download files and administrators to manage the application and its settings
“worker” process for running detection engines, sending e-mails, and other background tasks
“scheduler” process that starts regular tasks (checking time limits, expiration of packages, etc.)
The internal components are written in Java.
The external components are:
open-source HTTP server nginx acting as a reverse proxy for the web server
external detection engines such as ClamAV, Eset, Kaspersky, FortiSandbox and others (not included in the standard delivery of the application)
PostgreSQL database for data storage
Apache Kafka server serving as a message broker (message queue)
The application is optimized for OS RedHat Enterprise Linux (RHEL)/Rocky Linux/other compatible clone version 7, 8 or 9. The default installation is only available for these systems.
Web server
In the default installation, it listens on the loopback interface on TCP port 9090. It is accessible from the network through the nginx server. It does not provide directly secured HTTPS communication. This is the responsibility of the nginx server.
The front-end web application is written in TypeScript using the React library. Communication with the back-end part of the application takes place using HTTPS. User authentication is handled by a JWT token created after login and stored in an HTTP cookie.
The security of the application is regularly checked according to the recommendations of the OWASP Top Ten project (https://owasp.org/www-project-top-ten/).
“Worker” process
This is a stand-alone process that does not listen on any port. It is in charge of running long-lasting tasks in the background, such as running detection engines, sending e-mails, regular maintenance tasks, etc. The work that the process is to perform is obtained from the Apache Kafka server, to which the tasks are assigned to it by the web server and the “scheduler” process.
“Scheduler” process
This is a stand-alone process that does not listen on any port. The process runs regular tasks necessary for the correct function of the application (checking time limits, expiration of packages, etc.) – tasks run within the “worker” process. The “scheduler” passes them to it through the Apache Kafka server.
Nginx server
Nginx acts as a reverse proxy for the web server. It enables TLS (HTTPS) offloading, in the default installation using the Let’s Encrypt certificate authority.
Deployment example
Firewall requirements
The web server needs the following access from the Internet:
incoming TCP port 443 for the web interface of users and administrators
incoming TCP port 80 if Let’s Encrypt certificate authority is used in the default installation
The "Worker" process needs:
outgoing TCP port for SMTP server (according to the current application configuration)
outgoing UDP/TCP ports for the needs of external detection engines (e.g. for updating virus databases, etc.)
outgoing TCP port 443 to the server https://license.sonpo.io/ for checking the validity of the license
Data storage and security
Data are stored in the directory path according to the application settings. (The default directory is /var/sofie/data.) The use of remote storage is possible with the appropriate server configuration (mounted iSCSI volume, NFS volume, etc.).
Data transfer between the web browser and the server is protected by HTTPS.
The application does not support end-to-end encryption and in a strict form, in which the server would never see the data, it can never support it – the server needs to see the content of the package in order to check it with detection engines. But encryption at rest is supported. Details about encryption are available here: Data encryption in the SOFiE application.
Scalability and high availability
With very high performance requirements, it is possible to horizontally (by increasing the number of servers) scale the web server, the "worker" process, the nginx server and the Apache Kafka server. Other components can only be scaled vertically (by increasing server performance).
Vertical scaling is easier to maintain than horizontal and is therefore recommended.
High availability must be solved individually according to the conditions of the specific deployment of the application.