Launch Apache Superset using AWS


This blog post will explain the process of deploying the open-source BI platform Apache Superset on AWS infrastructure. We'll leverage EC2 instances to run Superset and RDS to store its metadata. But before diving into the deployment steps, let's first understand what Superset is and its capabilities.


Apache Superset is an open-source data visualization and business intelligence tool that allows users to explore and visualize their data through interactive dashboards and charts. It supports a wide variety of data sources and provides a user-friendly interface for creating complex visualizations without needing extensive programming knowledge. Superset is highly customizable and scalable, making it suitable for organizations of all sizes.




To run superset, we will need to setup a database for storing all metadata information and compute for running superset. For database we will use AWS RDS service and for compute we will use AWS EC2 service. 

First we will setup the database using AWS RDS Service. Follow below steps. 

  • Go to https://console.aws.amazon.com/ 
  • Select RDS Service from the Services list 
  • Create a database instance
  • For this example, 
    • Choose database creation method – Standard create 
    • Engine – PostgreSQL
    • Template – Free tier 
    • Credentials Management – Self managed (here you need to create password, its going to be master password) 
    • Under Connectivity, Public Access – Yes
  • Create database by keeping all other settings as default. 
Below are screenshots taken at different steps of creating database using AWS RDS. 

Open AWS Console 


Select RDS from Services list, You can search RDS using Search option. 




Create a DB Instance 



Select Configuration Parameters 



Select DB Credential Management - Self managed. 


Enable Public Access Option under connectivity 


Click "Create Database" Option. You have successfully launched database. 



Make a note of database host, database port, username and password. These details are required for superset configuration. 



We will setup EC2 instance. Follow below steps. Check the images below for reference 

  • Go to https://console.aws.amazon.com/
  • Search for EC2 service from the services list 
  • Specify the below configuration parameters while launching EC2 
    • Name - Superset (You can name anything you want) 
    • Application and OS Images – Amazon Linux, Amazon Linux 2023 AMI
    • Instance Type – t2.micro 
    • Under Network Settings, Select Allow HTTPS traffic from the internet, Allow HTTP traffic from the internet options (check the images below for reference) 
  • Launch EC2 instance 
  • We have to setup inbound rules on EC2 instance, We have to allow traffic from 8088 port because Apache Superset runs on 8088 port. 

Select EC2 service from list of Services from AWS Console 


Click on Launch Instance Option 



Add the details in required fields and launch the EC2 instance 



We have to change inbound rules in Security group, You can click on Instance ID and Select Security Tab and Select Security group. 







Save the Changes. 


With the infrastructure setup now in place, our next step involves installing Docker on EC2, pulling the Superset Docker image, configuring the essential parameters, and finally, launching Superset.


There are various ways to connect to EC2 instance. One convenient option I suggest for this task is utilizing EC2 Connect, which facilitates SSH connections to EC2 instances. Once connected to EC2, follow the steps outlined below.


Connect to EC2 instance using EC2 Connect - Once you click on Instance id from EC2 dashboard, You can see an option to Connect, Select the option. 



Once you connect, You will see the below screen/image. 




Installing docker 
  • sudo yum install docker - This command will download and install docker.
  • sudo service docker start - This command will start the Docker service, allowing you to use Docker commands to manage containers and containerized applications on your system.
  • sudo systemctl enable docker - This command is used to enable the Docker service to start automatically at boot time on systems. 


After completing above steps to Install docker. Now we will proceed to pull Apache Superset Image and Start the docker container. 


Pull and run superset docker image 
  • sudo docker run -d -p 8088:8088 --name superset apache/superset:3.0.0 – Docker will create a container based on the Apache Superset image version 3.0.0, run it in detached mode, map port 8088 on the host to port 8088 inside the container, and assign the name "superset" to the container. The container will run Apache Superset, making it accessible via port 8088 on your host system.


We have successfully downloaded the Apache Superset docker image and initiated to run the container. 

Creating superset_config.py file

The superset_config.py file in Apache Superset is a configuration file that allows you to customize various settings and behavior of your Superset instance. Using superset_config.py file, we can add database configuration, security configuration, Feature Flags, Cache Configuration, Logging Configuration, Custom extensions etc. 
For initial basic setup, we must configure below three and add it in superset_config.py file.  

  • SECRET_KEY - configuration variable used for cryptographic operations such as generating secure cookies, tokens, and other sensitive data. It's a secret key used for cryptographic signing, encryption, and session management. Do not share the SECRET_KEY with anyone.
You can generate secret code using the command “openssl rand -base64 42”. Enter in command shell. 



  • SQLALCHEMY_DATABASE_URI – configuration parameter used to specify the connection string for the metadata database that Superset uses to store its internal data, such as charts, dashboards, and user information.
Example - postgresql://username:password@hostname:port/database_name
We have created database using AWS RDS. By using those details, create the connection string using the details of DB launched using AWS RDS. 
  • TALISMAN_ENABLED - configuration parameter used to enable or disable Talisman, which is a security middleware for Flask applications. Talisman helps in securing web applications by setting various HTTP headers related to security.
Since we are launching superset in Test/dev mode, I will disable it by using command TALISMAN_ENABLED = False

The superset_config.py file should have the above mentioned three commands. I’m creating the file directly from command shell using vi editor.  Follow the below steps. You can check images for reference. 

  • vi superset_config.py 
  • enter into insert mode by typing “i”, this will enable insert mode in vi editor. 
  • Copy the three commands 
  • Save the file by pressing ESC button and type :wq!  and then press enter button. 




We have successfully created superset_config.py file. We have to copy the file into the superset container and restart the superset container. Follow below steps. 

For copying file to superset container, enter below command.  
sudo docker cp superset_config.py superset:/app/superset_config.py 





Restart the superset container - sudo docker restart superset

You can check health status of docker using sudo docker ps command, wait till the Status shows healthy.  

 

Create a username and password for Superset with admin permissions. Use below command. 

sudo docker exec -it superset superset fab create-admin --username admin --firstname Superset --lastname Admin --ullivinaybabu@gmail.com --password *your_own_password*

You can replace username, firstname,lastname, mail id and password in above command. Modify according to your need. 






We have to upgrade the metadata database schema of Apache superset using below command. 

sudo docker exec -it superset superset db upgrade


In last step, we have to initialize the Apache Superset application within a Docker container using below command

sudo docker exec -it superset superset init

We are done with all the steps. Lets open the web browser and open superset.  Every EC2 instance comes up with public ip address. Enter ipaddress:8088 in web browser to open superset launched in infrastructure setup by you. 

 









You're now equipped to begin dashboard development with Superset. Additionally, we can integrate these dashboards into web applications. I'll cover both topics separately in upcoming blog posts.


Note: The above setup is for test/dev environments. For launching Apache Superset in Production setup we have to configurate various parameters. I will cover the "Apache Superset for Production setup" in my upcoming blog posts. 


If any doubts, please reach out to me - ullivinaybabu@gmail.com. 







Comments