Week 7
Data security: validation, sanitization

Quiz 6: Data models 15 mins

There will be a quiz today. It will be worth 2% of your final grade.

Agenda

Guest Speaker (50 mins)
Break (10 mins)
AMA (10 mins)
Quiz (10 mins)
Better Debugging (20 mins)
Input Sanitization (20 mins)
Break (10 mins)
Data Validation (20 mins)
Lab Time (30 mins)

Better Debugging

Before we dig into recommended practices for guarding against data corruption and scripting attacks through input data validation and sanitization, let's add another tool to our development toolbox.

The debug NPM module is a better console.log that you can turn on and off with an environment variable instead of commenting out the debugging statements in your code. It is used internally by all of the modules in the Express framework, and we can use it in our modules as well.

Install debug as a project dependency.

npm install debug

Make it available in app.js and set the namespace to the application name.

TIP

The primary debug namespace should match the name property in package.json.

'use strict'

const debug = require('debug')('week7')
const express = require('express')

require('./startup/database')() // IIFE

const app = express()
app.use(express.json())

const port = process.env.PORT || 3030
app.listen(port, () => debug(`Express is listening on port ${port} ...`))

To keep the main entry module for our project (app.js) as clean as possible. It is a good practice to move one-time setup activities, like connecting to the database, into separate modules in a /startup or a /bootstrap folder. Let's put our Mongoose connection setup code from week 6 in the /startup/database.js module.

Then at the top of that file, require the debug module setting the namespace to week7:db and change any console.log() statements to debug() statements. The completed code should look like this.

const debug = require('debug')('week7:db')
const mongoose = require('mongoose')

module.exports = () => {
  mongoose
    .connect(`mongodb://localhost:27017/mad9124`, {
      useNewUrlParser: true,
      useUnifiedTopology: true
    })
    .then(() => {
      debug(`Connected to MongoDB ...`)
    })
    .catch(err => {
      debug(`Error connecting to MongoDB ...`, err)
      process.exit(1)
    })
}

Environment Variables

Remember

The debug module suppresses output by default.

If we run the app now with node app.js it will not output anything to the console. We need to activate it by setting the DEBUG environment variable to our application namespace before running the application.

DEBUG=week7 node app.js

Hmmm ... still not quite right. We only saw the debug message from app.js, not from the database connection module. Debug allows us to be very selective about which module's messages we want to see. If we run it again with ...

DEBUG=week7:db node app.js

... now we only see the database connection message.

To see all related namespaces we can use the * wildcard character.

DEBUG=week7* node app.js

NPM Scripts

Our application startup command is starting to get long. What if we want to add some other environment variables like NODE_ENV or PORT?

DEBUG=week7* NODE_ENV=dev PORT=3000 node app.js

NPM has a solution to make this easier. In the package.json file there is a scripts option. We can set different startup instructions for different environments. By default, it looks like this.

"scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },

Let's add a start script for production and a dev script to use with our local development.

"scripts": {
    "start": "NODE_ENV=production PORT=433 node app.js",
    "dev": "DEBUG=week7* NODE_ENV=dev PORT=3000 nodemon app.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },

Now we can start our development server by running ...

npm run dev

Or we can start up in production mode with npm start.

TIP

We have more configuration work to do before we are ready for production mode. We will cover that in a few weeks.

Sanitization

The very nature of our application is to take user input and act on it in some way, either to store information or searching and filtering information from the database. The data stored is ultimately returned to the client application.

This means that our web service application is inherently vulnerable to a variety of malicious attacks. We cannot trust any data coming from the client application. It might not be our code. It may have been hacked. A malicious actor may be using it to directly input data formatted to cause our system to malfunction.

The Open Web Application Security Project (OWASP) is an excellent resource for staying up to date on application security risks and best practices. Among their many resources, the annual OWASP Top 10 Critical Web Application Security Risks should be on your reading list.

Directly relevant to our task is the OWASP Node Goat Tutorial that provides examples of the Top 10 in a Node.js application environment.

Let's look at solving the two most damaging attack types: Cross-site scripting (XSS) and Database Injection. To do that, we need to filter and cleanse all incoming data.

XSS Protection

To help protect against Cross-site scripting (XSS) attacks we need to strip out any HTML or script tags from the input values so that they would not be interpreted as scripts or alter the browser rendering of any returned data. We will use the xss NPM package.

We will build a small middleware function to implement this behaviour. Create a new file in the middleware folder called sanitizeBody.js. Require the debug and xss modules at the top. Then add an empty middleware function signature. (We will fill in the logic in the next step.)

const debug = require('debug')('sanitize:body')
const xss = require('xss')

module.exports = (req, res, next) => {
  // sanitization logic goes here
  next()
}

Remember

It is a good practice not to take destructive action on the original request body. So, make a copy to work with and set the resulting sanitized version as a new property of the request object that downstream route handlers can access.

Start by stripping out any id, or _id properties. We never want those.

const {id, _id, ...attributes} = req.body

The xss module applies its filter rules to strings. To apply it to the various properties of our user supplied data, we will need to loop over the members of the req.body object with a for...in loop.

The xss function takes an optional configuration object to customize how it works.

for (let key in attributes) {
  attributes[key] = xss(attributes[key], {
    whiteList: [], // empty, means filter out all tags
    stripIgnoreTag: true, // filter out all HTML not in the whitelist
    stripIgnoreTagBody: ['script']
    // the script tag is a special case, we need
    // to filter out its content
  })
}

Lastly set the modified attributes as the value of a new req.sanitizedBody property and then call next(). The whole thing should look similar to this.

const debug = require('debug')('sanitize:body')
const xss = require('xss')

module.exports = (req, res, next) => {
  debug({body: req.body})
  const {id, _id, ...attributes} = req.body
  debug({attributes})
  for (let key in attributes) {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    })
  }
  debug({sanitizedBody: attributes})
  req.sanitizedBody = attributes
  next()
}

TIP

Notice that I added some debug statements just to help verify that everything is working. To see these print out, add modify the dev script's debug environment variable in the package.json file to DEBUG=week7*,sanitize*

OK, we have our body sanitizer middleware function. How do we use it?

Remember, Express route method declarations can take more than one callback function. This lets us call one or more middleware functions directly for any given route.

app.post('/test', middleware, routeHandler)

So, in our /routes/cars.js module, we could do this ...

const sanitizeBody = require('./middleware/sanitizeBody')
const Car = require('./models/Car')
// ... other setup

router.post('/', sanitizeBody, async (req, res) => {
  try {
    const newCar = new Car(req.sanitizedBody)
    await newCar.save()
    res.send({data: newCar})
  } catch (err) {
    errorHandlerFunction(err)
  }
})

We can now be reasonably confident that only plain text strings will be passed to Mongoose for validation. Potentially malicious HTML and JavaScript will be removed from input strings.

We can test it with Postman.

screenshot of Postman test

Yay! It worked!

But wait. What if the payload attributes are not simple strings? We need to refactor the sanitizeBody middleware to call itself recursively for more complex data structures.

Recursion

From The Modern JavaScript Tutorial

When a function solves a task, in the process it can call many other functions. A partial case of this is when a function calls itself. That’s called recursion.

We will start by creating a new function called stripTags. It should take a single argument - let's call it payload. Now cut the for...in loop from the main function and paste it into the new one. As a best practice, we should not mutate the original payload object that is passed in, so let's make a copy of that with the line let attributes = {...payload}. Don't forget to return the sanitized attributes at the end of this new function.

Then call the new stripTags function from within the primary function. The refactored middleware should now look like this.

const debug = require('debug')('sanitize:body')
const xss = require('xss')

const stripTags = payload => {
  let attributes = {...payload}
  for (let key in attributes) {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    })
  }
  return attributes
}

module.exports = (req, res, next) => {
  debug({body: req.body})
  const {id, _id, ...attributes} = req.body
  debug({attributes})
  const sanitizedBody = stripTags(attributes)
  debug({sanitizedBody: sanitizedBody})
  req.sanitizedBody = sanitizedBody
  next()
}

This version should be functionally equivalent to what we had before. Test it with Postman to be sure that it is working.

OK now we can augment the stripTags function to check for objects and then call itself to loop over that nested object and sanitize it's properties. Wrap the logic inside the for..in loop in an if/else block.

for (let key in attributes) {
  if (attributes[key] instanceof Object) {
    attributes[key] = stripTags(attributes[key])
  } else {
    attributes[key] = xss(attributes[key], {
      whiteList: [],
      stripIgnoreTag: true,
      stripIgnoreTagBody: ['script']
    })
  }
}

Test that with Postman ...

screenshot of Postman test

Great! Now let's make sure that we handle Arrays properly. Since arrays inherit from Object in JavaScript, we need to check for that case first. Then instead of just recursively calling stripTags on the array, we need to use the .map() method to loop over the array.

Each element could be another complex object or a simple string. We need to check and handle both cases. If it is a string we can call xss() otherwise call stripTags() again.

Rather than duplicate the code for the xss() function call, let's extract that to a separate function in our module.

const sanitize = sourceString => {
  return xss(sourceString, {
    whiteList: [],
    stripIgnoreTag: true,
    stripIgnoreTagBody: ['script']
  })
}

Then the conditional block in the stripTags() function becomes ...

if (attributes[key] instanceof Array) {
  attributes[key] = attributes[key].map(element => {
    return typeof element === 'string' ? sanitize(element) : stripTags(element)
  })
} else if (attributes[key] instanceof Object) {
  attributes[key] = stripTags(attributes[key])
} else {
  attributes[key] = sanitize(attributes[key])
}

TIP

The return statement of the map() method above is using JavaScript's ternary operator, rather than a more verbose if/else block.

Writing out that single ternary expression the long way would look something like this ...

let cleanedElement
if (typeof element === 'string') {
  cleanedElement = sanitize(element)
} else {
  cleanedElement = stripTags(element)
}
return cleanedElement

OK. Test that with Postman ...

screenshot of Postman test

Phew!

The final version of the sanitizeBody.js middleware module should look similar to this (with no debug statements).

const xss = require('xss')

const sanitize = sourceString => {
  return xss(sourceString, {
    whiteList: [],
    stripIgnoreTag: true,
    stripIgnoreTagBody: ['script']
  })
}

const stripTags = payload => {
  let attributes = { ...payload } // don't mutate the source data
  for (let key in attributes) {
    if (attributes[key] instanceof Array) {
      attributes[key] = attributes[key].map(element => {
        return typeof element === 'string'
          ? sanitize(element) // if true
          : stripTags(element) // if false
      })
    } else if (attributes[key] instanceof Object) {
      attributes[key] = stripTags(attributes[key])
    } else {
      attributes[key] = sanitize(attributes[key])
    }
  }
  return attributes
}

module.exports = (req, res, next) => {
  const { id, _id, ...attributes } = req.body
  const sanitizedBody = stripTags(attributes)
  req.sanitizedBody = sanitizedBody
  next()
}

OK. Now we can be reasonably sure that all HTML tags and scripts will be removed from all req.body properties, no matter how deeply they are buried.

Database Injection

Using Mongoose which enforces schema validation and attempts type coercion to ensure that only data in the correct format is stored, goes a long way to protecting MongoDB from various attacks. However, there are still some cases that are not covered.

Read Hacking Node.js and MongoDB to learn more about this kind of vulnerability.

To help protect against database injection attacks, we will use the Express Mongoose Sanitize NPM package.

From the docs ...

What?

This module searches for any keys in objects that begin with a $ sign or contain a ., from req.body, req.query or req.params. It can then either:

completely remove these keys and associated data from the object, or

replace the prohibited characters with another allowed character.

The behaviour is governed by the passed option, replaceWith. Set this option to have the sanitizer replace the prohibited characters with the character passed in.

Why?

Object keys starting with a $ or containing a . are reserved for use by MongoDB as operators. Without this sanitization, malicious users could send an object containing a $ operator, or including a ., which could change the context of a database operation. Most notorious is the $where operator, which can execute arbitrary JavaScript on the database.

The best way to prevent this is to sanitize the received data, and remove any offending keys, or replace the characters with a 'safe' one.

Let's install it

npm install express-mongo-sanitize

Let's implement it

We could implement this middleware on a route by route basis. e.g.

router.post('/:id', sanitizeMongo(), (req, res) => {})

Or at the router level ...

app.use('/api/cars', sanitizeMongo(), carsRouter)

Or at the application level ...

app.use(sanitizeMongo())

Let's choose the application level -- set it and forget it!

Don't forget

You need to require the module before you can use it.

const sanitizeMongo = require('express-mongo-sanitize')

For next week

Before next week's class, please read these additional online resources.

Assignment Reminder

Assignment 2 - Mongo CRUD - is due before 1:00 pm on Friday, March 6th.

Quiz

There will be a short quiz next class. The questions could come from any of the material referenced above.

Next week is Break Week

This is a great opportunity to review the course notes and the linked hybrid study materials from what we have covered so far.

As a self-assessment tool to help you identify areas that might need more review, I have posted an optional ungraded review quiz covering the first half of the term.

Week 7 Data security: validation, sanitization

# Agenda

# Better Debugging

# Environment Variables

# NPM Scripts

# Sanitization

# XSS Protection

# Recursion

# Database Injection

# What?

# Why?

# Let's install it

# Let's implement it

# For next week

Week 7
Data security: validation, sanitization

Agenda

Better Debugging

Environment Variables

NPM Scripts

Sanitization

XSS Protection

Recursion

Database Injection

What?

Why?

Let's install it

Let's implement it

For next week