The Most Cursed JavaScript

There was an old trick used by Angular and a few other libraries of using a function's source code (which you can get using .toString()) to get its argument names. Promisify-node used this technique to find arguments called callback or cb. Angular uses it for dependency injection, though there are other better ways of doing that.

When JavaScript got default values and rest parameters this technique became much less effective. It seems the StackOverflow answers based on regular expressions have lapsed into, “works in most cases.” I don't know how to make it work in all cases, but I have a method that works in a slightly unusual case: functions without a rest parameter and where every parameter has a default value.

In general, you should avoid parsing arguments, but with that being said, doing so can add a delightful dash of magic to your library.

A Motivating Example

We will be writing a module called ANADV which stands for Argument Names And Default Values. Before I step you through how it works and why it is cursed, I want to show you why I wanted to build it. You could also skip this whole thing and read all 115 lines here.

My interest in argument parsing came from wanting to define custom elements like this:

define(function element_name(prop_1 = "Default 1", prop_2 = 5, checked = false) {
  return html`
    <h1>Heading</h1>
    <p>${reactive_text(prop_1)}</p>
    <p>${reactive_text(computed(() => Math.pow(prop_2(), 2)))}</p>
  `;
});

If we then use that element with an attribute:

<element-name prop-2="10"></element-name>

it will look like this:

If you have a browser with top-level await support (just Chrome at the moment) then you can try the example out here.

We use the function's name for the custom element name but with hyphens instead of underscores. The arguments correspond to attributes / properties. In this case, prop-2 was set to "10" which was converted into new Number("10") before being assigned to the prop_2property. We can do this because we know that Number was the constructor for prop_2's default value which was 5. In case you didn't know, you can get the constructor of most values using the constructor property: (5).constructor === Number.

How Does it Work?

We're going to use eval… a lot. What we'd like to do, is to parse out the argument list and turn it into something we can eval that will give us the argument names and default values. We are going to treat the arguments like bulk assignment which is why we don't support rest parameters. Since they're always the last argument, you might be able to cut it off, but for my use case I chose to simply not support them.

function(a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 }) {
  /* ... */
}

// Is like:

const a, b, c, d;
a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 };

// Another thing I tried that we won't use:
	
const defaults = [a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 }];
// Gives us:
["Hello World", 5, undefined, { x: 2.2, y: 5.76 }]

While the array version would give us the default values, we would still need the argument names. Luckily, there is a way to assign to variables that you have not declared!

Slurping up Identifiers via Proxy + with

We need something that simulates the const a, b, c, d; line so that the arguments line is valid. We can do that by modifying the scope chain using a Proxy. There's only one way of modifying the scope chain in this way and that is by using the with keyword.

const arg_names = [], defaults = {};
	
with (new Proxy({}, {
  has(target, key) {
    arg_names.push(key);
    return true;
  },
  get(target, key) {
    if (key == Symbol.unscopables) return {};
    arg_names.pop();
    return eval(key);
  },
  set(target, key, value) {
    defaults[key] = value;
    return true;
  }
})) {
  a = "Hello World", b = 5, d = new Date();
}

console.log(arg_names, defaults);

This logs ["a", "b", "d"]and {a: “Hello World”, b: 5, d: <Your Current Time>} which is exactly what we wanted. What's going on?

First, the with keyword will ask our Proxy for its unscopables which our Proxy says is empty. Then the code inside the with block runs and for every identifier, our Proxy's has method is called. Our Proxy claims, “all your identifiers are belong to us”, and we add the identifiers to our arg_names list. If, the identifier is subsequently read, then we remove it from the arg_names list and eval the identifier to get its real value. Lastly, we store any assignments / sets into our defaults object.

You'll notice that the c parameter disappeared from our examples. That's because this method doesn't work with arguments that don't have a default value. Our Proxy relies on arguments being only assigned to, but a = "Hello World", b = 5, c, d = new Date() would read / get c instead of setting it.

Curse number one: We are using eval for the exact reason it is considered dangerous. We use it to access the environment around where eval is called. More on this later.

Curse number two: We use the partly deprecated with keyword.

There's three problems with this code. First, this would be tricked by an argument list that has an assignment within it's default value. For example function(a = 5, b = i++) {} would produce an arg_names of [“a”, “i”, “b”]. We're not going to fix this: partly because I don't think it's very common, and mostly because I don't know how.

Sloppy Mode Script

Second, I want ANADV to be an ES6 module and the with keyword isn't allowed in strict mode (which modules are executed in). That means we'll need to load a sloppy mode script ~synchronously~. We don't have importScripts, so we'll need to do something else:

// Within our module:
const script = document.createElement('script');
script.src = URL.createObjectUrl(new Blob([`
  function collect_names(code) {
    const proxy = /* ... */;

    with(proxy) {
      eval(code);
    }
  }
  
  window._script_loaded();
`], { type: 'application/JavaScript' }));
document.body.appendChild(script);

// Top-Level await that pauses our module's evaluation until the script has finished loading.
await new Promise(resolve => window._script_loaded = resolve);

Curse number three: We use a sloppy mode script from our strict-mode module — just so we can use the with keyword.

env_access_src

The third problem is that when we use eval(key)inside our Proxy's get method, we get the value as available within our Proxy's context. What we actually want is to get the value as defined in the context that the code snippet came from. Take this code for example:

function collect_names(code) {
  const arg_names = [], defaults = {};
  with (new Proxy({}, {
    has(target, key) {
      if (key !== 'code') {
        arg_names.push(key);
        return true;
      } else {
        return false;
      }
    },
    get(target, key) {
      if (key == Symbol.unscopables) return {};
      arg_names.pop();
      return eval(key);
    },
    set(target, key, value) {
      defaults[key] = value;
      return true;
    }
  })) {
    eval(code);
  }
  return [arg_names, defaults];
}

(function() {
  const some_value = "Hello World";
  class Date {
    /* Not like the normal Date */
  }
  console.log(collect_names("a = some_value, b = 5, d = new Date()"));
})()

This results in a ReferenceError because some_value isn't available from the Proxy's get method. Even if it didn't error, the value for d would be the globally available Date class instead of the local version. How do we fix this? If you guessed adding more eval then you are right.

And since we're getting pretty close to how ANADV actually works, let's look at its signature:

const different_eval = eval;
export function anadv(
  func,
  env_access = different_eval(env_access_src)
) { /* ... */ }

anadv takes a function whose parameters we'll parse and an environment access function which will let us probe the scope where that function came from. The access function is optional — in the motivating example I didn't supply one — but calling anadv without one only works if all references in the argument defaults are in the global scope or if there are no references (as was the case in the custom element). To create an environment access function, we eval the env_access_src. For the default env_access, we call eval indirectly, which executes env_access_src in the global scope instead of the ANADV module's scope.

env_access_src looks like this:

export const env_access_src = `(function(){ return function env_access() {
  return eval(arguments[0]);
}; })()`;

We use arguments here because we don't want to introduce a named variable that could mask the scope we're trying to grant access to.

Calling anadv with an env_access looks like this:

import { anadv, env_access_src } from 'anadv';

const some_value = "Hello World";

class Date { /* Not like the normal Date */ }

const access = eval(env_access_src);

anadv(
  function (a = some_value, b = 5, d = new Date()) { },
  access
);

The Final Stretch: Finding the Arguments

The last piece to talk about is extracting the arguments source code from the function's source code. This is the piece that has changed the least from other methods of getting argument names. We trim off the function keyword and name, and then use a stack to find the closing parenthesis of the arguments list (which should be followed by an opening { for the function's body).

It's not very interesting and probably has bugs:

// Get the function's source code (removing comments)
let source = func.toString().replace(/((\/\/.*$)|(\/\*[\s\S]*?\*\/))/mg, '');
source = source.substr(source.indexOf('(') + 1);

// Separate the arguments list
let rem = source;
const stack = [')'];
while (stack.length > 0) {
  let re;
  // Handle strings and stuff:
  if (stack[0] == '"' || stack[0] == "'") {
    re = new RegExp(`(?:^|[^\\\\])(?:\\\\\\\\)*${stack[0]}`, 'g');
  } else if (stack[0] == '`') {
    re = /(?:^|[^\\])(?:\\\\)*`|(?:^|[^\\])(?:\\\\)*$\{/g;
  } else {
    re = /[`'"{([\])}]/g;
  }
  const result = re.exec(rem);
  if (result === null || re.lastIndex === 0) {
    throw new Error("Failed to parse args list.");
  }
  const token = rem[re.lastIndex - 1];
  rem = rem.substr(re.lastIndex);

  if (stack[0] == token) {
    stack.shift();
  } else {
    const opposite = { "'": "'", '"': '"', '`': '`', '(': ')', '{': '}', '[': ']' }[token];
    stack.unshift(opposite);
  }
}
// After the closing parenthesis, there should be the opening '{'
if (!rem.match(/^\s*{/)) {
  throw new Error('Failed to parse args list.  Was this an arrow function?');
}
const args_source = source.substr(0, source.length - rem.length - 1);

With the args_source in hand we can pass it to our sloppy mode name collector and we're done.

Full code.


Very Narrow Usefulness

Obviously there are many problems with doing what we've just done. Parsing arguments is an unstable thing as all the previous libraries that did it have found out. But attempting it will teach you deep things about JavaScript like how scoping works, or about Symbol.unscopables, or how Proxies work when they're in the scope chain. You might not rely on these things often, but understanding how things work makes using them easier.

Security

To touch on the problems briefly: every use of eval is worrisome, however— if our argument separation is correct — we only end up executing our own code which limits it to a self-xss level issue.

Compatibility

As we already looked at, we don't support rest parameters, or parameters without a default value. My version doesn't support arrow functions because I wanted to use the function's name too, but adding that would be possible. Lastly, without top-level await, the interface for our module wouldn't be as nice. We'd have to return a promise that resolves when the module is ready to be used after the script loads.

It's a curious aspect of black magic that it simultaneously relies on the very new and the very old .

All magic is delightful until it's being debugged.


And that's it.