There was an old trick used by Angular and a few other libraries of using a
function's source code (which you can get using .toString()
) to get its
argument names. Promisify-node used this technique to find arguments called callback
or cb
. Angular uses it for
dependency injection, though there are other better ways of doing that.
When JavaScript got default values and rest parameters this technique became much less effective. It seems the StackOverflow answers based on regular expressions have lapsed into, “works in most cases.” I don't know how to make it work in all cases, but I have a method that works in a slightly unusual case: functions without a rest parameter and where every parameter has a default value.
In general, you should avoid parsing arguments, but with that being said, doing so can add a delightful dash of magic to your library.
A Motivating Example
We will be writing a module called ANADV which stands for Argument Names And Default Values. Before I step you through how it works and why it is cursed, I want to show you why I wanted to build it. You could also skip this whole thing and read all 115 lines here.
My interest in argument parsing came from wanting to define custom elements like this:
define(function element_name(prop_1 = "Default 1", prop_2 = 5, checked = false) {
return html`
<h1>Heading</h1>
<p>${reactive_text(prop_1)}</p>
<p>${reactive_text(computed(() => Math.pow(prop_2(), 2)))}</p>
`;
});
If we then use that element with an attribute:
<element-name prop-2="10"></element-name>
it will look like this:
If you have a browser with top-level await support (just Chrome at the moment) then you can try the example out here.
We
use the function's name for the custom element name but with hyphens
instead of underscores. The arguments correspond to attributes /
properties. In this case, prop-2
was set to "10"
which was converted into new Number("10")
before being assigned to the prop_2
property. We can do this because we know that Number
was the constructor for prop_2
's default value which was 5
. In case you didn't know, you can get the constructor of most
values using the constructor property: (5).constructor === Number
.
How Does it Work?
We're going to use eval
… a
lot. What we'd like to do, is to parse out the argument list and turn it into something we can eval
that will give us the argument names and default values. We are going
to treat the arguments like bulk assignment which is why we don't
support rest parameters. Since they're always the last argument, you
might be able to cut it off, but for my use case I chose to simply not
support them.
function(a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 }) {
/* ... */
}
// Is like:
const a, b, c, d;
a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 };
// Another thing I tried that we won't use:
const defaults = [a = "Hello World", b = 5, c, d = { x: 2.2, y: 5.76 }];
// Gives us:
["Hello World", 5, undefined, { x: 2.2, y: 5.76 }]
While the array version would give us the default values, we would still need the argument names. Luckily, there is a way to assign to variables that you have not declared!
Slurping up Identifiers via Proxy + with
We need something that simulates the const a, b, c, d;
line so that the arguments line is valid. We can do that by modifying
the scope chain using a Proxy. There's only one way of modifying the
scope chain in this way and that is by using the with
keyword.
const arg_names = [], defaults = {};
with (new Proxy({}, {
has(target, key) {
arg_names.push(key);
return true;
},
get(target, key) {
if (key == Symbol.unscopables) return {};
arg_names.pop();
return eval(key);
},
set(target, key, value) {
defaults[key] = value;
return true;
}
})) {
a = "Hello World", b = 5, d = new Date();
}
console.log(arg_names, defaults);
This logs ["a", "b", "d"]
and {a: “Hello World”, b: 5, d: <Your Current Time>}
which is
exactly what we wanted. What's going on?
First, the with
keyword
will ask our Proxy for its unscopables
which our Proxy says is
empty. Then the code inside the with
block runs and for every
identifier, our Proxy's has
method is called. Our Proxy claims,
“all your identifiers are belong to us”, and we add the identifiers to our arg_names
list. If, the identifier is subsequently read, then we
remove it from the arg_names
list and eval
the identifier to get its real value. Lastly, we store any
assignments / sets into our defaults
object.
You'll notice that the c
parameter disappeared from our examples. That's because this method
doesn't work with arguments that don't have a default value. Our Proxy
relies on arguments being only assigned to, but a = "Hello World", b = 5, c, d = new Date()
would read / get c
instead of setting it.
Curse number
one: We are using eval
for the exact reason
it is considered dangerous. We use it to access the environment around where eval
is called. More on this later.
Curse number
two: We use the partly deprecated with
keyword.
There's
three problems with this code. First, this would be tricked by an
argument list that has an assignment within it's default value. For
example function(a = 5, b = i++) {}
would produce an arg_names
of [“a”, “i”, “b”]
.
We're not going to fix this: partly because I don't think it's very common, and mostly because I don't
know how.
Sloppy Mode Script
Second, I want ANADV to be an ES6 module and the with
keyword isn't allowed in strict mode (which modules are
executed in). That means we'll need to load a sloppy mode script ~synchronously~. We
don't have importScripts
, so we'll need to do something else:
// Within our module:
const script = document.createElement('script');
script.src = URL.createObjectUrl(new Blob([`
function collect_names(code) {
const proxy = /* ... */;
with(proxy) {
eval(code);
}
}
window._script_loaded();
`], { type: 'application/JavaScript' }));
document.body.appendChild(script);
// Top-Level await that pauses our module's evaluation until the script has finished loading.
await new Promise(resolve => window._script_loaded = resolve);
Curse number
three: We use a sloppy mode script from our strict-mode module
— just so we can use the with
keyword.
env_access_src
The third problem is that when we use eval(key)
inside our Proxy's get
method, we get the value as available within our Proxy's context. What
we actually want is to get the value as defined in the context that the
code snippet came from. Take this code for example:
function collect_names(code) {
const arg_names = [], defaults = {};
with (new Proxy({}, {
has(target, key) {
if (key !== 'code') {
arg_names.push(key);
return true;
} else {
return false;
}
},
get(target, key) {
if (key == Symbol.unscopables) return {};
arg_names.pop();
return eval(key);
},
set(target, key, value) {
defaults[key] = value;
return true;
}
})) {
eval(code);
}
return [arg_names, defaults];
}
(function() {
const some_value = "Hello World";
class Date {
/* Not like the normal Date */
}
console.log(collect_names("a = some_value, b = 5, d = new Date()"));
})()
This results in a ReferenceError
because some_value
isn't available from the Proxy's get
method. Even if it didn't error, the value for d
would be the globally available Date class instead of the local
version. How do we fix this? If you guessed adding more eval
then
you are right.
And since we're getting pretty close to how ANADV actually works, let's look at its signature:
const different_eval = eval;
export function anadv(
func,
env_access = different_eval(env_access_src)
) { /* ... */ }
anadv
takes a function whose parameters we'll parse and an environment access
function which will let us probe the scope where that function came
from. The access function is optional — in the motivating example I
didn't supply one — but calling anadv without one only works if all
references in the argument defaults are in the global scope or if there
are no references (as was the case in the custom element). To create an
environment access function, we eval
the env_access_src
. For the default env_access
, we call eval indirectly, which executes env_access_src
in the global scope instead of the ANADV module's
scope.
env_access_src
looks like
this:
export const env_access_src = `(function(){ return function env_access() {
return eval(arguments[0]);
}; })()`;
We use arguments
here
because we don't want to introduce a named variable that could mask the scope we're trying to grant
access to.
Calling anadv with an env_access
looks like this:
import { anadv, env_access_src } from 'anadv';
const some_value = "Hello World";
class Date { /* Not like the normal Date */ }
const access = eval(env_access_src);
anadv(
function (a = some_value, b = 5, d = new Date()) { },
access
);
The Final Stretch: Finding the Arguments
The
last piece to talk about is extracting the arguments source code from
the function's source code. This is the piece that has changed the least
from other methods of getting argument names. We trim off the function
keyword and name, and then use a stack to find the closing parenthesis
of the arguments list (which should be followed by an opening {
for the function's body).
It's not very interesting and probably has bugs:
// Get the function's source code (removing comments)
let source = func.toString().replace(/((\/\/.*$)|(\/\*[\s\S]*?\*\/))/mg, '');
source = source.substr(source.indexOf('(') + 1);
// Separate the arguments list
let rem = source;
const stack = [')'];
while (stack.length > 0) {
let re;
// Handle strings and stuff:
if (stack[0] == '"' || stack[0] == "'") {
re = new RegExp(`(?:^|[^\\\\])(?:\\\\\\\\)*${stack[0]}`, 'g');
} else if (stack[0] == '`') {
re = /(?:^|[^\\])(?:\\\\)*`|(?:^|[^\\])(?:\\\\)*$\{/g;
} else {
re = /[`'"{([\])}]/g;
}
const result = re.exec(rem);
if (result === null || re.lastIndex === 0) {
throw new Error("Failed to parse args list.");
}
const token = rem[re.lastIndex - 1];
rem = rem.substr(re.lastIndex);
if (stack[0] == token) {
stack.shift();
} else {
const opposite = { "'": "'", '"': '"', '`': '`', '(': ')', '{': '}', '[': ']' }[token];
stack.unshift(opposite);
}
}
// After the closing parenthesis, there should be the opening '{'
if (!rem.match(/^\s*{/)) {
throw new Error('Failed to parse args list. Was this an arrow function?');
}
const args_source = source.substr(0, source.length - rem.length - 1);
With the args_source
in hand we can pass it to our sloppy mode name collector and we're done.
Very Narrow Usefulness
Obviously
there are many problems with doing what we've just done. Parsing
arguments is an unstable thing as all the previous libraries that did it
have found out. But attempting it will teach you deep things about
JavaScript like how scoping works, or about Symbol.unscopables
,
or how Proxies work when they're in the scope chain. You might not rely
on these things often, but understanding how things work makes using
them easier.
Security
To touch on the problems briefly: every use of eval
is worrisome, however— if our argument separation is correct — we only
end up executing our own code which limits it to a self-xss level issue.
Compatibility
As we already looked at, we don't support rest parameters, or parameters without a default value. My version doesn't support arrow functions because I wanted to use the function's name too, but adding that would be possible. Lastly, without top-level await, the interface for our module wouldn't be as nice. We'd have to return a promise that resolves when the module is ready to be used after the script loads.
It's a curious aspect of black magic that it simultaneously relies on the very new and the very old .
All magic is delightful until it's being debugged.
And that's it.