Jekyll2023-03-31T17:12:33+00:00https://benlansdell.github.io/expositions/feed.xmlExpositionsNotebooks in math and physicsBen LansdellSimulating the analemma2023-03-31T00:00:00+00:002023-03-31T00:00:00+00:00https://benlansdell.github.io/expositions/posts/analemma<h1 id="what-is-the-equation-of-time">What is the equation of time?</h1>
<p>The earth’s orbit is not perfectly circular, but elliptical, and the earth’s axis of rotation is offset relative to the plane on which it orbits the sun. These are referred to as the eccentricity and obliquity of the earth’s orbit. These two factors mean the time as measured by a sundial and the time as measured by our watches differ – the length of a solar day varies throughout the year, while our watches continue ticking at a uniform rate.</p>
<p>This difference is referred to as the ‘equation of time’, and it’s easily observable, if you’re patient and dedicated enough: if you take a photo of the sky at the same time every day for a year, the sun will walk around in the sky a little, owing to these variations and to the season. The shape it traces is known as the analemma.</p>
<p>This Three.js app explores the relation between eccentricity, obliquity, the equation of time and the analemma. A [forthcoming] blog post will do a much better job of explaining all these concepts than the above short introduction.</p>
<p>(See the live version <a href="https://benlansdell.github.io/analemma/">here</a>)</p>Ben LansdellWhat is the equation of time?Asking Alexa what planes are overhead2022-01-12T00:00:00+00:002022-01-12T00:00:00+00:00https://benlansdell.github.io/expositions/posts/raspberryfly<h1 id="the-goal">The goal</h1>
<p>Having just moved to Memphis, I’m only 3 miles from the busiest (cargo) airport in the world (<a href="https://en.wikipedia.org/wiki/List_of_busiest_airports_by_cargo_traffic">yes really</a>). Basically, this means a lot of Fedex planes are flying directly overhead. I was curious to know what aircraft they were specifically, which led me down this flight tracking rabbit hole. Now here I am with an Alexa skill that allows me to ask my Echo what planes are flying nearby, using data from my own ADS-B receiver.</p>
<p>Here I’ll describe the process to set the whole thing up. Note that this is a solution if you specifically want to use your own flight tracking data. If you don’t care about that, you can just install an already-made Alexa skill that just queries the OpenSky network or the ADSBExchange for nearby flight data. By using your own data you guarantee coverage in your area – ADSBExchage may not have many data feeders in your region.</p>
<h1 id="outline">Outline</h1>
<p>The basic steps are as follows:</p>
<ol>
<li>Obtain the hardware you’ll need: ADS-B receiver, filter, antenna, etc</li>
<li>Setup a computer to run dump1090 on. Commonly, this is a raspberry pi, running some flight tracking software. PiAware makes this very easy.</li>
<li>On the raspberry pi, setup a Flask-Ask server to query dump1090’s aircraft.json when your echo asks</li>
<li>Expose this server to the internet through ngrok. Could also use pagekite.</li>
<li>Setup an Alexa skill – just in developer mode – that uses the Flask-ask server as an endpoint</li>
</ol>
<p>More detail on each is below. Although I should note that this isn’t intended as a full tutorial on setting up Raspberry Pis, Alexa Skills or Flask-Ask. This is just how you can reproduce this skill, and does assume some familiarity with the above, particularly if you have to tweak the recipe provided for your own situation.</p>
<h1 id="hardware">Hardware</h1>
<p>My setup is the following:</p>
<ul>
<li>A USB <a href="https://amzn.to/3A2fKJo">FlightStick</a></li>
<li>A 1090MHz <a href="https://amzn.to/3fqr0pH">Filter</a></li>
<li>A <a href="https://amzn.to/3FAlGup">Small indoor antenna</a></li>
<li><a href="https://amzn.to/3frS7Ay">Raspberry pi zero</a></li>
</ul>
<p>But any variant on the above should work well. The filter is optional, but can help in urban areas reduce noise from other radio sources. The better your antenna setup the longer your tracking range is of course. Mounting an antenna outdoors should be significantly better than my indoor setup.</p>
<h1 id="basic-flight-tracking">Basic flight tracking</h1>
<ul>
<li>Setup the raspberry pi zero with the <a href="https://flightaware.com/adsb/piaware/build">piaware image</a>. Flightaware makes this very easy to setup, follow the instructions linked to here. Basically, you download the piaware image and place it on your pi’s SD card. You can set your Wifi settings in the file <code class="language-plaintext highlighter-rouge">piaware-config.txt</code>.</li>
<li>It’s a much nicer user experience to enable ssh access to your pi. That can be acheived simply by placing an empty file named <code class="language-plaintext highlighter-rouge">ssh</code> in the root directory of the /boot partition of the SD card.</li>
<li>Place the antenna near a window/outside, wait for the pi to boot and check its IP in your router’s admin panel.</li>
<li>Point a broswer to that IP. PiAware will ask you to associate your dump1090 stream with your flightware account.</li>
<li>Now you’re ready to track flights!</li>
<li>Head to <a href="https://flightaware.com/adsb/stats">https://flightaware.com/adsb/stats</a> to see your stats, and [your local ip]/skyaware, to see current flights nearby.</li>
</ul>
<h1 id="flask-ask-setup">Flask-ask setup</h1>
<p>Great, now you’re tracking flights. You can head to <code class="language-plaintext highlighter-rouge">http://[Pi's IP]/skyaware/data/aircraft.json</code> to see a list of planes currently being tracked. Basically, we’re going to run a Flask-ask server to parse this json file whenever a request comes from your Echo/Alexa skill. The steps for that are, on the pi:</p>
<p>Step 0 is setup the dev environment on the pi. Install git, vim, and whatever else you need.</p>
<ol>
<li>Clone the repo: <code class="language-plaintext highlighter-rouge">git clone https://github.com/benlansdell/raspberry-fly.git</code></li>
<li>Install some packages:
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo apt install python3-lxml nodejs npm mongodb
</code></pre></div> </div>
<p>It’s better to install <code class="language-plaintext highlighter-rouge">lxml</code> through <code class="language-plaintext highlighter-rouge">apt</code> since the compilation through <code class="language-plaintext highlighter-rouge">pip</code> could crash the pi.</p>
</li>
<li>Install required python packages
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install -r requirements.txt
</code></pre></div> </div>
</li>
<li>Once all the requirements are installed, you can setup a Mongo database with:
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python load_db.py
</code></pre></div> </div>
<p>This will load a database with info about planes that we can query based on the registration number. The pi is pretty limited for both RAM and CPU, of course, so you may want to also build an index for the collection we’ll be querying to make the queries faster. In the mongo CLI (<code class="language-plaintext highlighter-rouge">mongo</code>)</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>use AircraftData
db.Registration.createIndex({'icao':1})
</code></pre></div> </div>
</li>
<li>Now we’re ready to run the server. You could set this to run on startup. I just make a screen session and do it manually
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python main.py
</code></pre></div> </div>
</li>
</ol>
<h1 id="expose-the-server-to-the-internet">Expose the server to the internet</h1>
<p>The Flask-ask server is now running on port 5000. We want to expose this port to the internet so Alexa can access it. It seems frustrating having two local pieces of hardware communicate via Amazon’s servers externally. Direct Echo -> device communcation may be possible in some instances, but getting it to talk directly to the pi would be some sort of hack (e.g. getting the Echo to think it was communicating with a smart bulb.). So this is the setup we’ll use, until some other proper solution becomes available.</p>
<p>So:</p>
<ol>
<li>Install ngrok via apt
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null &&
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list &&
sudo apt update && sudo apt install ngrok
</code></pre></div> </div>
</li>
<li>Make an account at <a href="https://ngrok.com/">ngrok</a> to allow for longer sessions.</li>
<li>Start the tunnel
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ngrok http 5000
</code></pre></div> </div>
</li>
</ol>
<p>Now we have a URL we can point Alexa to in order to access our flight data. A more robust solution may be to use pagekite.</p>
<h1 id="setup-the-alexa-skill">Setup the Alexa skill</h1>
<p>All that’s left is to setup the Alexa skill to query our server. Head over to the Alexa skills developer console (<a href="https://developer.amazon.com/alexa/console/ask">https://developer.amazon.com/alexa/console/ask</a>). You’ll have to register as an Amazon developer if you haven’t done so already. Then:</p>
<ol>
<li>Create a new skill, the name will be the skill word. E.g. <code class="language-plaintext highlighter-rouge">Alexa ask [skill name] to ...</code>, so choose something that sounds natural. I just made mine Raspberry pi. Select the ‘custom’ template and choose to provision your own resources. Choose ‘start from scratch’ if it gives you a choice of template to use.</li>
<li>There are two things to setup: the endpoint and the intents. For the endpoint, go to ‘Slot Types’ -> Endpoint. Choose HTTPS, and input the https address output from ngrok into the default region field. Choose the type ‘My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority.</li>
<li>For the intent: Go to ‘Interaction Model’ -> ‘JSON Editor’. Paste or upload the schema.json contents into the editor. This file contains the set of phrases used to request flight information from Flask-ask.</li>
</ol>
<p>Save the model, build the model. Now it’s ready to test. You can test it in the console, which can be helpful for debugging. It will show what was sent and received from the endpoint. ngrok and the flask-ask server should also output something for each request, so you can see where something has gone wrong pretty easily.</p>
<p>Finally, if your local Echo is registered with the same Amazon account you should be able to see the skill you’ve just built in the list of skills, under the dev section. If it shows up in this list, you should be set to ask your Echo what planes are nearby.</p>
<p>Something like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Alexa ask raspberry pi what plane is closest.
</code></pre></div></div>
<p>And you should get a response!</p>
<h1 id="summary-and-follow-up-items">Summary and follow-up items</h1>
<p>So there we have it. It’s actually not that hard to get an Alexa skill to interact with your home devices. As I mentioned, a couple of obvious improvements are to get the server and tunnel running when the pi boots, that way you should be able to lose power etc and the system will restart itself. And using pagekite should allow for a more permanent forwarding URL. With ngrok, you’ll get a different URL each time you start the process, and so you’ll have to update your Alexa skill settings each time accordingly, which is a bit annoying.</p>
<p>There’s a lot you could add to the model once the basics are working. You could make it possible to ask for more information about the flights overhead, like where they are heading, came from, etc.</p>Ben LansdellThe goalThe arrow of time and entropy2021-09-17T00:00:00+00:002021-09-17T00:00:00+00:00https://benlansdell.github.io/expositions/posts/time-reversibility<p>The arrow of time refers to the fact that some physical processes have a temporal directionality.* We are familiar with many such processes: milk does not spontaneously unmix from the coffee we pour it into, eggs do not spontaneously reassemble themselves once broken, and shuffling a pack of cards is <em>very</em> unlikely to return it to a sorted configuration. These are processes for which we can tell, if watching a video of the process, if it is being played forwards or backwards. What underlies this directionality? In all of the examples given, there is a sense in which entropy increases. So it’s reasonable to think that the arrow of time may have something to do with entropy.</p>
<p>(* There are other arrows of time, which are also interesting, so to be precise this article is just about the thermodynamic arrow of time.)</p>
<p>Further, the arrow of time cannot come from the fundamental laws of physics, as these are time reversible. By time reversible, we mean that if we make the substitution \(t' = -t\), then the form of the dynamical equations comes out exactly the same. This is true for both classical and quantum systems. Here we consider classical systems here for simplicity. In a classical mechanics system, reversing time does nothing more than reverse the velocities of the objects under consideration; the dynamics remain the same. As such, if we were to examine a physical system where we are tracking the position of every relevant object, and where energy is conserved – consider billiard balls on a table with elastic collisions that do not lose kinetic energy when they bounce, or roll – we could not tell if it was being played forwards or backwards.</p>
<p>So, if it’s not to do with laws of fundamental physics, is the arrow of time <em>just</em> the fact that systems move to a state of increased entropy? Could increasing entropy be used to <em>define</em> which direction is future and which is past? Well, it’s a little more complicated than that. The goal of this article is to answer this latter question.</p>
<p>First, being in a global state of thermodynamic equilibrium – a state of maximum entropy – we don’t expect there to be an arrow of time. In such a state there would be no eggs to break, and no unmixed milk to pour into coffee. Instead, the scenario we need to consider this question for is the one we find ourselves in now (in a cosmological sense of now): in a state that is not at maximum entropy, a non-equilibrium state, a state of <em>relatively</em> low entropy. So, to rephrase the question we want to answer: in a relatively low entropy state, does entropy consistently increase in one direction in time – the direction we would call the future?</p>
<p>The answer is that, by itself, finding ourselves in a state of relatively low entropy is in fact not enough to pick out a special direction in time that is ‘forwards’ (the direction in which entropy increases). The problem is that, if all we know is that we are in a macrostate of relatively low entropy, then the chances are actually much greater that we are in a microstate for which that low-entropy macrostate state is a ‘local minimum’, that playing out the dynamics <em>in either direction</em> would result in an increase in entropy, than are the chances we are in a microstate that moves from a state of <em>lower</em> entropy. In such a local minimum, playing out the dynamics in either direction tends to lead to increased entropy, and we don’t have a useful thermodynamic arrow of time. Said a slightly different way: if it were the case that playing the dynamics out in one direction had a higher entropy and the other had lower entropy, then we could call one future and the other past. But in general a low entropy state is not enough conclude the existence of a <em>lower</em> entropy state that would allow for the definition of an arrow of time based solely on entropy.</p>
<p>As a result of this consideration, an additional hypothesis about the state we find ourselves in is needed. A commonly invoked hypothesis is that there does exist a very low entropy state in the distant past. Essentially this is a boundary condition, an assumption about the entropy at the beginning of the universe. This was dubbed the <em>past hypothesis</em> by philosopher/physicist David Albert. By assuming the past hypothesis then it <em>is</em> the case that we can assume the existence of a <em>lower</em> entropy state than the one we find ourselves in now, and therefore a way to relate the arrow of time to increasing entropy.</p>
<h2 id="an-example">An example</h2>
<p>We’ll demonstate the idea with some bouncing balls. Below is a simple simulation of particles bouncing around in an arena (their size is irrelevant here). They do not interact with one another, for simplicity. This is the same assumption made for an ideal gas. We’ll assume their collisions with the walls are fully elastic – they just bounce directly off the wall with the same speed, and opposite direction. How does the entropy of the system evolve with time here?</p>
<p>In this case we’ll use a simple model for entropy. Our macrostate will just be defined as the very coarse grained positions of the particles – how many particles are in the left half of the box? We’ll define our entropy to be logarithm of the number of microstates possible for a given macrostate. This is written as
\(\begin{equation}
S = k \ln (\Omega(X)),
\end{equation}\)
where \(\Omega\) is known as the multiplicity of the macrostate, given a state \(X\). \(k\) is Boltzman’s constant, on the order of \(10^{-23}\), though here we’ll just treat \(k\) as equal to 1.</p>
<iframe width="100%" height="605" frameborder="0" src="https://observablehq.com/embed/@benlansdell/entropy-and-the-arrow-of-time?cells=eh%2Ccanvas%2Cviewof+reset_widget_local"></iframe>
<p>In this simulation, we can start in a state of zero entropy, low entropy, or in equilibrium (high entropy). Whether starting in zero entropy or in low entropy, playing the dynamics forward leads inextricably towards equilibrium – roughly half the balls in the left and half in the right. You can run the dynamics forward for some time, until a balanced state is reached, and then reverse the dynamics to reveal a highly improbable set of trajectories – all the balls converging on the left hand side. If you started the system at a state of equilibrium, this state would be exeedingly improbable – occurring with a probably of something on the order of \(2^{-100}\), meaning you could run this simulation for as long as you wanted and never observe it happen. And indeed it does <em>look</em> highly improbable, it has the appearance of running backwards, even though there is nothing in the dynamical equations that has changed – all we’ve done is reverse the velocities.</p>
<p>The key point for the above discussion is that, when starting the system in a state of low entropy, you can see that playing the dynamics either forwards or backwards results in an increase in entropy. There is no direction in which entropy drops below what it started at. Thus there is really no time asymmetry here. The only way to get time asymmetry is to put it in at the very beginning.</p>Ben LansdellThe arrow of time refers to the fact that some physical processes have a temporal directionality.* We are familiar with many such processes: milk does not spontaneously unmix from the coffee we pour it into, eggs do not spontaneously reassemble themselves once broken, and shuffling a pack of cards is very unlikely to return it to a sorted configuration. These are processes for which we can tell, if watching a video of the process, if it is being played forwards or backwards. What underlies this directionality? In all of the examples given, there is a sense in which entropy increases. So it’s reasonable to think that the arrow of time may have something to do with entropy.A special relativity demo2021-07-06T00:00:00+00:002021-07-06T00:00:00+00:00https://benlansdell.github.io/expositions/posts/sr-simulator<p>Here we imagine we are in control of a powerful spaceship, navigating flat spacetime (i.e. away from any massive objects which could curve spacetime). We can provide thrust in only one spatial dimension, either forwards or backwards.</p>
<p>The force we can apply, at a time measured on the ship \(\tau\), accelerates the ship accordingly. What does our navigation look like from an outside observer? Let’s call this outside reference frame <strong>R</strong>.</p>
<p>Special relativity dictates that when the relative velocity of our ship approaches the speed of light, the kinematics of our flight, as observed by an outside observer, deviate those given by Newtonian mechanics. This post lets us explore how the kinematics do play out in special relativity.</p>
<p>Einstein’s theories of relativity generally require spelling out the setup more explicitly than in a Galilean setting – we have to explicitly say how a concept is to be operationalized for it to be fair game. To that end, let’s elaborate a bit more before diving into the simulation.</p>
<p>This will assume some knowledge of special relativity. Some introductory material can be found in my earlier post (<a href="https://benlansdell.github.io/expositions/posts/minkowsky.html">here</a>). Otherwise, you can consult the Feynman Lectures on Physics, for example.</p>
<h3 id="the-setup">The setup</h3>
<p>First, throughout we’ll assume units in which the speed of light is 1, so there will be no \(c\)s anywhere thoughout this post. They could always be put back in any expression in whatever way makes the units work out.</p>
<p>Second, from our <a href="https://benlansdell.github.io/expositions/posts/minkowsky.html">earlier post</a>, we recall that moving clocks run slowly. So, we’ll denote time measured by the outside observer as \(t\). In frame <strong>R</strong>, \(\tau\) runs slow compared to \(t\) for high relative velocities.</p>
<p>Now, how should we think about acceleration in moving frames? Dealing with measurements made in an accelerating frame is most satisfactorily studied with the general theory of relavity. But even absent this more general theory we can make sense of this question. The idea is to define all kinematic quantities, including acceleration, as 4-dimensional objects that are parameterized by the spaceship’s clock, \(\tau\). By doing so these quantities possess certain invariances to undergoing a Lorentz transformation – their properties are independent of our way of observing, or parameterizing, them. (These derivations will be provided in a follow-up post.) This is useful here, and indispensible in more complicated cases.</p>
<p>The so-called 4-acceleration is the rate of change of the 4-velocity as a function of proper time, \(\tau\):</p>
<p>\begin{aligned}
A = dU/d\tau = [a^0, a^1, a^2, a^3]
\end{aligned}</p>
<p>In our reduced dimension case, we only need consider 1 spatial component. We can show that \(A\) is of the form: \(a^0 = a\gamma v\), \(a^1 = a\gamma\) and \(a^2 = a^3 = 0\), for some parameter \(a\). In fact, in Minkowsky space, with the metric signature (-1,1,1,1), we have \(A\cdot A = a^2\). Further, in an inertial frame that is at some moment \(t\) moving along with the ship at exactly its velocity, \(v(t)\), we see that \(a^0 = 0\) and \(a^1 = a\). This means that in the inertial frame that is momentarily moving along with the ship, there are no relativistic effects and the ship has acceleration \(a\) – thus \(a\) can be thought of as the acceleration <em>as experienced by those on the ship, and what we have control over by adjusting the thrusters</em>.</p>
<h3 id="the-kinematic-equations">The kinematic equations</h3>
<p>From an outside observer, it is quite straightforward to derive the following relations:</p>
\[\begin{aligned}
dx/d\tau = \sinh\left(\int_0^{\tau(t)} a(s)\,ds\right)\\
dt/d\tau = \cosh\left(\int_0^{\tau(t)} a(s)\,ds\right)
\end{aligned}\]
<p>And thus we have</p>
<p>\begin{aligned}
v(t) = dx/dt = \tanh\left(\int_0^{\tau(t)} a(s)\,ds\right)
\end{aligned}</p>
<p>The quantity \(\phi = \int_0^\tau a(s)\,ds\) is known as the rapidity. Note that it has the form simply of the integrated ‘local’ acceleration – it is the velocity occupants on the ship would be moving at if relativistic effects were absent.</p>
<p>This equation for velocity as a function of rapidity can be integrated by time to give the ship’s position in frame <strong>R</strong>.</p>
<h3 id="the-simulation">The simulation</h3>
<p>The simulation below computes rapidity as a function of force applied to the ship, from which it can compute \(v\) and thus \(x\). It also tracks the ship’s mass, proper time, momentum and Lorentz factor, all assuming the ship has unit mass. You can change the force with the slider below.</p>
<p>First we plot things for an outside observer, frame <strong>R</strong>.</p>
<div id="observablehq-viewof-options-8839b668"></div>
<div id="observablehq-viewof-reset_widget-8839b668"></div>
<div id="observablehq-rest_frame-8839b668"></div>
<div id="observablehq-speedControl-8839b668"></div>
<div id="observablehq-Force-8839b668"></div>
<div id="observablehq-stats-8839b668"></div>
<script type="module">
import {Runtime, Inspector} from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/@benlansdell/a-special-relativity-simulator.js?v=3";
new Runtime().module(define, name => {
if (name === "viewof options") return new Inspector(document.querySelector("#observablehq-viewof-options-8839b668"));
if (name === "viewof reset_widget") return new Inspector(document.querySelector("#observablehq-viewof-reset_widget-8839b668"));
if (name === "rest_frame") return new Inspector(document.querySelector("#observablehq-rest_frame-8839b668"));
if (name === "speedControl") return new Inspector(document.querySelector("#observablehq-speedControl-8839b668"));
if (name === "Force") return new Inspector(document.querySelector("#observablehq-Force-8839b668"));
if (name === "stats") return new Inspector(document.querySelector("#observablehq-stats-8839b668"));
return ["plot_rest_frame","state","a","t","tau","p","x","rapidity","p_g","x_g","v_g","plot_moving_frame","v","moving_frame","m_x_func","m_t_func","gamma","mass","energy"].includes(name);
});
</script>
<p>Denote by <strong>R’</strong>(t) the inertial frame that, at time \(t\), is moving with speed \(v(t)\) relative to <strong>R</strong>, with the origin shifted to be the location of the ship. We can then plot the worldline from <strong>R’</strong>(t).</p>
<div id="observablehq-viewof-reset_widget_local-4e2e7ebe"></div>
<div id="observablehq-speedControl_l-4e2e7ebe"></div>
<div id="observablehq-Force_l-4e2e7ebe"></div>
<div id="observablehq-moving_frame-4e2e7ebe"></div>
<script type="module">
import {Runtime, Inspector} from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/@benlansdell/a-special-relativity-simulator.js?v=3";
new Runtime().module(define, name => {
if (name === "viewof reset_widget_local") return new Inspector(document.querySelector("#observablehq-viewof-reset_widget_local-4e2e7ebe"));
if (name === "speedControl_l") return new Inspector(document.querySelector("#observablehq-speedControl_l-4e2e7ebe"));
if (name === "Force_l") return new Inspector(document.querySelector("#observablehq-Force_l-4e2e7ebe"));
if (name === "moving_frame") return new Inspector(document.querySelector("#observablehq-moving_frame-4e2e7ebe"));
return ["state_l","a_l","plot_moving_frame"].includes(name);
});
</script>
<p>Some things to note:</p>
<ul>
<li>
<p>In frame <strong>R’</strong>(t), the acceleration vector points entirely in the \(x\) axis and the velocity vector coincides with the time axis \(t'\) – this makes sense since in this frame it has, at this instance, zero velocity. It’s clear in this frame that the acceleration vector is always perpendicular to the velocity vector – acceleration is the curvature of the worldline.</p>
</li>
<li>
<p>The velocity and acceleration vectors in the rest frame <strong>R</strong> are simply those same vectors in <strong>R’</strong>, Lorentz transformed accordingly – their Lorentz invariance means that they have the same magnitude and stay orthogonal to one another – acceleration is still the curvature of the worldline.</p>
</li>
<li>
<p>The worldline drawn in <strong>R’</strong>(t) is, in my opinion at least, not the most intuitive thing to interpret, as the reference frame always changes velocity as the ship does. You can see, however, that once no force is applied, the ship moves with constant velocity and the worldline drags straight behind the ship. More generally, in this frame the trajectory appears, basically, as the turning point of a parabola which is up or down depending on the sign of the acceleration.</p>
</li>
<li>
<p>If you turn on the Show Galilean Motion option, it shows the trajectory a ship will follow in <strong>R</strong> under normal, Newtonian mechanics. A key difference you can notice is that it can move faster than the speed of light, as evidenced by being able to move outside of the origin’s lightcone.</p>
</li>
<li>
<p>A final note is an admission of some subtley that comes with the way this simulation was described and setup. Our simulation clock counts ticks of rest time \(t\), and yet we’re imagining that we’re on board the ship, changing its thrusters. The relative rate at which proper time, \(\tau\), ticks over can be, depending on \(\gamma\), significantly slower. The temporal resolution at which we’re able to issue commands to the ship and respond to changes in its motion thus <em>increases</em> as time dilation increases. This isn’t so realistic. Is it better to instead run the simulation with a fixed proper time stepsize? The issue with that is that I think its easier to get a sense for the motion from a single inertial frame <strong>R</strong>, which dictates using an external clock. A fixed step size in \(\tau\) could become arbitrarily large in \(t\), so we would lose numerical precision in our external view of things.</p>
</li>
</ul>
<p>Hopefully this simulation gives a bit more intuition about how kinematics work in special relativity – basically: constant acceleration is a hyperbola instead of a parabola, with the extra momentum going into the object’s mass and not its velocity. For our next post, we will incorporate gravity into the picture with Einstein’s general theory of relativity.</p>Ben LansdellHere we imagine we are in control of a powerful spaceship, navigating flat spacetime (i.e. away from any massive objects which could curve spacetime). We can provide thrust in only one spatial dimension, either forwards or backwards.Minkowsky space2021-06-10T00:00:00+00:002021-06-10T00:00:00+00:00https://benlansdell.github.io/expositions/posts/minkowsky<p>Here is an exploration of Minkowsky space – the space-time prescribed by the postulates of special relativity – in 1 space dimension (and 1 time dimension).</p>
<p>A theory of relativity tells us how measurements of space (e.g. location, length) and time (e.g. duration) taken by two observers are related to one another. Objects are always in motion relative to one another, and such measurements are of course fundamental to any theory of mechanics, thus such theories are foundational to physics.</p>
<p>The fundamental object in such a theory is an <em>event</em>, which is simply a point in space and time. How do two observers measure the same event?</p>
<h3 id="galilean-space-time">Galilean space-time</h3>
<p>Before describing the space-time prescribed by special relativity, we can describe what is perhaps the naive way of relating events measured by two observers – that of pre-Einsteinian, Galilean space-time.</p>
<p>Shown below are the coordinate frames for two observers, one moving relative to one other. The stationary observer has the standard Cartesian grid (red), while the moving observer has a grid that is a shear mapping of the Cartesian grid (blue; though they are overlapped in the default image to make purple, change the relavtive velocity to see the two axes), the amount of shear determined by the relative velocity of the frames.</p>
<p>The idea is to think of events, being points in space and time, as being located on this chart somewhere. By overlapping the two observers’ coordinate frames we can say how each observer would measure a given event. Note that the overlapping coordinates here are setup so that both have the same origin.</p>
<p>The shear transformation shown here is in fact the Galilean transformation. Supposing \(x\) is the spatial location of an event in the stationary frame, and \(x\) is the spatial location of the event in the moving frame. Similarly for the times \(t\) and \(t'\). Then, given a relative frame velocity \(v\), they are related by
\(\begin{aligned}
x' &= x - vt,\\
t' &= t.
\end{aligned}\)
You can see the effect of changing the velocity on the axes below.</p>
<div id="observablehq-viewof-v_g-d6e7403f"></div>
<div id="observablehq-viewof-lightcone_g-d6e7403f"></div>
<div id="observablehq-pg-d6e7403f"></div>
<div id="observablehq-stats-d6e7403f"></div>
<script type="module">
import {Runtime, Inspector} from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/@benlansdell/minkowsky-space.js?v=3";
new Runtime().module(define, name => {
if (name === "viewof v_g") return new Inspector(document.querySelector("#observablehq-viewof-v_g-d6e7403f"));
if (name === "viewof lightcone_g") return new Inspector(document.querySelector("#observablehq-viewof-lightcone_g-d6e7403f"));
if (name === "pg") return new Inspector(document.querySelector("#observablehq-pg-d6e7403f"));
if (name === "stats") return new Inspector(document.querySelector("#observablehq-stats-d6e7403f"));
return ["g_x_func","g_x_inv","plot_gallileo","event_xp_g","event_x_g"].includes(name);
});
</script>
<ul>
<li>
<p>Single clicking on the plot above places an event on the axes, considered fixed in the stationary frame, with corresponding coordinates in each frame shown below. Note that the moving observer measures this event differently if you change the relative velocity of the two frames with the slider above. It also draws a linear trajectory from the origin to this event – we can imagine some particle travelling along this trajectory. We know its start and end points, and so can compute its velocity in both coordinate frames, as shown.</p>
</li>
<li>
<p>Double clicking on the plot above also places an event on the axes. This event is considered to be fixed in the moving frame, however, and so now if you change the relative velocity of the frames the event will (in our ‘non-moving’ frame) be shifted along with it.</p>
</li>
</ul>
<h3 id="from-galileo-to-minkowsky">From Galileo to Minkowsky</h3>
<p>Galilean space-time has a curious feature, in particular when it comes to measuring the speed of light. Einstein noted that, in the scenario above, in which both observers are in so-called inertial references frames, moving relative to one another, there is no privileged frame in which an observer can rightfully claim to ‘really be the one at rest’, and that it is the others that are ‘really moving’. This means that both observers, in their own frames of reference, cannot do any experiments that can tell them it is they who are stationary, and not the other observer. <strong>The laws of physics are the same for all inertial observers</strong>. This is Einstein’s first postulate. His second postulate follows naturally from this line of thinking: <strong>the speed of light is the same for all inertial observers</strong>, regardless of the velocity of the object that emitted the light. Some reflection suggests that the second postulate is incompatible with the Galilean picture above.</p>
<p>This is easy to see in the above Galilean axes. Turn on the light cone on the axes. This draws the trajectory a particle traveling from the origin at the speed of light, relative to the stationary observer, would take in the diagram (we have normalized units here so that the speed of light \(c\) is 1, and hence this line has slope \(\pm 1\)). Select an event that sits on this light cone somewhere. By sitting on the light cone it has speed \(c\) according to the stationary frame. But observe that this is not the case as measured by the moving observer. Indeed, as the relative velocity of the two frames is changed, the velocity of the packet of light, as measured by the other observer, changes.</p>
<h3 id="minkowsky-space-time">Minkowsky space-time</h3>
<p>Minkowsky space-time is what we get when we impose Einstein’s second postulate on a theory of relativity. The result is not a separate <em>Euclidean</em> space with a time dimension added time but, in a certain sense, a <em>hyperbolic</em> space-time in which space and time become interrelated. It has strange consequences for our notions of space, time, energy and mass.</p>
<div id="observablehq-viewof-v_m-8e80fb62"></div>
<div id="observablehq-viewof-lightcone_m-8e80fb62"></div>
<div id="observablehq-pm-8e80fb62"></div>
<script type="module">
import {Runtime, Inspector} from "https://cdn.jsdelivr.net/npm/@observablehq/runtime@4/dist/runtime.js";
import define from "https://api.observablehq.com/@benlansdell/minkowsky-space.js?v=3";
new Runtime().module(define, name => {
if (name === "viewof v_m") return new Inspector(document.querySelector("#observablehq-viewof-v_m-8e80fb62"));
if (name === "viewof lightcone_m") return new Inspector(document.querySelector("#observablehq-viewof-lightcone_m-8e80fb62"));
if (name === "pm") return new Inspector(document.querySelector("#observablehq-pm-8e80fb62"));
return ["plot_minkowsky","gamma","m_x_func","m_t_func","m_x_inv","m_t_inv","event_xp_m","event_x_m","event_t_m"].includes(name);
});
</script>
<p>Above is a representation of Minkowsky space, that is the space-time in which inertial frames of reference are related by the following transformation, known as the Lorentz transformation:
\(\begin{aligned}
x' &= \gamma(x - vt),\\
t' &= \gamma(t - vx/c^2),
\end{aligned}\)
where \(\gamma = 1/\sqrt{1-v^2/c^2}\).</p>
<p>If you play around with specifing events on these coordinates and with different relative frame velocities, you’ll notice some interesting things.</p>
<ul>
<li>
<p>First, as designed, the two frames share the same light cone. You’ll notice an event placed on the light cone stays on the light cone, regardless of the relative velocity of the two reference frames. In other words, the two observers always agree on the speed of light.</p>
</li>
<li>
<p>Second, events that are above the light-cone stay above the light-cone, regardless of relative frame velocities. Similarly for events that are below the light cones – they stay below the light cone. There is a strong division of Minkowsky space-time into regions which can be reached by slower-than-light travel from the origin, and those that cannot. Events that sit in this latter space will <em>never</em> be affected by an event at the origin, and nor will any of these events affect anything happening at the origin.</p>
</li>
<li>
<p>Third, the two frames of reference may no longer agree on the time that an event took place. Two events that may appear simultaneous (e.g. lying on the same ‘horizontal’ line) to one observer may not appear as simultaneous to the other. Special relativity abandons the notion of absolute time – the notion that time proceeds equally for all throughout the whole universe, and that there is a well-defined way in which two spatially separated events can be judged to have occurred simultaneously that all observers would agree on.</p>
</li>
<li>
<p>Fourth, in particular, a moving clock will appear, from the stationary observer, to run <em>slowly</em>. Set the relative frame velocity to zero, and place an event in the moving frame (double click on the axes) at (x’ = 0, t’ = 1). I.e. an event at the moving frame’s spatial origin, <em>one unit of time after time 0</em>. When there is no relative velocity, of course the two frames measure the time of this event as the same. As the relative velocity is increased, however, ‘one unit of time later’ for the moving observer becomes <em>more</em> than one unit of time later for the stationary observer. More time will appear to have passed for the stationary observer than for the moving one. Weird.</p>
</li>
</ul>
<p>This is just to highlight some of the most significant and interesting features of this model of space-time. There is much still to explore.</p>Ben LansdellHere is an exploration of Minkowsky space – the space-time prescribed by the postulates of special relativity – in 1 space dimension (and 1 time dimension).A very brief introduction to convex optimization – Part 22021-03-10T00:00:00+00:002021-03-10T00:00:00+00:00https://benlansdell.github.io/expositions/posts/convexity-part-2<p>In this follow-up post, we explore how some theory of convex functions and optimization relates to a common and powerful method in optimization – Lagrange multipliers</p>
<h3 id="1-conjugate-functions">1. Conjugate functions</h3>
<p>To being, we’ll introduce the concept of a conjugate function. Let \(f:\mathbb{R}^n\to\mathbb{R}\), then we can define \(f^*:\mathbb{R}^n\to\mathbb{R}\) as</p>
\[f^*(\lambda) = \sup_{x\in\text{dom} f} (\lambda^T x - f(x)).\]
<p>As you’ll recall from our last post, as this is the pointwise supremum over a set of convex (linear) functions, it is itself convex. This is true regardless of whether \(f\) is convex. This is known as the conjugate of \(f\).</p>
<p>How can this be convex even if \(f\) is not? Here’s one example:</p>
<p>Consider the non-convex function \(f(x) = x^2(x-1)(x+1)\)</p>
<p>Its conjugate is obtained through computing the following maximum:</p>
<iframe width="100%" height="719" frameborder="0" src="https://observablehq.com/embed/@benlansdell/convex-optimization-tutorial-part-2?cells=viewof+lambda%2Cx2_conjugate%2Cx2_conjugate_func"></iframe>
<p>Which is clearly convex.</p>
<h3 id="some-more-examples">Some more examples</h3>
<ol>
<li>Affine functions. If \(f(x) = ax+b\) then \(f^*:\{a\}\to\mathbb{R}$ and $f^*(a) = -b\)</li>
<li>Exponential. If \(f(x) = \exp(x)\) then \(f^*:\mathbb{R}_+\to\mathbb{R}\) and \(f^*(x) = x\log x - x\)</li>
<li>Negative entropy. If \(f(x) = x\log x\) then \(f^*(x) = \exp(x-1)\).</li>
<li>Strictly convex quadratic function. If \(f(x) = \frac{1}{2}x^TQx\) with \(Q\) positive definite, then \(f^*(x) = \frac{1}{2}x^TQ^{-1}x\)</li>
</ol>
<p>If \(f(x)\) is convex then, given some additional technical condtions, \(f(x) = f^{**}(x)\), justifying the use of the term conjugate.</p>
<h3 id="2-lagrangian-duality">2. Lagrangian duality</h3>
<p>The conjugate relates to an important concept in convex optimization, known as Lagrangian duality.</p>
<p>This can be thought of as a generalization of a common method in optimization, that of Lagrange multipliers, which I’ll review first.</p>
<h3 id="21-lagrange-multipliers">2.1 Lagrange multipliers</h3>
<p>Let’s consider the optimization problem</p>
\[\min f(x)\\
h_i(x) = 0\]
<p>Unlike above, here we don’t (yet) assume \(f(x)\) is convex.</p>
<p>Lagrange multipliers are a method for solving this constrained minimization problem by converting it into an unconstrained problem. This can be generalized (see below), but the basic method I’ll present in this section only deals with equality constraints.</p>
<p>The idea is to <em>augment</em> the objective function with a weighted sum of the constraint functions, forming what is known as the Lagrangian:</p>
\[\mathcal{L}(x, \nu) = f(x) + \sum_{i=1}^p \nu_i h_i(x)\]
<p>The variables \(\nu_i\) are known as the Lagrange multipliers.</p>
<p>The basic idea is that by looking for stationary points of the <em>unconstrained</em> Lagrangian</p>
\[\nabla \mathcal{L}(x,\nu) = 0\]
<p>we can obtain solutions to the original problem. Why does this work? First note that solving \(\frac{\partial \mathcal{L}}{\partial x} = 0\) and \(\frac{\partial \mathcal{L}}{\partial \nu}=0\) gives:</p>
\[\begin{align*}
\frac{\partial \mathcal{L}}{\partial x} = 0 &\Rightarrow \nabla_x f(x) + \sum_{i=1}^p \nu_i \nabla_x h_i(x) = 0\\
\frac{\partial \mathcal{L}}{\partial \nu} = 0 &\Rightarrow h_i(x) = 0, \forall i
\end{align*}\]
<p>Thus finding stationary points of \(\mathcal{L}\) will correspond to points satisfying the constraints.</p>
<p>But why should it minimize the function given the constraint? The graphical intuition is the following:</p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/LagrangeMultipliers2D.svg" alt="title" /></p>
<h3 id="22-the-dual-problem">2.2 The dual problem</h3>
<h4 id="the-dual-function">The dual function</h4>
<p>Let’s begin the process of generalizing this approach to deal with inequality constraints, too. That is, let’s turn to the problem</p>
\[\min f(x)\\
g_i(x) \le 0,\quad i = 1, \dots, m \\
h_j(x) = 0,\quad i = 1, \dots, p\]
<p>Define the Lagrangian of the this problem as:</p>
\[\mathcal{L}(x, \lambda, \nu) = f(x) + \sum_{i = 1}^m\lambda_i g_i(x) + \sum_{j=1}^p\nu_i h_i(x)\]
<p>Again, \(\lambda\) and \(\nu\) are Lagrange multipliers, or <em>dual variables</em>.</p>
<p>From this we can define the Lagrange dual function:</p>
\[g(\lambda, \nu) = \inf_{x\in\mathcal{X}}\mathcal{L}(x, \lambda, nu).\]
<p>Now, we use the same property as above: the pointwise <em>infimum</em> (read minimum) of a family of affine functions of \((\lambda,\nu)\) is concave. This is true even when the optimization problem above is not convex.</p>
<p>For a common problem class there is a relation between the dual function and the conjugate that can facilitate computation of \(g(\lambda, \nu)\). For problems of the form:</p>
\[\min f(x)\\
Ax \preceq b,\\
Cx = d\]
<p>then</p>
\[\begin{align}
g(\lambda, \nu) &= \inf_x \left(f(x) + \lambda^T (Ax-b) + \nu^T(Cx-d) \right)\\
&= -b^T\lambda - d^T\nu + \inf_x \left(f(x) + \lambda^T (A^T\lambda + C^T\nu)x \right)\\
&= -b^T\lambda - d^T\nu -f^*(-A^T\lambda - C^T\nu)
\end{align}\]
<p>An important property is that the dual function is a lower bound for the solution to the original problem. Call \(p^*\) the minimum obtained at the optimal solution to the problem: \(p^* = f(x^*)\).</p>
<p>The original problem is known as the primal problem. Then, we have:</p>
\[g(\lambda, \nu) \le p^*\]
<p>for any \(\lambda \succeq 0\) and for any \(\nu\).</p>
<p>This is easy to show: the optimal \(x^*\) satisfies the constraints \(g_i(x)\le 0\) and \(h_i(x) = 0\), thus</p>
\[\sum_{i=1}^m\lambda_ig_i(x^*) \le 0\]
<p>and</p>
\[\sum_{i=1}^p\nu_ih_i(x^*) = 0\]
<p>thus</p>
\[L(x^*, \lambda, \nu) = f(x^*) + \sum_{i=1}^m\lambda_ig_i(x^*) + \sum_{i=1}^p\lambda_ih_i(x^*) \le f(x^*).\]
<p>This means:</p>
\[g(\lambda, \nu) = \inf_x L(x, \lambda, \nu) \le L(x^*, \lambda, \nu) \le f(x^*) = p^*\]
<h4 id="the-dual-problem">The dual problem</h4>
<p>Ok, so what do we do with this lower bound? A natural thing to do is to ask, how high can we make this lower bound? This is the dual problem:</p>
\[\max_{\lambda \succeq 0, \nu} g(\lambda, \nu)\]
<p>The solution is denoted \((\lambda^*, \nu^*)\), the <em>dual optimal</em> solution. Since this is a maximization of a concave function, it is a convex problem, even if the primal problem is not.</p>
<p>Call \(d^* = g(\lambda^*, \nu^*)\). Then, from above, we have</p>
\[d^*\le p^*\]
<p>The difference \(p^* - d^*\) is known as the optimal duality gap.</p>
<p>This general inequality is known as <em>weak duality</em>. Even weak duality can be useful. As in some cases the dual problem may be efficiently solvable (being a convex problem) while the original one is much more challenging. Thus it can be useful to find a lower bound on the primal solution.</p>
<h4 id="strong-duality">Strong duality</h4>
<p>When the gap is zero:</p>
\[d^* = p^*\]
<p>then <em>strong duality</em> holds. Now the dual problem can say a lot more about our original problem, which we’ll cover momentarily. Strong duality holds when the problem is convex: this is known as Slater’s condition. But it can hold under other conditions also.</p>
<h4 id="some-examples">Some examples</h4>
<p>This has been quite a bit of theory. So what are some examples of these concepts?</p>
<p>Well, let’s consider again the simple quadratic function \(f(x) = x^2/2\) with inequality constraints \(x - a \le 0\). The Lagrangian is</p>
\[L(x, \lambda) = x^2/2 + \lambda(x-a)\]
<p>thus the dual function is</p>
\[g(\lambda) = \inf_x L(x, \lambda) = -\lambda^2/2 - \lambda a\]
<p>We can see that by maximizing \(g(\lambda)\) over \(\lambda \ge 0\) we get:</p>
\[\lambda^* = \max(-a, 0), \quad d^* = \begin{cases}
a^2/2, \quad a < 0;\\
0, \quad a \ge 0
\end{cases}\]
<p>which matches the optimal solution \(p^*\).</p>
<p>Strong duality holds here, as we can see in this olot. The top axes show the original optimization problem, with the red line indicating the primal optimal solution. The bottom axes show the dual optimization problem, with the dashed line showing the dual optimal solution. We can see the red and dashed lines overlap, regardless of the value of \(a\).</p>
<iframe width="100%" height="718" frameborder="0" src="https://observablehq.com/embed/@benlansdell/convex-optimization-tutorial-part-2?cells=viewof+a%2Cx2_primal_problem%2Cx2_dual_problem"></iframe>
<p>What if we try with a non-convex function? Now let \(f(x) = -x^3+x\), again with \(x\le a\).</p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/primal_dual_gap.svg" alt="duality_gap" /></p>
<p>This time we do have a duality gap, as evidenced by the gap between the red and dashed curves.</p>
<h4 id="min-max-interpretation">Min-max interpretation</h4>
<p>As in interesting aside, the primal problem can be expressed in a way that gives a nice symmetry to the two problems.</p>
<p>Consider the case where we only have inequality constraints. Then we have:</p>
\[\sup_{\lambda \succeq 0} L(x,\lambda) = \sup_{\lambda \succeq 0}\left(f(x) + \sum_{i=1}^m \lambda_i g_i(x)\right) = f(x)\]
<p>This is because, provided the constraints are satisfied, \(g_i(x) \le 0\) and the best choice is \(\lambda = 0\). Thus we can write:</p>
\[p^* = \inf_x f(x) = \inf_x \sup_{\lambda \succeq 0} L(x,\lambda)\]
<p>and, by our earlier definitions, we have:</p>
\[d^* = \sup_{\lambda\succeq 0}\inf_x L(x,\lambda).\]
<p>This means weak duality implies:</p>
\[\sup_{\lambda\succeq 0}\inf_x L(x,\lambda) \le \inf_x \sup_{\lambda\succeq 0} L(x,\lambda).\]
<p>This is in fact a general result that any function \(f(x,\lambda)\) satisfies. This is known as the max-min inequality.</p>
<p>Strong duality implies:</p>
\[\sup_{\lambda\succeq 0}\inf_x L(x,\lambda) = \inf_x \sup_{\lambda\succeq 0} L(x,\lambda)\]
<p>(This is also known as the saddle point property, because the optimal point \((x^*, \lambda^*)\) is in fact a saddle point of \(L\). This result is known as the minimax theorem, proved by von Neumann in the context of his work on game theory).</p>
<p>We can see the saddle point property at play if we plot the Lagrangian for our earlier quadratic problem:</p>
\[\min x^2, \quad x \le a\]
<p>The Lagrangian is:</p>
\[L(x, \lambda) = x^2 + \lambda(x-a)\]
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/saddle_point.png" alt="saddle_point" /></p>
<h3 id="3-generalizing-lagrange-multipliers-the-kkt-conditions">3. Generalizing Lagrange multipliers: the KKT conditions</h3>
<h4 id="complementary-slackness">Complementary slackness</h4>
<p>When strong duality holds we have an important property of the optimal solution known as complementary slackness.</p>
<p>We have</p>
\[\begin{align}
f(x^*) &= g(\lambda^*, \nu^*)\\
&= \inf_x\left(f(x) + \sum_{i=1}^m\lambda^*_i g_i(x) + \sum_{i=1}^p\nu^*_i h_i(x)\right)\\
&\le f(x^*) + \sum_{i=1}^m\lambda^*_i g_i(x^*) + \sum_{i=1}^p\nu^*_i h_i(x^*)\\
&\le f(x^*)
\end{align}\]
<p>This implies that</p>
\[\sum_{i=1}^m \lambda_i^* g_i(x^*) = 0\]
<p>and, since each term in the sum is nonpositive, then in fact each term must be zero:</p>
\[\lambda_i^* g_i(x^*) = 0, \quad i = 1, \dots, m.\]
<p>Usefully, this property gives additional equality constraints a solution must satisfy to be optimal. In particular, it means that either \(\lambda_i^* = 0\) or \(g_i(x^*) = 0\). In other words, when \(\lambda_i^* > 0\) then the inequality constraint \(g_i\) must be tight. If \(\lambda_i = 0\) then it can be slack.</p>
<p>We actually saw complementary slackness at play in our simple quadratic example above. If we plot \(g(x^*) = x^* - a, \lambda^*\) as a function of the inequality constraint parameter \(a\) (recall \(x \le a\)):</p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/complementary_slackness.svg" alt="cs" /></p>
<p>We see for \(a\le 0\) the constraint is tight, and \(x^* = a\). For \(a>0\), then the optimal solution is \(x^* = 0\), and \(x^* - a\) becomes slack.</p>
<h4 id="kkt-for-non-convex-problems">KKT for non-convex problems</h4>
<p>Here we assume the functions \(f(x), g_i(x), h_i(x)\) are all differentiable. We can argue that \(x^*\) minimizes \(L(x, \lambda^*, \nu^*)\), and therefore the gradient must vanish at \(x^*\):</p>
\[\nabla f(x^*) + \sum_{i=1}^m\lambda_i^*\nabla g_i(x^*) + \sum_{i=1}^p\nu_i^*\nabla h_i(x^*) = 0\]
<p>Now we’re in a position to see how all this theory relates to our optimization problem, when strong duality obtains.</p>
<p>Let’s collect conditions an optimal solution \((x^*, \lambda^*, \nu^*)\) must satisfy, for problems with strong duality. We have:</p>
\[\begin{align}
g_i(x^*) &\le 0, \quad i=1, \dots, m\\
h_i(x^*) &= 0, \quad i=1, \dots, p\\
\lambda^*_i &\ge 0, \quad i=1, \dots, m\\
\lambda_i^* g_i(x^*) &= 0, \quad i=1, \dots, m\\
\nabla f(x^*) + \sum_{i=1}^m\lambda_i^*\nabla g_i(x^*) + \sum_{i=1}^p\nu_i^*\nabla h_i(x^*) &= 0
\end{align}\]
<p>These are known as the Karush-Kuhn-Tucker (KKT) conditions. They are <em>necessary</em> conditions for an optimal solution.</p>
<p>Note that if we have no inequality constraints, the above conditions simplify to the method of Lagrange multipliers that we discussed above:</p>
\[\begin{align}
h_i(x^*) &= 0,\quad i =1, \dots, p\\
\nabla f(x^*) + \sum_{i=1}^p\nu_i^*\nabla h_i(x^*) &= 0
\end{align}\]
<h4 id="kkt-for-convex-problems">KKT for convex problems</h4>
<p>When the primal problem is convex, the KKT conditions are also <em>sufficient</em> for an optimal solution.</p>
<p>To summarize all of the above: for any differentiable optimization problem for which strong duality obtains, the KKT conditions provide necessary conditions for an optimal solution. Algorithms focus on finding all points which satisfy such conditions, and from those finding the globally optimal solution.</p>
<p>When the problem is convex, KKT is also sufficient, and <em>any</em> solution that satisifes the conditions is optimal.</p>
<p>Outside of convexity, there are a range of other <em>constraint qualifications</em> that imply that a particular problem has strong duality, and therefore that KKT is relevant.</p>
<h2 id="take-home-messages">Take-home messages</h2>
<p>After going through this tutorial, you should now know:</p>
<ul>
<li>What convexity is and why it is important for optimization</li>
<li>How some properties of convex functions are used to prove convergence of gradient descent</li>
<li>What primal and dual optimization problems are</li>
<li>How the KKT conditions generalize Lagrange multipliers for inequality constraint problems</li>
</ul>Ben LansdellIn this follow-up post, we explore how some theory of convex functions and optimization relates to a common and powerful method in optimization – Lagrange multipliersA very brief introduction to convex optimization – Part 12021-02-28T00:00:00+00:002021-02-28T00:00:00+00:00https://benlansdell.github.io/expositions/posts/convexity-part-1<p>Here I cover a basic introduction to concepts and theory of convex optimization. The goal is to give an impression of why this is an important area of optimization, what its applications are, and some intiution for how it works. This is of course not meant to overview all areas of convex optimization, it’s a huge topic, but more to give a flavor of the area by describing some results and theory, particularly as they relate to other areas that may be familiar to some (e.g. the method of Lagrange multipliers). By presenting this in a notebook the aim is to focus on providing some geometric intuition whenever possible through plotting simple examples whose parameters you can play with. Images not generated in this notebook are taken from one of the standard references: <a href="https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf">Convex Optimization</a>, by Boyd and Vandenberghe.</p>
<p>This will be a (for the moment) two part post. In this part I will cover:</p>
<ol>
<li>Why care about convexity?</li>
<li>Basics of convex functions and sets</li>
<li>A convergence proof of gradient descent</li>
</ol>
<p>It will assume some basic familiarity with the idea of optimization, linear algebra and some machine learning basics.</p>
<h3 id="overview">Overview</h3>
<p>Most machine learning problems end up as some form of optimization problem, thus a basic understanding of optimization is very useful, or sometimes necessary, to solve a given problem.</p>
<p>For instance, in simple linear regression, given some data \((y, X)\) and a model \(y \sim X\beta + \epsilon\), we aim to find the weights \(\beta\) that minimize:</p>
\[\beta^* = \text{argmin}_\beta \|y - X\beta\|^2_2.\]
<p>In general, we consider the following basic problem:</p>
\[x^* = \text{argmin}_{x\in\mathcal{X}} f(x)\]
<p>subject to constraints:</p>
\[g_i(x) \le 0\\
h_j(x) = 0.\]
<p>Convex optimization deals with problems in which \(f(x)\) and \(g_i(x)\) are convex functions, and \(h_j(x)\) are affine (of the form \(a_j^Tx = b_j\)).</p>
<h3 id="1-why-care-about-convex-optimization">1. Why care about convex optimization?</h3>
<p>There are a range of reasons:</p>
<ol>
<li>When your problem is convex, a locally optimal solution is globally optimal -> Can use gradient-based methods confidently</li>
<li>Shows up in common optimization problems
<ul>
<li>Linear least squares</li>
<li>Logistic regression</li>
<li>Weighted least squares</li>
<li>Any of these with L1 or L2 regularization</li>
</ul>
</li>
<li>There is a lot of associated theory. Convexity is quite a strict requirement, this provides a lot of structure, which mean we can apply strong theory and geometric intuitions which can provide a good understanding of a problem</li>
<li>Can be relevant even for non-convex problems:
<ul>
<li>Can turn into a convex problem (primal -> dual problem, see below)</li>
<li>Can approximate with a convex function to initialize a local optimization method</li>
<li>Common heuristics: convex relaxation for finding sparse solutions, e.g. \(L_0\) to \(L_1\) relaxation</li>
<li>Bounds for global optimization</li>
</ul>
</li>
<li>Concepts that naturally arise in convex optimization are important elsewhere, like the theory of Lagrangians</li>
</ol>
<p>Somewhat like linear algebra, because you can do a lot with convex optimization, it is quite foundational to optimization.</p>
<p>Ok, but what is a convex function?</p>
<h3 id="2-basics-of-convex-functions-and-sets">2. Basics of convex functions and sets</h3>
<p>A convex <em>set</em> is a set \(C\) in which the line segment connecting any two points in the set is also in the set. That is, if \(x_1,x_2\in C\) and \(0\le \theta \le t\) then</p>
\[\theta x_1 + (1-\theta)x_2 \in C.\]
<p>Some examples (the middle one is <em>not</em> convex):
<img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/convexset.png" alt="title" /></p>
<p>A <em>function</em> \(f:\mathbb{R}^n\to\mathbb{R}\) is convex if \(\text{dom} f\) is convex and for \(0\le \theta \le 1, x_1, x_2 \in \text{dom} f\):</p>
\[f(\theta x_1 + (1-\theta)x_2) \le \theta f(x_1) + (1-\theta) f(x_2)\]
<p>This means that a line segement connecting any two points in the domain of \(f\) lies above the graph of \(f\):
<img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/convexfunc.png" alt="" /></p>
<h3 id="an-alternative-definition">An alternative definition</h3>
<p>If \(f\) is differentiable, then \(f\) is convex if and only if \(\text{dom} f\) is convex and</p>
\[f(y) \ge f(x) + \nabla f(x)^T(y-x)\]
<p>holds for all \(x,y\in\text{dom} f\).</p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/convexfunc2.png" alt="" /></p>
<p>This means local information of a convex function can tell us about global information of the function – this is a key property of convex functions.</p>
<p>For instance, if \(\nabla f(x) = 0\) then for all \(y\in\text{dom} f\) it is the case that \(f(y) \ge f(x)\). In other words, \(x\) is a global minimizer of \(f\).</p>
<p>A few more definitions:</p>
<p><strong>Strict convexity</strong></p>
<p>A function \(f\) is <em>strictly convex</em> if the inequality holds whenever \(x\ne y\). I.e. a linear function is not strictly convex</p>
<p><strong>Strong convexity</strong></p>
<p><em>Strong convexity</em> implies there is some positive \(m\) such that:</p>
\[\nabla^2 f(x) \succeq mI\]
<p>which can be shown to be equivalent to</p>
\[f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{m}{2}\|x - y\|^2_2\]
<p>for all \(x,y\in \mathcal{X}\).</p>
<p>This means the function can be lower bounded by a quadratic function with some fixed second derivative \(mI\) at all points \(x\in\mathcal{X}\).</p>
<h3 id="some-examples-of-convex-functions">Some examples of convex functions</h3>
<ol>
<li>The indicator function of a convex set, \(S\).</li>
</ol>
<p>If \(I_S(x)\) is defined as</p>
\[I_S(x) = \begin{cases}
0, \quad x\in S;\\
+\infty, \quad\text{else}
\end{cases}\]
<p>Then \(I_S(x)\) is convex</p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/indicator_function.png" alt="title" /></p>
<ol>
<li>Norms. Any norm is a convex function. Here is the \(L_1\) norm:</li>
</ol>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/l1_plot.png" alt="title" /></p>
<ol>
<li>Quadratic functions: \(f(x) = x^T P x + 2q^T x + r\) for \(P\) positive definite. The linear least squares in the introduction is a quadratic function of this form. For example:</li>
</ol>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/quad_plot.png" alt="title" /></p>
<ol>
<li>Common functions: \(1/x\) for \(x>0\), \(e^x\) for \(x\in\mathbb{R}\), \(x^2\) for \(x\in\mathbb{R}\). For instance:</li>
</ol>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/squared.png" alt="title" /></p>
<p><img src="https://raw.githubusercontent.com/benlansdell/expositions/gh-pages/assets/img/inverse.png" alt="title" /></p>
<h3 id="examples-of-strong-convexity">Examples of strong convexity</h3>
<p>\(f(x) = x^2\), somewhat trivally, <em>is</em> a strongly convex function.</p>
<p>\(f(x) = \exp(x)\) is <em>not</em> a strongly convex function. The general idea being something like functions that become arbitarily flat/linear in some direction are not strongly convex. This is simply shown below, in which for the quadratic function, we can find an \(m\) such that a quadatic approximation at any point \(x_0\) with second derivative \(m\) is always below \(f(x)\). This is not true for the exponential function.</p>
<iframe width="100%" height="849" frameborder="0" src="https://observablehq.com/embed/@benlansdell/convex-optimization-tutorials?cells=viewof+x0%2Cviewof+m%2Cx2_strong_convex%2Cexp_strong_convex"></iframe>
<h3 id="operations-that-preserve-convexity">Operations that preserve convexity</h3>
<p>Convexity is a strong property. So it’s good to know when we can preserve it.</p>
<ol>
<li>Non-negative weighted sums.</li>
</ol>
<p>If \(f_i\) are each convex, then the nonnegative weighted sum:</p>
\[f(x) = \sum_i w_i f_i(x)\]
<p>is convex, for \(w_i \ge 0\).</p>
<ol>
<li>Composition with nondecreasing functions.</li>
</ol>
<p>Let \(f = h(g(x))\). If \(h\) is a scalar function, \(h:\mathbb{R}\to\mathbb{R}\), then:</p>
<p>For example, if \(h\) is convex and nondecreasing, and \(g\) is convex, then \(f\) is convex.</p>
<ol>
<li>Pointwise maxima.</li>
</ol>
<p>If \(f_1\) and \(f_2\) are convex, then the pointwise maximum:</p>
\[f(x) = \max\{f_1(x), f_2(x)\}\]
<p>is also convex. The proof is simple:</p>
\[\begin{align}
f(\theta x + (1-\theta)y) &= \max\{f_1(\theta x + (1-\theta)y), f_2(\theta x + (1-\theta)y)\}\\
&\le \max\{\theta f_1(x) + (1-\theta)f_1(y), \theta f_2(x) + (1-\theta)f_2(y)\}\quad\text{(conv. of $f_1,f_2$)}\\
&\le \theta\max\{ f_1(x), f_2(x) \} + (1-\theta)\max\{f_1(y),f_2(y)\}\quad\text{(replace $f_1$ w. max $f_1,f_2$)}\\
&= \theta f(x) + (1-\theta) f(y)
\end{align}\]
<p>This result extends to pointwise maximum over \(n\) functions:</p>
\[f(x) = \max\{(f_i(x)\}_{i=1}^n\]
<p>and also the pointwise supremum over an infinite set of convex functions. Let \(\{f_i(x)\}_{i\in I}\) be a collection of convex functions, then</p>
\[g(x) = \sup_{i\in I}f_i(x)\]
<p>is convex.</p>
<h3 id="3-convergence-of-gradient-descent">3. Convergence of gradient descent</h3>
<p>We demonstrate how the ideas presented here can be used to study optimization algorithms. We’ll prove the convergence of gradient descent for strongly convex functions. The argument is as follows.</p>
<p>For a strongly convex function that satisfies:</p>
\[\alpha I \preceq \nabla^2 f(x) \preceq \beta I\]
<p>for all \(x\in\mathcal{X}\) and \(0<\alpha \le \beta\), an equivalent condition is for</p>
\[f(y) \ge f(x) + \nabla f(x)(y-x) + \frac{\alpha}{2}\|y-x\|^2,\quad \forall x,y\in\mathcal{X}\]
<p>known as \(\alpha\)-strongly convex. And the relation condition</p>
\[f(y) \le f(x) + \nabla f(x)(y-x) + \frac{\beta}{2}\|y-x\|^2,\quad \forall x,y\in\mathcal{X}\]
<p>is known as \(\beta\)-smoothness.</p>
<p>In other words, for all points \(x\in\mathcal{X}\), the function \(f(y)\) can be bounded below and above by quadratic functions intersecting at \(f(x)\).</p>
<p>This can be used to show the following inequalities:</p>
\[\begin{align}
\frac{\alpha}{2}\|x^* - x\|^2&\le f(x) - f(x^*) \le \frac{1}{2\alpha}\|\nabla f(x)\|^2\quad \text{$\alpha$-strongly convex}\\
\frac{\beta}{2}\|x^* - x\|^2&\ge f(x) - f(x^*) \ge \frac{1}{2\beta}\|\nabla f(x)\|^2\quad\text{$\beta$-smoothness}\\
\end{align}\]
<p>Call the quantity \(h(x) = f(x)-f(x^*)\) the <em>primal gap</em>, the thing we are trying to reduce in our optimization.</p>
<p>These inequalities are useful because they let us bound the primal gap by the gradient and the amount we’re moving in \(x\) (a property of the algorithm, which is known).</p>
<p>Now consider the gradient descent update:</p>
\[x_{t+1} = x_t - \frac{1}{\beta}\nabla f(x_t)\]
<p>Then the above inequalities can be used to show how the primal gap converges:</p>
\[\begin{align}
h_{t+1} - h_t &= f(x_{t+1}) - f(x_t)\\
&\le \nabla f(x_t)(x_{t+1}-x_t) + \frac{\beta}{2}\|x_{t+1}-x_t\|^2\quad (\text{$\beta$-smoothness})\\
&= -\frac{1}{\beta}\|\nabla f(x_t)\|^2 + \frac{1}{2\beta}\|\nabla f(x_t)\|^2\quad (\text{definition of algorithm})\\
&= -\frac{1}{2\beta}\|\nabla f(x_t)\|^2\quad\\
&\le -\frac{\alpha}{\beta}h_t\quad\\
\end{align}\]
<p>Thus</p>
\[h_{t+1} = h_t(1 - \frac{\alpha}{\beta}),\]
<p>or</p>
\[h_{t} = h_0(1 - \frac{\alpha}{\beta})^t.\]
<p>Since \(\alpha < \beta\) then the algorithm converges. Further, how close \(\alpha\) is to \(\beta\) determines the convergence rate – convergence is fastest when \(\alpha\) is close to \(\beta\). This corresponds to the Hessian being closed to spherical (well-conditioned).</p>
<p>As an example, consider the quadratic function:</p>
\[f(x,y) = \frac{m_x}{2}x^2 + \frac{m_y}{2}y^2.\]
<p>This is strongly convex, with \(\alpha = \min(m_x, m_y)\) and \(\beta = \max(m_x, m_y)\). In the below widget we observe the convergence behavior (the primal gap as a function of gradient descent iteration). The green trace in the widget below shows the progression of the optimization for 20 iterations, after starting at (1,1). You can play with the functions parameters to see how that affects the rate of convergence. Apart from the first iteration, it’s linear on a log scale, as the above analysis would suggest. The slope depends on the ratio between \(\alpha\) and \(\beta\).</p>
<iframe width="100%" height="637" frameborder="0" src="https://observablehq.com/embed/@benlansdell/convex-optimization-tutorials?cells=viewof+alpha%2Cviewof+beta%2Cgd_widget"></iframe>
<h3 id="conclusion">Conclusion</h3>
<p>We’ve seen here the basic components of convex optimization, including what convexity is, some properties of convex functions, and how these relate to optimization with gradient descent. In the next post we will study how the theory of convex functions applies to Lagrange optimization and related concepts.</p>Ben LansdellHere I cover a basic introduction to concepts and theory of convex optimization. The goal is to give an impression of why this is an important area of optimization, what its applications are, and some intiution for how it works. This is of course not meant to overview all areas of convex optimization, it’s a huge topic, but more to give a flavor of the area by describing some results and theory, particularly as they relate to other areas that may be familiar to some (e.g. the method of Lagrange multipliers). By presenting this in a notebook the aim is to focus on providing some geometric intuition whenever possible through plotting simple examples whose parameters you can play with. Images not generated in this notebook are taken from one of the standard references: Convex Optimization, by Boyd and Vandenberghe.