JanetRossini.github.io

Lua, LSL, Blender, Python in Second Life


Project maintained by JanetRossini

Faster Linearize?

Aug 8, 2025 • [luamoverstesting]


In my never-ending quest for something to think about, today we’ll try to make linearizing a Bezier a bit faster, with a bit less memory impact. We’ll measure what we can.

The most important criterion for the new scheme is that it gets the same (correct) answer as the old scheme. We’ll write that test first.

_:test('new scheme gets same result', function()
    local old_split = splitter:split(3)
    local new_split = splitter:new_split(3)
    _:expect(new_split).is(old_split)
end)

I don’t really even know what comes back from split, but whatever it is, we want the same thing from new_split, at least for now.

I ;make the test pass trivially:

function BezierSplitter:new_split(numberOfSplits)
    return self:split(numberOfSplits)
end

Test passes and I bet it’s not much slower, either. (hahaha) Let’s do some work. My plan, tentatively, goes something like this:

  1. Create all the results into the same array, appending each level of split right onto the end.
  2. Next cycle just reads the current splits and splits them again, onto the end.
  3. When done, copy the final result to the output and return it.

I’m also planning to pre-allocate the array, and reuse it, to avoid some allocation.

As always, this may all turn out not to be worth it. We’ll see how things go. Can’t learn without trying.

After more effort than it might have been, I have the test passing with this:

function BezierSplitter:new_split(numberOfSplits)
    local array = table.clone(self.array)
    local start = 1
    local stop
    local new_start
    for s = 1, numberOfSplits do
        new_start = #array + 1
        stop = #array - start
        for i  = 1, stop, 4 do
            local a = array[start+i - 1]
            local b = array[start+i]
            local c = array[start+i + 1]
            local d = array[start+i + 2]
            local e = (a+b)/2
            local f = (b+c)/2
            local g = (c+d)/2
            local h = (e+f)/2
            local j = (f+g)/2
            local k = (h+j)/2
            for _, p in {a,e,h,k, k,j,g,d} do
                table.insert(array, p)
            end
        end
        start = new_start
    end
    local answer = {}
    table.move(array, start, #array, 1, answer)
    return answer
end

I don’t love this, and it took a bit of hammering to make it work but it is passing the test, which means it is getting the same answer as the old way, which is kind of the base line for an improved algorithm.

I’ll refactor some of the index calculations a bit:

Eeek, that didn’t work. Let me commit this much to git: then I can bet back to working after changes don’t work.

OK, this is a bit simpler and still works:

function BezierSplitter:new_split(numberOfSplits)
    local array = table.clone(self.array)
    local start = 1
    local stop
    local next_start
    for s = 1, numberOfSplits do
        next_start = #array + 1
        stop = #array
        for i  = start, stop, 4 do
            local a = array[i]
            local b = array[i + 1]
            local c = array[i + 2]
            local d = array[i + 3]
            local e = (a+b)/2
            local f = (b+c)/2
            local g = (c+d)/2
            local h = (e+f)/2
            local j = (f+g)/2
            local k = (h+j)/2
            for _, p in {a,e,h,k, k,j,g,d} do
                table.insert(array, p)
            end
        end
        start = next_start
    end
    local answer = {}
    table.move(array, start, #array, 1, answer)
    return answer
end

I guess I’m interested now in whether this is any faster. I don’t think it’s more clear, but maybe that can be improved, or maybe it’s OK if we get any speed. I have high doubts about that.

_:test("new vs old timing", function()
    local n = 10000
    local t_1 = os.clock()
    for i = 1, n do
        splitter:split(3)
    end
    local t_2 = os.clock()
    for i = 1, n do
        splitter:new_split(3)
    end
    local t_3 = os.clock()
    local t_old = t_2 - t_1
    local t_new = t_3 - t_2
    print(t_old, t_new, t_new/t_old)
end)

Result:

0.21233299997402355 0.18347204168094322 0.8640769061021549

So 14 percent improvement. Not terrible but not incredibly impressive either. If we use this thing at all, which I suspect we won’t, will want it fast, because it is in the very frequent timer event. We are trading off the cost of linearizing with the cost of computing several bezier accesses, because using the PolylinePath is much faster than computing a Bezier, but setting it up, with this code, takes some initial time.

Let’s commit again and see if we can find any time.

In actual use, we take the 32 points that the splitter creates and remove the fence posts to make a polyline. We could save that time if we didn’t create the fence posts to begin with. I am hesitant to do that because it will break my existing correctness test. What else might we do? Ah yes, this:

function BezierSplitter:new_split(numberOfSplits)
    local a,b,c,d,e,f,g,h,j,k
    local array = table.clone(self.array)
    local start = 1
    local stop
    local next_start
    for s = 1, numberOfSplits do
        stop = #array
        next_start = stop + 1
        for i  = start, stop, 4 do
            a = array[i]
            b = array[i + 1]
            c = array[i + 2]
            d = array[i + 3]
            e = (a+b)/2
            f = (b+c)/2
            g = (c+d)/2
            h = (e+f)/2
            j = (f+g)/2
            k = (h+j)/2
            for _, p in {a,e,h,k, k,j,g,d} do
                table.insert(array, p)
            end
        end
        start = next_start
    end
    local answer = {}
    -- print('#', #array, 'start', start)
    table.move(array, start, #array, 1, answer)
    return answer
end

Test again for speed. Not much difference, which surprises me a bit. I’ll make the same change to the old method. It remains unclear which we’ll wind up using.

Looking at that function raises a question in my mind:

function BezierSplitter:split_by_fours(array)
    local a,b,c,d,e,f,g,h,j,k
    local result = {}
    for i = 1, #array, 4 do
        a = array[i]
        b = array[i+1]
        c = array[i+2]
        d = array[i+3]
        e = (a+b)/2
        f = (b+c)/2
        g = (c+d)/2
        h = (e+f)/2
        j = (f+g)/2
        k = (h+j)/2
        for _, p in {a,e,h,k, k,j,g,d} do
            table.insert(result, p)
        end
    end
    return result
end

Couldn’t we use table.move here, to some advantage? I put that in both, like this:

function BezierSplitter:split_by_fours(array)
    local a,b,c,d,e,f,g,h,j,k
    local result = {}
    for i = 1, #array, 4 do
        a = array[i]
        b = array[i+1]
        c = array[i+2]
        d = array[i+3]
        e = (a+b)/2
        f = (b+c)/2
        g = (c+d)/2
        h = (e+f)/2
        j = (f+g)/2
        k = (h+j)/2
        table.move({a,e,h,k, k,j,g,d}, 1, 8, #result+1, result)
    end
    return result
end

Similarly in the new split. It makes very little difference to the time, if any. I wonder if the # in there is troublesome. Computing it seems to make no difference. We’ll leave the moves in there, they might be marginally faster and are certainly more compact.

With these changes in place, the new scheme is around 88, 89, 90 percent of the old one. I have not tried pre-allocating the result array, and having seen the code, I am a bit less inclined to do so, because we are relying on the length of the array from time to time, and that would not work with a pre-allocated array.

We can do about 60,000 of these operations per second on my M1 Mac and we need at most ten per second in actual use.

Comparing the code:


function BezierSplitter:split(numberOfSplits)
    local array = self.array
    for i = 1, numberOfSplits do
        array = self:split_by_fours(array)
    end
    return array
end

function BezierSplitter:split_by_fours(array)
    local a,b,c,d,e,f,g,h,j,k
    local result = {}
    for i = 1, #array, 4 do
        a = array[i]
        b = array[i + 1]
        c = array[i + 2]
        d = array[i + 3]
        e = (a+b)/2
        f = (b+c)/2
        g = (c+d)/2
        h = (e+f)/2
        j = (f+g)/2
        k = (h+j)/2
        table.move({a,e,h,k, k,j,g,d}, 1, 8, #result+1, result)
    end
    return result
end

function BezierSplitter:new_split(numberOfSplits)
    local a,b,c,d,e,f,g,h,j,k
    local array = table.clone(self.array)
    local start = 1
    local stop
    local next_start
    for s = 1, numberOfSplits do
        stop = #array
        next_start = stop + 1
        for i  = start, stop, 4 do
            a = array[i]
            b = array[i + 1]
            c = array[i + 2]
            d = array[i + 3]
            e = (a+b)/2
            f = (b+c)/2
            g = (c+d)/2
            h = (e+f)/2
            j = (f+g)/2
            k = (h+j)/2
            table.move({a,e,h,k, k,j,g,d}, 1, 8, #array+1, array)
        end
        start = next_start
    end
    local answer = {}
    table.move(array, start, #array, 1, answer)
    return answer
end

Once again, having done the work and run the test, I think the savings is not worth the complexity. I’ll commit a final version and then remove this stuff. It’ll be in git if we ever want it.

I appear to be oh for two on the optimizations in this part of the season. That’s fine: I’m here to try ideas, learning what SLua is all about, and filling up with ideas about what might be useful when it finally arrives.

OK, I do admit that I’d have been happier had the new scheme been twice as fast, but that was quite unlikely. The time is probably mostly taken up in creating vectors: we create something \around 64 new ones in the course of either process. And we know that Lua and Luau have gone to a lot of trouble to make tables fast, since they are the main, well, only data structure.

Still, solid answer. In my view, the gain was again not worth the added complexity in the code. Maybe next time.

Safe paths!