In my never-ending quest for something to think about, today we’ll try to make linearizing a Bezier a bit faster, with a bit less memory impact. We’ll measure what we can.
The most important criterion for the new scheme is that it gets the same (correct) answer as the old scheme. We’ll write that test first.
_:test('new scheme gets same result', function()
local old_split = splitter:split(3)
local new_split = splitter:new_split(3)
_:expect(new_split).is(old_split)
end)
I don’t really even know what comes back from split, but whatever it is, we want the same thing from new_split, at least for now.
I ;make the test pass trivially:
function BezierSplitter:new_split(numberOfSplits)
return self:split(numberOfSplits)
end
Test passes and I bet it’s not much slower, either. (hahaha) Let’s do some work. My plan, tentatively, goes something like this:
I’m also planning to pre-allocate the array, and reuse it, to avoid some allocation.
As always, this may all turn out not to be worth it. We’ll see how things go. Can’t learn without trying.
After more effort than it might have been, I have the test passing with this:
function BezierSplitter:new_split(numberOfSplits)
local array = table.clone(self.array)
local start = 1
local stop
local new_start
for s = 1, numberOfSplits do
new_start = #array + 1
stop = #array - start
for i = 1, stop, 4 do
local a = array[start+i - 1]
local b = array[start+i]
local c = array[start+i + 1]
local d = array[start+i + 2]
local e = (a+b)/2
local f = (b+c)/2
local g = (c+d)/2
local h = (e+f)/2
local j = (f+g)/2
local k = (h+j)/2
for _, p in {a,e,h,k, k,j,g,d} do
table.insert(array, p)
end
end
start = new_start
end
local answer = {}
table.move(array, start, #array, 1, answer)
return answer
end
I don’t love this, and it took a bit of hammering to make it work but it is passing the test, which means it is getting the same answer as the old way, which is kind of the base line for an improved algorithm.
I’ll refactor some of the index calculations a bit:
Eeek, that didn’t work. Let me commit this much to git: then I can bet back to working after changes don’t work.
OK, this is a bit simpler and still works:
function BezierSplitter:new_split(numberOfSplits)
local array = table.clone(self.array)
local start = 1
local stop
local next_start
for s = 1, numberOfSplits do
next_start = #array + 1
stop = #array
for i = start, stop, 4 do
local a = array[i]
local b = array[i + 1]
local c = array[i + 2]
local d = array[i + 3]
local e = (a+b)/2
local f = (b+c)/2
local g = (c+d)/2
local h = (e+f)/2
local j = (f+g)/2
local k = (h+j)/2
for _, p in {a,e,h,k, k,j,g,d} do
table.insert(array, p)
end
end
start = next_start
end
local answer = {}
table.move(array, start, #array, 1, answer)
return answer
end
I guess I’m interested now in whether this is any faster. I don’t think it’s more clear, but maybe that can be improved, or maybe it’s OK if we get any speed. I have high doubts about that.
_:test("new vs old timing", function()
local n = 10000
local t_1 = os.clock()
for i = 1, n do
splitter:split(3)
end
local t_2 = os.clock()
for i = 1, n do
splitter:new_split(3)
end
local t_3 = os.clock()
local t_old = t_2 - t_1
local t_new = t_3 - t_2
print(t_old, t_new, t_new/t_old)
end)
Result:
0.21233299997402355 0.18347204168094322 0.8640769061021549
So 14 percent improvement. Not terrible but not incredibly impressive either. If we use this thing at all, which I suspect we won’t, will want it fast, because it is in the very frequent timer event. We are trading off the cost of linearizing with the cost of computing several bezier accesses, because using the PolylinePath is much faster than computing a Bezier, but setting it up, with this code, takes some initial time.
Let’s commit again and see if we can find any time.
In actual use, we take the 32 points that the splitter creates and remove the fence posts to make a polyline. We could save that time if we didn’t create the fence posts to begin with. I am hesitant to do that because it will break my existing correctness test. What else might we do? Ah yes, this:
function BezierSplitter:new_split(numberOfSplits)
local a,b,c,d,e,f,g,h,j,k
local array = table.clone(self.array)
local start = 1
local stop
local next_start
for s = 1, numberOfSplits do
stop = #array
next_start = stop + 1
for i = start, stop, 4 do
a = array[i]
b = array[i + 1]
c = array[i + 2]
d = array[i + 3]
e = (a+b)/2
f = (b+c)/2
g = (c+d)/2
h = (e+f)/2
j = (f+g)/2
k = (h+j)/2
for _, p in {a,e,h,k, k,j,g,d} do
table.insert(array, p)
end
end
start = next_start
end
local answer = {}
-- print('#', #array, 'start', start)
table.move(array, start, #array, 1, answer)
return answer
end
Test again for speed. Not much difference, which surprises me a bit. I’ll make the same change to the old method. It remains unclear which we’ll wind up using.
Looking at that function raises a question in my mind:
function BezierSplitter:split_by_fours(array)
local a,b,c,d,e,f,g,h,j,k
local result = {}
for i = 1, #array, 4 do
a = array[i]
b = array[i+1]
c = array[i+2]
d = array[i+3]
e = (a+b)/2
f = (b+c)/2
g = (c+d)/2
h = (e+f)/2
j = (f+g)/2
k = (h+j)/2
for _, p in {a,e,h,k, k,j,g,d} do
table.insert(result, p)
end
end
return result
end
Couldn’t we use table.move here, to some advantage? I put that in both, like this:
function BezierSplitter:split_by_fours(array)
local a,b,c,d,e,f,g,h,j,k
local result = {}
for i = 1, #array, 4 do
a = array[i]
b = array[i+1]
c = array[i+2]
d = array[i+3]
e = (a+b)/2
f = (b+c)/2
g = (c+d)/2
h = (e+f)/2
j = (f+g)/2
k = (h+j)/2
table.move({a,e,h,k, k,j,g,d}, 1, 8, #result+1, result)
end
return result
end
Similarly in the new split. It makes very little difference to the time, if any. I wonder if the # in there is troublesome. Computing it seems to make no difference. We’ll leave the moves in there, they might be marginally faster and are certainly more compact.
With these changes in place, the new scheme is around 88, 89, 90 percent of the old one. I have not tried pre-allocating the result array, and having seen the code, I am a bit less inclined to do so, because we are relying on the length of the array from time to time, and that would not work with a pre-allocated array.
We can do about 60,000 of these operations per second on my M1 Mac and we need at most ten per second in actual use.
Comparing the code:
function BezierSplitter:split(numberOfSplits)
local array = self.array
for i = 1, numberOfSplits do
array = self:split_by_fours(array)
end
return array
end
function BezierSplitter:split_by_fours(array)
local a,b,c,d,e,f,g,h,j,k
local result = {}
for i = 1, #array, 4 do
a = array[i]
b = array[i + 1]
c = array[i + 2]
d = array[i + 3]
e = (a+b)/2
f = (b+c)/2
g = (c+d)/2
h = (e+f)/2
j = (f+g)/2
k = (h+j)/2
table.move({a,e,h,k, k,j,g,d}, 1, 8, #result+1, result)
end
return result
end
function BezierSplitter:new_split(numberOfSplits)
local a,b,c,d,e,f,g,h,j,k
local array = table.clone(self.array)
local start = 1
local stop
local next_start
for s = 1, numberOfSplits do
stop = #array
next_start = stop + 1
for i = start, stop, 4 do
a = array[i]
b = array[i + 1]
c = array[i + 2]
d = array[i + 3]
e = (a+b)/2
f = (b+c)/2
g = (c+d)/2
h = (e+f)/2
j = (f+g)/2
k = (h+j)/2
table.move({a,e,h,k, k,j,g,d}, 1, 8, #array+1, array)
end
start = next_start
end
local answer = {}
table.move(array, start, #array, 1, answer)
return answer
end
Once again, having done the work and run the test, I think the savings is not worth the complexity. I’ll commit a final version and then remove this stuff. It’ll be in git if we ever want it.
I appear to be oh for two on the optimizations in this part of the season. That’s fine: I’m here to try ideas, learning what SLua is all about, and filling up with ideas about what might be useful when it finally arrives.
OK, I do admit that I’d have been happier had the new scheme been twice as fast, but that was quite unlikely. The time is probably mostly taken up in creating vectors: we create something \around 64 new ones in the course of either process. And we know that Lua and Luau have gone to a lot of trouble to make tables fast, since they are the main, well, only data structure.
Still, solid answer. In my view, the gain was again not worth the added complexity in the code. Maybe next time.
Safe paths!