Dan: Fucking skype mining bitcoins
Cas: bitcoins!?
Dan: How else do you make a chat app use 100% CPU on 5 out of 8 cores?
Seriously
Discord uses less CPU power while focused than skype does in the background
Anyway...
I'll add GPU profiling and finish my optimizations when I get home
Should be home 14-ish
Cas: cool. Don't be scared of the massive refactor ??
there's more to come - haven't renamed CasNode stuff yet.
Dan: I'm sure I'll have opinions on it =P
But not really worrying
A refactor was way overdue anyway
Cas: it's half the size now
much stuff just moved into the root voxoid package
Dan: Into?!
Cas: stuff like HDRManager was in a whole package on its own - unncessary really
Dan: Ah
Cas: so any packages with like less than 3 classes in I just shoved in the main package
made a few classes package private
Dan: Err
Where have all the test classes gone?
I have nothing runnable naymore
anymore*
src-test is empty?
Cas: eh
oh you need to *properly* update...
do Update to Revision...
select HEAD
then Fully recursive depth
and Change working copy to selected depth
then everything will appear
Dan: so uh
english?
Cas: heh you'll see when you do Update to Revision
Dan: Update to Version?
Cas: yeh
Dan: Scene graph stuff is still gone
Cas: CasScene is now in SPGL2
Dan: Seems to run now thanks!
Cas: anything you sorely miss can still be dug out of SVN history if necessary
I wont do any more refactoring now so consider it stable
Dan: Why did you change the priority calculation?
Cas: I... didnt?
Dan: It needs a sqrt and is broken for the edges?
Yeah it was a float before now it's an int
double distance = Math.sqrt(distanceSquared(cx cy cz c.x + chunkSize * 0.5f c.y + chunkSize * 0.5f c.z + chunkSize * 0.5f)); // Clamp to 0..63 for counting sort range c.priority = Math.min(63 (int) distance / 15);
Cas: oh it was to try and do that more optimised chunky sorting
which was in the end a total waste of time
leave it be for now I might just back it all out
Dan: It's causing some issues around the edge of the screen.
Cas: aye
Dan: I'm upping the max value above 63 at least
Cas: you need to then also change the sort
it's statically sized
Dan: It's a radix sort?
Cas: it really should just be chucked out it was no help at all
counting sort sort of a radix sort
with one radix
unfortunately it a) doesnt sort in place b) has a fixed maximum range
Dan: The maximum range is causing issues around the edge of the screen right now.
Cas: yeah I know it's been on my radar for a bit. Ill fix it you worry about optimsing unshadowed point lights and 2-stsage ssao ??
ill make a card to fix it
when you start on something drag it to In Progress ??
Dan: Yeah this is a problem for future me/cas lol
Mkay
Cas: _sits and watches Trello_
Dan: ...
hai hai
Cas: heh
Dan: profiling results: Frame 6297 : 4.099ms (100.0%) Camera 1 : 4.058ms (99.0%) Shadow map rendering : 1.056ms (25.7%) Shadow map 1 : 0.673ms (16.4%) Shadow map 2 : 0.381ms (9.3%) Clear main buffer : 0.025ms (0.6%) Terrain rendering : 1.409ms (34.3%) SSAO rendering : 1.036ms (25.2%) Skybox : 0.138ms (3.3%) Bloom and tone mapping : 0.39ms (9.5%)
(1440p no MSAA)
VoxoidSceneFactory Is in the test package for some reason
Cas: there was some thinking to do about where exactly that belonged... I left it there for the time being
Dan: OK
Cas: got a meeting here...
Dan: OK I should be fine from now on
Thanks for the help
Cas: such as it was...
Dan: e-err?
It was really helpful since I had no idea what was up with the SVN stuff and such
Cas: meh that's what I do all day at work anyway
Dan: So I got some nice optimizations in.
Cas: what sort>?
Dan: Shader optimizations.
My test scene has 500 large lights placed all within view of the camera.
Cas: ah yes I saw that ?? Wondered what the hell was going on
Dan: Got 135 FPS before.
After optimizations 152 FPS.
so not a huge game changing increase but a very useful one at least.
It went from 3 texture samples per light to 2.25 texture samples per light.
It now reads 4 light indices at a time.
It pretty much works as manual unrolling of the loop in a way.
(Like unrolling it so it processes 4 elements at a time)
It's essentially a free 20% cut in the GPU time needed to draw the terrain which is pretty nice.
Should be even more significant on lower-end hardware of course.
Cas: that's the spirit
i will try it on the laptop later which is pretty much the lowest end hardware we might run it on
Dan: The increase only applies when you got a lot of point lights though
but as soon as you got more than 1 point light affecting a given cluster you get a decent boost.
Cas: I suppose in the midst of a firefight there might be a bit of that going on
Dan: Yeah it doesn't slow it down if there's only 1 or 0 lights affecting it.
Just the increase becomes more the more complex the scene is.
Cas: well move teh card to Done then and on with the next thing!
Dan: It's not entirely done yet I got some CPU side changes to do as well...
Also the clustering is a bit on the slow side...
Easy to become CPU bottlenecked here... =/
Cas: k
Dan: I did some simple optimizations to it that helped a bit
and you can always just increase the cluster size for massive CPU performance boosts (at the cost of GPU performance of course).
Cas: is it multithreaded?
Dan: It is not but has potential for it.
Cas: I'll make a card for that
Dan: Multithreading it?
Cas: aye
Dan: My plan right now is to skip multithreading but not to dig myself a hole where I can't add it later
and then we'll just multithread the entire thing with my threading system later.
Cas: yup - have made a card for the Melting Pot nothing we will need to act on unless we run out of CPU power
Dan: It should be very little effort to multithread this rather efficiently.
We may not get perfect scaling though... I'd say something like 3x scaling on a quad core sounds reasonable here.
The limiting factor will be the frustum culling I think.
Cas: talking of which have you optimised the culling yet
Dan: It's hard to efficiently thread a single frustum culling job so we'd be limited to threading each different frustum culling job separately
AKA the camera and the shadow maps.
Still haven't lol
You sound like my mom about that damn optimization lol
Cas: ill make a card for it
Dan: Alright pushed my light performance improvements.
Cas: what's next?
Dan: Updating trello! lol
woo
that was fun lol
Cas: feels like progress eh
Dan: something something quest log
Cas: +10xp
how about SSAO next then
the hard bit
Dan: pfff only 10 xp? what kind of quest giver are you?!
Cas: miserly
Dan: lol
Yeah I want to sort out some of the spaghetti-ish code before I move on to other stuff.
Then I can start on the SSAO improvements.
Cas: what exactly is the spag. code?
it all looks messy to me ??
it could do with making stuff private or package private here and there
and lots of Javadoc and commenting
because this code will likely live for a very long time...
Dan: o-ouch
Well my definition of spaghetti is lots of dependencies.
The classes are slightly badly structured right now but basically the problem is the lighting itself.
Lighting info is needed by the rendering shaders and comes from the other end of the engine pretty much
I wanna see if I can sort that out a bit so that there's less spooky-action-at-a-distance there
Once I'm done with that I could go ahead and start documenting more.
Cas: documentation is maybe a separate task in itself
ill make a card for it
Dan: Good point
I'm gonna run off and unpack my new router while I think about how to sort this out
Cas: new router eh?
dan goes dark for 12 days
Dan: Well it's up
As soon as my internet went down computer started freezing up for a second at a time
Checked task manager.
Skype going 100% CPU load on one core and the freezes where when it went 100% CPU load on ALL cores
I'm fucking telling you
Cas: Skype is shit
Dan: They are mining bitcoins.
It's the only way this makes sense.
You can't write a chat program that uses this much CPU power multithreaded
I'm disabling Trello email notifications due to the spam.
Cas: lol yes
Dan: If you wanna notify me of something on Trello just send me a message here.
Cas: turn them off
Dan: I'll check it periodically too.
Cas: no need I just open trello every time I open Chrome
aye
Dan: Hmm...
Did some benchmarking experiments withth terrain shader...
Basically the shader even though it's this complicated mess of directional light calculations and shadow sampling paletted texture sampling and a crapload of math it basically doesn't cost anything.
The difference between outputting the lit pixels and outputting something that pretty much amounts to a constant color is 0.05ms or so.
(with a baseline of around 1.2ms)
Cas: curious
Dan: Removing one of the vertex attributes does nothing either.
Cas: hm what's taking all the time then?
Dan: In this case probably the MSAA fixed functionality hardware.
Just lots of small triangles that each cover a couple of MSAA samples or so I guess.
So in other words triangle count I guess.
It's neither vertex nor fragment limited it seems
It's just the number of triangles.
Cas: good thing we're getting that down then
Dan: They cover too few pixels each that the limiting factor becomes how fast the hardware rasterizer can generate work for the shader cores I think.
Disabling the vertex cache optimization does nothing
and turning the fragment shader into a no-op does almost nothing either.
It's the hardware inbetween the two that's limiting us here.
Cas: strange
Dan: Not really.
IT just isn't limited by the shader performance.
So more vertex shader invocations (due to worse reuse from bad ordering) or more expensive fragment shaders doesn't affect it.
They weren't working at 100% anyway.
Funnily enough I can't test this without MSAA.
I get CPU limited lol
Standard view no MSAA only directional light = 450 FPS 70% GPU load for me.
Wait what there are 13 000 CasColliders taking up CPU time due to being updated when CasScene is updated.
Cas: oh that's me experimenting with picking
I will I think stop them from actually ticking/updating
Dan: Removing those bumps me up to 90-95% GPU load.
Well if they're in the scene graph you can't differentiate between them.
Cas: I can - I can skip over things that say i don't tick/update
Dan: Also they're children of the terrain so they can't be threaded either well not easily at least.
Skipping over them does nothing.
That's what you already do.
Cas: so how can it be slow then?
Dan: They don't actually have anything in their update() function.
Cas: 13000 no-ops should be ... instant
almost
Dan: But you loop over all of them and do a switch() on their state and make sure that they're all up to date and eliminate dead children by moving them around etc.
It's partly amplified by the fact that I have over 500 FPS so it's done a lot
but right now it's comparable to the cost of frustum culling.
Cas: well I guess them's the breaks
Dan: (That relative cost won't go down by the FPS being lower too.)
Cas: we will need them
Dan: Hmm?
Cas: coz player will need to have voxel accurate picking for terrain
Dan: Which doesn't require any colliders does it?
Cas: as they'll be painting buildings and stuff on to the map
well it does sort of
Dan: No just the voxel data.
Cas: aye
you can do it with specialised code in CasTerrain
this was a quick hack to test it
(wasnt very good)
Dan: Yeah I get that
All I'm saying is that you should be careful with what you place in the scenegraph
Cas: feel free to remove them
Dan: as everything in there will have a cost that really adds up.
Cas: yup
hopefully there will only be mobs particle systems and ... err.. thats it i guess
oh and floating labels and crap like that
Dan: OK so using threaded optimization and maximizing the window I seem to be barely GPU limited without MSAA.
Cas: well thats good news for you then 😃 remains to be seen what the laptop manages
ill check it later
er
you've fixed MSAA with the SSAO then?
Dan: No haven't gotten around to it.
There was a ventilation inspection today and crap
err you got some time for being my rubberduck?
Cas: not just at the mo maybe in a few hrs
Dan: Alright.
Cas: evenin
Voxoid won't run...
SkyBoxShader2 doesn't compile
did you forget to commit SPGL2 changes?
Dan: Oh I did
sec
There pushed sorry for the wait
Cas: will we have stuff to show off on Friday?
Dan: Depends on what you want done for then
Did you make a post about the SSAO and stuff?
Cas: last week yes
Dan: Link?
Cas: SSAO and coloured lights
https://www.patreon.com/posts/ssao-in-voxoid-15523200
Dan: Hey.
Did some work on Battledroid before school.
Cas: mornin
oh-ho
wot was that?
Dan: Accidentally disabled texture sharpening on the terrain so that's back and not looking crappy lol
and also made SSAO technically work with MSAA
Cas: technically?
Dan: as in: the MSAA buffer is correctly read when MSAA is enabled so you get SOMETHING on the screen
but the final application of SSAO to the screen is done per pixel so it reintroduces aliasing.
It's not a major problem for something so blurry as SSAO but it can produce extremely jarring artifacts
Cas: how jarring?
Dan: and it'll be VERY jarring for particle effects so we need the proper MSAA upscale later.
Cas: ah
Dan: Right now it almost passes
but it won't with particles.
A smoke cloud behind a mountain will reintroduce 100% aliasing at the edges.
No worries I got Plans (tm).
Cas: what sort of plans? Are they the sorts of plans that make my brain boil and eyes cross?
Dan: Nah just the stuff that we talked about before.
Cas: uh
Dan: I think I can make it perform well maybe even make the bloom/tone mapping faster than before
Cas: what writing normals out to a buffer?
Dan: Oh that too...
Basically I know what I should try out next and I think it'll work.
Will show results when ready.
Cas: cool 😃 looking forward to it
Dan: OH!
Also fixed the radius bug
Turns out I was outputting linearDepth/1000 from the linearization shader =___=
Cas: doh
Dan: so radius needed to be divided by 1000 to compensate
Cas: will we be able to easily & cheaply get that 2-level SSAO effect I was after?
Dan: Relatively easily yes but I'm worried about a couple of things.
The SSAO is already a bit noisy when you zoom in
Cas: maybe but you've enabled a zoom level that's greater than we'd normally use
Dan: True but it DEFINITELY is noisy at the higher radius of the second level.
Worry not I got plans.
I should be able to fix it up.
Cas: nice 😃 maybe do a bit tonight?
Dan: Yeah we'll see...
I might try to get some more stuff done now before school
Cas: kk. I've got meetings for the next hour tho i've got to dash off
Dan: Gogogo
I probably won't be able to finish anything fancy now or maybe even today.
Yeah got some work done but nothing showable yet.
It's gonna be good I think.
Cas: cool
Dan: No MSAA will have great performance and zero memory overhead.
MSAA will need a bit of extra memory but should be fast.
Cas: I reckon we can get away with 2x or 4x for most
Dan: Yeah even just a small amount of MSAA helps a huge amount.
The difference between no MSAA and 2x MSAA is huge.
Cas: 2x looked pretty good to me 4x wasn't really any better than 8x
*worse
Dan: You can see some difference for almost horizontal/vertical edges under motion
Cas: very few of those about tho
Dan: but it mostly depends on thepixels/inch of the screen you have.
Cas: aye
Dan: That being said
Fuck you I'm gonna put on 32xMSAA lol
Cas: hehe
Dan: Which is technically 2x2 supersampling with 8xMSAA though
But yeah I think it'll work well.
Gotta head to school soon
Cas: okeydoke.
Catch you later
Dan: Yeah
Yeah all plans of progress were just crushed by the people in one of my projects dumping all the programming work on me.
Cas: gah
Dan: I did get some more stuff done before I went to school so the SSAO upscaling shader is pretty much done I think.
Still lots more work but some progress at least.
CAS
Do you not support multiple render outputs from shaders???
Cas: Uh
I dunno
Dan: @FragData(name = fragColor index = 0)
I can't specify multiple of those
Cas: Ah not yet maybe
Easy fixes
Dan: If I could just give it a list of variable names that are assigned to consequitive indices that'd be the best
also I am unable to spell consecutive apparently
I won't ever need to only assign index 0 1 and 3 for example.
Only 01 ... N
Cas: so basically... `@FragData({fragColor thing normal})`
Dan: that'd be really nice yes
Cas: er
actually....
it already does support multiple `@FragDatas`
Dan: Copy pastaing them gave me compiler errors though...?
Maybe I messed up?
Cas: nah I think i need to tweak `@FragData` definition
`@Repeatable`
Dan: so what do I do?
Cas: wait a mo 😃
cted spgl
you can now just use multiple FragDatas
index still needed but meh
its inheritable so it migjht conceivably be useful
Dan: >come up with genius way of compressing 4 HDR colors into one texture
>find super fast function for extracting the float exponent for fast compression
>code it all up it's super efficient
------------- 0(15) : error C7532: global function frexp requires #version 400 or later 0(15) : error C0000: ... or #extension GL_ARB_gpu_shader5 : enable (0) : error C2003: incompatible options for link
RIP
Cas: boom 😦
Dan: I'll fix it
The per sample SSAO upsampling shader is a friggin mess
but it should be fast as hell
We are going to need a certain GLSL extension for packing bytes
or technically I can work around it by implementing stuff manually
but using those often hit fast paths in the compiler and stuff
and the extension is supported by all OGL3 cards AFAIK.
At worst I can make an #ifdef that checks for support and uses the fast path if available.
I can use FragDatas?
Cas: no you just use `@FragData` over and over I think
test it and tell me if it works
Dan: murr murr OK
dear god @FragData(name = fragColor0 index = 0) @FragData(name = fragColor1 index = 1) @FragData(name = fragColor2 index = 2) @FragData(name = fragColor3 index = 3) @FragData(name = fragColor4 index = 4) @FragData(name = fragColor5 index = 5) @FragData(name = fragColor6 index = 6) @FragData(name = fragColor7 index = 7)
Cas: uh yeah
Dan: Isn't that beautiful? lol
Cas: heh
at least its quite rare 😃
Dan: True
So uh looks like 32xMSAA is out of the question I'm afriad lol
afraid*
Cas: tragic 😄
Dan: 8xMSAA seems to work pretty well.
Upsampling of SSAO is taking something like 1.2ms at 1440p which is good.
Cas: perfect
Dan: Frame 1265 : 7.505ms (100.0%) Camera 1 : 7.467ms (99.4%) Shadow map rendering : 0.363ms (4.8%) Shadow map 1 : 0.362ms (4.8%) Clear main buffer : 0.07ms (0.9%) Terrain rendering : 3.796ms (50.5%) Skybox : 0.001ms (0.0%) Post processing : 3.233ms (43.0%) SSAO rendering : 1.177ms (15.6%) Linearize depth : 0.228ms (3.0%) Generate depth mipmaps : 0.096ms (1.2%) Compute SSAO : 0.85ms (11.3%) Merge : 1.203ms (16.0%) Bloom : 0.505ms (6.7%) Tone mapping : 0.343ms (4.5%)
(the Merge part is the new pass)
I'm hoping that bloom and tone mapping will become cheaper once they properly use the packed output of the merge pass.
Cas: getting stats out of the laptop might be valuable at some point too bearing in mind it's waaaaay slower than your 1080
represents our lowest end hardware specs I suppose
Dan: That GTX 960 or something?
Cas: 760M
Dan: Right
Yeah pretty low-end but I think we can make it run on even worse stuff.
You there?
I'm having issues with multiple render targets.
Not sure what it is yet
Nvm seems like it's working.
So uh I guess I'm a genius?
I just threw together 3 different shaders
and it actually worked in the end on the first try xD
Cas: hiya
arent you clever 😃
cted?
Dan: Not yet bloom isn't fixed yet.
Need another 30 min or so
Cas: oh ok
Dan: Also this is rather amazing.
Sec
look Before: Tone mapping : 3.354ms (15.4%) After Tone mapping : 1.659ms (8.2%)
Tone mapping is almost twice as fast.
Cas: very nice 😃
every bit helps
and that's quite a big bit
Dan: Hmm doesn't seem like it helps very much for lower sample counts...
But whatever.
I
I'll optimize the shaders more later
Lemme just fix the bloom...
Cas: back in a bit
Dan: I pushed SPGL2 and Voxoid.
MSAA and SSAO now work together.
I won't have time to run any performance tests but it looks like it's a bit slower than I hoped.
Doesn't look like tone mapping/bloom got much of a gain. It even looks like they may be slower...
There are also some SSAO artifacts from the normal reconstruction.
Next step is adding normals output from the main pass so we get accurate normals.
Latest timings with a few optimizations applied. 1440p 8xMSAA: Frame 5205 : 4.585ms (100.0%) Camera 1 : 4.547ms (99.1%) Shadow map rendering : 0.366ms (7.9%) Shadow map 1 : 0.364ms (7.9%) Clear main buffer : 0.048ms (1.0%) Terrain rendering : 1.021ms (22.2%) Skybox : 0.0ms (0.0%) Post processing : 3.109ms (67.8%) SSAO rendering : 1.108ms (24.1%) Linearize depth : 0.228ms (4.9%) Generate depth mipmaps : 0.098ms (2.1%) Compute SSAO : 0.481ms (10.4%) Blur : 0.3ms (6.5%) Merge : 0.872ms (19.0%) Bloom : 0.673ms (14.6%) Tone mapping : 0.451ms (9.8%)
I gotta go to bed... x___x
Cas: ah
cted anything?
Dan: yes everything
Cas: cool
Dan: [10:59 PM] theagentd MOKYU: I pushed SPGL2 and Voxoid.
Cas: ive got to do Mr Elf and the Chocloate Faerie for the kids tonight
Dan: lol
Cas: what mischief will he get up to
Dan: yeeeaaaah i'm just gonna head to bed without a bedtime horror story lol
Cas: heh
good job and nn cya tomoz or so
Dan: Thanks
Night
Cas: performance and looks great so far (especially when I added the mouse light back in)
really must do some proper rocks
the spotty ones brian did look a bit like a children's fantasy world 😄
Dan: Sorry no time to work today.
School work from waking up to going to bed...
Cas: too knackered to do much here meself too
StabbedBadger
2018-08-01 15:18:06 +0000 UTC