SamuKata
Puppygames
Puppygames

patreon


Transcripts 28th November 2017 - 5th December 2017

Dan: Fucking skype mining bitcoins

Cas: bitcoins!?

Dan: How else do you make a chat app use 100% CPU on 5 out of 8 cores?

Seriously

Discord uses less CPU power while focused than skype does in the background

Anyway...

I'll add GPU profiling and finish my optimizations when I get home

Should be home 14-ish

Cas: cool. Don't be scared of the massive refactor ??

there's more to come - haven't renamed CasNode stuff yet.

Dan: I'm sure I'll have opinions on it =P

But not really worrying

A refactor was way overdue anyway

Cas: it's half the size now

much stuff just moved into the root voxoid package

Dan: Into?!

Cas: stuff like HDRManager was in a whole package on its own - unncessary really

Dan: Ah

Cas: so any packages with like less than 3 classes in I just shoved in the main package

made a few classes package private

Dan: Err

Where have all the test classes gone?

I have nothing runnable naymore

anymore*

src-test is empty?

Cas: eh

oh you need to *properly* update...

do Update to Revision...

select HEAD

then Fully recursive depth

and Change working copy to selected depth

then everything will appear

Dan: so uh

english?

Cas: heh you'll see when you do Update to Revision

Dan: Update to Version?

Cas: yeh

Dan: Scene graph stuff is still gone

Cas: CasScene is now in SPGL2

Dan: Seems to run now thanks!

Cas: anything you sorely miss can still be dug out of SVN history if necessary

I wont do any more refactoring now so consider it stable

Dan: Why did you change the priority calculation?

Cas: I... didnt?

Dan: It needs a sqrt and is broken for the edges?

Yeah it was a float before now it's an int

double distance = Math.sqrt(distanceSquared(cx cy cz c.x + chunkSize * 0.5f c.y + chunkSize * 0.5f c.z + chunkSize * 0.5f));                 // Clamp to 0..63 for counting sort range                 c.priority = Math.min(63 (int) distance / 15);

Cas: oh it was to try and do that more optimised chunky sorting

which was in the end a total waste of time

leave it be for now I might just back it all out

Dan: It's causing some issues around the edge of the screen.

Cas: aye

Dan: I'm upping the max value above 63 at least

Cas: you need to then also change the sort

it's statically sized

Dan: It's a radix sort?

Cas: it really should just be chucked out it was no help at all

counting sort sort of a radix sort

with one radix

unfortunately it a) doesnt sort in place b) has a fixed maximum range

Dan: The maximum range is causing issues around the edge of the screen right now.

Cas: yeah I know it's been on my radar for a bit. Ill fix it you worry about optimsing unshadowed point lights and 2-stsage ssao ??

ill make a card to fix it

when you start on something drag it to In Progress ??

Dan: Yeah this is a problem for future me/cas lol

Mkay

Cas: _sits and watches Trello_

Dan: ...

hai hai

Cas: heh

Dan: profiling results: Frame 6297 : 4.099ms (100.0%)     Camera 1 : 4.058ms (99.0%)         Shadow map rendering : 1.056ms (25.7%)             Shadow map 1 : 0.673ms (16.4%)             Shadow map 2 : 0.381ms (9.3%)         Clear main buffer : 0.025ms (0.6%)         Terrain rendering : 1.409ms (34.3%)         SSAO rendering : 1.036ms (25.2%)         Skybox : 0.138ms (3.3%)         Bloom and tone mapping : 0.39ms (9.5%)

(1440p no MSAA)

VoxoidSceneFactory Is in the test package for some reason

Cas: there was some thinking to do about where exactly that belonged... I left it there for the time being

Dan: OK

Cas: got a meeting here...

Dan: OK I should be fine from now on

Thanks for the help

Cas: such as it was...

Dan: e-err?

It was really helpful since I had no idea what was up with the SVN stuff and such

Cas: meh that's what I do all day at work anyway

Dan: So I got some nice optimizations in.

Cas: what sort>?

Dan: Shader optimizations.

My test scene has 500 large lights placed all within view of the camera.

Cas: ah yes I saw that ?? Wondered what the hell was going on

Dan: Got 135 FPS before.

After optimizations 152 FPS.

so not a huge game changing increase but a very useful one at least.

It went from 3 texture samples per light to 2.25 texture samples per light.

It now reads 4 light indices at a time.

It pretty much works as manual unrolling of the loop in a way.

(Like unrolling it so it processes 4 elements at a time)

It's essentially a free 20% cut in the GPU time needed to draw the terrain which is pretty nice.

Should be even more significant on lower-end hardware of course.

Cas: that's the spirit

i will try it on the laptop later which is pretty much the lowest end hardware we might run it on

Dan: The increase only applies when you got a lot of point lights though

but as soon as you got more than 1 point light affecting a given cluster you get a decent boost.

Cas: I suppose in the midst of a firefight there might be a bit of that going on

Dan: Yeah it doesn't slow it down if there's only 1 or 0 lights affecting it.

Just the increase becomes more the more complex the scene is.

Cas: well move teh card to Done then and on with the next thing!

Dan: It's not entirely done yet I got some CPU side changes to do as well...

Also the clustering is a bit on the slow side...

Easy to become CPU bottlenecked here... =/

Cas: k

Dan: I did some simple optimizations to it that helped a bit

and you can always just increase the cluster size for massive CPU performance boosts (at the cost of GPU performance of course).

Cas: is it multithreaded?

Dan: It is not but has potential for it.

Cas: I'll make a card for that

Dan: Multithreading it?

Cas: aye

Dan: My plan right now is to skip multithreading but not to dig myself a hole where I can't add it later

and then we'll just multithread the entire thing with my threading system later.

Cas: yup - have made a card for the Melting Pot nothing we will need to act on unless we run out of CPU power

Dan: It should be very little effort to multithread this rather efficiently.

We may not get perfect scaling though... I'd say something like 3x scaling on a quad core sounds reasonable here.

The limiting factor will be the frustum culling I think.

Cas: talking of which have you optimised the culling yet

Dan: It's hard to efficiently thread a single frustum culling job so we'd be limited to threading each different frustum culling job separately

AKA the camera and the shadow maps.

Still haven't lol

You sound like my mom about that damn optimization lol

Cas: ill make a card for it

Dan: Alright pushed my light performance improvements.

Cas: what's next?

Dan: Updating trello! lol

woo

that was fun lol

Cas: feels like progress eh

Dan: something something quest log

Cas: +10xp

how about SSAO next then

the hard bit

Dan: pfff only 10 xp? what kind of quest giver are you?!

Cas: miserly

Dan: lol

Yeah I want to sort out some of the spaghetti-ish code before I move on to other stuff.

Then I can start on the SSAO improvements.

Cas: what exactly is the spag. code?

it all looks messy to me ??

it could do with making stuff private or package private here and there

and lots of Javadoc and commenting

because this code will likely live for a very long time...

Dan: o-ouch

Well my definition of spaghetti is lots of dependencies.

The classes are slightly badly structured right now but basically the problem is the lighting itself.

Lighting info is needed by the rendering shaders and comes from the other end of the engine pretty much

I wanna see if I can sort that out a bit so that there's less spooky-action-at-a-distance there

Once I'm done with that I could go ahead and start documenting more.

Cas: documentation is maybe a separate task in itself

ill make a card for it

Dan: Good point

I'm gonna run off and unpack my new router while I think about how to sort this out

Cas: new router eh?

dan goes dark for 12 days

Dan: Well it's up

As soon as my internet went down computer started freezing up for a second at a time

Checked task manager.

Skype going 100% CPU load on one core and the freezes where when it went 100% CPU  load on ALL cores

I'm fucking telling you

Cas: Skype is shit

Dan: They are mining bitcoins.

It's the only way this makes sense.

You can't write a chat program that uses this much CPU power multithreaded

I'm disabling Trello email notifications due to the spam.

Cas: lol yes

Dan: If you wanna notify me of something on Trello just send me a message here.

Cas: turn them off

Dan: I'll check it periodically too.

Cas: no need I just open trello every time I open Chrome

aye

Dan: Hmm...

Did some benchmarking experiments withth terrain shader...

Basically the shader even though it's this complicated mess of directional light calculations and shadow sampling paletted texture sampling and a crapload of math it basically doesn't cost anything.

The difference between outputting the lit pixels and outputting something that pretty much amounts to a constant color is 0.05ms or so.

(with a baseline of around 1.2ms)

Cas: curious

Dan: Removing one of the vertex attributes does nothing either.

Cas: hm what's taking all the time then?

Dan: In this case probably the MSAA fixed functionality hardware.

Just lots of small triangles that each cover a couple of MSAA samples or so I guess.

So in other words triangle count I guess.

It's neither vertex nor fragment limited it seems

It's just the number of triangles.

Cas: good thing we're getting that down then

Dan: They cover too few pixels each that the limiting factor becomes how fast the hardware rasterizer can generate work for the shader cores I think.

Disabling the vertex cache optimization does nothing

and turning the fragment shader into a no-op does almost nothing either.

It's the hardware inbetween the two that's limiting us here.

Cas: strange

Dan: Not really.

IT just isn't limited by the shader performance.

So more vertex shader invocations (due to worse reuse from bad ordering) or more expensive fragment shaders doesn't affect it.

They weren't working at 100% anyway.

Funnily enough I can't test this without MSAA.

I get CPU limited lol

Standard view no MSAA only directional light = 450 FPS 70% GPU load for me.

Wait what there are 13 000 CasColliders taking up CPU time due to being updated when CasScene is updated.

Cas: oh that's me experimenting with picking

I will I think stop them from actually ticking/updating

Dan: Removing those bumps me up to 90-95% GPU load.

Well if they're in the scene graph you can't differentiate between them.

Cas: I can - I can skip over things that say i don't tick/update

Dan: Also they're children of the terrain so they can't be threaded either well not easily at least.

Skipping over them does nothing.

That's what you already do.

Cas: so how can it be slow then?

Dan: They don't actually have anything in their update() function.

Cas: 13000 no-ops should be ... instant

almost

Dan: But you loop over all of them and do a switch() on their state and make sure that they're all up to date and eliminate dead children by moving them around etc.

It's partly amplified by the fact that I have over 500 FPS so it's done a lot

but right now it's comparable to the cost of frustum culling.

Cas: well I guess them's the breaks

Dan: (That relative cost won't go down by the FPS being lower too.)

Cas: we will need them

Dan: Hmm?

Cas: coz player will need to have voxel accurate picking for terrain

Dan: Which doesn't require any colliders does it?

Cas: as they'll be painting buildings and stuff on to the map

well it does sort of

Dan: No just the voxel data.

Cas: aye

you can do it with specialised code in CasTerrain

this was a quick hack to test it

(wasnt very good)

Dan: Yeah I get that

All I'm saying is that you should be careful with what you place in the scenegraph

Cas: feel free to remove them

Dan: as everything in there will have a cost that really adds up.

Cas: yup

hopefully there will only be mobs particle systems and ... err.. thats it i guess

oh and floating labels and crap like that

Dan: OK so using threaded optimization and maximizing the window I seem to be barely GPU limited without MSAA.

Cas: well thats good news for you then 😃 remains to be seen what the laptop manages

ill check it later

er

you've fixed MSAA with the SSAO then?

Dan: No haven't gotten around to it.

There was a ventilation inspection today and crap

err you got some time for being my rubberduck?

Cas: not just at the mo maybe in a few hrs

Dan: Alright.

Cas: evenin

Voxoid won't run...

SkyBoxShader2 doesn't compile

did you forget to commit SPGL2 changes?

Dan: Oh I did

sec

There pushed sorry for the wait

Cas: will we have stuff to show off on Friday?

Dan: Depends on what you want done for then

Did you make a post about the SSAO and stuff?

Cas: last week  yes

Dan: Link?

Cas: SSAO and coloured lights

https://www.patreon.com/posts/ssao-in-voxoid-15523200

Dan: Hey.

Did some work on Battledroid before school.

Cas: mornin

oh-ho

wot was that?

Dan: Accidentally disabled texture sharpening on the terrain so that's back and not looking crappy lol

and also made SSAO technically work with MSAA

Cas: technically?

Dan: as in: the MSAA buffer is correctly read when MSAA is enabled so you get SOMETHING on the screen

but the final application of SSAO to the screen is done per pixel so it reintroduces aliasing.

It's not a major problem for something so blurry as SSAO but it can produce extremely jarring artifacts

Cas: how jarring?

Dan: and it'll be VERY jarring for particle effects so we need the proper MSAA upscale later.

Cas: ah

Dan: Right now it almost passes

but it won't with particles.

A smoke cloud behind a mountain will reintroduce 100% aliasing at the edges.

No worries I got Plans (tm).

Cas: what sort of plans? Are they the sorts of plans that make my brain boil and eyes cross?

Dan: Nah just the stuff that we talked about before.

Cas: uh

Dan: I think I can make it perform well maybe even make the bloom/tone mapping faster than before

Cas: what writing normals out to a buffer?

Dan: Oh that too...

Basically I know what I should try out next and I think it'll work.

Will show results when ready.

Cas: cool 😃 looking forward to it

Dan: OH!

Also fixed the radius bug

Turns out I was outputting linearDepth/1000 from the linearization shader =___=

Cas: doh

Dan: so radius needed to be divided by 1000 to compensate

Cas: will we be able to easily & cheaply get that 2-level SSAO effect I was after?

Dan: Relatively easily yes but I'm worried about a couple of things.

The SSAO is already a bit noisy when you zoom in

Cas: maybe but you've enabled a zoom level that's greater than we'd normally use

Dan: True but it DEFINITELY is noisy at the higher radius of the second level.

Worry not I got plans.

I should be able to fix it up.

Cas: nice 😃 maybe do a bit tonight?

Dan: Yeah we'll see...

I might try to get some more stuff done now before school

Cas: kk. I've got meetings for the next hour tho i've got to dash off

Dan: Gogogo

I probably won't be able to finish anything fancy now or maybe even today.

Yeah got some work done but nothing showable yet.

It's gonna be good I think.

Cas: cool

Dan: No MSAA will have great performance and zero memory overhead.

MSAA will need a bit of extra memory but should be fast.

Cas: I reckon we can get away with 2x or 4x for most

Dan: Yeah even just a small amount of MSAA helps a huge amount.

The difference between no MSAA and 2x MSAA is huge.

Cas: 2x looked pretty good to me 4x wasn't really any better than 8x

*worse

Dan: You can see some difference for almost horizontal/vertical edges under motion

Cas: very few of those about tho

Dan: but it mostly depends on thepixels/inch of the screen you have.

Cas: aye

Dan: That being said

Fuck you I'm gonna put on 32xMSAA lol

Cas: hehe

Dan: Which is technically 2x2 supersampling with 8xMSAA though

But yeah I think it'll work well.

Gotta head to school soon

Cas: okeydoke.

Catch you later

Dan: Yeah

Yeah all plans of progress were just crushed by the people in one of my projects dumping all the programming work on me.

Cas: gah

Dan: I did get some more stuff done before I went to school so the SSAO upscaling shader is pretty much done I think.

Still lots more work but some progress at least.

CAS

Do you not support multiple render outputs from shaders???

Cas: Uh

I dunno

Dan: @FragData(name = fragColor index = 0)

I can't specify multiple of those

Cas: Ah not yet maybe

Easy fixes

Dan: If I could just give it a list of variable names that are assigned to consequitive indices that'd be the best

also I am unable to spell consecutive apparently

I won't ever need to only assign index 0 1 and 3 for example.

Only 01 ... N

Cas: so basically... `@FragData({fragColor  thing normal})`

Dan: that'd be really nice yes

Cas: er

actually....

it already does support multiple `@FragDatas`

Dan: Copy pastaing them gave me compiler errors though...?

Maybe I messed up?

Cas: nah I think i need to  tweak `@FragData` definition

`@Repeatable`

Dan: so what do I do?

Cas: wait a mo 😃

cted spgl

you can now just use multiple FragDatas

index still needed but meh

its inheritable so it migjht conceivably be useful

Dan: >come up with genius way of compressing 4 HDR colors into one texture

>find super fast function for extracting the float exponent for fast compression

>code it all up it's super efficient

------------- 0(15) : error C7532: global function frexp requires #version 400 or later 0(15) : error C0000: ... or #extension GL_ARB_gpu_shader5 : enable (0) : error C2003: incompatible options for link

RIP

Cas: boom 😦

Dan: I'll fix it

The per sample SSAO upsampling shader is a friggin mess

but it should be fast as hell

We are going to need a certain GLSL extension for packing bytes

or technically I can work around it by implementing stuff manually

but using those often hit fast paths in the compiler and stuff

and the extension is supported by all OGL3 cards AFAIK.

At worst I can make an #ifdef that checks for support and uses the fast path if available.

I can use FragDatas?

Cas: no you just use `@FragData` over and over I think

test it and tell me if it works

Dan: murr murr OK

dear god @FragData(name = fragColor0 index = 0) @FragData(name = fragColor1 index = 1) @FragData(name = fragColor2 index = 2) @FragData(name = fragColor3 index = 3) @FragData(name = fragColor4 index = 4) @FragData(name = fragColor5 index = 5) @FragData(name = fragColor6 index = 6) @FragData(name = fragColor7 index = 7)

Cas: uh yeah

Dan: Isn't that beautiful? lol

Cas: heh

at least its quite rare 😃

Dan: True

So uh looks like 32xMSAA is out of the question I'm afriad lol

afraid*

Cas: tragic 😄

Dan: 8xMSAA seems to work pretty well.

Upsampling of SSAO is taking something like 1.2ms at 1440p which is good.

Cas: perfect

Dan: Frame 1265 : 7.505ms (100.0%)     Camera 1 : 7.467ms (99.4%)         Shadow map rendering : 0.363ms (4.8%)             Shadow map 1 : 0.362ms (4.8%)         Clear main buffer : 0.07ms (0.9%)         Terrain rendering : 3.796ms (50.5%)         Skybox : 0.001ms (0.0%)         Post processing : 3.233ms (43.0%)             SSAO rendering : 1.177ms (15.6%)                 Linearize depth : 0.228ms (3.0%)                 Generate depth mipmaps : 0.096ms (1.2%)                 Compute SSAO : 0.85ms (11.3%)             Merge : 1.203ms (16.0%)             Bloom : 0.505ms (6.7%)             Tone mapping : 0.343ms (4.5%)

(the Merge part is the new pass)

I'm hoping that bloom and tone mapping will become cheaper once they properly use the packed output of the merge pass.

Cas: getting stats out of the laptop might be valuable at some point too bearing in mind it's waaaaay slower than your 1080

represents our lowest end hardware specs I suppose

Dan: That GTX 960 or something?

Cas: 760M

Dan: Right

Yeah pretty low-end but I think we can make it run on even worse stuff.

You there?

I'm having issues with multiple render targets.

Not sure what it is yet

Nvm seems like it's working.

So uh I guess I'm a genius?

I just threw together 3 different shaders

and it actually worked in the end on the first try xD

Cas: hiya

arent you clever 😃

cted?

Dan: Not yet bloom isn't fixed yet.

Need another 30 min or so

Cas: oh ok

Dan: Also this is rather amazing.

Sec

look Before:             Tone mapping : 3.354ms (15.4%) After             Tone mapping : 1.659ms (8.2%)

Tone mapping is almost twice as fast.

Cas: very nice 😃

every bit helps

and that's quite a big bit

Dan: Hmm doesn't seem like it helps very much for lower sample counts...

But whatever.

I

I'll optimize the shaders more later

Lemme just fix the bloom...

Cas: back in a bit

Dan: I pushed SPGL2 and Voxoid.

MSAA and SSAO now work together.

I won't have time to run any performance tests but it looks like it's a bit slower than I hoped.

Doesn't look like tone mapping/bloom got much of a gain. It even looks like they may be slower...

There are also some SSAO artifacts from the normal reconstruction.

Next step is adding normals output from the main pass so we get accurate normals.

Latest timings with a few optimizations applied. 1440p 8xMSAA: Frame 5205 : 4.585ms (100.0%)     Camera 1 : 4.547ms (99.1%)         Shadow map rendering : 0.366ms (7.9%)             Shadow map 1 : 0.364ms (7.9%)         Clear main buffer : 0.048ms (1.0%)         Terrain rendering : 1.021ms (22.2%)         Skybox : 0.0ms (0.0%)         Post processing : 3.109ms (67.8%)             SSAO rendering : 1.108ms (24.1%)                 Linearize depth : 0.228ms (4.9%)                 Generate depth mipmaps : 0.098ms (2.1%)                 Compute SSAO : 0.481ms (10.4%)                 Blur : 0.3ms (6.5%)             Merge : 0.872ms (19.0%)             Bloom : 0.673ms (14.6%)             Tone mapping : 0.451ms (9.8%)

I gotta go to bed... x___x

Cas: ah

cted anything?

Dan: yes everything

Cas: cool

Dan: [10:59 PM] theagentd MOKYU: I pushed SPGL2 and Voxoid.

Cas: ive got to do Mr Elf and the Chocloate Faerie for the kids tonight

Dan: lol

Cas: what mischief will he get up to

Dan: yeeeaaaah i'm just gonna head to bed without a bedtime horror story lol

Cas: heh

good job and nn cya tomoz or so

Dan: Thanks

Night

Cas: performance and looks great so far (especially when I added the mouse light back in)

really must do some proper rocks

the spotty ones brian did look a bit like a children's fantasy world 😄

Dan: Sorry no time to work today.

School work from waking up to going to bed...

Cas: too knackered to do much here meself too

Transcripts 28th November 2017 - 5th December 2017

Comments

Love the opening line of this one lol

StabbedBadger


More Creators