Performance change from X in Fedora Core 4 to Fedora Core 5

Fri Jul 14 07:46:08 PDT 2006

On Fri, 14 Jul 2006 12:01:40 +0100 Felix Bellaby <felix at bellaby.plus.com>
babbled:

> On Fri, 2006-07-14 at 08:28 +0900, Carsten Haitzler wrote:
> > ok - if your content is TEXT it will be hard to read - but lets say its a
> > slow website - and it is loading. you can watch it load in a corner of your
> > screen while you go do something more useful. invariably an app will give
> > some hint as to its state without needing to see all the details. even text
> > filled xterms - if you have a compile going and its scrolling along. once
> > it STOPS scrolling you know the compile is done (if the compile is on a
> > remote box over ssh you wont be using your local cpu load meter to tell
> > either). anyway - my point is - people ask for and have uses for such
> > technology and xcomposite makes it all (finally) possible.
> 
> I do not think that compositing is ever going to deliver user
> expectations if they set them so high. Supporting all existing
> applications does mean giving them access to the framebuffer
> environments that they were written to use. This is expensive. Giving
> them access to those framebuffers even when the applications are
> minimised is even more expensive. Minifying those framebuffers as
> textures is very expensive, especially without mipmaps. Put the whole
> lot together and you are asking much more of the GPU than it can
> possinly deliver.

not really - modern gpus with 256mb or 512mb cards IF we can actually use all
of that video ram... can easily deliver this - while twiddling their thumbs.
even cards a few years old can.

> If users want to see minified versions of their apps at all times then
> those apps are going to have to be written from the bottom up to make
> full use of the capabilities of the GPU. There is no other choice.

not going to happen though :(

> > you will want to accelerate drawing as the apps wont stop
> > rendering - they will draw like they do when fully visible. if you use
> > software fallbacks or pixmap thrash with them - then your xserver will just
> > consume most of your cpu with these other windows.... not a good thing. my
> > point is that you still want to reduce unneeded pixmap usage where and when
> > you can.
> 
> > > Compositing using pixel perfect framebuffers for each application just
> > > to shrink them to nothing would indeed be extremely expensive, but
> > > speeding up the drawing into those framebuffers would be a rather feeble
> > > gesture towards efficiency. Pixel operations can not be performed
> > > remotely efficiently at entirely the wrong scale. You would be much
> > > better off switching to a scalable drawing API like cairo for _this_
> > > kind of work. 
> > 
> > not going to happen - you plan on making every app rescale its own output.
> > also just because cairo can draw vectors - doesn't mean it trivially makes
> > such a thing work out of the box everywhere just by its use. problem is
> > you  can't just switch all the apps - they come as they come with their
> > various toolkits or even DIY drawing. a scalable drawing kit doesnt
> > suddenly make blit operations work properly when scaled down or for that
> > matter most operations. it isn't so simple. the only sane solution is to
> > let the app draw to its full sized window as a pixmap ans post-scale to the
> > icon version.
> 
> I had no intention of implying that a software based drawing kit would
> enable users to see minified versions of _existing_ applications. That
> is impossible by any means. Not going to happen.
> 
> However, it _might_ be possible to minify a new generation of
> applications that were designed with minification in mind. These new
> apps would need to perform their drawing operations through an API that
> accomodated scaling and used the full capabilities of the GPU to achieve
> it. OpenGL does scale the app programmers drawing in this way. It uses
> radically different drawing operations from 2D graphics to achieve it,
> specifying floating point vertices and mipmapped textures rather than
> integer coordinates and 2D images. I mentioned cairo as its glitz
> backend is the closest bridge from 2D to 3D that is currently available.
> 
> Are we going to see a mass migration of 2D apps over to 3D ? You clearly
> believe that it is not going to happen and you might well be right.
> However, it is the only way that user expectations can possibly be met
> if they want the kind of minification that you are envisioning.

i disagree. i think it is perfectly feasible given the way i discussed. let me
take my desktop in front of me. 1600x1200 - 4 virtual desktops. let me assume i
FILL every desktop with windows so they all double-overlap (on average) -=
thats 60mb of video ram. my card has 128. *IF* it allocated pixmaps and
textures sensibly - i am not even using half my cards ram. it can render to
those pixmaps without even batting an eyelid too - and this card is over a year
old and was not top of the line either when i bought it. the problem is when
you watch half your video ram vanish on the lower end of things... thats when
the pain kicks in. there is also still the possibility to put pixmaps in agp
ram mapped into the card - that will extend such limits by using system ram.
that is still not seamlessly/nicely handled currently.

> >> ...
> > > Your approach reduces the total memory that needs to be allocated at any
> > > specific time by allowing the toolkit to choose when to buffer and which
> > > areas to buffer. However, nvidia seem to be confessing that this memory
> > > will currently be alloced and dealloced in RAM and the drawing
> > > operations will make little use of the GPU. 
> > 
> > that is a matter of a bad driver implementation - but i seriously do not
> > believe this. i can allocate a pixmap xcopyarea from it to other pixmaps at
> > blinding speeds - it's in video ram and it's using the gfx chipset to do the
> > blits. it's not done by cpu.
> 
> xcopyarea certainly outperforms memcpy by a huge margin and the effect
> on xrender speeds of turning off the acceleration is really striking. I
> suspect that nvidia usually manage to get the pixmaps into VRAM right
> from the start and use non-pipelined 2D hardware to do the blits.
> 
> >  the problem is the opengl <-> 2d world mix. in the
> > past opengl and 3d have been done so entirely separately and differently
> > that it si when you cross the boundaries of these 2 worlds and want to mix
> > and match them - you run into the slowest operations and biggest
> > inefficiencies. that is a matter of fixing drivers to no longer see both as
> > distinctly different worlds.
> 
> The problem is indeed the mix of 3D and 2D. They are different worlds
> and getting from one to the other is very difficult. However, this
> difference _can not_ be fixed in the drivers and could not even be fixed

how can it NOT be fixed in drivers? please explain? isn't xgl an example of
just that? building a driver on top fo the 3d engine only (using opengl as the
intermediate 3d api?)???

> in the hardware. OpenGL provides a means of getting from 3D to 2D, but
> it has to start from a 3D based drawing API and use 3D specific hardware
> to achieve the move to 2D framebuffers at acceptable speeds. Starting
> from the 2D drawing API with which we are familiar provides a very
> efficient direct route to a 2D framebuffer. However, no one has devised
> an efficient route from this 2D drawing API to 3D.

i don't think so. pbuffers in opengl for starters are just "framebuffers" for
opengl to render to and use as sources. nothing is technically impossible to
make your gfx card do all this fancy stuff with 2d using the 3d chip subsystem
and havnig no need to transfer - or to use pbuffers shared as pixmaps for
starters (to talk in oepngl terms). it is just a matter of drivers (i like
libGL into this) thinking of 2d again as a first class citizen and merging the 2
seamlessly and efficiently.

> I entirely agree that the current 2D drawing API is only compatible with
> direct use of a 2D framebuffer, but transforming a continuously changing
> 2D framebuffer into a scalable 3D texture is a horrendous process. You
> can either recalculate the mipmaps continuously or abandon them

or use ansiotropic filtering which gives you multiple samples per output pixel
when minifying - expensive(ish) but much fgaster than software - and i am
manging full accurate minified supersampling at usable speeds in software...

> completely. Either way scaling will take ages. Even without any scaling
> it takes 5 times longer on my hardware to map a static unmipmapped
> texture onto the screen than it takes to perform the functionally
> identical bit blit. Computer games take ages to set up their textures
> before they begin drawing and the GPUs still end up spending most of
> their time handling the textures even after they have been mipmapped.
> 
> OpenGL divides up the drawing process into vertex transformations,
> rasterisation and perfragment operations. The vertex transforms allow
> scaling and lighting to be done using efficient vector graphic logic.
> The rasterisation works out where every visible pixel will come from on
> each polygon and works out its value by transforming the relevent
> static, mipmapped texture, ignoring undrawn pixels completely. Finally,
> the perfragment operations do the blending, depth testing, etc. on the
> final pixels. This is obvioulsy much more efficient as a means of
> scaling than starting by drawing a big 2D image, and you have to start
> from an API like OpenGL to get there.

in my experience of using gl as a 2d api and treating it with painters
algorithm... it works just fine speed-wise and can manage all of this easily -
without blinking, on any vaguely modern hardware. most of what we need is
compositing and transforms - both of which are relatively cheap and fast even
brute-force on a modern gpu.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com
裸好多
Tokyo, Japan (東京 日本)