SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

   Technology StocksNew Technology


Previous 10 Next 10 
From: FJB7/7/2021 5:59:45 PM
   of 426
 
Data Structure Visualizations

cs.usfca.edu

Share RecommendKeepReplyMark as Last Read


From: FJB1/3/2022 7:26:11 PM
   of 426
 

NASM Assembly Language Tutorials -

asmtutor.com
asmtutor.com




Share RecommendKeepReplyMark as Last Read


From: FJB1/15/2022 4:15:04 PM
   of 426
 
gist.github.com

How to setup a practically free CDN I've been using Backblaze for a while now as my online backup service. I have used a few others in the past. None were particularly satisfactory until Backblaze came along.

It was - still is - keenly priced at a flat $5 (£4) per month for unlimited backup (I've currently got just under half a terabyte backed-up). It has a fast, reliable client. The company itself is transparent about their operations and generous with their knowledge sharing. To me, this says they understand their customers well. I've never had reliability problems and everything about the outfit exudes a sense of simple, quick, solid quality. The service has even saved the day on a couple of occasions where I've lost files.

Safe to say, I'm a happy customer. If you're not already using Backblaze, I highly recommend you do.

Taking on the big boys with B2 So when Backblaze announced they were getting into the cloud storage business, taking on the likes of Amazon S3, Microsoft Azure, and Google Cloud, I paid attention. Even if the cost were the same, or a little bit more, I'd be interested because I like the company. I like their product, and I like their style.

What I wasn't expecting was for them to be cheaper. Much cheaper. How about 440% cheaper than S3 per GB? Don't believe me? Take a look. Remarkable.

What's more, they offer a generous free tier of 10 GB free storage and 1 GB free download per day.

If it were any other company, I might think they're a bunch of clowns trying it on. But I know from my own experience and following their journey, they're genuine innovators and good people.

Using B2 B2 is pretty simple. You can use their web UI, which is decent. Or you can use Cyberduck, which is what I use, is free, and of high quality. There is also a command-line tool and a number of other integrated tools. There is also a web API, of course.

Setting up a vanity URL You can set up a "vanity" URL for your public B2 files. Do it for free using CloudFlare. There's a PDF [1.3 MB] documenting how.

Using CloudFlare CDN to cache B2 hosted files You can also configure CloudFlare to aggressively cache assets served by your B2 service. It is not immediately obvious how to do this, and took a bit of poking around to set up correctly.

By default, B2 serves with cache-invalidating headers: cache-control:max-age=0, no-cache, no-store, which causes CloudFlare to skip caching of assets. You can see this happening by looking for the cf-cache-status:MISS header.

To work around this problem, you can use CloudFlare's PageRules specifying an "Edge cache expire TTL". I won't explain what that means here as it is covered in-depth on the CloudFlare blog.

So, to cache your B2 assets, you need to create a PageRule that includes all files on your B2 domain. For example:

files.silversuit.net/* You then need to add your cache settings. I have Cache Level set to Cache Everything; Browser Cache TTL set to a year; Edge Cache TTL set to 7 days. I'm caching aggressively here, but you can tweak these settings to suit. Here's a screenshot:



Screenshot showing PageRules settings To check that it's working correctly, use DevTools to look for the cf-cache-status:HIT header:



Screenshot showing a CloudFlare cache hit Wrapping up So, with that, you're making use of already very inexpensive B2 storage coupled with CloudFlare's free CDN to serve your assets almost entirely for free. And it's not like these are rinky-dink services that are going to fall over regularly; these are both high-quality, reputable companies.

What a time to be alive, eh?

Share RecommendKeepReplyMark as Last Read


From: FJB1/19/2022 6:47:06 PM
   of 426
 
Security Engineering Lecture 1: Who is the Opponent?


Share RecommendKeepReplyMark as Last Read


From: FJB3/21/2022 8:44:32 PM
   of 426
 
Lanai, the mystery CPU architecture in LLVM.


Disclaimer: I have had access to some confidential information about some of the matter discussed in this page. However, everything written here is derived form publicly available sources, and references to these sources are also provided.


https://q3k.org/lanai.html
https://q3k.org/lanai.html

Some of my recent long-term projects revolve around a little known CPU architecture called 'Lanai'. Unsurprisingly, very few people have heard of it, and even their Googling skills don't come in handy. This page is a short summary of what I know, and should serve as a reference for future questions.

Myricom & the origins of Lanai Myricom is a hardware company founded in 1994. One of their early products was a networking interface card family and protocol, Myrinet. I don't know much about it, other than it did some funky stuff with wormhole routing.

As part of their network interface card design, they introduced data plane programmability with the help of a small RISC core they named LANai. It originally ran at 33MHz, the speed of the PCI bus on which the cards were operating. These cores were quite well documented on the Myricom website, seemingly with the end-user programmability being a selling point of their devices.

It's worth noting that multiple versions of LANai/Lanai have been released. The last publicly documented version on the old Myricom website is Lanai3/4. Apart from the documentation, sources for a gcc/binutils fork exist to this day on Myricom's Github.

At some point, however, Myricom stopped publicly documenting the programmability of their network cards, but documentation/SDK was still available on request. Some papers and research websites actually contain tutorials on how to get running with the newest versions of the SDK at the time, and even document the differences between the last documented Lanai3/4 version and newer releases of the architecture/core.

This closing down of the Lanai core documentation by Myricom didn't mean they stopped using it in their subsequent cards. The core made its way into their Ethernet offerings (after Myrinet basically died), like their 10GbE network cards. You can easily find these 10G cards on eBay, and they even have the word 'Lanai' written on their main ASIC package. Even more interestingly, Lanai binaries are shipped with Linux firmware packages, and can be chucked straight into a Lanai disassembler (eg. the Myricom binutils fork's objdump).

Technical summary of Lanai3/4
  • 32 registers, most of them general purpose, with special treatment for R0 (all zeroes), R1 (all ones), R2 (the program counter), R3 (status register), and some registers allocated for mode/context switching.
  • 4-stage RISC-style pipeline: Calculate Address, Fetch, Compute, Memory
  • Delay slot based pipeline hazard resolution
  • No multiplication, no division. It's meant to route packets, not crunch numbers.
  • The world's best instruction mnemonic: PUNT, to switch between user and system contexts.
Here's a sample of Lanai assembly:

000000f8 <main>:       f8: 92 93 ff fc   st      %fp, [--%sp]       fc: 02 90 00 08   add     %sp, 0x8, %fp      100: 22 10 00 08   sub     %sp, 0x8, %sp      104: 51 80 00 00   or      %r0, 0x0, %r3      108: 04 81 40 01   mov     0x40010000, %r9      10c: 54 a4 08 0c   or      %r9, 0x80c, %r9      110: 06 01 11 11   mov     0x11110000, %r12      114: 56 30 11 11   or      %r12, 0x1111, %r12      118: 96 26 ff f4   st      %r12, -12[%r9]      11c: 96 26 ff f8   st      %r12, -8[%r9]      120: 86 26 13 f8   ld      5112[%r9], %r12  00000124 <.LBB3_1>:      124: 46 8d 00 00   and     %r3, 0xffff, %r13      128: 96 a4 00 00   st      %r13, 0[%r9]      12c: 01 8c 00 01   add     %r3, 0x1, %r3      130: e0 00 01 24   bt      0x124 <.LBB3_1>      134: 96 24 00 00   st      %r12, 0[%r9] 
The `add`/`sub`/`or` instruction have their destination on the right hand side. `st` and `ld` are memory store and load instructions respectively. Note the lack of 32-bit immediate load (instead a `mov` and `or` instruction are used in tandem). That `mov` instruction isn't real, either - it's a pseudo instruction for an `add 0, 0x40010000, %r9`. Also note the branch delay slot at address 134 (this instruction gets executed even if the branch at 130 is taken).

The ISA is quite boring, and in my opinion that's a good thing. It makes core implementations easy and fast, and it generally feels like one of the RISC-iest cores I've dealt with. The only truly interesting thing about it is its' dual-context execution system, but that unfortunately becomes irrelevant at some point, as we'll see later.

Google & the Lanai team In the early 2010s, things weren't going great at Myricom. Due to financial and leadership difficulties, some of their products got canceled, and in 2013, core Myricom engineers were bought out by Google, and they transferred the Lanai intellectual property rights with them. The company still limps on, seemingly targeting the network security and fintech markets, and even continuing to market their networking gear as programmable, but Lanai is nowehere to be seen in their new designs.

So what has Google done with the Lanai engineers and technology? The only thing we know is that in 2016 Google implemented and upstreamed a Lanai target in LLVM, and that it was to be used internally at Google. What is it used for? Only Google knows, and Google isn't saying.

The LLVM backend targets Lanai11. This is quite a few numbers higher than the last publicly documented Lanai3/4, and there's quite a few differences between them:

  1. No more dual-context operation, no more PUNT instruction. The compiler/programmer can now make use of nearly all registers from r4 to r31.
  2. No more dual-ALU (R-R-R) instructions. This was obviously slow, and was probably a combinatorial bottleneck in newer microarchitectural implementations.
  3. Slightly different delay slot semantics, pointing at a new microarchitecture (likely having stepped away from a classic RISC pipeline into something more modern).
  4. New additional instruction format and set of accompanying instructions: SPLS (special part-word load/store), SLI (special load immediate), and Special Instruction (containing amongst others popcount, of course).
Lanai Necromancy As you can tell by this page, this architecture intrigued me. The fact that it's an LLVM target shipped with nearly every LLVM distribution while no-one has access to hardware which runs the emitted code is just so spicy. Apart from writing this page, I have a few other Lanai-related projects, and I'd like to introduce them here:

  1. I'm porting Rust to Lanai11. I have a working prototype, which required submitting some patches to upstream LLVM to deal with IR emitted by rustc. This has been upstreamed. My rustc patches are pending on...
  2. I'm implementing LLD support for Lanai. Google (in the LLVM mailing list posts) mentions they use a binutils ld, forked off from the Myricom binutils fork. I've instead opted to implement an LLD backend for Lanai, which currently only supports the simplest relocations. I haven't yet submitted a public LLVM change request for this, but this is on my shortlist of things to do. I have to first talk to the LLVM/Google folks on the maintenance plan for this.
  3. I've implemented a simple Lanai11 core in Bluespec, as part of my qfc monorepo. 3-stage pipeline (merged addr/fetch stages), in-order. It's my first bit of serious Bluespec code, so it's not very good. I plan on implementing a better core at some point.
  4. I've implemented a small Lanai-based microcontroller, qf105, which is due to be manufactured in 130nm as part of the OpenMPW5 shuttle. Which is, notably, sponsored by Google :).
If you're interested in following or joining these efforts, hop on to ##q3k on libera.chat.

In addition to my effort piecing together information about Lanai and making use of it for my own needs, the TrueBit project also used it as a base for their smart contract system (in which they implemented a Lanai interpreter in Solidity).

Documentation Useful resources, in no particular oder:


Share RecommendKeepReplyMark as Last Read


From: FJB3/25/2022 5:18:32 PM
   of 426
 
Writing a Simple Operating System —
from Scratch

cs.bham.ac.uk

Share RecommendKeepReplyMark as Last Read


From: retrodynamic4/17/2022 9:58:48 PM
   of 426
 
State of the Art Novel InFlow Tech: ·1-Gearturbine Reaction Turbine Rotary Turbo, ·2-Imploturbocompressor Impulse Turbine 1 Compression Step.



·1-Gearturbine: Reaction Turbine, ·Rotary-Turbo, Similar System of the Aeolipilie ·Heron Steam Device from 10-70 AD, ·With Retrodynamic = DextroGiro/RPM VS LevoGiro/InFlow, + ·Ying Yang Circular Power Type, ·Non Waste Parasitic Power Looses Type, ·8-X,Y Thermodynamic Cycle Way Steps, Patent: #197187 / IMPI - MX.



·2-Imploturbocompressor: Impulse Turbine, ·Implo-Ducted, One Moving Part System Excellence Design, · InFlow Goes from Macro-Flow to Micro-Flow by Implosion/And Inverse, ·One Compression Step, ·Circular Dynamic Motion. Implosion Way Type, ·Same Nature of a Hurricane Satellite View.

stateoftheartnovelinflowtech.blogspot.com

https://padlet.com/gearturbine/un2slbar3s94

https://www.behance.net/gearturbina61a


Share RecommendKeepReplyMark as Last Read


From: FJB4/20/2022 7:03:09 AM
   of 426
 


hpcwire.com

Nvidia R&D Chief on How AI is Improving Chip Design
By John Russell

8-10 minutes





Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and senior vice president of research, providing an overview of Nvidia’s R&D organization and a few details on current priorities. This year, Dally focused mostly on AI tools that Nvidia is both developing and using in-house to improve its own products – a neat reverse sales pitch if you will. Nvidia has, for example begun using AI to effectively improve and speed GPU design.

Bill Dally of Nvidia in his home ‘workshop’ “We’re a group of about 300 people that tries to look ahead of where we are with products at Nvidia,” described Dally in his talk this year. “We’re sort of the high beams trying to illuminate things in the far distance. We’re loosely organized into two halves. The supply half delivers technology that supplies GPUs. It makes GPUs themselves better, ranging from circuits, to VLSI design methodologies, architecture networks, programming systems, and storage systems that go into GPUs and GPUs systems.”

“The demand side of Nvidia research tries to drive demand for Nvidia products by developing software systems and techniques that need GPUs to run well. We have three different graphics research groups, because we’re constantly pushing the state of the art in computer graphics. We have five different AI groups, because using GPUs to run AI is currently a huge thing and getting bigger. We also have groups doing robotics and autonomous vehicles. And we have a number of geographically ordered oriented labs like our Toronto and Tel Aviv AI labs,” he said.

Occasionally, Nvidia launches a Moonshot effort pulling from several groups – one of these, for example, produced Nvidia’s real-time ray tracing technology.

As always, there was overlap with Dally’s prior-year talk – but there was also new information. The size of the group has certainly grown from around 175 in 2019. Not surprisingly, efforts supporting autonomous driving systems and robotics have intensified. Roughly a year ago, Nvidia recruited Marco Pavone from Stanford University to lead its new autonomous vehicle research group, said Dally. He didn’t say much about CPU design efforts, which are no doubt also intensifying.



Presented here are small portions of Dally’s comments (lightly edited) on Nvidia’s growing use of AI in designing chips along a with a few supporting slides.

1 Mapping Voltage Drop

“It’s natural as an expert in AI that we would want to take that AI and use it to design better chips. We do this in a couple of different ways. The first and most obvious way is we can take existing computer-aided design tools that we have [and incorporate AI]. For example, we have one that takes a map of where power is used in our GPUs, and predicts how far the voltage grid drops – what’s called IR drop for current times resistance drop. Running this on a conventional CAD tool takes three hours,” noted Dally.

“Because it’s an iterative process, that becomes very problematic for us. What we’d like to do instead is train an AI model to take the same data; we do this over a bunch of designs, and then we can basically feed in the power map. The [resulting] inference time is just three seconds. Of course, it’s 18 minutes if you include the time for feature extraction. And we can get very quick results. A similar thing in this case, rather than using a convolutional neural network, we use a graph neural network, and we do this to estimate how often different nodes in the circuit switch, and this actually drives the power input to the previous example. And again, we’re able to get very accurate power estimations much more quickly than with conventional tools and in a tiny fraction of the time,” said Dally.





2 Predicting Parasitics

“One that I particularly like – having spent a fair amount of time a number of years ago as a circuit designer – is predicting parasitics with graph neural networks. It used to be that circuit design was a very iterative process where you would draw a schematic, much like this picture on the left here with the two transistors. But you wouldn’t know how it would perform until after a layout designer took that schematic and did the layout, extracted the parasitics, and only then could you run the circuit simulations and find out you’re not meeting some specifications,” noted Dally.

“You’d go back and modify your schematic [and go through] the layout designer again, a very long and iterative and inhuman labor-intensive process. Now what we can do is train neural networks to predict what the parasitics are going to be without having to do layout. So, the circuit designer can iterate very quickly without having that manual step of the layout in the loop. And the plot here shows we get very accurate predictions of these parasitics compared to the ground truth.”



3 Place and Routing Challenges

“We can also predict routing congestion; this is critical in the layout of our chips. The normal process is we would have to take a net list, run through the place and route process, which can be quite time consuming often taking days. And only then we would get the actual congestion, finding out that our initial placement is not adequate. We need to refactor it and place the macros differently to avoid these red areas (slide below), which is where there’s too many wires trying to go through a given area, sort of a traffic jam for bits. What we can do instead now is without having to run the place and route, we can take these net lists and using a graph neural network basically predict where the congestion is going to be and get fairly accurate. It’s not perfect, but it shows the areas where there are concerns, we can then act on that and do these iterations very quickly without the need to do a full place and route,” he said.



4 Automating Standard Cell Migration

“Now those [approaches] are all sort of using AI to critique a design that’s been done by humans. What’s even more exciting is using AI to actually do the design. I’ll give you two examples of that. The first is a system we have called NVCell, which uses a combination of simulated annealing and reinforcement learning to basically design our standard cell library. So each time we get a new technology, say we’re moving from a seven nanometer technology to a five nanometer technology, we have a library of cells. A cell is something like an AND gate and OR gate, a full adder. We’ve got actually many thoundands of these cells that have to be redesigned in the new technology with a very complex set of design rules,” said Dally.

“We basically do this using reinforcement learning to place the transistors. But then more importantly, after they’re placed, there are usually a bunch of design rule errors, and it goes through almost like a video game. In fact, this is what reinforcement learning is good at. One of the great examples is using reinforcement learning for Atari video games. So this is like an Atari video game, but it’s a video game for fixing design rule errors in a standard cell. By going through and fixing these design rule errors with reinforcement learning, we’re able to basically complete the design of our standard cells. What you see (slide) is that the 92 percent of the cell library was able to be done by this tool with no design rule or electrical rule errors. And 12 percent of them are smaller than the human design cells, and in general, over the cell complexity, [this tool] does as well or better than the human design cells,” he said.

“This does two things for us. One is it’s a huge labor savings. It’s a group on the order of 10 people will take the better part of a year to port a new technology library. Now we can do it with a couple of GPUs running for a few days. Then the humans can work on those 8 percent of the cells that didn’t get done automatically. And in many cases, we wind up with a better design as well. So it’s labor savings and better than human design.”





There was a good deal more to Dally’s talk, all of it a kind of high-speed dash through a variety of Nvidia’s R&D efforts. If you’re interested, here is HPCwire’s coverage of two previous Dally R&D talks – 2019, 2021 – for a rear-view mirror into work that may begin appearing in products. As a rule, Nvidia’s R&D is very product-focused rather than basic science. You’ll note his description of the R&D mission and organization hasn’t changed much but the topics are different.


Share RecommendKeepReplyMark as Last Read


From: FJB6/7/2022 9:29:44 PM
   of 426
 
Miracle Drug Shows 100% Remission For All Cancer Patients In Drug Trial

Share RecommendKeepReplyMark as Last Read


From: FJB6/13/2022 11:54:24 PM
   of 426
 
so awesome

Diving into GCC internals

Share RecommendKeepReplyMark as Last Read
Previous 10 Next 10