From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: LRU in caches with bit decay Newsgroups: comp.arch Date: 02 Mar 1997 08:03:28 +0000difference is that clock that i did in late '60s effectively decays bits at a rate proportional to demand for pages (i.e. dynamic adaptive) as opposed to fixed clock
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Year 2000: Why? Newsgroups: alt.folklore.computers Date: 15 Mar 1997 10:25:39 +0000the 2nd order problem is not the procedural code ... but all the files and databases that have the year stored as 2digit field. even if all the programs were changed ... there is possibly 100-1000 tapes chock full of 2digit year fields ... for evevy program with a 2digit definition field.
then it becomes a huge transformation problem ... is it possible to write a conversion program to read each tape ... recognize every 2digit year field, modify it appropriately to 4digit field ... and write a new tape containing all 4digit fields (and appropriately update all the automated and manual tape library procedures). this presumably has to run in zero time (instead of 4-5 months) since everything is shutdown while the conversion is running ... and the new programs won't be able to operate until after the files are converted.
one way is to have computer with infinite computer power and tape drives with infinite transfer rates. other approach is to have a time-machine ... so that all the new tapes generated at the end of the several month process can be shipped back in time to point where the process started (simulating infinite computer power). of course, time machines also imply that the problem could be corrected at the origin (instead of just fixing it afterwards).
some amount of medicine (and others) has been dealing with 2digit year birthdays for some time. there are non-trivial number of people >100 years old ... with admin records having birthdays <97 (and default is that age is modulo 100).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IBM 1130 (was Re: IBM 7090--used for business or science?) Newsgroups: alt.folklore.computers Date: 07 May 1997 18:38:13 +0000the 1130 was also used in something called the 2250mod4. 2250 was large vector graphics screen ... 2250mod1 was directly channel attached to 360 ... 2250mod4 was controlled by 1130 ... and any 360 involvement required talking to the 1130 which then drove the 2250.
one of the people at csc (4th floor, 545 tech sq) had ported spacewar to 1130 running w/2250 sometime prior to 1970 (game was played by remapping keyboard into left & right side for control keys for player one and two).
same person was also responsible for "networking" software between the 1130 and 360. this was also essentially the basis for what became the the internal network (larger than whole arpa/internet up until 1984 or possibly 1985)
as a total aside ... in 1968, i "borrowed" some 2250mod1 support code from llnl cms fortran subroutine library and integrated it into the cms editor with some other enhancements to create an early full-screen editor.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IBM 1130 (was Re: IBM 7090--used for business or science?) Newsgroups: alt.folklore.computers Date: 07 May 1997 21:55:22 +0000 Organization: A poorly-installed InterNetNews siteoops, finger slip ... llnl -> lincoln labs cms fortran subroutine lib.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Mythical beasts (was IBM... mainframe) Newsgroups: alt.folklore.computers Date: 09 May 1997 21:26:57 +0000phili science center did apl/360 ... had its own monitor, 32k (64k) workspaces, swapping, etc.
camb science center (4th floor, 545 tech sq), ported it to cms, redid garbage collection for virtual environment, added i/o and other primitives (cms/apl). this was also deployed internally for numerous admin and support applications across the corporation (majority of HONE applications for world-wide field, sales, & marketing used this starting circa 70 or 71, also used extensively by hdqtrs business planners).
palo alto science center wrote 145 m'code for apl (getting 10*? speedup), remapped much of the cms/apl stuff to shared variables, etc .. and produced apl/cms.
in '77 HONE US consolidated all the centers at 1501 california (across parking lot from PASC). First implemented & deployed SMP support ... and then implemented cluster support for eight SMPs in single system image (largest SSI cluster in the world at the time including high availability and fall-over), providing all the US field & marketing people online support.
Smaller variations of this were cloned in places like Havant, La Defense, Tokyo, & number of other places. For the early cloning, I hand carried and did the install & configuration myself ... I would sometimes make the rounds and check on things.
One of the reasons I developed symbolic dump analyser was to help cut down on number of things I had to look at from different sites (i.e. numerous sites with custom modified operating system that had to be supported).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: 360/44 (was Re: IBM 1130 (was Re: IBM 7090--used for business or Newsgroups: alt.folklore.computers Date: 12 May 1997 16:46:18 +0000i didn't see much of 801 until '76 ... but from what I understand ... it started out as trying to do single chip computer ... and then along the way became a computer with all instructions executing in a single machine cycle. the seamingly biggest 370 influence was the hard&fast attempt to avoid memory consistency because of the significant 370 SMP scale-up problems trying to support strong consistency (one of the places we deviated; since at conference in '76 where they presented 801 with no supervisor mode, no memory consistency, no protection, etc; ... we presented a 16-way 370 SMP ... it did require some software tweaking tho).
Effects have carried forward today thru the RIOS genaration. When we were doing HA/CMP ... I focused on getting fiber-channel scale-up (i.e. precursor to SP1/SP2) as the only practical mechanism for dealing with scale-up for RIOS line of chips.
another RIOS, 801 hold-over is the segment architecture. the original design point had a closed proprietary operating system with protection checking occurring at compile & bind time. execution didn't have any protection domains ... and all nominal supervisor functions were to be executed inline. this resulted in design-point where virtual memory hardware savings could be achieved by treating virtual memory objects like another form of addressing ... with inline code swapping virtual memory object pointers as easily as address pointers in general purpose registers were changed. however, high rate of segment register swapping takes on completely different dimension if supervisor calls & protection domains become involved (becomes impractical to support several hundred different, concurrent virtual memory objects in the same address space via segment register swapping).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IBM Hursley? Newsgroups: alt.folklore.computers Date: 12 May 1997 22:38:57 +0000hursley may currently be responsible for cics ... but it origins were at some customer site in the states in the 60s. ibm picked up for a product ... and the site I was at in '69 was "beta" test for the product. I remember shooting a number of bugs at the time in the bdam support.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Did 1401 have time? Newsgroups: comp.society.folklore Date: 31 May 1997 10:59:56 +0000the 360/67 had higher resolution timer in order to do time-sharing ... decrementing once every 13.? microseconds. when i was undergraduate ... four of us were involved in building a controller (supposedly we get credit for originating the ibm oem control unit business). one of the early/dramatic bugs we had was holding the bus too long ... if interval timer hardware was unable to update memory (from the previous tic) before the next timer-tic occurred ... the machine would red-light and die.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Ancient DASD Newsgroups: alt.folklore.computers Date: 31 May 1997 11:14:14 +0000original cics had been written at customer site ... and ibm had picked it up to make it a product. while it seemed to have been well tested for that particular environment ... that was about it. lots of bugs having to do with using it in any other environment ... bdam open for example only worked for specific set of bdam parameters ... had to rewrite in order to make it work in generalized environment (this was when ibm still supplied code for some number of things).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: HELP! Chronology of word-processing Newsgroups: alt.folklore.computers Date: 03 Jun 1997 22:05:24 +0000csc did conversational editor and script for cms in 66-67. csc did gml (precursor to sgml/html) in 69-70 integrated into script ... also see
http://www.sil.org/sgml/sgmlhist0.html
I did full-screen interface for the conversational editor on 2250m1 in 68 at wsu
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: HELP! Chronology of word-processing Newsgroups: alt.folklore.computers Date: 04 Jun 1997 19:03:18 +0000... pointed out to me that some people might think that csc could stand for something other than cambridge science center (4th floor, 545 tech. sq, cambridge, mass) ... virtual machines, gml, cp/67, script, vm/370, performance monitors, dynamic adaptive feedback, fair share, compare&swap, smp technology, cms, etc.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: OSes commerical, history Newsgroups: comp.sys.super,comp.arch Date: 09 Jun 1997 07:26:27 +0000the development models have tended to be requirements driven. an area of more and more concern is security and integrity. in the past little attention in this market segment was given to security and integrity ... lack of requirements ... and/or perception that their was little of value to secure and/or protect. with increasing commercial aspect to the internet ... there are increasing requirements to meet security and integrity standards ... which accounted for much of the characteristics of the (older?) proprietary system development methodologies. In some sense (for at least some products) they should become more similar to attempts to assure the integrity of the hardware development process (and elimination of bugs).
frequently these (proprietary software) methodologies had six month QA cycles between the end of development and ship of product. At one time in the early 70s, I added to this cycle by developing an automated performance benchmarking and capacity measurement system ... it would generate a wide variety of loads and system configurations ... measure the results and then calculate a new set. It took three months elapsed time to run (at least for basic assesement, extending the qa process to 9 months). It included generations of various stress-test loads that were 10* outside of normal operational envelopes and required fixes for any resulting failures (system crashes and/or bugs in resource management algorithms).
An interesting challenge is to compress all of this into product cycles that are under 6-9 months ... and still be able to meet security and integrity requirements (for at least some subset of the products).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: OSes commerical, history Newsgroups: comp.sys.super,comp.arch Date: 09 Jun 1997 13:54:02 +0000dynamic adaptive resource scheduling between cpu, virtual/real memory, i/o, etc .... that was one of the "bugs" I fixed in '69 ... that cp (and multics) had inherited from ctss (& other) schedulers floating around the eastern region seaboard.
later version was released in official commercial product ... only problem was that tuning knobs were all the rage (especially among the performance witch doctors) ... and marketing told me I had to add tuning knobs. so I added tuning knobs, published the source code ... and documented the algorithms ... funny thing 20+ years later nobody made the connection between formulas, dynamic feedback and degrees of freedom (i.e. dynamic adaptive feedback had more degrees of freedom than the knobs).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: OSes commerical, history Newsgroups: comp.sys.super,comp.arch Date: 09 Jun 1997 13:59:28 +0000... separate aside ... both the absolute performance targets and the relative fair share resource consumption targets had to predictably and exactly change resource consumption across a wide range of loads (from a couple concurrent processes to hundreds of concurrent processes) ... one of the things that was validated thru the three month automated process.
i alwas like to say it was done in zero instructions ... i.e. pathlengths were shorter after i added all the function than before I started.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Galaxies Newsgroups: comp.arch,comp.os.vms,comp.sys.dec Date: 12 Jun 1997 18:13:14 +0000there were numerous clusters from 60s and 70s ... winter 77/78, we fielded the largest single system image cluster (in the world at the time; was at 1501 cal, palo alto). changes to do process migration ... while not very extensive ... never made it to product. did have a friend who did process migration at service bureau ... allowing both intra-complex as well as inter-complex process migration (i.e. between san fran & waltham). It was for little things like providing 7x24 world-wide (non-disruptive) access ... even when the machine was to be taken down 3rd shift sunday for maint ... and it was 1st shift in other parts of the world. Intra-complex migration isn't too hard because of shared disk subsystems ... inter-complex migration did have some restrictions regarding file availability(/replication).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: lynn@garlic.com (Anne & Lynn Wheeler) Subject: Re: OSes commerical, history Newsgroups: comp.sys.super,comp.arch Date: 17 Jun 1997 16:18:44 GMTproblem involved pre-existing mainframe operating system with delta release-to-release enhancements originally coded the first time in the 60s. at one point had rewriteen all serialization for the system to eliminate all zombies, orphaned processes, and synchronization failures. Somewhere along the way the serialization function for the system got dropped. As part of the testing ... well beyond normal operating envelopes, had to re-implement the function (eliminating all zombies, orphaned processes, and synchronization failures). the other major issue was supporting both fairshare as well as absolute percent ... which was actually straight-forward ... the problem was "unfair" share ... each "nice'ing" notch had to have a predictable and repeatable results across a wide-range of configurations and workloads.
somewhat later had an environment where MVS would crash about every 15-minutes .... effectively because of a very hostile I/O environment (disk/controller engineering design/development/test). I spent several months completely rewriting IOS so that it was absolutely bullet-proof ... no possible I/O operational characteristic would result in system loop, crash, &/or failure. went from stand-alone testcell operation to having upwards of 10-12 testcells being able to operate concurrently. (some of the guys at sun research made the comment that they couldn't imagine what would have to be done to IOS where a normal MVS system would crash every 15 minutes).
in a more controlled environment ... QA was also easier ... but when trying to integrate incremental changes into massive amount of code that had been around for 10-20 years and worked on by thousands of people ... there was a little more of a challenge. By comparison, I once took an operating system snapshot ... and did something like 30K lines of modifications ... which (which along with some other work) was released to some select customers. One customer (AT&T longlines) was still running it ten years later (I hadn't been aware of it until salesman on the account called to say that the most recent processor hardware product was incompatible with ten-year old operating system ... and something had to be done).
Life-cycle becomes interesting when you get calls on code you wrote, 10, 20 or in one or two cases 25 years ago. Challenge these days is to get CMM-4/5 integrated into large orgs with several thousand development programmers doing business critical systems. Most people only have direct experience with consumer software which frequently can have (very) short product cycles & lifetimes ... and little mission-critical requirements. by comparison ... lets say for some financial software handling trillions/day ... very few people would even tolerate .001% error rate; say the wrong account (yours?) was erronously debited by .00001*$1000000000000 (and gov came after you to make up any shortfall).
one of my pet-peeve failures was register allocation & use (use before set, typically an atypical branch/merge scenerio), especially for addressing (assembler operating system programming) .... in '73, I wrote a (PLI) program that would do detailed flow analysis, register allocation/coloring and complexity analsysis for (manually written) assembler routines. I wanted to at least be able to identify control flows where register might be used before set.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger lynn@garlic.com for public key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Why Mainframes? Newsgroups: alt.folklore.computers Date: 11 Jul 1997 10:07:01 +0000selector & multiplexor were pretty much the same except for protocol convention ... both required synchronous bus hand-shake to transfer single byte; multiplexor convention was that controllers would frequently suspend operation (during data transfer) and release the channel prior to completion (allowing sort-of time-sharing of the channel for multiple concurrent transfers).
because of synchronous bus hand-shake per byte ... selector channel was pretty much limited to 1.5mbyte/sec thruput given daisy-chain cable length of 200 feet.
block multiplexor defined a new channel operation which supported a suspension command (rather than the effectively time-driven suspend protocol on the multiplexor). more importantly, block multiplexor defined a new protocol which allowed eight bytes to be transferred in one synchronous bus hand-shake ... which extended the thruput to 3mbyte/sec and distance limitation to 400 feet. there are some similarties between scsi bus protocol and channel bus protocol ... except a big difference in the aggregate cable distance.
the suspension command was big boon to aggregate disk transfer thruput. prior to the suspension command, disk record locator operation was performed by a "search" command which compared the id of each disk record with data in the memory of the computer. there was no local copy of this data so the channel was locked up continuously requesting the identifier information during the search operation. In connection with the channel suspension command, disk technology introduced a servo platter that had positioning information and a new command that would suspend & release the channel until a specific rotational position had arrived ... the disk would then attempt a channel reconnect, followed by the resumption of the search command. If the application software had correct table of record identifiers and corresponding rotational position ... channel utilization efficiency could be remarkably increased.
this whole scenerio was done in place of the scsi-type of search record (in mainframe terms, fixed-block architecture or fba) ... which would of totally subsumed much of appliation and system software complexity attempting to build these tables of record ids and rotational position.
i once tried to get a modified file structure and super FBA hardware device (something like current generation of scsi disks ... it looked like i could get a factor of 3* improvement in file thruput for fairly interesting set of operations and workloads) ... but STL quoted a figure of $26m & long delivery schedule to redo search-id paradigm in VTOC and PDS structure ... which nixt the idea.
ECKD then added another layer of baroqueness in an attempt to address the opportunity. A large percentage of disk data access were to subfiles/members via a two file lookup scheme ... a multitrack search-id of the vtoc for the file/library pointer. Then subfiles/members were located by another multitrack search-id of the PDS directory for the specific member pointer. Because there was little or no caching ... these operations were required for each reference. A PDS directory for a large library could easily span three cylinders (upwards of 60 tracks). Because the multitrack search "busied-out" the disk, the (shared) disk controller, and the (shared) channel ... aggregate member loading could slow down to one/second (and aggregate disk I/O operation drop to 3-4/second ... multitrack search would examine very record on a "cylinder" of 19 tracks ... if the requested record wasn't found, it would be restarted on the next cylinder, disks spinning 3600/rpm or 60rps ... and a single disk I/O operation runs for 19 revolutions ... results in 3-4 ops/sec). ECKD wasn't to try and eliminate the dedicated busy time of the controller and channel during the 19 revolution disk operation.
Again as a hardware bandaid which would have seen significantly better aggregate thruput with eliminating the multi-track search paradigm for vtocs and pds/libraries. I've contended that the aggregate effort in the bandaids far surpasses the effort to just changed the paradigm (with the side effect of significantly better thruput).
as an aside, the board that four of us did when I was an undergraduate ... which we used to build our own replacement for one of the ibm controllers. this (supposedly) originated the ibm oem controller business. i've gotten comments/feedback that possibly the same wire-wrap board (design) was still being shipped with descendants 15 years later.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Why Mainframes? Newsgroups: alt.folklore.computers,alt.sys.pdp10 Date: 11 Jul 1997 10:20:01 +0000channels multiplex on the memory bus ... with memory busses frequently operating in the hundreds of megabytes per second ... and typically only a few channels operating even at 30% ... it is possible to support quite a few channels. in fact, a lot of the increase in the number of channels is because of high channel and controller bus protocol processing overhead/latency that tends to severely restrict sustained thruput (thus leading to spreading operations out over large number of essentially dedicated channel resources ... minimizing contention). the dedication of channel resources ... tends to further reduce average channel utilization ... but increases aggregate thruput because of reduction in contention & queueing delays.
channel utilization is typically measured in terms of data transferred divided by hardware transfer capacity. this ignores processing overhead delays and latencies in the channel/controller handshaking protocol.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Why Mainframes? Newsgroups: alt.folklore.computers,alt.sys.pdp10 Date: 11 Jul 1997 10:35:42 +0000... slight historical note
... 360/67 "simplex" had single ported memory shared between processor and i/o complex.
... 360/67 "duplex" had tri-ported memory with dedicated ports for two processors and the "i/o controller".
the duplex "hardware" ran slower than the simplex ... but workloads with 100% cpu utilization and high i/o activity had better thruput on "half-duplex" 67 (i.e. running only single processor) same workload on simplex (in such workloads, memory bus contention became significant on the simplex).
later machines reverted to single-ported memory ... most commercial workloads didn't tend to have 100% cpu utilization concurrent with very high I/O utilization (it tended to be one or the other ... but very seldom both).
these configurations and workloads tended to have E/B ratios that used megabytes/sec and MIPs ... todays E/B ratios tend to be similar ... but substituting megabits/sec in place of megabytes/sec (i.e. relative system I/O thruput has declined by at least an order of magnitude ... or from different viewpoint relative system CPU capacity has increased by at least an order of magnitude ... there is at least 10* and sometimes 100* as much CPU done per I/O operation today).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Why Mainframes? Newsgroups: alt.folklore.computers,alt.sys.pdp10 Date: 11 Jul 1997 21:59:02 +0000somewhere i have "360/62" SRL (i.e. 60 & 62 before memory system changed and were renamed 65 & 67) describing 4-way and showing a picture. I'm pretty sure no 4-ways were built and only one 3-way. I believe the 3-way was for lockheed in sunnyvale. it had some interesting additions ... like the channel controller switches were all software settable. Charlie S. worked on the project before joining cambridge (compare&swap or CAS comes from charlie's initials ... padegs & smith made the change to CS & CSD ... as well requiring the programming examples for uniprocessor programming before accepting for architecture amd the POP).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Why Mainframes? Newsgroups: alt.folklore.computers,alt.sys.pdp10 Date: 24 Jul 1997 20:22:47 +0000the 370/158 was a horizontal (somewhat vliw) microcode machine. for the 303x channel director, the 370 m'code was replaced with channel processing code. A 3031 was a 370/158 repackaged with a channel director; a 3032 was a 370/168 repackaged to use channel director, the 3033 was the 168 logic remap'ed from 4 circuit/chip technology to approx 40 circuit/chip technology. The simple remap would have gained approx. 20% performance improvement over 168-3; later in the engineering cycle some of the logic was redesigned to take advantage of additional circuits/chip (going off chip less frequently) obtaining the additional 3033 performance.
the (158) channel director supported six channels. a 3033 could have up to 16 channels so some configuration had three channel directors.
one of the diagnostic tricks of the IOS rewrite to create absolute non-fail system for the disk & controller engineering lab ... it was possible to reboot controllers and channel directors under "software control" from the mainframe. most controllers would reboot if each subchannel address on a controller was hit with HDV/CLRIO in tight loop. The channel director would reboot if every channel address was hit with CLRCH in tight loop. For certain types of failure modes, I would resort to reboot technique for recovery.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IBM 1401's claim to fame Newsgroups: alt.folklore.computers Date: 30 Jul 1997 18:01:52 +0000my first programming job as sophomore was to design/implement stand-alone 360 program that replaced the 1401 mpio. all the gear had been moved to 360/30 (i.e. 1401 gear was interoperable on 360 line via 360 control units) and when necessary the 30 was run in 1401 hardware emulation mode for mpio execution. my job was to design/implement simple multitasker, device drivers, and be able to do both reader->tape concurrently with tape->printer/punch with multiple buffered asynchronous i/o (none of which mpio did).
after i got it running ... added option to use standard os/pcp dcb i/o system ... problem for debugging/turn-around was that assembling non-dcb version on pcp of 2000 card program was about 20 minutes elapsed time (pcp6, 64k 360/30, 2311 drives). doing the dcb version added six minutes per dcb macro (you could tell from the lights when the assembler hit a dcb macro) for five DCBs (another 30 minutes).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Pre S/360 IBM Operating Systems? Newsgroups: alt.folklore.computers Date: 30 Aug 1997 08:11:17 +0000attached is posting here from a couple years ago regarding presentation I made at fall '68 share meeting. The careful ordering of MFT sysgen increased the (stand-alone) thruput of MFT (for our workload) by approx. a factor of 2-3 times.
CP/40 (and later cp/67 follow-on) was project by several people that had worked on CTSS (swapping system for 7094 at MIT). CP was somewhat parallel with (also some former CTSS) people working on Multics in the same building.
Later in '69 (after redoing lots of the cp/67 pathlengths, creating fastpath, ... many cases a reduction in pathlength of 100* or more) ... I redid the paging system and creating my own "working set" definition as well as dynamic adaptive resource scheduling for CP/67 (in recent years ... I've worked on some Unix kernels that had scheduling logic that looked remarkably similar to the CP/67 code I replaced in '69 ... possibly indicating a common heritage traced back to ctss).
Also in '69, i replaced the HASP-iii 2780 support (then running on MVT-18) ... and implemented 2741, 1052, & tty line support along with syntax from the cms editor ... for early online crje.
For the indicated job stream ... the complete fortran job stream had been running under ibsys on 709 in less time than it took for the os job scheduler to execute for a single job on 360-65/67. It wasn't until a combination of HASP&Watfor that the (stand-alone) job stream elapsed times were comparable on the 65/67 to what they had been on the 709. HASP provided the fast disk-to-disk input/output (709/ibsys was tape to tape) ... compared to base system which ran (synchronously) card reader to printer input/output. Watfor effectively took over the job scheduling responsibility from the base OS ... and was more similar to IBSYS monitor than the ob job scheduler (and ran in comparable elapsed times).
Newsgroups: alt.folklore.computers Subject: CP/67 & OS MFT14 Date: Sun, 3 Apr 1994 17:51:11 GMT
In response to various inquiries, attached is report that I presented at the fall '68 SHARE meeting (Atlantic City?). CSC had installed CP/67 at our university in January '68. We were then part of the CP/67 "announcement" that went on at the spring '68 SHARE meeting (in Houston).
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OS Performance Studies With CP/67 OS MFT 14, OS nucleus with 100 entry trace table, 105 record in-core job queue, default IBM in-core modules, nucleus total size 82k, job scheduler 100k. HASP 118k Hasp with 1/3 2314 track buffering Job Stream 25 FORTG compiles Bare machine Time to run: 322 sec. (12.9 sec/job) times Time to run just JCL for above: 292 sec. (11.7 sec/job) Orig. CP/67 Time to run: 856 sec. (34.2 sec/job) times Time to run just JCL for above: 787 sec. (31.5 sec/job) Ratio CP/67 to bare machine 2.65 Run FORTG compiles 2.7 to run just JCL 2.2 Total time less JCL time 1 user, OS on with all of core available less CP/67 program. Note: No jobs run with the original CP/67 had ratio times higher than the job scheduler. For example, the same 25 jobs were run under WATFOR, where they were compiled and executed. Bare machine time was 20 secs., CP/67 time was 44 sec. or a ratio of 2.2. Subtracting 11.7 sec. for bare machine time and 31.5 for CP/67 time, a ratio for WATFOR less job scheduler time was 1.5. I hand built the OS MFT system with careful ordering of cards in the stage-two sysgen to optimize placement of data sets, and members in SYS1.LINKLIB and SYS1.SVCLIB. MODIFIED CP/67 OS run with one other user. The other user was not active, was just available to control amount of core used by OS. The following table gives core available to OS, execution time and execution time ratio for the 25 FORTG compiles. CORE (pages) OS with Hasp OS w/o HASP 104 1.35 (435 sec) 94 1.37 (445 sec) 74 1.38 (450 sec) 1.49 (480 sec) 64 1.89 (610 sec) 1.49 (480 sec) 54 2.32 (750 sec) 1.81 (585 sec) 44 4.53 (1450 sec) 1.96 (630 sec)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MISC. footnotes:
I had started doing hand-built "in-queue" SYSGENs starting with MFT11. I would manually break all the stage2 SYSGEN steps into individual components, provide "JOB" cards for each step and then effectively run the "stand-alone" stage2 SYSGEN in the standard, production job-queue.
I would also carefully reorder the steps/jobs in stage2 (as well as reordering MOVE/COPY statements for PDS member order/placement) so as to appropriately place data on disk for optimal disk arm-seek performance.
In the following report, the "bare-machine" times of 12.9 sec/job was typically over 30 seconds/job for a MFT14 built using standard "stand-alone" SYSGEN process (effectively increase in arm-seek elapsed time). Also, the standard OS "fix/maintenance" process involved replacing PDS-members which resulted in destroying careful member placement. Even with an optimally built system, "six months" of OS maintenance would resort in performance degrading to over 20 secs/job.
A non-optimal built OS system actually would make CP/67 performance look "better" (i.e. ratio of CP/67 times to "bare-machine" times). CP/67 overhead (elapsed time increase) was proportional to simulation activity for various "kernel" activities going on in the virtual machine. I/O elapsed time was not affecting by running under CP/67. Keeping the sumulation overhead fixed, but doubling (or tripling) the elapsed time with longer I/O service time would improve the CP/67/bare-machine ratios.
The modified CP/67 was based on numerous pathlength performance changes that I had done between Jan of 1968 and Sept of 1968, i.e. reduce CP/67 elapsed time from 856 sec. to 435 secs (reduction in CP/67 pathlength CPU cycles from 534secs to 113secs).
From: Lynn Wheeler <lynn@garlic.com> Subject: Re: Kernel swapping itself out ? Newsgroups: comp.society.folklore Date: 25 Oct 1997 18:34:21 -0800I implemented kernel paging on cp/67 (ibm 360/67) in the summer of 1969.
initial implementation of pageable MVT was called SVS (single virtual storage) ... it basically allowed MVT to operate as if it was running on a real 16mbyte machine (although the actual real machine memory size was much smaller). certain areas of the kernel had to be fixed. This was later rewritten as MVS (multiple virtual storage).
The distinction of what is kernel and non-kernel in MVT, SVS, and MVS is less distinct (than in some other systems) ... since the operating system occupied the same address space as the application program space (in MVS each application could be provided a distinct address space with the common MVS operating system code continueing to reside in the same address space). Distinction between application, operating system "non-priviledge" and operating system "priviledge" was more a case of the system mode ... not a clear-cut address space.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Kernel swapping itself out ? Newsgroups: comp.society.folklore Date: 26 Oct 1997 05:54:27 -0800i also somewhat facetiously claim that the person responsible for closing the burlington center contributed nearly as much to VMS as anybody else.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: Early RJE Terminals (was Re: First Network?) Newsgroups: alt.folklore.computers Date: 01 Nov 1997 08:11:50 -0800HASP we were running while i was undergraduate (circa 68 timeframe) had 2780 support in it. We didn't have any 2780s (at the time) ... so when I was trying to adapt some CMS-like editor support to OS ... I replaced the 2780 support (to get addressability) with TTY & 2741 support ... along with interactive editor. Line communication (2780, tty, 2741, etc) came into 360 thru the 2702 controller. For a number of reasons, 4 of us built our own replacement for the 2702 controller to add both dynamic speed recognition and dynamic terminal type identification.
I had earlier ... rewritten the mainframe support to utilize the 2702 SAD command to re-associate the different type of line scanners with a specific line ... as part of dynamic terminal type identification ... only to find it work somewhat problamatically. Turns out somewhere along the way in 2702 development ... they went to hard-wiring specific frequency oscillator to specific line ... resulting in the ability to redirect line-scanner association only useful if the bit-rate was identical (which then prompted us to build our own replacement ... and originating the ibm oem controller market).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IA64 Self Virtualizable? Newsgroups: comp.arch Date: 18 Nov 1997 11:00:08 -0800i started with cp/67 having about 150-200% overhead for some typical MFT jobstreams and rewrote a lot of code to get it down around 10%. Going to MVT drove it back up into the 30% range because of the increased ratio of kernel-mode and supervisor instructions to application program execution (MVT compared to MFT).
Supporting SVS and then MVS drove that way back up. Effectively VM had to simulate a hardware TLB in software for virtual operating systems that used virtual memory. There were later machines with hardware assists that would handle the software TLB loading (semi-) automatically that significantly reduced the overhead (there was still a couple storage cycles, miss in the hardware TLB, check the virtualized software TLB, miss in it, extract the entry from the virtualized page table, translate it, stuff it into the software TLB, re-execute the instruction, miss in the hardware TLB, check the virtualized software TLB, extract the entry, load it into the hardware TLB, effectively re-execute the instruction).
The interesting thing is that the increase in operating system overhead and elapsed time for an application was significantly greater going from MVT to MVS (than overhead provided by running MVT under VM). In fact, in one exercise, somebody took a copy of VS1, laid it out in VM virtual memory, tweak some of its code ... and had a multi-stream VS1 under VM significantly outperforming a "stand-alone" MVS.
I periodically have comparisons with the Multics guys about some of their op system stuff on upper floors of tech sq to what we did on the first couple floors of the same building (responsible for virtual machines, SGML, internal network). In numbers the internal network was larger than the internet up thru about 84/85. For virtual machines, it seem fair to compare real customer accounts ... so only compared internal accounts where we did all the direct support (distributed the material, answered the calls, fixed things) ... w/o any help from other organizations or the company's marketing & support teams. That would get number of large mainframes down into the couple hundred range ... which seemd to be a better way of comparing Multics efforts to our efforts.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IA64 Self Virtualizable? Newsgroups: comp.arch Date: 18 Nov 1997 12:18:44 -0800... also there is some difference between self-virtualizing and hypervisor support. hypervisors tend to have lots of hardware support for switching between different supervisors. self-virtualizing is an operating system that can run virtual copies of itself (potentially nested several layers deeps) as well as copies of other operating systems (self-virtualizing may have hardware assists ... but hypervisors aren't necessarily self-virtualizing).
at one point we had:
1) copy of cp/67 running on the real 360/67 hardware that was
self-virtualizing
2) modified copy of cp/67 running in a 67 virtual machine (provided
by #1) providing 370 virtual machines
3) modified copy of cp/67 running in a 370 virtual machine (provided
by #2) providing 370 virtual machines
4) 370 operating system running a 370 virtual machine (provided
by #3)
i.e. three levels of virtualization ... including somewhat "foriegn" architecture ... all being handled purely in software (no virtualizing hardware assists). For some secure environments (restricted population of users), "level 2" could be run directly on the native hardware (i.e. didn't want to expose general public to the ability of having 370 virtual machines before the product/hardware had been announced).
An important characteristic making self-virtualization possible on the 67 was the ability to switch both addressing modes and supervisor state in a single instruction (as well as interrupts being able to switch addressing modes and supervisor state in a single operation). It also required a strong seperation of state changing instruc Requiring separate instructions for switching addressing modes and supervisor/kernel state cripples self-virtualization (unless there is separate, custom hardware support for virtualization).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IA64 Self Virtualizable? Newsgroups: comp.arch Date: 19 Nov 1997 08:49:36 -0800... attached is handout i used at a talk in fall of '68 of some early performance work I had done as an undergraduate. it is primarily the effect of fast-path and careful tuning of simulation paths ... along with some restructuring of internal program call infrastructure. It doesn't show the effect of the work of fair share scheduling and fixing various non-linear scale-up dealing with large number of tasks. It also doesn't really show the work I had done on early versions of clock-like page replacement algorithms.
OS Performance Studies With CP/67 OS MFT 14, OS nucleus with 100 entry trace table, 105 record in-core job queue, default IBM in-core modules, nucleus total size 82k, job scheduler 100k. HASP 118k Hasp with 1/3 2314 track buffering Job Stream 25 FORTG compiles Bare machine Time to run: 322 sec. (12.9 sec/job) times Time to run just JCL for above: 292 sec. (11.7 sec/job) Orig. CP/67 Time to run: 856 sec. (34.2 sec/job) times Time to run just JCL for above: 787 sec. (31.5 sec/job) Ratio CP/67 to bare machine 2.65 Run FORTG compiles 2.7 to run just JCL 2.2 Total time less JCL time 1 user, OS on with all of core available less CP/67 program. Note: No jobs run with the original CP/67 had ratio times higher than the job scheduler. For example, the same 25 jobs were run under WATFOR, where they were compiled and executed. Bare machine time was 20 secs., CP/67 time was 44 sec. or a ratio of 2.2. Subtracting 11.7 sec. for bare machine time and 31.5 for CP/67 time, a ratio for WATFOR less job scheduler time was 1.5. I hand built the OS MFT system with careful ordering of cards in the stage-two sysgen to optimize placement of data sets, and members in SYS1.LINKLIB and SYS1.SVCLIB. MODIFIED CP/67 OS run with one other user. The other user was not active, was just available to control amount of core used by OS. The following table gives core available to OS, execution time and execution time ratio for the 25 FORTG compiles. CORE (pages) OS with Hasp OS w/o HASP 104 1.35 (435 sec) 94 1.37 (445 sec) 74 1.38 (450 sec) 1.49 (480 sec) 64 1.89 (610 sec) 1.49 (480 sec) 54 2.32 (750 sec) 1.81 (585 sec) 44 4.53 (1450 sec) 1.96 (630 sec)xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MISC. footnotes:
I had started doing hand-built "in-queue" SYSGENs starting with MFT11. I would manually break all the stage2 SYSGEN steps into individual components, provide "JOB" cards for each step and then effectively run the "stand-alone" stage2 SYSGEN in the standard, production job-queue.
I would also carefully reorder the steps/jobs in stage2 (as well as reordering MOVE/COPY statements for PDS member order/placement) so as to appropriately place data on disk for optimal disk arm-seek performance.
In the report, the "bare-machine" times of 12.9 sec/job was typically over 30 seconds/job for a MFT14 built using standard "stand-alone" SYSGEN process (effectively increase in arm-seek elapsed time). Also, the standard OS "fix/maintenance" process involved replacing PDS-members which resulted in destroying careful member placement. Even with an optimally built system, "six months" of OS maintenance would resort in performance degrading to over 20 secs/job.
A non-optimal built OS system actually would make CP/67 performance look "better" (i.e. ratio of CP/67 times to "bare-machine" times). CP/67 overhead (elapsed time increase) was proportional to simulation activity for various "kernel" activities going on in the virtual machine. I/O elapsed time was not affecting by running under CP/67. Keeping the sumulation overhead fixed, but doubling (or tripling) the elapsed time with longer I/O service time would improve the CP/67/bare-machine ratios.
The modified CP/67 was based on numerous pathlength performance changes that I had done between Jan of 1968 and Sept of 1968, i.e. reduce CP/67 elapsed time from 856 sec. to 435 secs (reduction in CP/67 pathlength CPU cycles from 534secs to 113secs).
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Anne & Lynn Wheeler <lynn@garlic.com> Subject: Re: IA64 Self Virtualizable? Newsgroups: comp.arch Date: 20 Nov 1997 17:03:30 -0800the whole count-key-data out-board search was design point trade-off regarding 64kbyte operating system memory and several hundred kbyte/sec smart adapters. configurations were on the wrong side of the trade-off by the mid-70s; system memory for caching disk locations was by then cheaper than tieing the I/O subsystem with linear searches.
I could get 3* speed-up for most i/o instensive workloads with fixed-block architecture and reasonably filesystem.
it wasn't just the fancy ISAM stuff ... but just the simple vtoc and pds operations.
The problem with vtoc, pds, and fancy ISAM is that they did multi-track searches which tied up the device, controller, and channel (bus) for the duration of the search (multiple revolutions).
i was called in as solution of last resort to very large customer with a large cluster of mainframes managing large national business. they had a frequent performance bottleneck which brought all processers in the complex to a grinding halt at the same time. It had been several months and unable to fiqure out the problem. They gave me a classroom of 15-20 tables ... all covered with paper listings 3' high of performance data. After about 3hrs of eye-balling the data ... only correlation pattern that I observered was that one drive (out of upwards of hundred) would peak around 6-7 I/Os per second. In non-bottleneck periods ... the drive would only be doing 3-4 I/Os per second (these are drives commonly clocked at 40-60 I/Os per second).
Turns out that the shared application program library was located on that drive ... with a large number of members and a three cylinder vtoc. Most application program loads would require a PDS vtoc search; on average the vtoc search 1.5 cylinders. For drives spinning 3600rpm, and 19 tracks/cylinder ... a full-cylinder multitrack search would take .3 seconds elapsed time (during which time the drive, controller, and channel/bus were all totally locked out). A typical program load was taking 3 disk I/Os (two searches and a read) which took an aggregate elapsed time of approximately .45 seconds.
The effect of doing multi-track search (of full cylinder) slowed the disk I/O rate down from 40-60 per second to 4-6 per second (max. thruput). furthermore the multitrack search not only tied up the disk drive, but tied up a significant portion of the rest of (shared) I/O resources in the system.
The Q&D solution for the customer was to spread the program library across multiple disks with limit on the size of vtoc no more than 1/2 cylinder.
Another scenerio where we ran into the obsolescence of search-id technology was in large shared disk configuration consisting of both MVS and VM processors. The "rule" was that the shared disk configuration only existed for availability and NEVER was a MVS disk to be mounted on a "string" controlled by a "VM" controller.
The problem was that placing an MVS disk on a string belonging to a nominally VM controller would subject the VM controller to the same (but less severe) multi-track search "lock-out" scenerios as the shared program library problem. MVS users nominally never realized the performance degradation caused by multi-track search. However, a single MVS drive on a VM controller could be immediately perceived as a 20-30% degradation in thruput (i.e. MVS users didn't know any better ... it was only if you were use to running in non-multi-track search environment that you would perceive the significant slow down).
The counter that was used when the MVS group accidentally mis-mounted a disk was to bring up a souped up VS1 system on VM ... and turn it loose on the mis-mounted mvs drive. In the severe case, they could bring up a souped up VS1 on a fully loaded VM system running at 100% utilization, and bring the MVS system to its knees (even when the MVS system was only moderate loaded and had a processor 4* faster) .... i.e. souped-up VS1 with about 3% of the resources of a stand-alone MVS system ... could still turn SIOs to the disk around faster.
The killer that has been with us for a long time ... was that even tho I could show a 300% speed-up for common set of disk intensive workload converting to a fixed-block infrastructure ... the business case to rewrite MVS PDS & VTOC support was set at something like $26m.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Lynn Wheeler <lynn@garlic.com> Subject: Re: How is CICS pronounced? Newsgroups: alt.folklore.computers Date: 25 Nov 1997 17:01:13 -0800i was at site that was beta-test for cics ... 1969 and my impression was that it originated at some customer site before being picked up by ibm.
i remember shooting a bunch of cics bugs in the code ... one that took awhile to track down was an invalid bdam dcb specification for open ... as close as i could tell ... the cics code had only been developed against a single bdam file mode ... and we were using something different.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
From: Lynn Wheeler <lynn@garlic.com> Subject: Re: The Bad Old Days Newsgroups: comp.society.folklore Date: 25 Nov 1997 18:07:00 -0800i lucked out in the bad ol days ... when i was an undergraduate I would get the keys to the machine room and all the machines from 8am saturday until 8am monday; everything in the room were my personnel computers for 48hrs ... monday classes were a little hard, not having slept for over 48hrs.
--
Anne & Lynn Wheeler | lynn@garlic.com, lynn@netcom.com
| finger for pgp key
next, previous, subject index - home