Katago做发阳论120
2020年4月23日 10:20来自:九点圆
https://blog.janestreet.com/deep-learning-the-hardest-go-problem-in-the-world/
就,只针对这一道题,整个从零开始训练了一遍
然后,结论是,白胜2目……
![]() | llp 写于 2020-4-23 2:231楼害怕 |
![]() | ran2009 写于 2020-4-23 2:262楼???? |
![]() | 斩杀呉暨 写于 2020-4-23 2:273楼...... |
![]() | 刘泊含 写于 2020-4-23 2:294楼我发现我的账号有个问题,我点棋书查询的时候一片代码,不敢发,怕被说是刷屏封号 |
![]() | llp 写于 2020-4-23 2:305楼Whereas the tentative best known human line led to a win by 3 points for Black, as a result of these new moves, KataGo seems to believe that White will win by 2 points, or at least by 1 point. 尽管目前人类所知的最优变化黑棋将赢3目,但由于发现了新的变化,KataGo认为白棋将会赢2目,或者至少1目 |
![]() | 斩杀呉暨 写于 2020-4-23 2:306楼Updates and a New Run First, some brief updates: Earlier this year, I posted about our project KataGo and research to improve self-play learning in Go, with an initial one-week run showing highly promising results. Several months later in June, KataGo performed a second, longer 19-day run with some major bugfixes and minor optimizations. Starting from scratch and with slightly less hardware than before, up to 28 V100 GPUs, it reached and surpassed the earlier one-week run in barely more than the first three days. By the end of the 19 days, it had reached the strength of ELF OpenGo, Facebook AI Research’s multi-thousand-GPU replication of one of AlphaZero’s runs - equating to roughly a factor of 50 reduction in computation required. Our paper has since been significantly revised and updated with the new results: Accelerating Self-Play Learning in Go (arxiv paper) This version of KataGo has also been released to the Go player community for several months now. With the results of this second run, KataGo is comparable to other top-level open-source bots and is now one of the more popular bots online for its ability to play handicap games - to come up with strong moves even starting at a great disadvantage by secondarily optimizing for score, something that ELF and other “AlphaZero”-trained bots fail to do well [1]. KataGo is also popular for game analysis using some of the more major analysis GUIs also due to its ability to estimate the score, ownership, and to adjust to alternate board sizes and rules. Elo strength versus Leela Zero and ELF Relative Elo rating (y-axis) of KataGo's run versus Leela Zero (LZ) and ELF, plotted by *rough* self-play computation required (x-axis, log scale). Note: Leela Zero is likely depicted quite over-favorably compared to either KataGo or ELF, but also a little under-favorably for the very earliest parts of their run due to a variety of technical details and differences in their run - see paper. |
![]() | 罗桢子 写于 2020-4-23 2:387楼高级啊! |
![]() | 刘泊含 写于 2020-4-23 6:58楼真的,刚从哔哩哔哩看完 |
![]() | 睿趣刘嘉博 写于 2020-4-23 6:209楼? |
![]() | 睿趣刘嘉博 写于 2020-4-23 6:2110楼最佳解强强强强 |
![]() | kenny 写于 2020-4-23 6:2611楼katago真的学会了日本规则吗? |
![]() | 睿趣刘嘉博 写于 2020-4-23 6:2712楼呃呃呃呃呃 |
![]() | 九点圆 写于 2020-4-23 6:5613楼用中国规则做的吧…… |
![]() | 马帅123 写于 2020-4-27 10:4914楼卡塔狗1.3就有日本规则和古棋了(Tromp-Taylor规则就不说了。。根本搞不懂) 话说各位大佬都是英语天才啊(百度翻译的当我没说) |
![]() | llp 写于 2020-4-27 11:2115楼英语还算可以,上过145 |
![]() | 斩杀呉暨 写于 2020-4-27 11:2916楼还可以,上过101【注意不是官网】 |
![]() | 马帅123 写于 2020-4-27 13:1217楼那么请翻译一下"Tromp-Taylor"是什么意思。。 |
![]() | 马帅123 写于 2020-4-27 13:1318楼百度翻译:特朗普·泰勒 hhh |
![]() | 斩杀呉暨 写于 2020-4-27 13:1319楼Go is played on a 19x19 square grid of points, by two players called Black and White. Each point on the grid may be colored black, white or empty. A point P, not colored C, is said to reach C, if there is a path of (vertically or horizontally) adjacent points of P's color from P to a point of color C. Clearing a color is the process of emptying all points of that color that don't reach empty. Starting with an empty grid, the players alternate turns, starting with Black. A turn is either a pass; or a move that doesn't repeat an earlier grid coloring. A move consists of coloring an empty point one's own color; then clearing the opponent color, and then clearing one's own color. The game ends after two consecutive passes. A player's score is the number of points of her color, plus the number of empty points that reach only her color. The player with the higher score at the end of the game is the winner. Equal scores result in a tie. |
![]() | 马帅123 写于 2020-4-27 13:1520楼额。。中文呢。。 |
![]() | 斩杀呉暨 写于 2020-4-27 13:1521楼哈 1、围棋在19x19的棋盘上进行,对战者称为黑方和白方; 2、每个交叉点为黑,白,空三种颜色; 3、称某颜色不为C的点P为“到达C”,若存在一条由全是P点颜色的相邻点(水平或竖直)构成的从P到某颜色为C的点的路径;(就是说从P可以一直不变色地走到一个颜色为C的点) 4、将所有不能“到达空”的某种颜色的点染为空,叫做“清除”那种颜色; 5、从空白棋盘开始,双方轮替“下”,黑方起始; 6、“下”要么是什么也不走,要么是使得全局不和以往重复的一次“落子”; 7、“落子”由如下步骤组成:首先将一个空点染为己方颜色,然后“清除”对方颜色,再然后“清除”己方颜色; 8、当出现两次连续的“不走”时,棋局结束; 9、某一方的点数等于此方颜色的点数加上只“到达”这一颜色的空色点数; 10、点数高的一方获胜。双方点数相等为平局。 这一规则由 John Tromp 和 Bill Taylor 创制,也被称为围棋的逻辑规则,试图尽量简化规则,并消除歧义。 |
![]() | 马帅123 写于 2020-4-27 13:1622楼围棋由两个叫黑白的玩家在19x19的方格点上进行。 网格上的每个点可以是黑色、白色或空的。 如果有一条路径(垂直或水平),则点P(不是C)表示到达C P的颜色从P到C的相邻点。 清除颜色是清空该颜色中所有未达到空值的点的过程。 从一个空的格子开始,球员们轮流转身,从黑色开始。 一个转身要么是一个传球,要么是一个不重复先前网格着色的动作。 一个动作包括给一个空点涂上自己的颜色; 然后清除对手的颜色,然后清除自己的颜色。 比赛在连续两次传球后结束。 玩家的分数是她颜色的点数,加上只有她颜色的空点数。 比赛结束时得分较高的选手获胜。分数相等导致平局。 |
![]() | 马帅123 写于 2020-4-27 13:1623楼不一样。。百度翻译 |
![]() | 斩杀呉暨 写于 2020-4-27 13:1624楼Research Interests Board Games and Artificial Intelligence, Algorithms, Complexity, Algorithmic Information Theory, Distributed Computing, Computational biology, and what not. Recently i've been designing a new Proof-of-Work system called Cuckoo Cycle. Less recent research has focused on the Combinatorics of Go, specifically counting the number of legal positions. I've been playing with a Lambda Calculus based utterly simple computer model. My Erdös number is 2, courtesy of Jeffrey Shallit, my favourite CS lecturer, who coauthored papers with both me and Erdös. A while ago I studied the complexity of OriMazes. |
![]() | 马帅123 写于 2020-4-27 13:1725楼研究兴趣 棋盘游戏和人工智能,算法,复杂性,算法信息理论,分布式计算,计算生物学,等等。最近我设计了一个新的工作证明系统,叫做布谷鸟循环。最近较少的研究集中在Go的组合学上,特别是计算合法职位的数量。我一直在玩一个基于Lambda微积分的完全简单的计算机模型。我的Erdós数字是2,这是由我最喜欢的CS讲师Jeffrey Shallit提供的,他与我和Erdós共同撰写了论文。不久前,我研究了orimaze的复杂性。 |
![]() | llp 写于 2020-4-27 13:1726楼如果我没猜错的话应该是两个人名,一个叫Tromp,一个叫Taylor。至于翻译出来那就是各式各样了…… |
![]() | 马帅123 写于 2020-4-27 13:1727楼翻译成中文我都听不懂。。何况英语 |
![]() | 马帅123 写于 2020-4-27 13:1828楼@llp,似乎有道理。。 |
![]() | 斩杀呉暨 写于 2020-4-27 13:1829楼he Logical Rules Go is played on a 19x19 square grid of points, by two players called Black and White. Each point on the grid may be colored black, white or empty. A point P, not colored C, is said to reach C, if there is a path of (vertically or horizontally) adjacent points of P's color from P to a point of color C. Clearing a color is the process of emptying all points of that color that don't reach empty. Starting with an empty grid, the players alternate turns, starting with Black. A turn is either a pass; or a move that doesn't repeat an earlier grid coloring. A move consists of coloring an empty point one's own color; then clearing the opponent color, and then clearing one's own color. The game ends after two consecutive passes. A player's score is the number of points of her color, plus the number of empty points that reach only her color. The player with the higher score at the end of the game is the winner. Equal scores result in a tie. Here are the rules expressed in Haskell. An array based version with parametrized board topology. Comments The grid of points is usually marked by a set of 19x19 lines on a wooden board. Each player has an arbitrarily large set of stones of his own color. By prior agreement a rectangle of different dimensions may be used. Using boards, coloring a point (intersection) black or white means placing a stone of that color on the point. Coloring a point empty, i.e. emptying a point, means removing the stone from it. Connected stones of the same color, sometimes called strings, all reach the same colors. Reaching empty means having empty points adjacent to the string, called liberties. Strings without liberties cannot exist on the board at the end of a turn. For handicap games, the weaker player, taking black, may be given an n stone handicap; these are n consecutive moves played before the first white move. This is the positional superko (PSK) rule, while the situational superko (SSK) rule forbids repeating the same grid coloring with the same player to move. Only in exceedingly rare cases does the difference matter, sufficient reason for the simpler PSK rule to be prefered. For any specific move, at most one of the clearing processes can have effect; the first is called capture, the second suicide. Allowing suicide means that a play on an empty point can be illegal only due to superko. As a practical shortcut, the following amendment allows dead stone removal: After only 2 consecutive passes, the players may end the game by agreeing on which points to empty. After 4 consecutive passes, the game ends as is. This is called area scoring. An almost equivalent result is reached by territory scoring where in addition to empty surrounded space we count opponent stones captured instead of own stones not captured. By prior agreement, for games between equals, a fixed amount can be added to white's final score. This is called komi, and can be chosen a non-integer such as 5.5 to avoid ties. Background (by Bill Taylor) These are essentially the New Zealand rules, re-worded to be as simple and elegant as possible. The NZ rules are in turn the simplest version of Chinese-style rules around. The NZ rules are worded with definitions given recursively, which is elegant and a joy to computer scientists, logicians and mathematicians, but perhaps not so nice for most others. John Tromp came up with the key idea of a stone "seeing" (or as I've presently worded "reaching") a different color. This was the brilliant step which enabled such succinct rules. My modest contribution was the wording for the end-of-game criterion, and putting an expansion into a second tier of interpretations rather than rules. This was to keep the logical rules as simple as possible, and to keep things close to how the game is actually played by humans. If you like the Tromp/Taylor rules as presented here, you are earnestly enjoined to present a copy to your local, national or international clubs and committees. This is especially the case for the European Go Association, which is long overdue in making the inevitable shift to Chinese-style rules, thus following the excellent American precedent set about four years ago; (even though the Americans sadly couldn't quite admit they'd in fact shifted to Chinese rules, as they have done, in effect). If you DON'T like the rules presented here, please contact us, explaining why. Let me just re-iterate the motivations for wanting to adopt Chinese-style rules. They are by far the simplest, most elegant, most easily worded, and most easily umpired of the main rule sets. The matter of when the game finishes, and what is dead/removable, in particular, is far more logical and simple than in the Japanese variants, where the main motivations seem to be undue respect for tradition, and a feeling for the "beauty of omission", (a criterion possibly more appropriate to Noh opera than to a worldwide game of strategy). It is of particular concern that the rules be made as "natural" and comprehensible as possible for beginners, so that they not be turned away from the game by puzzlement or outrage, notably at the unfair-looking "free removal" of scoring prisoners at the end of the game. Many of us have known this to happen with promising beginners. Western countries especially cannot afford this kind of wastage of recruits. Another point which has come up in email is this. There are four main areas in which Chinese and Japanese rules differ, and are effectively independent of one another. So in principle there are 2x2x2x2 = 16 ways of forming the rules, in these respects. Only the first difference is crucial. The whole network of rules concerning scoring; prisoners; end of the game; passing; removable stones; special positions; when are extra moves needed. This is "the" defining difference between the two rulesets. Chinese is far and away the simpler, by a country mile. The ko rules. Japanese is simpler, but has an annoying gap:- non-games resulting from long cycles. Chinese is more elegantly wordable. Suicide. Neutral; I have a slight preference for allowing it. (It allows slightly more options, thus more exercise of skill.) Where to put handicap stones. I much prefer the Chinese "free-placing" style - more game variation, and more opportunity for exercising skill. Some people may object that I've "cheated" by relegating many concepts to `comments and interpretations', and have thus kept the core rules artificially concise. However I don't think so. The core rules are precisely those that (e.g.) a computer or game-theoretician needs to know; which surely qualifies them as being the "real" rules. The remaining `comments and interpretations' are merely about those matters that real live players have to worry about for reasons of convenience, impatience, and a desire (usually) to play with physical equipment. It should be noted that (especially for tournaments) there would need to be a further layer of rules and proprieties concerning things like clocks and time, physical disturbances, ambiguous placements, getting unfair advice, and so on. (What Barry Phease succinctly dubbed "not rules of the game, but rules about playing the game".) I have completely ignored such matters. for all x there exists y such that y is not equal to x Preliminary version of Ladders are PSPACE complete |
![]() | llp 写于 2020-4-27 13:1830楼看了一下,确实言简意赅地把围棋规则描述了出来,很厉害啊 |
![]() | 斩杀呉暨 写于 2020-4-27 13:1931楼原稿在 http://tromp.github.io/go.html |
![]() | 马帅123 写于 2020-4-27 13:1932楼百度翻译字数限制为5000字, "ura"及其后面没有被翻译! 我凉了 |
![]() | llp 写于 2020-4-27 13:2133楼不过这个规则似乎没有定义“死活”的概念,如果真的要实行这个规则的话,得把所有死子全部提光 |
![]() | 斩杀呉暨 写于 2020-4-27 13:2234楼接下来由我来完成这份遗业【滑稽】 |
![]() | 马帅123 写于 2020-4-27 13:2235楼然鹅原稿最后那个29*29(眼虾,猜的)的棋盘啥意思? |
![]() | 斩杀呉暨 写于 2020-4-27 13:2236楼逻辑规则 围棋是在一个19x19的方格点上进行的,由两名名为“黑与白”的球员组成。 网格上的每个点都可以是黑色、白色或空的。 点P,而非C色,据说为达达c,如果存在(垂直或水平的)路径 P‘s颜色从P点到C点的邻接点。 清清颜色是指清空该颜色中没有达到空点的所有点的过程。 从一个空的栅格开始,玩家交替旋转,从黑色开始。 转弯要么是传球,要么是不重复先前网格着色的移动。 一个动作包括给一个空的点涂上自己的颜色; 然后清除对手的颜色,然后清除自己的颜色。 比赛在连续两次传球后结束。 一个球员的分数是她的颜色的点数,加上只有她的颜色的空点数。 比赛结束时得分较高的球员是胜利者。相等的分数会导致平分。 这里 are the rules expressed in 哈斯克尔. An 基于数组的版本 with parametrized board topology. 评论意见 点的网格通常在木板上用一组19x19条线来标记。 每个玩家都有自己颜色的一组任意大的石头。 根据事先协议,可以使用不同尺寸的矩形。 使用板,给点(交点)涂上黑色或白色的颜色意味着在点上放置一块这种颜色的石头。 把一个点涂成空,即清空一个点,就意味着把石头从它上移开。 同样颜色的连接的石头,有时被称为弦都达到相同的颜色。 “到达空”意味着在字符串旁边有空点,称为自由. 没有自由的字符串不可能在回合结束时出现在董事会上。 对于残疾游戏,实力较弱的玩家,如果是黑色的,可能会被给予一个石残; 这些是在第一次白色移动之前连续进行的移动。 这是位置超级子(PSK)规则,而情景超级子(SSK)规则禁止与同一播放器重复相同的网格着色以移动。只有在极其罕见的情况下,差异才是重要的,有足够的理由来优先采用更简单的PSK规则。 对于任何特定的移动,最多一个清算过程可以产生效果; 第一个叫做俘获,第二个是自杀。允许自杀意味着一个空点的游戏可能是非法的,因为超级英雄。 作为一种实用的捷径,下面的修改允许去石: 在连续两次传球之后,玩家可能会结束比赛。商定哪一个点为空. 在连续4次传球之后,比赛就结束了。 这叫做面积评分. 一个几乎等价的结果是领土评分在这里,除了空的、被包围的空间外,我们还计算对手捕获的石头,而不是自己的石头。 根据先前的协议,对于等价者之间的游戏,一个固定的数量可以添加到怀特的最后分数中。 这叫做科米,并可选择非整数,如5.5,以避免联系. 背景(比尔·泰勒) 这些基本上是新西兰的规则,重新措辞尽可能简单和优雅。新西兰规则反过来又是中式规则的最简单版本。NZ规则的定义是递归给出的,这是优雅的,对计算机科学家、逻辑学家和数学家来说也是一种乐趣,但对大多数人来说可能不是那么好。约翰·特罗普提出了一个关键的想法,那就是石头“看到”(或者就像我现在所说的“触觉”)不同的颜色。这是如此简洁的规则得以实现的辉煌一步。我的微薄贡献是“游戏结束准则”的措辞,并将其扩展为第二层解释,而不是规则。这是 使逻辑规则尽可能简单,以及 让游戏更接近人类的游戏方式。 如果您喜欢这里提出的Tromp/Taylor规则,请您向当地、国家或国际俱乐部和委员会提交一份副本。欧洲围棋协会(EuropeanGo Association)的情况尤其如此,该协会早就应该做出不可避免的转变,转而采用中国式的规则,从而遵循了大约四年前创下的出色的美国先例(尽管美国人很遗憾地不能承认,他们实际上已经转向了中国的规则)。 如果你不喜欢这里提出的规则,请与我们联系,解释原因。 让我再重复一下想要采用中国式规则的动机。到目前为止,它们是最简单、最优雅、最容易表达的,也是最容易理解的主要规则集。游戏何时结束,特别是什么是死的/可移除的,远比日本变体的逻辑和简单得多,日本变体的主要动机似乎是对传统的过分尊重,以及对“遗漏之美”的感觉(这一标准可能更适合于Noh歌剧,而不是一场世界性的战略游戏)。它是.特别人们担心这些规则是“自然的”,对初学者来说也是可以理解的,这样他们就不会因困惑或愤怒而被拒之门外,尤其是在游戏结束时看上去不公平的“免费移除”得分囚犯。我们中的许多人都知道,这种情况发生在有希望的初学者身上。西方国家尤其承受不起这种流失的新兵。 电子邮件中提到的另一点是这一点。中日两国的规则有四大不同之处,实际上是相互独立的。在这些方面,原则上有2x2x2x2=16种规则的形成方式。只有第一个区别是至关重要的。 关于得分的整个规则网络;囚犯;游戏的结束;传球;可移动的石头;特殊位置;何时需要额外的动作。这是两个规则集之间的“定义”区别。中国人远比一个国家简单。 KO规则。日语比较简单,但有一个恼人的差距:中文的用词更文雅。 自杀。中立;我稍微倾向于允许它。(它允许了更多的选择,从而增加了技能的锻炼。) 哪里放残石。我更喜欢中国的“自由放置”风格-更多的游戏变体,更多的锻炼技巧的机会。 有些人可能会反对我将许多概念降级为“评论和解释”,从而人为地保持核心规则的简洁性,从而使我“作弊”。但是我不这么认为。核心规则正是那些(例如)计算机或游戏理论家需要知道,这无疑使他们成为“真正的”规则。其余的‘评论和解释’仅仅是关于那些真正的现场玩家必须担心的事情,因为方便,不耐烦,和一个愿望(通常)玩物理设备。 应该指出的是(特别是对于锦标赛),在诸如时钟和时间、身体干扰、不明确的放置、得到不公平的建议等方面,还需要有更多的规则和礼仪。(巴里·菲斯(BarryPhease)简洁地称之为“不是游戏规则,而是游戏规则”。)我完全不理会这些事。 |
![]() | llp 写于 2020-4-27 13:2337楼很简洁地把禁全同、打劫的规则全部阐明了,只是可能对于萌新十分不友好 |
![]() | 马帅123 写于 2020-4-27 13:2438楼传球是指?? 囚犯? |
![]() | llp 写于 2020-4-27 13:2439楼@斩杀呉暨 百度翻译就算了吧……翻译出来狗屁不通 |
![]() | llp 写于 2020-4-27 13:2540楼英文中prisoner指囚犯,在围棋中指死子 |
![]() | kenny 写于 2020-4-27 13:2741楼认真看了一遍,很到位 |
![]() | 睿趣孙奕楠 写于 2020-4-27 13:2742楼球员? |
![]() | 马帅123 写于 2020-4-27 13:2743楼在yzy_lizzie中Tromp-Taylor规则被认为是AI之间的对决使用的规则 胜负判断:数子 打劫规则:严格禁全同 不允许棋块自杀 不还棋头 让子贴还(让N子):贴还N目 -------华丽的分隔线------- 这是我唯一能看懂的 |
![]() | kenny 写于 2020-4-27 13:2744楼英语不太好,不过大概意思也get到了 |
![]() | 马帅123 写于 2020-4-27 13:2845楼说错了,允许棋块自杀 |
![]() | 九点圆 写于 2020-4-27 15:5546楼话说T-T规则严格双虚终局,和通常采用的带确认死活流程的版本还是有非平凡的区别 |